Wednesday, 12 October 2016

The Natural Language Processing Dictionary

A
An abstract noun is a noun that does not describe a physical object, for example philosophy. Contrast common noun.
An accepter is a program (or algorithm) that takes as input a grammar and a string of terminal symbols from the alphabet of that grammar, and outputs yes (or something equivalent) if the string is asentence of the grammar, and no otherwise. Contrast parser.
An active arc is a structure used by a chart parser as it attempts to parse a sentence. It is derived ultimately from a rule of the grammar being used, and consists of:
·        name for the arc,
·        type - the phrasal category being sought,
·        list of found constituents, i.e. constituents required by a grammar rule
·        list of types of the constituents not yet found,
·        from position, indicating the position in the sentence of the start of the first found constituent, and
·        to position, indicating the position in the sentence of the end of the last found constituent.
The symbol  is used to separated the type from the list of found constituents, and a dot is used to separate the list of found constituents from the list of types of constituents not yet found.
= chart (in a chart parser)
Sentences in English may be in active or passive form. The active form makes the one who is performing the action in the sentence (termed the agent in semantics) the grammatical subject. This serves to focus attention on the agent. Example: "John ate the pizza". The alternative, the passive voice, makes the thing acted on into the grammatical subject, thus focussing attention on that thing, rather than on the agent. Example: "The pizza was eaten by John." Many writers appear to believe that use of the passive seems more formal and dignified, and consequently it is over-used in technical writing. For example, they might write "The following experiments were performed" when it would be clearer to say "We [i.e. the authors] performed the following experiments."
Contrast moodtense, and aspect.
symbol used in grammar rules for an adjective.
An adjective is a word that modifies a noun by specifying an attribute of the noun. Examples include adjectives of colour, like red, size or shape, like round or large, along with thousands of less classifiable adjectives like willingonerous, etc. In grammar rules, we use the symbol ADJ for the pre-terminal category of adjectives.
Adjectives are also used as the complements of sentences with verbs like "be" and "seem" - "He is happy", "He seems drunk".
ADJ is a lexical grammatical category.
Adjective Phrase (or adjectival phrase) is a phrasal grammatical category. Adjective phrase is usually abbreviated to ADJP. They range from simple adjectives (like "green" in "the green grass") through short lists of adjectives possibly modified by an adverb or so (like "really large, cream" in "that really large, cream building") to fairly complicated constructs like "angry that he had been ignored" in "Jack was angry that he had been ignored". The longer adjective phrases are frequently take the form of an adjective followed by a complement, which might be a "that"+Sentence complement (as in "angry that he had been ignored"), or a PP complement or a "to"+VP complement.
The longer ADJPs are most often found as complements of verbs such as "be" and "seem". ADJP is a phrasal grammatical category.
symbol used in grammar rules for an adjective phrase.
symbol used in grammar rules for an adverb.
An adverb is a word that modifies a verb, ("strongly", in "she swam strongly") an adjective, ("very", in "a very strong swimmer") or another adverb ("very", in "she swam very strongly").
Many adverbs end with the morpheme -ly, which converts an adjective X into an adverb meaning something like "in an X manner" - thus "bravely" = "in a brave manner". Other adverbs includeintensifiers like "very" and "extremely". There are also adverbs of time (like "today", "tomorrow", "then" - as in "I gave him the book then"), frequency ("never", "often"), and place ("here", "there", and "everywhere").
ADV is a lexical grammatical category.
Adverbial phrases are phrases that perform one of the functions of an adverb. They include simple phrases that express some of the same types of concepts that a single adverb might express, such as frequency - "every week", duration - "for three weeks", time - "at lunchtime", and manner - "this way" ("Do it this way"), or "by holding his head under water for one minute".
Adverbial Phrase is a phrasal grammatical category. Adverbial phrase is usually abbreviated to ADVP.
symbol used in grammar rules for an adverbial phrase.
A AGENT is a case used in logical forms. It signifies the entity that is acting in an event. It normally corresponds to the syntactic subject of an active voice declarative sentence. In the logical form for a state description, the term EXPERIENCER is used for the corresponding entity. AGENTs appear in the frame-like structures used to describe logical forms: e.g. the following, representing "John breaks it with the hammer":
break1(e1,
agent[name(j1, 'John')]
theme[pro(i1, it1)]
instr(the<h1, hammer1>])
Agreement is the phenomenon in many languages in which words must take certain inflections depending on the company they keep. A simple case occurs with verbs in the third person singular form and their singular subjects: "Jane likes cheese" is correct, but * "Jane like cheese" and * "My dogs likes cheese" are not, because the subjects and verbs do not agree on the number feature. The name used in the lecture notes for the agreement feature is agr. The possible values of the agr feature are 1s, 2s, 3s, 1p, 2p, 3p, signifying 1st person singular, 2nd person singular, ..., 3rd person plural. Pronouns like "I" and "me" have agr=1s, "you" has agr={2s,2p} as it is not possible to distinguish singular from plural in this case, and so on. Definite noun phrases like "the green ball" have agr=3s.
This refers to the book by James Allen, Natural Language Processing, second edition, Benjamin Cummings, 1995.
The "alphabet" of a grammar is the set of symbols that it uses, including the terminal symbols (which are like words) and the non-terminal symbols which include the grammatical categories like N (noun), V (verb), NP (noun phrase), S ( sentence), etc.
See also context-free grammar, and context-sensitive grammar.
An ambiguity is a situation where more than one meaning is possible in a sentence. We consider three types of ambiguity:
word-sense ambiguity structural ambiguity referential ambiguity
There can be situations where more than one of these is present.
An anaphor is an expression that refers back to a previous expression in a natural language discourse. For example: "Mary died. She was very old." The word she refers to Mary, and is described as an anaphoric reference to MaryMary is described as the antecedent of she. Anaphoric references are frequently pronouns, as in the example, but may also be definite noun phrases, as in: "Ronald Reagan frowned. The President was clearly worried by this issue." Here The President is an anaphoric reference to Ronald Reagan. Anaphors may in some cases not be explicitly mentioned in a previous sentence - as in "John got out his pencil. He found that the lead was broken." The lead here refers to a subpart of his pencil. Anaphors need not be in the immediately preceding sentence, they could be further back, or in the same sentence, as in "John got out his pencil, but found that the lead was broken." In all our examples so far the anaphor and the antecedent are noun phrases, but VP and sentence-anaphora is also possible, as in "I have today dismissed the prime minister. It was my duty in the circumstances." Here It is an anaphoric reference to the VP dismissed the prime minister.
For a fairly complete and quite entertaining treatment of anaphora, see Hirst, G. Anaphora in Natural Language Understanding: A Survey Springer Lecture Notes in Computer Science 119, Berlin: Springer, 1981.
feature of some noun phrases. It indicates that the thing described by the noun phrase is alive, and so capable of acting, i.e. being the agent of some act. This feature could be used to distinguish between The hammer broke the window and The boy broke the window - in the former, the hammer is not animate, so cannot be the agent of the break action, (it is in fact the instrument), while the boyis animate, so can be the agent.
See anaphor.
A grammatical relation between a word and a noun phrase that follows. It frequently expresses equality or a set membership relationship. For example, "Rudolph the red-nosed reindeer [had a very shiny nose]" - here Rudolph = the unique red-nosed reindeer. Another example, "Freewheelin' Franklin, an underground comic-strip character, [was into drugs and rock music]", expresses a set membership relation: Freewheeling_Franklin in "underground comic-strip characters".
Words like "the", "a", and "an" in English. They are a kind of determiner. See also the quantifying logical operator THE.
The phrase "I am reading" is in the progressive aspect, signifying that the action is still in progress. Contrast this with "I read" which does not likely refer to an action that is currently in progress. Aspect goes further than this, but we shall not pursue the details of aspect in this subject. If interested, you could try Huddleston, R., "Introduction to the Grammar of English" Cambridge, 1984, pp. 157-158 and elsewhere.
augmented transition network
An augmented grammar is what you get if you take grammar rules (usually from a context-free grammar) and add extra information to them, usually in the form of feature information. For example, the grammar rule s → np vp can be augmented by adding feature information to indicate that say the agr feature for the vp and the np must agree:
s(agr(?agr)) → np(agr(?agr)) vp(agr(?agr))
In Prolog, we would write something like:
s(P1, P3, Agr) :- np(P1, P2, Agr), vp(P2, P3, Agr).
Actually, this is too tough - the agr feature of a VP, in particular, is usually fairly ambiguous - for example the verb "love" (and so any VP of which it is the main verb) has agr=[1s,2s,1p,2p,3p], and we would want it to agree with the NP "we" which has agr=[1p]. This can be achieved by computing the intersection of the agr of the NP and the VP and setting the agr of the S to be this intersection, provided it is non-empty. If it is empty, then the S goal should not succeed.
s(P1, P3, SAgr) :-
        np(P1, P2, NPAgr),
  vp(P1, P2, VPAgr),
  intersection(NPAgr, VPAgr, SAgr),
  nonempty(SAgr).
where intersection computes the intersection of two lists (regarded as sets) and binds the third argument to this intersection, and nonempty succeeds if its argument is not the empty list.
Augmented grammar rules are also used to record sem and var features in computing logical forms, and to express the relationship between the sem and var of the left-hand side and the sem(s) andvar(s) of the right-hand side. For example, for the rule vp → v (i.e. an intransitive verb), the augmented rule with sem feature could be:
vp(sem(lambda(X, ?semv(?varv, X))), var(?varv)) →
    v(subcat(none), sem(?semv), var(?varv))
where subcat none indicates that this only works with an intransitive verb.
A parsing formalism for augmented context free grammars. Not covered in current version of COMP9414, but described in Allen.
symbol used in grammar rules for an auxiliary verb.
A "helper" verb, not the main verb. For example, in "He would have read the book", "would" and "have" are auxiliaries. A reasonably complete list of auxiliary verbs in English is:
Auxiliary
Example
do/does/did
I did read
have/has/had/having
He has read
be/am/are/is/was/were/been/being
He is reading
shall/will/should/would
He should read
can, could
She can read
may, might, must
She might read
Complex groupings of auxiliaries can occur, as in "The child may have been being taken to the movies".
Some auxiliaries (dobe, and have) can also occur as verbs in their own right.
Auxiliary verb is often abbreviated to AUX.
AUX is a lexical grammatical category.
B
This statistical rule relates the conditional probability Pr(A | B) to Pr(B | A) for two events A and B. The rule states that
Pr(A | B) = Pr(B | A) × Pr(A) / Pr(B)
BELIEVE is a modal operator in the language for representing logical forms. BELIEVE and other operators like it have some unexpected properties such as failure of substitutivity. For more details, read page 237 in Allen. Page 542 ff. provides yet more on belief in NLP (but this material is well beyond the needs of COMP9414).
A bigram is a pair of things, but usually a pair of lexical categories. Suppose that we are concerned with two lexical categories L1 and L2. The term bigram is used in statistical NLP in connection with the conditional probability that a word will belong to L2 given that the preceding word was in L1. This probability is written Pr(L2 | L1), or more fully Prob(w[i] in L2 | w[i-1] in L1). For example, in the phrase "The flies", given that The is tagged with ART, we would be concerned with the conditional probabilities Pr(N | ART) and Pr(V | ART) given that flies can be tagged with N and V.
bottom-up parser
The chart parser described in lectures is a bottom-up parser, and can parse sentences, using any context-free grammar, in cubic time: i.e., in time proportional to the cube of the number of words in the sentence.
A bound morpheme is a prefix or suffix, which cannot stand as a word in its own right, but which, can be attached to a free morpheme and modify the meaning of the free morpheme. For example, "happy" is a free morpheme, which becomes "unhappily" when the prefix "un-", and suffix "-ly", both bound morphemes, are attached.
C
Number words like one, two, four, twenty, fifty, hundred, million. Contrast ordinal.
The term case is used in two different (though related) senses in NLP and linguistics. Originally it referred to what is now termed syntactic case. Syntactic case essentially depends on the relationship between a noun (or noun phrase) and the verb that governs it. For example, in "Mary ate the pizza", "Mary" is in the nominative or subject case, and "the pizza" is in the accusative or object case. Other languages may have a wider range of cases. English has remnants of a couple more cases - genitive (relating to possession, as with the pronoun "his") and dative (only with ditransitive verbs - the indirect object of the verb is said to be in the dative case).
Notice that in "The pizza was eaten by Mary", "the pizza" becomes the syntactic subject, whereas it was the syntactic object in the equivalent sentence "Mary ate the pizza".
With semantic case, which is the primary sense in which we are concerned with the term case in COMP9414, the focus is on the meaning-relationship between the verb and the noun or noun phrase. Since this does not change between "Mary ate the pizza" and "The pizza was eaten by Mary", we want to use the same syntactic case for "the pizza" in both sentences. The term used for the semantic case of "the pizza" is theme. Similarly, the semantic case of "Mary" in both versions of the sentence is agent. Other cases frequently used include instrumentcoagentexperiencerat-locfrom-loc, andto-locat-possfrom-poss, and to-possat-valuefrom-value, and to-valueat-timefrom-time, and to-time, and beneficiary.
Semantic cases are also referred to as thematic roles.
Opposite of anaphor, and much rarer in actual language use. A cataphor is a phrase that is explained by text that comes after the phrase. Example: "Although he loved fishing, Paul went skating with his girlfriend." Here he is a cataphoric reference to Paul.
context-free grammar
A chart is a data structure used in parsing. It consists of a collection of active arcs (sometimes also called edges), together with a collection of constituents, (sometimes also called inactive arcs orinactive edges.
See also chart parsing.
chart parsing
A chart parser is a variety of parsing algorithm that maintains a table of well-formed substrings found so far in the sentence being parsed. While the chart techniques can be incorporated into a range of parsing algorithms, they were studied in lectures in the context of a particular bottom-up parsing algorithm.
That algorithm will now be summarized:
to parse a sentence S using a grammar G and lexicon L:
1.    Initially there are no constituents or active arcs
2.    Scan the next word w of the sentence, which lies between positions i and i+1 in the sentence.
3.    Look up the word w in the lexicon L. For each lexical category C to which w belongs, create a new constituent of type C, from i to i+1.
4.    Look up the grammar G. For each category C found in the step just performed, and each grammar rule R whose right-hand side begins with C, create a new active arc whose rule is R, with the dot in the rule immediately after the first category on the right-hand side, and from i to i+1.
5.    If any of the active arcs can have their dots advanced (this is only possible if the arc was created in a previous cycle of this algorithm) then advance them.
6.    If any active arcs are now completed (that is, the dot is now after the last category on the right-hand side of the active arc's rule), then convert that active-arc to a constituent (or inactive arc), and go to step 4.
7.    If there are any more words in the sentence, go to step 2.
to check if an active arc can have its dot advanced
1.    Let the active arc be ARCxC → C[1] ... C[j] . C[j+1] ... C[n] from m to n.
2.    If there is a constituent of type C[j] from n to p, then the dot can be advanced.
The resulting new active arc will be:
ARCyC → C[1] ... C[j+1] . C[j+2] ... C[n] from m to n
where y is a natural number that has not yet been used in an arc-name.
Example: For the active arc ARC2: NP → ART1 . ADJ N from 2 to 3 if there is a constituent ADJ2: ADJ → "green" from 3 to 4 (so that the to position, 3, and the type, ADJ, of the constituent of the active arc immediately after the dot, match the from position, 3, and the type, ADJ, of the constituent ADJ2) then the active arc ARC2 can be extended, i.e. have its dot advanced, creating a new active arc, say ARC3: NP → ART1 ADJ2 . N from 2 to 4.
The Chomsky hierarchy is an ordering of types of grammar according to generality. The classification in fact only depends on the type of grammar rule or production used. The grammar types described in COMP9414 included:
·        unrestricted grammars (rules of the form a → b with no restrictions on the strings a and b)
·        context sensitive grammars (rules of the form a → b with the restriction length(a) <= length(b))
·        context free grammars (rules of the form X → b where X is a single non-terminal symbol)
·        regular grammars (rules of the form X → a and X → aN where X and N are non-terminal symbols, and a is a terminal symbol.)
Named after the linguist Noam Chomsky.
symbol used in grammar rules for an common noun phrase.
You really need to know what an agent is before proceeding. A co-agent is someone who acts with the agent in a sentence. A sentence with a prepositional phrase introduced by the preposition with and whose object is an animate is likely to be a coagent. "Jane ate the pizza with her mother" - her mother is the coagent.
co-refer
Two items (an anaphor and its antecedent) that describe the same thing are said to co-refer.
common noun
A common noun is a noun that describes a type, for example woman, or philosophy rather than an individual, such as Amelia Earhart. Contrast proper noun.
A common noun phrase is a phrasal grammatical category of chiefly technical significance. Examples include "man" "big man" "man with the pizza", but not these same phrases with "the" or "a" in front - that is, "the man with the pizza", etc., are NPs, not a CNP. The need for the category CNP as a separate named object arises from the way articles like "the" act on a CNP. The word "the", regarded as a natural language quantifier, acts on the whole of the CNP that it precedes: it's "the[man with the pizza]", not "the[man] with the pizza". For this reason, it makes sense to make phrases like "man with the pizza" into syntactic objects in their own right, so that the semantic interpretation phase does not need to reorganize the structural description of the sentence in order to be able to interpret it.
A complement is a grammatical structure required in a sentence, typically to complete the meaning of a verb or adjective. For example, the verb "believe" can take a sentential complement, that is, be followed by a sentence, as in "I believe you are standing on my foot."
There is a wide variety of complement structures. Some are illustrated in the entry for subcategorization.
An example of an adjective with a complement is "thirsty for blood", as in "The football crowd was thirsty for blood after the home team was defeated." This is a PP-complement. Another would be "keen to get out of the stadium", a TO-INF complement, as in "The away-team supporters were keen to get out of the stadium."
Compositional semantics signifies a system of constructing logical forms for sentences or parts of sentences in such a way that the meanings of the components of the sentence (or phrase) are used to construct the meanings of the whole sentence (or whole phrase). For example, in "three brown dogs", the meaning of the phrase is constructed in an obvious way from the meanings of threebrownanddogs. By way of contrast, a phrase like "kick the bucket" (when read as meaning "die") does not have compositional semantics, as the meaning of the whole ("die") is unrelated to the meanings of the component words.
The semantic system described in COMP9414 assumes compositional semantics.
A concrete noun is a noun that describes a physical object, for example apple. Contrast abstract noun.
The conditional probability of event B given event A is the probability that B will occur given that we know that A has occurred. The example used in lecture notes was that of a horse Harry that won 20 races out of 100 starts, but of the 30 of these races that were run in the rain, Harry won 15. So while the probability that Harry would win a race (in general) would be estimated as 20/100, the conditional probability Pr(Win | Rain) would be estimated as 15/30 = 0.5. The formal definition of Pr(B | A) is Pr(B & A) / Pr(A). In the case of B = Win and A = Rain, Pr(B & A) is the probability that it will be raining and Harry will win (which on the data given above is 15/100), while Pr(A) is the probability that it will be raining, or 30/100. So again Pr(B | A) = 0.15/0.30 = 0.5
symbol used in grammar rules for a conjunction.
A conjunction is a word used to join two sentences together to make a larger sentence. Conjunctions include coordinate conjunctions, like "and", "or" and "but": "Jim is happy and Mary is proud", "India will win the test match or I'm a monkey's uncle".
There are also subordinate conjunctions, like "if" and "when", as in "I will play with you if you will lend me your marbles" and "I will lend you this book when you return the last one you borrowed".
Conjunctions may also be used to join nouns, adjectives, adverbs, verbs, phrases ...
Examples:
nouns
Boys and girls [come out to play].
adjectives
[The team colours are] black and yellow.
adverbs
[He was] well and truly [beaten].
verbs
[Mary] played and won [her match].
phrases
across the river and into the trees
[She] fell down and hit her head.
Conjunction is often abbreviated to CONJ.
CONJ is a lexical grammatical category.
A constituent, in parsing, is a lexical or phrasal category that has been found in a sentence being parsed, or alternatively one that is being sought for but has not yet been found.
See active arc. When an active arc is completed (when all its sub-constituents are found), the active arc becomes a constituent.
Constituents are used to create new active arcs - when there is a constituent X1 of type X, and a grammar rule whose right hand side starts with the grammar symbol X, then a new active arc of type X may be created, with the constituent X1 listed as a found constituent for the active arc (the only one, so far).
The components of a constituent, as recorded in the chart parsing algorithm described in lectures, are:
component
example
NP1: NP → ART1 ADJ1 N1 from 0 to 3
name
NP1
usually formed from the type + a number
type
NP
a phrasal or lexical category of the grammar
decomposition
ART1 ADJ1 N1
(ART1, ADJ1 and N1 would be the names of other constituents already found)
from
0
sentence position of the left end of this NP
to
3
sentence position of the right end of this NP
.
context-free
See context-free grammar and Chomsky hierarchy and contrast with context-sensitive grammar.
A context-free grammar is defined to be a 5-tuple (P, A, N, T, S) with components as follows:
P
A set of grammar rules or productions, that is, items of the form X → a, where X is a member of the set N, that is, a non-terminal symbol, and a is a string over the alphabet A.
An example would be the rule NP → ART ADJ N which signifies that a Noun Phrase can be an ARTicle followed by an ADJective followed by a Noun, or N → horse, which signifies that horse is a Noun.
NP, ART, ADJ, and N are all non-terminal symbols, and horse is a terminal symbol.
A
the alphabet of the grammar, equal to the disjoint union of N and T
N
the set of non-terminal symbols (i.e. grammatical or phrasal categories)
T
the set of terminal symbols (i.e. words of the language that the grammar defines)
S
distinguished non-terminal, normally interpreted as representing a full sentence (or program, in the case of a programming language grammar)
context-sensitive
See context-sensitive grammar and Chomsky hierarchy and contrast with context-free grammar.
A context-sensitive grammar is a grammar with context-sensitive rules. There are two equivalent formulations of the definition of a context-sensitive grammar rule (cf. Chomsky hierarchy):
·        rules of the form a → b where a and b are strings of alphabet symbols, with the restriction that length(a) <= length(b)
·        rules of the form l X r → l b r where lr, and b are (possibly empty) strings of alphabet symbols, and X is a non-terminal. l and r are referred to as the left and right context for X → b in the context-sensitive rule.
Context-sensitive grammars are more powerful than context-free grammars, but they are much harder to work with.
A corpus is a large body of natural language text used for accumulating statistics on natural language text. The plural is corpora. Corpora often include extra information such as a tag for each word indicating its part-of-speech, and perhaps the parse tree for each sentence.
See also statistical NLP.
noun of a type that can be counted. Thus horse is a count noun, but water is not. Contrast mass noun.
context-sensitive grammar
D
discourse entity
DE list
See history list.
indicative.
A kind of determiner, that is, an ingredient of noun phrases. This class of words includes "this", "that", "these", and "those". They are part of the reference system of English. That is, they is used to tell which of a number of possibilities for the interpretation of the rest of the noun phrase is in fact intended. Demonstratives are most useful in spoken language, and are often accompanied by a pointing gesture.
A derivation of a sentence of a grammar is, in effect, a proof that the sentence can be derived from the start symbol of the grammar using the grammar rules and a rewriting process. For example, given the grammar 1.S → NP VP, 2. NP → ART N, 3. VP → V, and lexical rules 4. ART → "the", 5. N → "cat", and 6. V → "miaowed", we can derive the sentence "the cat miaowed" as follows:
S
NP VP
rule 1
ART N VP
rule 2
the N VP
rule 4
the cat VP
rule 5
the cat V
rule 3
the cat miaowed
rule 6
One can then write S * "the cat miaowed": i.e. * is the symbol for the derivation relation. The symbol is referred to as direct derivation. A sentential form is any string that can be derived (in the sense defined above) from the start symbol S.
The sense in which the term grammar is primarily used in Natural Language Processing. A grammar is a formalism for describing the syntax of a language. Contrast prescriptive grammar.
Determiners are one of the ingredients of noun phrases. Along with cardinals and ordinals, they make up the set of specifiers, which assist in reference - that is, determining exactly which of several possible alternative objects in the world is referred to by a noun phrase. They come in several varieties - articlesdemonstrativespossessives, and quantifying determiners.
See also here.
A discourse entity (DE) is a something mentioned in a sentence that could act as a possible antecedent for an anaphoric reference, e.g. noun phrases, verb phrases and sentences. For example, with the sentence "Jack lost his wallet in his car", the DEs would include representations of "Jack" "his wallet", "his car", "lost his wallet in his car" and the whole sentence. The whole sentence could serve as the antecedent for "it" in a follow-up sentence like "He couldn't understand it" (while "Jack" would be the antecedent of "He").
Sometimes discourse entities have a more complex relation to the text. For example, in "Three boys each bought a pizza", clearly "Three boys" gives rise to a DE that is a set of three objects of type boy (B1: |B1| = 3 and B1 subset_of {x|Boy(x)}), but "a pizza", in this context, gives rise to a representation of a set P1 of three pizzas (whereas in the usual case "a pizza" would give rise to a DE representing a single pizza.)
P1 = {p | pizza(p) and exists(b) : Boy(b) and y = pizza_bought_by(b)}.
The function "pizza_bought_by" is the 
Skolem function referred to in lectures as "sk4".
See history list.
See context-free grammar.
A verb in English that can take two objects, like give, as in "He gave his mother a bunch of flowers". Here "his mother" is the indirect object and "a bunch of flowers" is the direct object. The same sentence can also be expressed as "He gave a bunch of flowers to his mother", with the direct and indirect objects in the opposite order, and the indirect object marked by the preposition "to". The preposition in such cases is usually "to", or "for" (as in "He bought his mother a bunch of flowers" = "He bought a bunch of flowers for his mother."
Bitransitive verbs can appear with just one or even no syntactic objects ("I gave two dollars", "I gave at the office") - their distinguishing characteristic is that they can have two objects, unlikeintransitive and transitive verbs.
Here is an incomplete list of ditransitive verbs in English.
E
Ellipsis refers to situations in which sentences are abbreviated by leaving out parts of them that are to be understood from the context. For example, if someone asks "What is your name?" and the reply is "John Smith" then this can be viewed as an elliptical form of the full sentence "My name is John Smith".
Ellipsis causes problems for NLP since it is necessary to infer the rest of the sentence from the context.
"ellipsis" is also the name of the symbol "..." used when something is omitted from a piece of text, as in "Parts of speech include nouns, verbs, adjectives, adverbs, determiners, ... - the list goes on and on."
"elliptical" is the adjectival form of "ellipsis".
An embedded sentence is a sentence that is contained inside another sentence. Some examples, with the embedded sentence in italics:
·        John believes that Mary likes pizza
·        If Mary likes pizza then she may come to our pizza party.
·        If Joan liked pizzathen she would come to our pizza party.
A noun phrase is said to evoke a discourse entity if the noun phrase refers to something related to a previously mentioned discourse entity (but not to an already-mentioned DE). For example, in "Jack lost his wallet in his car. Later he found it under the front seat.", the phrase "the front seat" evokes a discourse entity that has not actually been mentioned, but which is in a sense already present as part of the the DE created by the phrase "his car".
See also anaphor.
"exists" is a textual way of writing the existential quantifier, which is otherwise written as an back-to-front capital E. It corresponds fairly closely to the English word "some". Thus,
exists(X, likes(X, spinach))
would be read as "for some entity X, X likes spinach" or just "something likes spinach". This might be too broad a statement, as it could be satisfied, for example, by a snail X that liked spinach. It is common therefore to restrict the proposition to something like:
exists(X, is_person(X) and likes(X, spinach))
i.e. "Some person likes icecream." That is, we are restricting the type of X to persons. In some cases, it is more reasonable to abbreviate the type restriction as follows:
exists(X : person, likes(X, spinach))
See also forallSkolem functions and this riddle.
Experiencer is a case that usually fills a similar syntactic role to the agent but where the entity involved cannot be said to act. It is thus associated with the use of particular verbs like "remember", as in "Jim remembered his homework when he got to school". Here "Jim" is the experiencer of the "remember" situation.
F
In some situations, things that are equal cannot be substituted for each other in logical forms. Consider believe(sue1, happy1(jack1)) - jack1 may = john22 (i.e. the individual known as Jack may also be called John, e.g. by other people, but Sue believes John is happy may not be true, e.g. because Sue may not know that jack1 = john22. Thus john22 cannot be substituted for jack1, even though they are equal in some sense. See also Allen pp. 237-238.
Features can be thought of as slots in a lexicon entry or in structures used to build a logical form. They record syntactic or semantic information about the word or phrase. Examples include the agragreement feature, the sem feature that records the logical form of a word or phrase, and the var feature that records the variable used to name the referent of a phrase in a logical form.
first person
One of the choices for the person feature. A sentence is "in the first person" if the subject of the sentence is the speaker, or the speaker and some other individual(s), as in "I like pizza" and "We like pizza".
"I" and "we" are first-person pronouns, as are "me", "us". Other words with the first-person feature include "mine", "my", "myself", "ours", "our", and "ourselves".
This stands for First Order Predicate Calculus, a standard formulation of logic that has logical operators like andor, and not, predicate symbols and constants and functions, and terms built from these, together with the quantifiers "forall and exists. It is common for semantic representation systems in NLP to be expressed in languages that resemble or are based on FOPC, though sometimes they add significant features of more elaborate logical systems.
"forall" is a textual way of writing the universal quantifier, which is otherwise written as an upside-down capital A. It corresponds fairly closely to the English words "each" and "every". Thus,
forall(X, likes(X, icecream))
would be read as "for every entity X, X likes icecream" or just "everything likes icecream". This would be too broad a statement, as it would allege that, for example, rocks like icecream. It is usual therefore to restrict the proposition to something like:
forall(X, is_person(X) likes(X, icecream))
i.e. "Every person likes icecream." That is, we are restricting the type of X to persons. In some cases, it is more reasonable to abbreviate the type restriction as follows:
forall(X : person, likes(X, icecream))
See also exists.
A free morpheme is a basic or root form of a word, to which can be attached bound morphemes that modify the meaning. For example, "happy" is a free morpheme, which becomes "unhappy" when the prefix "un-", a bound morpheme, is attached.
See modal operators - tense and tense - future.
See tense.
G
One of the features of a noun phrase. In English, gender is only marked in third-person singular pronouns and associated words. The possible values of the gender feature are masculinefeminine, andneuter.
type
masculine
feminine
neuter
example
pronoun (nominative)
he
she
it
he hit the ball.
pronoun (accusative)
him
her
it
Frank hit him.
pronoun(possessive adjective)
his
her
its
Frank hit his arm.
pronoun(possessive)
his
hers
its
The ball is his.
pronoun(reflexive)
himself
herself
itself
Frank hurt himself.
An alternative grammatical formalism, in which, among other things, the non-terminal symbols of context-free grammars are replaced by sets of features, and the grammar rules show the relationships between these objects much as context-free rules show the relationships between grammar symbols in a CFG.
If there is a derivation of a sentence from a grammar, then the grammar is said to generate the sentence.
generalized phrase structure grammar
1.    A system for describing a language, the rules of a language.
2.    A formal system for describing the syntax of a language. In COMP9414, we are principally concerned with context-free grammars, sometimes augmented.
See also Chomsky hierarchy.
See Chomsky hierarchy and context-free grammars.
H
head-driven phrase structure grammar
A head feature is one for which the feature value on a parent category must be the same as the value on the head subconstituent. Each phrasal category has associated with it a head subconstituent - N, NAME or PRO or CNP for NPs, VP for S, V for VP, P (= PREP) for PP.
For example, var is a head feature for a range of phrasal categories, including S. This means that an S gets its var feature by copying the var feature of its head subconstituent, namely its VP.
Head features are discussed on page 94-96 of 
Allen.
See head feature.
head-driven phrase structure grammar
A Hidden Markov Model, for our purposes in COMP9414, is a set of states (lexical categories in our case) with directed edges (cf. directed graphs) labelled with transition probabilities that indicate the probability of moving to the state at the end of the directed edge, given that one is now in the state at the start of the edge. The states are also labelled with a function which indicates the probabilities of outputting different symbols if in that state (while in a state, one outputs a single symbol before moving to the next state). In our case, the symbol output from a state/lexical category is a word belonging to that lexical category. Here is an example:
Using this model, the probability of generating "dogcatchers catch old red fish" can be calculated as follows: first work out the probability of the lexical category sequence → N → V → ADJ → ADJ → N, which is 1 × 1 × 0.5 × 0.1 × 0.9 = 0.045, and then multiply this by the product of the output probabilities of the words, i.e. by 0.3 × 0.2 × 0.6 × 0.2 × 0.5 = 0.0036, for a final probability of 0.000162.
This is the list of discourse entities mentioned in recent sentences, ordered from most recent to least recent. Some versions also include the syntactic and semantic analyses of the previous sentence (or previous clause of a compound sentence). Some versions keep only the last few sentences worth of discourse entities, others keep all the discourse entities since the start of the discourse.
Hidden Markov model
I
ill-formed text
Much "naturally occurring" text contains some or many typographical errors or other errors. Industrial-strength parsers have to be able to deal with these, just as people can deal with typos and ungrammaticality. Such a parser is called a robust parser.
Here is a list of 300+ ill-formed sentences.
An imperative sentence is one that expresses a command, as opposed to a question or a statement. See also WH-questionY/N-questionindicativesubjunctive, and mood.
Two events A and B are said to be statistically independent if Pr(B | A) = Pr(B) - i.e. whether or not A is a fact has no effect on the probability that B will occur. Using Bayes' rule, this can be reformulated as Pr(A and B) = Pr(A) × Pr(B).
An indicative sentence is one that makes a statement, as opposed to a question or a command. See also WH-questionY/N-questionimperativesubjunctive, and mood.
A form of verbs. In English, this form is introduced by the word "to" - the infinitive particle. Examples: to goto rather preferto have ownedVerb phrases that have an infinitive verb construction in them are referred to as infinitive verb phrases. Constructions that are not infinitive are referred to as infinitive verb phrases. See vp:inf and np_vp:inf in the article on subcategorization.
See also here for the distinction between "infinite" and "infinitive".
An inflection is a type of bound morpheme, with a grammatical function. For example, the suffix "-ing" is an inflection which, when attached to a participle form of the verb. Other inflections in English form the other parts of verbs (such as the past tense and past participle forms), and the plural of nouns.
Some words inflect regularly, and some inflect irregularly, like the plural form "children" of "child", and the past tense and past participle forms "broke" and "broken" of the verb "break".
A semantic case, frequently appearing as a prepositional phrase introduced by the preposition "with". For example, in "Mary ate the pizza with her fingers", the prepositional phrase "with her fingers" indicates the instrument used in the action described by the sentence.
A kind of adverb, used to indicate the level or intensity of an adjective or another adverb. Examples include "very", "slightly", "rather", "somewhat" and "extremely". An example of use with an adjective: "Steve was somewhat tired". An example of use with an adverb: "Mary ran very quickly".
symbol used in grammar rules for an interjection.
Interjection is often abbreviated to INTERJ.
INTERJ is a lexical grammatical category. It usually appears as a single word utterance, indicating some strong emotion or reaction to something. Examples include: "Oh!", "Ouch!", "No!", "Hurray!" and a range of blasphemies and obscenities, starting with "Damn!".
A verb that can take no syntactic object, like laugh, as in "He laughed loudly", or "She laughed at his remark". Contrast ditransitive and transitive. See also subcategorization.
J
K
knowledge base
See knowledge representation language.
See knowledge representation language.
The term knowledge representation language (KRL) is used to refer to the language used by a particular system to encode the knowledge. The collection of knowledge used by the system is referred to as a knowledge base (KB).
knowledge representation language
L
The process of applying a lambda-expression to it's argument (in general, arguments, but the examples we've seen in COMP9414 have all been single argument lambda-expressions). A lambda expression is a formula of the form (lambda ?x P(?x)), in an Allen- like notation, or lambda(X, p(X)) in a Prolog-ish notation. P(?x) (or p(X)) signifies a formula involving the variable ?x (or X). The lambda-expression can be viewed as a function to be applied to an argument. The result of applying lambda(X, p(X)) to an argument a is p(a) - that is, the formula p(X) with all the instances of the variable X replaced by a. Using a more clearly NLP example, if we apply lambda(X, eat1(l1, X, pizza1)) to mary1 we get eat1(l1, mary1, pizza1)).
Prolog code for lambda-reduction is:
lambda_reduce(lambda(X, Predicate), Argument, Predicate) :-
     X = Argument.
Applying this to an actual example:
: lambda_reduce(
         lambda(X, eats(e1, X, the1(p1, pizza1))),
         name(m1, 'Mary'),
         Result) ?

X = name(m1, 'Mary')
Result = eats(e1, name(m1, 'Mary'), the1(p1, pizza1))
The language generated by a grammar is the set of all sentences that can be derived from the start symbol S of the grammar using the grammar rules. Less formally, it is the set of all sentences that "follow from" or are consistent with the grammar rules.
Parsing that processes the words of the sentence from left to right (i.e. from beginning to end), as opposed to right-to-left (or end-to-beginning) parsing. Logically it may not matter which direction parsing proceeds in, and the parser will work, eventually, in either direction. However, right-to-left parsing is likely to be less intuitive than left-to-right. If the sentence is damaged (e.g. by the presence of a mis-spelled word) it may help to use a parsing algorithm that incorporates both left-to-right and right-to-left strategies, to allow one to parse material to the right of the error.
A set of lexemes with the same stem, the same major part-of-speech, and the same word-sense. E.g. {cat, cats}.
Fancy name for a word, including any suffix or prefix. Contrast free and bound morphemes.
A grammatical formalism, not covered in COMP9414.
The probability that a particular lexical category (in context or out of context) will give rise to a particular word. For example, suppose, in a system with a very small lexicon, there might be only two nouns, say cat and dog. Given a corpus of sentences using this lexicon, one could count the number of times that the two words cat anddog occurred (as a noun), say ncats and ndogs. Then the lexical generation probability for cats as a noun would be ncats/(ncats+ndogs), written symbolically as Pr(cat | N).
A rule of a grammar (particularly a context-free grammar) of the form X → w, where w is a single word. In most lexicons, all the lexical insertion rules for a particular word are "collapsed" into a single lexical entry, like
"pig": N V ADJ.
"pig" is familiar as a N, but also occurs as a verb ("Jane pigged herself on pizza") and an adjective, in the phrase "pig iron", for example.
lexical symbol, lexical category
Synonymous with part-of-speech (POS). Also called a pre-terminal symbol. A kind of non-terminal symbol of a grammar - a non-terminal is a lexical symbol if it can appear in a lexical insertion rule. Examples are N, V, ADJ, PREP, INTERJ, ADV. Non-examples include NP, VP, PP and S (these are non-terminals). The term lexical category signifies the collection of all words that belong to a particular lexical symbol, for example, the collection of all Nouns or the collection of all ADJectives.
Contrast with phrasal category.
A lexicon is a collection of information about the words of a language about the lexical categories to which they belong. A lexicon is usually structured as a collection of lexical entries, like ("pig" N V ADJ). "pig" is familiar as a N, but also occurs as a verb ("Jane pigged herself on pizza") and an adjective, in the phrase "pig iron", for example. In practice, a lexical entry will include further information about the roles the word plays, such as feature information - for example, whether a verb is transitive, intransitive, ditransitive, etc., what form the verb takes (e.g. present participle, or past tense, etc.)
lexical functional grammar
The local discourse context, or just local context includes the syntactic and semantic analysis of the preceding sentence, together with a list of objects mentioned in the sentence that could be antecedents for later pronouns and definite noun phrases. Thus the local context is used for the reference stage of NLP. See also history list.
Logical forms are expressions in a special language, resembling FOPC (first order predicate calculus) and used to encode the meanings (out of context) of NLP sentences. The logical form language used in the book by James Allen includes:
terms
constants or expressions that describe objects: fido1, jack1
predicates
constants or expressions that describe relations or properties, like bites1. Each predicate has an associated number of arguments - bites1 is binary.
propositions
a predicate followed by the appropriate number of arguments: bites1(fido1, jack1), dog1(fido1) - Fido is a dog. More complex propositions can be constructed using logical operators not(loves1(sue1, jack1)), &(bites1(fido1, jack1), dog1(fido1)).
quantifiers
English has some precise quantifier-like words: some, all, each, every, the, aas well as vague ones: most, many, a few. The logical form language has quantifiers to encode the meanings of each quantifier-like word.
variables
are needed because of the quantifiers, and because while the words in a sentence in many cases give us the types of the objects, states and events being discussed, but it is not until a later stage of processing (reference) that we know to what instances of those types the words refer.
Variables in logical form language, unlike in FOPC, persist beyond the "scope" of the quantifier. E.g. A man came in. He went to the table. The first sentence introduces a new object of type man1. The He, in the second sentence refers to this object.
NL quantifiers are typically restricted in the range of objects that the variable ranges over. In Most dogs bark the variable in the most1 quantifier is restricted to dog1 objects: most1(d1 : dog1(d1), barks1(d1)).
predicate operators
A predicate operator takes a predicate as an argument and produces a new predicate. For example, we can take a predicate like cat1 (a unary predicate true of a single object of type cat1) and apply the predicate operator plur that converts singular predicates into the corresponding plural predicate plur(cat1), which is true of any set of cats with more than one member.
modal operators
Modal operators are used to represent certain verbs like believe, know, want, that express attitudes to other propositions, and for tense, and other purposes. Sue believes Jack is happy becomes
believe(sue1, happy1(jack1))
With tenses, we use the modal operators pres, past, fut, as in:
pres(sees1)(john1, fido1))
past(sees1)(john1, fido1)
fut(sees1)(john1, fido1)
The operators andornot (implies), and < (equivalent to). and is sometimes written as &. They are used to connect propositions to make larger propositions: e.g.
is-blue(sky1) and is-green(grass1) or can-fly(pig1)
M
A parsing technique, not covered in COMP9414.
noun that cannot be counted. Water is a mass noun, as is sand (if you want to count sand, you refer to grains). Contrast count noun.
A modal auxiliary is distinguished syntactically by the fact that it forces the main verb that follows it to take the infinitive form. For example, "can", "do", "will" are modal ("she can eat the pizza", "she does eat pizza", "she will eat pizza") but "be" and "have" are not ("she is eating pizza", "she has eaten pizza").
As far as we are concerned in COMP9414, modal operators are a feature of the logical form language used to represent certain epistemic verbs like "believe", "know" and other verbs like "want", and the tense operators, which convert an untensed logical form into a tensed one.
Thus if likes1(jack1, sue1) is a formula in the logical form language, then we can construct logical forms like know(mary1, likes1(jack1, sue1)) meaning that Mary knows that Jack likes Sue. Similarly for believe(mary1, likes1(jack1, sue1)) and want(marg1, own(marg1, (?obj : &(porsche1(?obj), fire_engine_red(?obj))))) - that's Marg wants to own a fire-engine red Porsche.
The tense operators include fut, pres, and past, representing future, present and past. For example, fut(likes1(jack1, sue1)) would represent Jack will like Sue.
See also failure of substitutivity.
See also articles on individual moods.
Mood
Description
Example
indicative
A plain statement
John eats the pizza
imperative
A command
Eat the pizza!
WH-question
A question with a phrasal answer,
often starting with a question-word
beginning with "wh"
Who is eating the pizza?
What is John eating?
What is John doing to the pizza?
Y/N-question
A question with yes/no answer
Did John eat the pizza?
subjunctive
An embedded sentence that is
counter-factual but must be expressed
to, e.g. explain a possible consequence.
If John were to eat more pizza
he would be sick.
A unit of language immediately below the word level. See free morpheme and bound morpheme, and morphology.
The study of the analysis of words into morphemes, and conversely of the synthesis of words from morphemes.
A rather vague natural language quantifier, corresponding to the word "most" in English. "Many", "a few", and "several" are other quantifier-type expressions that are similarly problematical in their interpretation.
N
N
symbol used in grammar rules for a noun.
n-gram
n-gram is an n-tuple of things, but usually of lexical categories. Suppose that we are concerned with n lexical categories L1L2, ..., Ln. The term n-gram is used in statistical NLP in connection with the conditional probability that a word will belong to Ln given that the preceding words were in L1L2, ..., Ln–1. This probability is written Pr(Ln | Ln–1...L2 L1), or more fully Prob(wi  Ln | wi–1  Ln–1  ...  win–1  L1). See also bigram and trigram, and p. 197 in Allen.
Word for a noun functioning as an adjective, as with the word "wood" in "wood fire". Longer expressions constructed from nominals are possible. It can be difficult to infer the meaning of the nominal compound (like "wood fire") from the meanings of the individual words.- for instance, while "wood fire" presumably means a fire made with wood, "brain damage" means damage to a brain, rather than damage made with a brain. Another example: "noun modifier" could on the face of it either mean a noun that acts as a modifier (i.e. a nominal as just defined) or a modifier of a noun.
In fact, noun modifier is a synonym for nominal.
non-terminal
A non-terminal symbol of a grammar is a symbol that represents a lexical or phrasal category in a language. Examples in English would include N, V, ADJ, ADV (lexical categories) and NP, VP, ADJP, ADVP and S (phrasal categories). See also terminal symbol and context-free grammar.
A noun is a word describing a (real or abstract) object. See also mass nouncount nouncommon nounabstract nounproper noun, and concrete noun.
Constrast 
verbadjectiveadverbprepositionconjunction, and interjection.
Noun is often abbreviated to N.
N is a lexical grammatical category.
nominal.
Noun Phrase is a phrasal grammatical category. Noun phrase is usually abbreviated to NP. NPs have a noun as their head, together with (optionally) some of the following:
adjectivesnominal modifiers (i.e. other nouns, acting as though they were adjectives), certain kinds of adverbs that modify the adjectives, as with "very" in "very bright lights", participles functioning as adjectives (as in "hired man" and "firing squad"), cardinalsordinalsdeterminers, and quantifiers. There are constraints on the way these ingredients can be put together. Here are some examples of noun phrases: Ships (as in Ships are expensive to buildthree ships (cardinal + noun), all three ships (quantifier + cardinal + noun), the ships (determiner + noun), enemy ships (nominal + noun), large, grey ships (adjective + adjective + noun), the first three ships (determiner + ordinal + cardinal + noun), my ships (possessive + noun).
symbol used in grammar rules for a noun phrase.
The term grammatical number refers to whether the concept described consists of a single unit (singular number), like "this pen", or to more than one unit (plural number), like "these pens", or "three pens".
In some languages other than English, there may be different distinctions drawn - some languages distinguish between one, two, and many, rather than just one and many as in English.
Nouns in English are mostly marked for number - see plural.
Pronouns and certain determiners may also be marked for number. For example, "this" is singular, but "these" is plural, and "he" is singular, while "they" is plural.
O
The object of a sentence is the noun phrase that appears after the verb in a declarative English sentence. For example, in The cat ate the pizzathe pizza is the object. In The pizza was eaten by the cat, there is no object. Object noun phrases can be arbitrarily long and complex. For example, in He ate a pizza with lots of pepperoni, pineapple, capsicum, mushrooms, anchovies, olives, and vegemite, the object is a pizza with lots of pepperoni, pineapple, capsicum, mushrooms, anchovies, olives, and vegemite. [No, I do not have shares in a pizza company.]
See also ditransitivetransitive, and intransitive.
A form of number word that indicates rank rather than value. Thus "one, two, three, four, five, six, seven" are cardinal numbers, whose corresponding ordinal numbers are "first, second, third, fourth, fifth, sixth, seventh".
lexical generation probability, but used in the context of a Hidden Markov Model.
P
A parse tree is a way of representing the output of a parser, particularly with a context-free grammar. Each phrasal constituent found during parsing becomes a branch node of the parse tree. The words of the sentence become the leaves of the parse tree. As there can be more than one parse for a single sentence, so there can be more than one parse. Example, for the sentence "He ate the pizza", with the respect to the grammar with rules
S → NP VP, NP → PRO, NP → ART N, VP → V NP,
and lexicon
("ate" V) ("he" PRO) ("pizza" N) ("the" ART)
the parse tree is
http://www.cse.unsw.edu.au/~billw/dictionaries/pix/parsetree.gif
Note that this graphical representation of the parse tree is unsuitable for further computer processing, so the parse tree is normally represented in some other way internally in NLP systems. For example, in a Prolog-like notation, the tree above could be represented as:
s(np(pro("He")),
  vp(v("ate"),
     np(art("the"), n("pizza")))).
A parser is an algorithm (or a program that implements that algorithm) that takes a grammar, a lexicon, and a string of words, decides whether the string of words can be derived from the grammar and lexicon (i.e. is a sentence with respect to the grammar and lexicon.
If so, it produces as output some kind of representation of the way (or ways) in which the sentence can be derived from the grammar and lexicon. A common way of doing this is to output (a) parse tree(s).
The process of going through a corpus of sentences and labelling each word in each sentence with its part of speech. A tagged corpus is a corpus that has been so labelled. A tag is one of the labels. Large-scale corpora might use tag-sets with around 35-40 different tags (for English). See Allen Fig. 7.3 p. 196 for an example of a tag-set.
part of speech, POS
Synonymous with lexical category: the role, like noun, verb, adjective, adverb, pronoun, preposition, etc. that a word is either playing in a particular sentence (e.g. like is acting as a preposition in I like pizza) or that it can play in some sentence: e.g. like can act as a verb, noun, adjective, adverb, preposition, and conjunction. (It can also act as a "filled pause", as do umer, and uh - e.g. >He's, like, a pizza chef in this, like, fast food joint downtown.
Participles come in two varieties (in English) - present participles and past participles. (Often abbreviated to PRESPART and PASTPART or something equivalent, like ING and EN). Present participles are variants on verbs; they end in "-ing", as in "setting", "being", "eating", "hiring". Past participles end in "-ed", "-en", or a few other possibilities, as in "set" (past participle the same as theinfinitive form of the verb), "been", "eaten", "hired", "flown" (from "fly").
Participles are used in constructing tensed forms of verbs, as in "he is eating", "you are hired", and also as though they were adjectives in phrases like "a flying horse" and "a hired man".
In some cases, present participles have become accepted as nouns representing an instance of the action that the underlying verb describes, as with "meeting".
PRESPART and PASTPART are lexical grammatical categories.
A particle is usually a word that "normally" functions as a preposition, but can also modify the sense of a noun. Not all prepositions can be particles. An example of a word functioning as a particle is "up", in "The mugger beat up his victim". Here "beat up" functions as a unit that determines the action being described. A telltale sign of a particle is that it can often be separated from the verb, as in "The mugger beat the victim up". Sometimes it can be non-trivial for an NLP system to tell whether a word is being used as a particle or as a preposition. For example, in "Eat up your dinner", "up" is definitely a particle, but in "He eats up the street", "up" is a preposition, but it takes real-world knowledge to be sure of this, as the alternative possibility, that the person being referred to is eating the street, is syntactically reasonable (though not pragmatically reasonable, unless perhaps "he" refers to a bug-eyed asphalt-eating alien.)
See also phrasal verb.
Both active and passive voice are described in the article on active voice.
See modal operators - tense and tense.
An abbreviation for Past Participle, particularly in grammar rules.
See tense.
object.
Person is a feature of English noun phrases that is principally of significance with pronouns and related forms. The possible values of person are first person signifying the speaker (possibly with his/her companions), second person signifying the person addressed (possibly with his/her companions), and third person signifying anybody else, i.e. not speaker or person addressed or companion of either.
Below is a table of the forms of pronouns, etc. in English, classified by person and syntactic case:
case
first person
second person
third person
nominative
I/we
thou/you/ye
he/she/it/they
accusative
me/us
thee/you/ye
him/her/it/them
possessive adjective
mine/ours
thine/yours
his/hers/its/theirs
possessive
my/our
thy/your
his/her/its/their
reflexive
myself/ourselves
thyself/yourself
yourselves
himself/herself
itself/themselves
A low-level classification of linguistic sounds - phones are the acoustic patterns that are significant and distinguishable in some human language. Particular languages may group together several phones and regard them as equivalent. For example, in English, the L-sounds at the beginning and end of the word "loyal", termed "light L" and "dark L" by linguists, are distinct in some languages. Light L and dark L are termed allophones of L in English. Similarly, the L and R sounds in English are regarded as equivalent in some other languages.
Start by reading about phones. Phonemes are the groups of phones (i.e. allophones) regarded as linguistically equivalent by speakers of a particular language. Thus native English speakers hear light L and dark L as the same sound, namely the phoneme L, unless trained to do otherwise. One or more phonemes make up a morpheme.
The study of acoustic signals from a linguistic viewpoint, that is, how acoustic signals are classified into phones.
The study of phones, and how they are grouped together in particular human languages to form phonemes.
A kind of non-terminal symbol of a grammar - a non-terminal determines a phrasal category if it cannot appear in a lexical insertion rule, that is, a rule of the form X → w, where w is a word. Examples include NP, VP, PP, ADJP, ADVP and S . Non-examples include N, V, ADJ, PREP, INTERJ, ADV (see lexical category).
Contrast with lexical category.
A phrasal verb is one whose meaning is completed by the use of a particle. Different particles can give rise to different meanings. The verb "take" participates in a number of phrasal verb constructs - for example:
take in
deceive
He was taken in by the swindler
take in
help, esp. with housing
The homeless refugees were taken in
by the Sisters of Mercy
take up
accept
They took up the offer of help.
take off
remove
She took off her hat.
A unit of language larger than a word but smaller than a sentence. Examples include noun phrasesverb phrasesadjectival phrases, and adverbial phrases.
See also phrasal categories.
See tense.
predicate operator that handle plurals. plur transforms a predicate like book1 into a predicate plur(book1). If book1 is true of any book, then plur(book1) is true of any set of books with more than one member. Thus "the books fell" could be represented by the(X : PLUR(BOOK1)(X), past(fall1(x))).
noun in a form that signifies more than one of whatever the base form of the noun refers to. For example, the plural of "pizza" is "pizzas". While most plurals in English are formed by adding "s" or "es", or occasionally doubling the last letter and adding "es", there are a number of exceptions. Some derive from words borrowed from other languages, like "criterion"/"criteria", "minimum"/"minima", "cherub"/"cherubim", and "vertex"/"vertices". Others derive from Old English words that formed plurals in nonstandard ways, like "man"/"men", "mouse"/"mice", and "child"/"children".
This is a name applied to two English pronoun forms that indicate possession. There are possessive adjectives and possessive pronouns. They are tabulated below:
person & number
possessive
adjective
possessive
pronoun
first person singular
my
mine
first person plural
our
ours
second person singular
(archaic)
thy
thine
second person (modern)
your
yours
third person singular
his/her/its
his/hers/its
third person plural
their
theirs
Abbreviation for prepositional phrase.
The problem of deciding what component of a sentence should be modified by a prepositional phrase appearing in the sentence. In the classic example "The boy saw the man on the hill with the telescope", "with the telescope" could modify "hill" (so the man is on the "hill with the telescope") or it could modify "saw" (so the boy "saw with the telescope"). The first attachment corresponds to a grammar rule like np → np pp, while the second corresponds to a grammar rule like vp → v np pp. Both rules should be present to capture both readings, but this inevitably leads to a multiplicity of parses. The problem of choosing between the parses is normally deferred to the semantic and pragmatic phases of processing.
Pragmatics can be described as the study of meaning in context, to be contrasted with semantics, which covers meaning out of context. For example, if someone says "the door is open", there is a single logical form for this. However, there is much to be done beyond producing the logical form, in really understanding the sentence. To begin with, it is necessary to know which door "the door" refers to. Beyond that, we need to know what the intention of the speaker (say, or the writer) is in making this utterance. It could be a pure statement of fact, it could be an explanation of how the cat got in, or it could be a tacit request to the person addressed to close the door.
It is also possible for a sentence to be well-formed at the lexical, syntactic, and semantic levels, but ill-formed at the pragmatic level because it is inappropriate or inexplicable in context. For example, "Try to hit the person next to you as hard as you can" would be pragmatically ill-formed in almost every conceivable situation in a lecture on natural language processing, except in quotes as an example like this. (It might however, be quite appropriate in some settings at a martial arts lesson.)
pre-terminal
See lexical category.
This term is used in (at least) three senses:
1.    In NLP, equivalent to verb phrase, used in the analysis of a sentence into a subject and predicate.
2.    In logic, a predicate is a logical formula involving predicate symbols, variables, terms, quantifiers and logical connectives.
3.    In Prolog - see here.
Predicate operators form a part of the logical form language. They transform one predicate into another predicate. For example, the predicate operator PLUR transforms a singular predicate like (DOG x) which is true if x is a dog, into a plural equivalent (PLUR DOG) such that ((PLUR DOG) x) is true if x is a set of more than one dog.
A predictive parser is a parsing algorithm that operates top-down, starting with the start symbol, and predicting or guessing which grammar rule to used to rewrite the current sentential form Alternative grammar rules are stacked so that they can be explored (using backtracking) if the current sequences of guesses turns out to be wrong.
On general context-free grammars, a vanilla predictive parser takes exponential parsing time (i.e. it can be very very slow). See also bottom-up parsers.
symbol used in grammar rules for a preposition.
A preposition is a part of speech that is used to indicate the role that the noun phrase that follows it plays in the sentence. For example, the preposition "with" often signals that NP that follows is theinstrument of the action described in the sentence. For example "She ate the pizza with her fingers". It can also indicate the co-agent, especially if the NP describes something that is animate. For example, "She ate the pizza with her friends". As well as signalling a relationship between a noun phrase and the main verb of the sentence, a preposition can indicate a relationship between a noun phrase and another noun phrase. This is particularly the case with "of", as in "The length of the ruler is 40 cm".
Prepositions are the head items of prepositional phrases.
Preposition is often abbreviated to PREP.
PREP is a lexical grammatical category.
Prepositional phrase is a phrasal grammatical category. Prepositional phrase is usually abbreviated to PP. PPs serve to modify a noun phrase or a verb or verb phrase. For example, in "The house on the hill is green", the prepositional phrase "on the hill" modifies "the house", while in "He shot the deer with a rifle", "with a rifle" is a prepositional phrase that modifies "shot ..." (except in the extremely rare case that the deer has a rifle :→).
Prepositional phrases normally consist of a preposition followed by a noun phrase.
The main exception is the possessive clitic 's, as in "my uncle's car", where the 's functions as a preposition ("my uncle's car" = the car of my uncle") but follows the noun phrase that the preposition normally precedes.
Occasionally other structures are seen, such as "these errors notwithstanding", an allowable variant on "notwithstanding these errors" ("notwithstanding" is a preposition).
PP is a phrasal grammatical category.
See also PP attachment.
See modal operators - tense and tense - present.
When we think of grammar, we often think of the rules of good grammar that we may have been taught when younger. In English, these may have included things like "never split an infinitive" i.e. do not put an adverb between the word "to" and the verb, as in "I want to really enjoy this subject." (The origin of this rule is said to be the fact that infinitive-splitting is grammatically impossible in Latin: some early grammarians sought to transfer the rule to English for some twisted reason.) Grammar in this sense is called "prescriptive grammar" and has nothing to do with descriptive grammar", which is what we are concerned with in NLP.
An abbreviation for Present Participle, particularly in grammar rules.
= grammar rule - see Chomsky hierarchy and context free grammar.
See aspect.
A proper noun is a noun that names an individual, such as Amelia Earhart, rather than a type, for example woman, or philosophy. Proper nouns are often compound. Amelia and Earhart would each rank as proper nouns in their own right.
Contrast 
common noun.
A proposition is a statement of which it is possible to decide whether it is true or false. There are atomic propositions, like "Mary likes pizza", and compound ones involving
Q
Umbrella term for adjectives and nominals or noun modifiers.
1.    (in semantic interpretation) - objects in the logical form language that correspond to the various words and groups of words that act in language in the way that quantifiers do in formal logic systems. Obvious examples of such words in English include all each every some most many several. Less obvious examples include the, which is similar in effect to "there exists a unique" - thus when we refer to, say, "the green box", we are indicating that there exists, in the current discourse context, a unique green box that is being referred to. This phrase would be represented in the logical form language by an expression like the(b1 : &(box1(b1), green1(b1))).
NL quantifiers are typically restricted in the range of objects that the variable ranges over. In Most dogs bark the variable in the MOST1 quantifier is restricted to dog1 objects: most1(d1 : dog1(d1), barks1(d1))
2.    In logic, this term refers to the logical operators "forall" and "exists".
3.    This term is also sometimes used for quantifying determiners.
A quantifying determiner, in English, is one of a fairly small class of words like "all", "both", "some", "most", "few", "more" that behave in a similar way to quantifiers in logic.
See also determiners.
R
Reference, in NLP, is the problem/methods of deciding to what real-world objects various natural language expressions refer. Described in Chapter 14 of Allen. See also anaphorcataphorco-refer,discourse entityhistory list, and local discourse context
A type of ambiguity where what is uncertain is what is being referred to by a particular natural language expression. For example, in John hit Paul. He was angry with him. it is not entirely clear to whom the pronouns "he" and "him" refer - it could be that the second sentence explains the first, in which case the "he" is John, or it could be that the second sentence gives a consequence of the first, in which case the "he" is Paul.
1.    Regular grammar = right-linear grammar - see Chomsky hierarchy.
2.    A regular verb, noun, etc. is one that inflects in a regular way. "save" is a regular verb and "house" is a regular noun. On the other hand, "break" (with past tense "broke" and past participle "broken") is an irregular verb, and "mouse" (with plural form "mice") is an irregular noun.
Relative clauses involve sentence forms used as modifiers in noun phrases. These clauses are often introduced by relative pronouns such as whowhich and that. For example, "The man who gave Barry the money". See Allen p.34
The rewriting process is what is used in derivation to get from one sentential form to the next.
The process is as follows with context free grammars: pick a non-terminal X in the current string (or sentential form) and a grammar rule whose left-hand side is that non-terminal X. Replace X in the current string by the right-hand side of the grammar rule, to obtain a new current string. This definition also works for regular grammars. A single step in the rewriting process is called a direct derivation. For an example, see derivation.
The process is similar with context-sensitive grammarsand unrestricted grammars, except that instead of picking a non-terminal X in the current string, we find a substring of the current string that matches the left-hand side of some context-sensitive or unrestricted grammar rule, and replace it with the right-hand side of that grammar rule.
right-linear grammar
See Chomsky hierarchy.
right-to-left parsing
See the article on left-to-right parsing.
A parser or other NLP algorithm is robust if it can recover from or otherwise handle ill-formed or otherwise deviant natural language expressions. For example, we would not want a syntactic parser to give up as soon as it encounters a word that is not in its lexicon - preferably it should try to infer the lexical category and continue parsing (as humans do when they first encounter a new word).
S
S
symbol used in grammar rules for a sentence.
One of the choices for the person feature. A sentence is "in the second person" if the subject of the sentence is the person(s) addressed as in "you like pizza" and the archaic "Ye like pizza" and "Thou likest pizza".
"you", "thou" and "ye" are second-person pronouns, as is "thee". Other words with the second-person feature include "yours", "thine", "your", "thy", "yourself", "yourselves", and "thyself".
A variant on a context free grammar, in which the non-terminals correspond to semantic rather than syntactic concepts. A system of this type encodes semantic knowledge about the types of sentences likely to appear in the input in the grammar rules. For example, a system to handle text about ships and ports might encode in its grammar rules information about the subject of a sentence about "docking" must be a ship:
docksentence → shipnp dockvp
The problem with semantic grammar is that for coverage of a significant portion of a language, a huge number of rules would be required, and a massive analysis of the meanings that those rules could encode would be included in their development.
Semantics is the study of meaning (as opposed to form/syntax, for example) in language. Normally semantics is restricted to "meaning out of context" - that is, too meaning so far as it can be determined without taking context into account. See Allen page 10 for the different levels of language analysis, and chapters 8-12 for detailed treatment (we covered parts of chapters 8 and 9 only in COMP9414).
See also logical form.
Sentence is the level of language above phrase. Above sentence are pragmatic-level structures that interconnect sentences into paragraphs, etc., using concepts such as cohesion. They are beyond the scope of this subject.
Sentences are sometimes classified into simple, compound, complex, and compound-complex, according to the absence or presence of conjunctions, relative clauses (phrases with verb groups that are introduced by "which", "that", etc.) or both. They may also be analysed into subject and predicate.
Sentence is often abbreviated to S (see also start symbol).
S is a phrasal grammatical category.
See article on derivation.
shift-reduce parser
A type of parsing algorithm, not discussed in COMP9414.
simple future
See tense.
See tense.
See tense.
noun in a form that signifies one of whatever type of object the noun refers to. For example, dog is singular, whereas dogs is plural.
Universally quantified variables can be handled (and are handled in Prolog) simply by assuming that any variable is universally quantified. Existentially quantified variables must thus be removed in some way. This is handled by a technique called skolemization.
In its simplest form, skolemization replaces the variable with a new constant, called a Skolem constant. For example, the formula:
exists(y, forall(x, loves(x, y))
would be encoded as an expression such as
loves(X, sk1),
where sk1 is a new constant that stands for the object that is asserted to exist, i.e. the person (or whatever) that is loved by every X.
Quantifier scoping dependencies are shown using new functions called Skolem functions. For example, the formula:
forall(y, exists(x, loves(x,y))
would be encoded as an expression such as
loves(sk2(Y), Y),
where sk2 is a new function that produces a potentially new object for each value of Y.
See Skolem functions
A term from the pragmatic end of language use: when we say (or write) something, each utterance has a purpose and, if effective, accomplishes an act of some type. Examples of different types of speech act include:
ask request inform deny congratulate confirm promise
. Not covered in COMP9414, except to point out that it is ultimately vital to understanding language. For some discussion, see Allen p. 542 ff., and compare surface speech act.
The start symbol of a grammar is a another name for the "distinguished non-terminal" of the grammar. Details at context-free grammar. The start symbol of most NLP grammars is S (for sentence).
A group of techniques relying on mathematical statistics and used in NLP to, for example, find the most likely lexical categories or parses for a sentence. Often the techniques are based on frequency information collected by analysing very large corpora of sentences in a single language, to find out, for example, how many times a particular word (dog, perhaps) has been used with a particular part of speech. The sentences in the corpus have usually been tagged in some way (sometimes manually) so that the information about the part of speech each time each word is used, is known. Sometimes the sentences are hand-parsed, as well (a treebank).
See chapter 7 in Allen, and also Bayes' rulebigramtrigramn-gramconditional probabilitystatistical independenceHidden Markov Model, and Viterbi algorithm
bound morpheme.
string
A "string over an alphabet A" means a sequence of symbols taken from the alphabet A, where by alphabet we mean just a set of symbols that we are using in a similar way to the way that we use, say, the Latin alphabet to make up words. Thus a word (in English) is a string over the alphabet {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z} (plus arguably a few other items like hyphen and apostrophe). A construct like "ART ADJ NOUN" is a string over an alphabet that includes the symbols ART, ADJ, and NOUN. Similarly "NP of NP" is a string (over some alphabet that includes the symbols "NP" and "of") that has two non-terminal symbols and one terminal symbol (namely "of").
A form of ambiguity in which what is in doubt is the syntactic structure of the sentence or fragment of language in question. An example of pure structural ambiguity is "old men and women" which is ambiguous in that it is not clear whether the adjective old applies to the women or just to the men. Frequently structural ambiguity occurs in conjunction with word-sense ambiguity, as in "the red eyes water" which could signify "the communist looks at water":
s(np(art(the), n(red)), vp(v(eyes), np(n(water))))
or alternatively "the reddened eyes drip tear fluid"
s(np(art(the), adj(red), n(eyes)), vp(v(water)))
See also referential ambiguity.
The name for the feature used to record the subcategorization of a verb or adjective.
Verbs and some adjectives admit complement structures. They are said to subcategorize the structures that they can be followed by. For example, some verbs can be followed by two noun phrases (likeJack gave Mary food), some by at most one (like Jack kicked the dog), and some by none (like Jack laughed). We would record this by saying that the verbs have subcat np_np, or subcat np, or subcatnone. Further examples are shown below (taken from Figures 4.2 and 4.4 in Allen:
Value
Example
Verb
Example of Use
none
laugh
Jack laughed
np
find
Jack found a key
np_np
give
Jack gave Sue the paper
vp:inf
want
Jack wants to fly
np_vp:inf
tell
Jack told the man to go
vp:ing
keep
Jack keeps hoping for the best
np_vp:ing
catch
Jack caught Sam looking at his desk
np_vp:base
watch
Jack watched Sam look at his desk
np_pp:to
give
Jack gave the key to the man
pp:loc
be
Jack is at the store
np_pp:loc
put
Jack put the box in the corner
pp_mot
go
Jack went to the store
np_pp:mot
take
Jack took the hat to the party
adjp
be, seem
Jack is happy
np_adjp
keep
Jack kept the dinner hot
s:that
believe
Jack believed that sharks wear wedding rings
s:for
hope
Jack hoped for Mary to eat the pizza.
Notice that several verbs (give be keep) among the examples have more than one subcat. This is not unusual. As an example of subcategorization by adjectives, notice that "Freddo was happy to be a frog" is OK, so happy subcategorizes vp:inf, but "Freddo was green to ..." cannot be completed in any way, so green does not subcategorize vp:inf.
The subject of a sentence is the noun phrase that appears before the verb in a declarative English sentence. For example, in The cat sat on the matThe cat is the subject. In The mat was sat onThe matis the subject. Subject noun phrases can be arbitrarily long and complex, and may not look like "typical" noun phrases. For example, in Surfing the net caused him to fail his course, the subject issurfing the net. [Please excuse the subliminal message.]
A subjunctive sentence is an embedded sentence that expresses a proposition that is counterfactual (not true), such as "If John were to eat more pizza, he would make himself sick", as opposed to a Y/N-question, a WH-question, a command, or a statement. As can be seen from the example, the subjunctive form of a verb resembles the past form in modern English, even though it frequently refers to a possible future action or state (as in our example). The past forms of some modals (e.g. "should", "would" which were originally the past forms of verbs "shall" and "will") are used for little else in modern English.
See also Y/N-questionWH-questionimperativeindicative, and mood.
See failure of substitutivity.
This term refers to analysing the type of sentence into standard syntactic categories - assertion, command, and the two kinds of question: yes/no-questions and wh-questions. See Allen p. 250.
To be contrasted with (pragmatic) speech acts.
Not a part of COMP9414 Artificial Intelligence.
A figure of speech in which a single word appears to be in the same relationship to two others, but must be understood in a different sense with each of the two other words (the "pair"). See alsozeugmaI'm leaving for greener pastures and ten days.
One of a couple of dozen little-used terms for figures of speech.
Syntax means the rules of language that primarily concern the form of phrases and sentences, as distinct from the substructure of words (see morphology) or the meaning of phrases and sentences in or out of context (see pragmatics and semantics).
An alternative approach to linguistic grammar, driven from the functional (rather than the structural) end of language. Not covered in COMP9414. See p.95 (Box 4.3) in Allen.
T
1.    See part of speech tagging.
2.    As TAG, it is also an acronym for Tree-Adjoining Grammar. TAGs were not mentioned elsewhere in COMP9414, but are mentioned here just in case you run into them in a book somewhere.
See part of speech tagging.
The tense of a verb or a sentence relates to the time when the action or state described by the verb or sentence occurred/occurs/will occur. The main contrast with tense is between past, present and future. Most verbs in English indicate the past/present distinction by inflection (a few, like "set", are invariant in this respect). Thus "break" and "breaks" are the present tense forms of the verb "break", and "broke" is the past tense form. The future tense is constructed, in English, by using the auxiliaries "shall" and "will" with the verb - "it will break", for example. Here is a list of the six major tense forms:
Form
Example
Meaning
present
He drives a Ford
The "drive" action occurs in the present, though
it suggests that this is habitual - it may have
occurred in the past and may continue in the future.
simple past
He drove a Ford
The action occurred at some time in the past.
simple future
He will drive a Ford
The action will occur at some time in the future.
past perfect
or pluperfect
He had driven a Ford
At some time in the past, it was true to say
"He drove a Ford".
future perfect
He will have driven a Ford
At some point in the future, it will be true to say
"He drove a Ford".
See also participle.
Contrast tensemood and aspect.
1.    Used in the logical form language to describe constants and expressions that describe objects.
2.    Used in FOPC to refer to a class of objects that may be defined, recursively, as follows:
o   a constant is a term;
o   a variable is a term;
o   a function f applied to a suitable number of terms t1, t2, ..., tn is a term: f(t1, t2, ..., tn).
3.    Used to refer to certain types of Prolog language constructs
A terminal symbol of a grammar is a symbol that can appear in a sentence of the grammar. In effect, a terminal symbol is a word of the language described by the grammar.
See also non-terminal symbol and context-free grammar.
The word the gives rise to an important NL quantifier written as THE in Allen or as the or the1 in the Prolog notation used in COMP9414 assignments. Thus the dog barks has logical form:
(THE d1 : (DOG1 d1) (BARKS1 d1))
(Allen),
or
the(d1 : dog1(d1), barks1(d1))
(COMP9414)

Here d1 is the variable over which THE quantifies.
This is also written as (BARKS1 <THE d1 DOG1>) in Allen's notation, and as 
barks1(the(d1, dog1)) in the Prolog notation.
semantic case.
Term used for the noun phrase that follows the verb in an active voiceindicative sentence in English. Also referred to as the object or sometimes the victim.
One of the choices for the person feature. A sentence is "in the third person" if the subject of the sentence is neither the speaker nor the person(s) addressed, as in "she likes pizza" and "they like pizza".
"she", "he", "it", and "they" are third-person pronouns, as are "her", "him", and "them". Other words with the third-person feature include "hers", "his", "theirs", "their", "herself", "himself", "itself" and "themselves".
top-down parser
A parser that starts by hypothesizing an S (see start symbol) and proceeds to refine its hypothesis by expanding S using a grammar rule which has S as its left-hand side (see rewriting process, and successively refining the non-terminals so produced, and so on until there are no non-terminals left (only terminals).
See also predictive parser.
A verb that can take a single syntactic object, like eat, as in "He ate the pizza". Sometimes transitive verbs appear without their object, as in "He ate slowly" - the distinguishing characteristic of transitive verbs is that they can take an object (unlike intransitive verbs, and they cannot take two objects, as ditransitive verbs can. See also subcategorization.
A trigram is a triple of things, but usually a triple of lexical categories. Suppose that we are concerned with three lexical categories L1L2 and L3. The term trigram is used in statistical NLP in connection with the conditional probability that a word will belong to L3 given that the preceding words were in L1 and L2. This probability is written Pr(L3 | L2 L1), or more fully Prob(wi  L3 | wi–1 L2 & wi–2  L1). For example, in the phrase "The green flies", given that The is tagged with ART, and green with ADJ, we would be concerned with the conditional probabilities Pr(N | ADJ ART) and Pr(V | ADJ ART) given that flies can be tagged with N and V. See also bigram and n-gram.
U
See Chomsky hierarchy.
V
V
symbol used in grammar rules for a verb.
feature used as part of the logical form generation system described in lectures. It is used to provide a "discourse variable" that corresponds to the constituent it belongs to. It is useful in handling certain types of modifiers - for example, if we have a ball b1 and it turns out to be red, then we can assert (in the logical form for "red ball") that the object is both red and a ball, by including &(ball1(b1), red1(b1)). Grammar rules, once augmented to handle logical forms, usually give explicit instructions on how to incorporate which var feature. For example, the rule for intransitive VPs:
VP(var(?v), sem(lambda(A2, ?semv(?v, A2))) →
    V(subcat(none), var(?v), sem(?semv))
indicates that the VPs var feature is derived from that of its V subconstituent and shows how the feature (?v) is also incorporated into the sem of the VP.
var features are described in Allen on page 268 ff.
A word describing an action or state or attitude. Examples of each of these would be "ate" in "Jane ate the pizza", "is" in "Jane is happy", and "believed" in "Jane believed Paul at the pizza".
Verbs are one of the major sources of inflection in English, with most verbs having five distinct forms (like "eat" with "eat"/"eats"/"eating"/"ate"/"eaten". The verb "be" is the most irregular, with forms "be", "am", "is", "are", "being", "was", "were", "been", plus some archaic forms, like "art" as in "thou art".
Verb is often abbreviated to V.
V is a lexical grammatical category.
The structure which follows the verb or verb group in a sentence.
Example
Type of complement
Jane laughed.
empty
Jane ate the pizza.
NP
Jane believed Paul ate the pizza.
S
Jane wanted to eat the pizza.
to+VP
Jane gave Paul the pizza.
NP+NP
Jane was happy to eat the pizza.
ADJP
See also verb phrase.
This term is used for a sequence of words headed by a verb together with auxiliaries, and possibly adverbs and the negative particle "not".
For example, in "Jane may not have eaten all the pizza", the verb group is "may not have eaten".
Verb Phrase is a phrasal grammatical category. Verb phrase is usually abbreviated to VP. A verb phrase normally consists of a verb or verb group and a complement, together possibly with adverbialmodifiers and PP modifiers. The simplest complements are noun phrases, but sentential complements and similar structures are also possible.
The logical form of a verb phrase is a lambda-expression. For example, the logical form of "likes pizza" would be something like λ(X, likes1(st1, X, pizza1)), where st1 is the var feature variable for the state of liking (pizza), and likes1 and pizza1 are the semantic interpretations of the verb "likes" and the noun "pizza", respectively.
See also predicate.
A feature of verbs that signifies what form of the verb is present - particularly useful with verbs are irregular in some of their forms, or where a particular form of the verb is required by a particular syntactic rule (for example, modal auxiliaries force the infinitive form of the verb - VFORM inf).
VFORM
Example
Comment
base
break, be, set, decide
base form
pres
break, breaks, am, is, are,
set, sets, decide, decides
simple present tense
past
broke, was, were, set, decided
simple past tense
fin
-
finite = tensed = pres or past
ing
breaking, being, setting, deciding
present participle
pastprt
broken, been, set, decided
past participle
inf
-
used for infinitive forms with to
object.
The Viterbi algorithm is an algorithm applicable in a range of situations that allows a space that apparently has an exponential number of points in it to be searched in polynomial time.
The Viterbi algorithm was not actually described in detail in COMP9414, but was referred to in the section on statistical NLP in connection with a method for finding the most likely sequence of tags for a sequence of words. Reference: Allen p. 201 ff., especially from p. 202.
symbol used in grammar rules for a verb phrase.
W
wh-question
Words are units of language. They are built of morphemes and are used to build phrases (which are in turn used to build sentences.
·        See also lexeme
·        See also terminal symbol
One of several possible meanings for a word, particularly one of several with the same part of speech. For example, dog as a noun has at least the following senses: canine animal, a type of fastening, a low person, a constellation - the dog barked, dog the hatches, You filthy dog! Canis major is also called the great dog.
A kind of ambiguity where what is in doubt is what sense of a word is intended. One classic example is in the sentence "John shot some bucks". Here there are (at least) two readings - one corresponding to interpreting "bucks" as meaning male deer, and "shot" meaning to kill, wound or damage with a projectile weapon (gun or arrow), and the other corresponding to interpreting "shot" as meaning "waste", and "bucks" as meaning dollars. Other readings (such as damaging some dollars) are possible but semantically implausible. Notice that all readings mentioned have the same syntactic structure, as in each case, "shot" is a verb and "bucks" is a noun.
See also structural ambiguity and referential ambiguity.
X
Y
y/n question
Z
Not a part of COMP9414 Artificial Intelligence, but it allows us to avoid having an empty list of Z-concepts in the NLP Dictionary. :-)
A zeugma is a syllepsis in which the single word fails to give meaning to one of its pair. She greeted him with arms and expectations wide.
One of a couple of dozen little-used terms for figures of speech.


No comments:

Post a Comment