A
An
abstract noun is a noun that does not describe a physical object, for example philosophy.
Contrast common noun.
An accepter is a program
(or algorithm) that takes as input a grammar and a string of terminal symbols from the alphabet of that grammar, and outputs yes (or something
equivalent) if the string is asentence of the grammar,
and no otherwise. Contrast parser.
An active arc is a
structure used by a chart parser as it attempts to parse a sentence. It is derived ultimately
from a rule of the grammar being used, and consists
of:
·
a name for
the arc,
·
a type -
the phrasal category being sought,
·
a list of found constituents, i.e. constituents required by a grammar rule
·
a list of types
of the constituents not yet found,
·
a from position,
indicating the position in the sentence of the start of the first found
constituent, and
·
a to position,
indicating the position in the sentence of the end of the last found
constituent.
The
symbol → is used to separated the type from the
list of found constituents, and a dot is used to separate the list of found
constituents from the list of types of constituents not yet found.
=
chart (in a chart parser)
Sentences
in English may be in active or passive form. The active form makes the one who is
performing the action in the sentence (termed the agent in semantics) the grammatical subject. This serves to focus
attention on the agent. Example: "John ate the pizza". The
alternative, the passive voice, makes the thing acted on into the grammatical
subject, thus focussing attention on that thing, rather than on the agent.
Example: "The pizza was eaten by John." Many writers appear to believe
that use of the passive seems more formal and dignified, and consequently it is
over-used in technical writing. For example, they might write "The
following experiments were performed" when it would be clearer to say
"We [i.e. the authors] performed the following experiments."
Contrast mood, tense, and aspect.
symbol used in grammar
rules for an adjective.
An adjective is a word
that modifies a noun by specifying an attribute of the noun. Examples include
adjectives of colour, like red, size or shape, like round or large,
along with thousands of less classifiable adjectives like willing, onerous,
etc. In grammar rules, we use the symbol ADJ for the pre-terminal category of adjectives.
Adjectives
are also used as the complements of sentences with verbs like
"be" and "seem" - "He is happy", "He seems
drunk".
ADJ
is a lexical grammatical
category.
Adjective Phrase (or
adjectival phrase) is a phrasal grammatical category. Adjective phrase is usually abbreviated to
ADJP. They range from simple adjectives (like "green" in "the
green grass") through short lists of adjectives possibly modified by an
adverb or so (like "really large, cream" in "that really large,
cream building") to fairly complicated constructs like "angry that he
had been ignored" in "Jack was angry that he had been ignored".
The longer adjective phrases are frequently take the form of an adjective
followed by a complement, which might be a "that"+Sentence
complement (as in "angry that he had been ignored"), or a PP complement or a "to"+VP complement.
The
longer ADJPs are most often found as complements of verbs such as
"be" and "seem". ADJP is a phrasal grammatical category.
symbol used in grammar
rules for an adjective phrase.
symbol used in grammar
rules for an adverb.
An adverb is a word that
modifies a verb, ("strongly", in "she swam strongly") an adjective, ("very", in "a very strong swimmer") or
another adverb ("very", in "she swam very strongly").
Many
adverbs end with the morpheme -ly, which converts an adjective X into an adverb meaning something like "in an X
manner" - thus "bravely" = "in a brave manner". Other
adverbs includeintensifiers like "very" and
"extremely". There are also adverbs of time (like "today",
"tomorrow", "then" - as in "I gave him the book
then"), frequency ("never", "often"), and place
("here", "there", and "everywhere").
ADV
is a lexical grammatical
category.
Adverbial phrases are
phrases that perform one of the functions of an adverb. They include simple
phrases that express some of the same types of concepts that a single adverb
might express, such as frequency - "every week", duration - "for
three weeks", time - "at lunchtime", and manner - "this
way" ("Do it this way"), or "by holding his head under
water for one minute".
Adverbial
Phrase is a phrasal grammatical
category. Adverbial phrase is
usually abbreviated to ADVP.
symbol used in grammar
rules for an adverbial phrase.
A AGENT is a case used in logical forms. It signifies the entity that is acting in an event.
It normally corresponds to the syntactic subject of an active voice declarative sentence. In the logical form for a state
description, the term EXPERIENCER is used for the corresponding entity.
AGENTs appear in the frame-like structures used to describe logical forms:
e.g. the following, representing "John breaks it with the hammer":
break1(e1,
|
agent[name(j1, 'John')]
|
theme[pro(i1, it1)]
|
|
instr(the<h1, hammer1>])
|
Agreement is the
phenomenon in many languages in which words must take certain inflections
depending on the company they keep. A simple case occurs with verbs in the
third person singular form and their singular subjects: "Jane likes
cheese" is correct, but * "Jane like cheese" and * "My dogs
likes cheese" are not, because the subjects and verbs do not agree on the number feature. The name used in the lecture notes for the agreement feature is agr.
The possible values of the agr feature are 1s, 2s, 3s, 1p, 2p,
3p, signifying 1st person singular, 2nd person singular, ..., 3rd person
plural. Pronouns like "I" and "me" have agr=1s,
"you" has agr={2s,2p} as it is not possible to
distinguish singular from plural in this case, and so on. Definite noun phrases
like "the green ball" have agr=3s.
This refers to the book
by James Allen, Natural Language Processing, second edition,
Benjamin Cummings, 1995.
The "alphabet"
of a grammar is the set of symbols that it uses, including the terminal symbols (which are like words) and the non-terminal symbols which include the grammatical categories
like N (noun), V (verb), NP (noun phrase),
S ( sentence), etc.
See
also context-free grammar, and context-sensitive grammar.
An ambiguity is a
situation where more than one meaning is possible in a sentence. We consider
three types of ambiguity:
word-sense ambiguity structural ambiguity referential ambiguity
There
can be situations where more than one of these is present.
An anaphor is an
expression that refers back to a previous expression in a natural language
discourse. For example: "Mary died. She was very
old." The word she refers to Mary, and is
described as an anaphoric reference to Mary. Mary is
described as the antecedent of she. Anaphoric
references are frequently pronouns, as in the example, but may also be definite
noun phrases, as in: "Ronald Reagan frowned. The President was clearly
worried by this issue." Here The President is an
anaphoric reference to Ronald Reagan. Anaphors may in some cases
not be explicitly mentioned in a previous sentence - as in
"John got out his pencil. He found that the lead was
broken." The lead here refers to a subpart of his
pencil. Anaphors need not be in the immediately preceding sentence, they
could be further back, or in the same sentence, as in "John got out his
pencil, but found that the lead was broken." In all our
examples so far the anaphor and the antecedent are noun phrases, but VP and
sentence-anaphora is also possible, as in "I have today dismissed the
prime minister. It was my duty in the circumstances." Here It is
an anaphoric reference to the VP dismissed the prime minister.
For a fairly complete and quite entertaining
treatment of anaphora, see Hirst, G. Anaphora in Natural Language
Understanding: A Survey Springer Lecture Notes in Computer Science
119, Berlin: Springer, 1981.
A feature of some noun phrases. It indicates that the thing described by the noun phrase is
alive, and so capable of acting, i.e. being the agent of some act. This feature
could be used to distinguish between The hammer broke the window and The
boy broke the window - in the former, the hammer is not animate, so
cannot be the agent of the break action, (it is in fact the
instrument), while the boyis animate, so can be the agent.
See anaphor.
A grammatical relation
between a word and a noun phrase that
follows. It frequently expresses equality or a set membership relationship. For
example, "Rudolph the red-nosed reindeer [had a very shiny nose]" -
here Rudolph = the unique red-nosed reindeer. Another example,
"Freewheelin' Franklin, an underground comic-strip character, [was into
drugs and rock music]", expresses a set membership relation:
Freewheeling_Franklin in "underground comic-strip characters".
Words like
"the", "a", and "an" in English. They are a kind
of determiner. See also the quantifying logical operator THE.
The phrase "I am
reading" is in the progressive aspect, signifying that the
action is still in progress. Contrast this with "I read"
which does not likely refer to an action that is currently in progress. Aspect
goes further than this, but we shall not pursue the details of aspect in this
subject. If interested, you could try Huddleston, R., "Introduction to the
Grammar of English" Cambridge, 1984, pp. 157-158 and elsewhere.
= augmented
transition network
An augmented grammar is
what you get if you take grammar rules (usually from a context-free grammar) and add extra information to them, usually in the form of feature information. For example, the grammar rule s → np vp can be
augmented by adding feature information to indicate that say the agr feature for the vp and the np must agree:
s(agr(?agr)) →
np(agr(?agr)) vp(agr(?agr))
In
Prolog, we would write something like:
s(P1, P3, Agr) :- np(P1, P2, Agr), vp(P2, P3, Agr).
Actually, this is too tough - the agr feature
of a VP, in particular, is usually fairly ambiguous - for example the verb
"love" (and so any VP of which it is the main verb) has agr=[1s,2s,1p,2p,3p],
and we would want it to agree with the NP "we" which has agr=[1p].
This can be achieved by computing the intersection of the agr of
the NP and the VP and setting the agr of the S to be this
intersection, provided it is non-empty. If it is empty, then the S goal should
not succeed.
s(P1, P3, SAgr) :-
np(P1, P2, NPAgr),
vp(P1, P2, VPAgr),
intersection(NPAgr, VPAgr,
SAgr),
nonempty(SAgr).
where intersection computes the intersection of two lists
(regarded as sets) and binds the third argument to this intersection, and nonempty succeeds if its
argument is not the empty list.
Augmented
grammar rules are also used to record sem and var features
in computing logical forms, and to express the relationship between the sem and var of
the left-hand side and the sem(s) andvar(s) of the
right-hand side. For example, for the rule vp → v (i.e. an intransitive verb),
the augmented rule with sem feature could be:
vp(sem(lambda(X,
?semv(?varv, X))), var(?varv)) →
v(subcat(none), sem(?semv), var(?varv))
v(subcat(none), sem(?semv), var(?varv))
where subcat none indicates that this only works with an intransitive
verb.
A
parsing formalism for augmented context free grammars. Not covered in current
version of COMP9414, but described in Allen.
symbol used in grammar
rules for an auxiliary verb.
A "helper"
verb, not the main verb. For example, in "He would have read the
book", "would" and "have" are auxiliaries. A
reasonably complete list of auxiliary verbs in English is:
Auxiliary
|
Example
|
do/does/did
|
I
did read
|
have/has/had/having
|
He
has read
|
be/am/are/is/was/were/been/being
|
He
is reading
|
shall/will/should/would
|
He
should read
|
can,
could
|
She
can read
|
may,
might, must
|
She
might read
|
Complex groupings of auxiliaries can occur, as
in "The child may have been being taken to the
movies".
Some auxiliaries (do, be, and have)
can also occur as verbs in their own right.
Auxiliary verb is often abbreviated to AUX.
AUX is a lexical grammatical category.
This statistical rule
relates the conditional probability Pr(A | B) to Pr(B | A) for two events A
and B. The rule states that
Pr(A | B) = Pr(B | A) × Pr(A) / Pr(B)
BELIEVE is a modal operator in
the language for representing logical forms.
BELIEVE and other operators like it have some unexpected properties such as failure of substitutivity. For more details, read page 237 in Allen. Page 542 ff. provides yet more on belief in NLP (but this
material is well beyond the needs of COMP9414).
A bigram is a pair of
things, but usually a pair of lexical categories. Suppose that we are concerned
with two lexical categories L1 and L2. The term bigram is used in statistical
NLP in connection with the conditional probability that a word will belong to L2 given that
the preceding word was in L1. This probability is written Pr(L2 | L1), or more
fully Prob(w[i] in L2 | w[i-1] in L1). For example, in the phrase "The
flies", given that The is tagged with ART, we would be concerned with the conditional
probabilities Pr(N | ART) and Pr(V | ART) given that flies can
be tagged with N and V.
bottom-up parser
The
chart parser described in lectures is a bottom-up parser, and can parse
sentences, using any context-free grammar, in cubic time: i.e., in time
proportional to the cube of the number of words in the sentence.
A
bound morpheme is a prefix or suffix, which cannot stand as a word in its own
right, but which, can be attached to a free morpheme and
modify the meaning of the free morpheme. For example, "happy" is a
free morpheme, which becomes "unhappily" when the prefix
"un-", and suffix "-ly", both bound morphemes, are
attached.
Number words like one,
two, four, twenty, fifty, hundred, million. Contrast ordinal.
The term case is
used in two different (though related) senses in NLP and linguistics.
Originally it referred to what is now termed syntactic case.
Syntactic case essentially depends on the relationship between a noun (or noun
phrase) and the verb that governs it. For example, in "Mary ate the
pizza", "Mary" is in the nominative or subject case, and
"the pizza" is in the accusative or object case. Other languages may
have a wider range of cases. English has remnants of a couple more cases -
genitive (relating to possession, as with the pronoun "his") and
dative (only with ditransitive verbs - the indirect object of the verb is said to be in the
dative case).
Notice
that in "The pizza was eaten by Mary", "the pizza" becomes
the syntactic subject, whereas it was the syntactic object in the equivalent
sentence "Mary ate the pizza".
With semantic
case, which is the primary sense in which we are concerned with the term case in
COMP9414, the focus is on the meaning-relationship between the verb and the
noun or noun phrase. Since this does not change between "Mary ate the
pizza" and "The pizza was eaten by Mary", we want to use the
same syntactic case for "the pizza" in both sentences. The term used
for the semantic case of "the pizza" is theme. Similarly, the semantic case of "Mary" in both
versions of the sentence is agent. Other cases frequently
used include instrument, coagent, experiencer, at-loc, from-loc, andto-loc, at-poss, from-poss,
and to-poss, at-value, from-value, and to-value, at-time, from-time,
and to-time, and beneficiary.
Semantic
cases are also referred to as thematic roles.
Opposite of anaphor, and much rarer in
actual language use. A cataphor is a phrase that is explained by text that
comes after the phrase. Example: "Although he loved
fishing, Paul went skating with his girlfriend." Here he is
a cataphoric reference to Paul.
= context-free
grammar
A chart is a data
structure used in parsing. It consists of a collection of active
arcs (sometimes also
called edges), together with a collection of constituents, (sometimes also called inactive arcs orinactive
edges.
chart parsing
A
chart parser is a variety of parsing algorithm that maintains a table of well-formed
substrings found so far in the sentence being parsed. While the chart
techniques can be incorporated into a range of parsing algorithms, they were
studied in lectures in the context of a particular bottom-up parsing algorithm.
That algorithm will now be summarized:
to parse a sentence S
using a grammar G and lexicon L:
1. Initially there are no constituents or active
arcs
2. Scan the next word w of the
sentence, which lies between positions i and i+1 in
the sentence.
3. Look up the word w in the
lexicon L. For each lexical category C to which w belongs,
create a new constituent of type C, from i to i+1.
4. Look up the grammar G. For each category C found
in the step just performed, and each grammar rule R whose
right-hand side begins with C, create a new active arc whose rule
is R, with the dot in the rule immediately after the first category
on the right-hand side, and from i to i+1.
5. If any of the active arcs can have their dots
advanced (this is only possible if the arc was created in a previous cycle of
this algorithm) then advance them.
6. If any active arcs are now completed (that is,
the dot is now after the last category on the right-hand side of the active
arc's rule), then convert that active-arc to a constituent (or inactive arc),
and go to step 4.
7. If there are any more words in the sentence, go
to step 2.
to check if an active
arc can have its dot advanced
1. Let the active arc be ARCx: C → C[1]
... C[j] . C[j+1] ... C[n]
from m to n.
2. If there is a constituent of type C[j]
from n to p, then the dot can be advanced.
The resulting new active arc will be:
ARCy: C → C[1] ... C[j+1] . C[j+2] ... C[n] from m to n
where y is a natural number that has not yet been used in an arc-name.
The resulting new active arc will be:
ARCy: C → C[1] ... C[j+1] . C[j+2] ... C[n] from m to n
where y is a natural number that has not yet been used in an arc-name.
Example: For the active arc
ARC2: NP → ART1 . ADJ N from 2 to 3 if there is a constituent ADJ2: ADJ →
"green" from 3 to 4 (so that the to position, 3, and the type,
ADJ, of the constituent of the active arc immediately after the dot, match the from
position, 3, and the type, ADJ, of the constituent ADJ2) then
the active arc ARC2 can be extended, i.e. have its dot advanced, creating a new
active arc, say ARC3: NP → ART1 ADJ2 . N from 2 to 4.
The
Chomsky hierarchy is an ordering of types of grammar according to
generality. The classification in fact only depends on the type of grammar
rule or production used. The grammar types described
in COMP9414 included:
·
unrestricted grammars
(rules of the form a → b with no restrictions on the strings a and b)
·
context sensitive
grammars (rules of the form a → b with the
restriction length(a) <= length(b))
·
context free grammars
(rules of the form X → b where X is a
single non-terminal symbol)
·
regular grammars (rules
of the form X → a and X → aN where X and N are non-terminal symbols, and a is a terminal symbol.)
Named
after the linguist Noam Chomsky.
symbol used in grammar
rules for an common noun phrase.
You really need to know
what an agent is before proceeding. A co-agent is someone who acts with the
agent in a sentence. A sentence with a prepositional phrase introduced by the preposition with and
whose object is an animate is likely to be a coagent. "Jane ate
the pizza with her mother" - her mother is
the coagent.
co-refer
common noun
A common noun is a noun that describes a
type, for example woman, or philosophy rather than
an individual, such as Amelia Earhart. Contrast proper noun.
A common noun phrase is
a phrasal grammatical category of chiefly technical significance.
Examples include "man" "big man" "man with the
pizza", but not these same phrases with "the" or "a"
in front - that is, "the man with the pizza", etc., are NPs, not a CNP. The need for the category CNP as a separate named
object arises from the way articles like
"the" act on a CNP. The word "the", regarded as a natural
language quantifier, acts on the whole of the CNP that it precedes:
it's "the[man with the pizza]", not "the[man] with the
pizza". For this reason, it makes sense to make phrases like "man
with the pizza" into syntactic objects in their own right, so that the
semantic interpretation phase does not need to reorganize the structural
description of the sentence in order to be able to interpret it.
A complement is a
grammatical structure required in a sentence, typically to complete the meaning of a verb or adjective. For example, the verb "believe" can
take a sentential complement, that is, be followed by a sentence,
as in "I believe you are standing on my foot."
There
is a wide variety of complement structures. Some are illustrated in the entry
for subcategorization.
An
example of an adjective with a complement is "thirsty for blood", as
in "The football crowd was thirsty for blood after the home team was
defeated." This is a PP-complement. Another would be "keen to get out
of the stadium", a TO-INF complement, as in "The away-team supporters
were keen to get out of the stadium."
Compositional
semantics signifies a system of constructing logical forms for
sentences or parts of sentences in such a way that the meanings of the
components of the sentence (or phrase) are used to construct the meanings of
the whole sentence (or whole phrase). For example, in "three brown
dogs", the meaning of the phrase is constructed in an obvious way from the
meanings of three, brownanddogs. By way of
contrast, a phrase like "kick the bucket" (when read as meaning
"die") does not have compositional semantics, as the meaning of the whole
("die") is unrelated to the meanings of the component words.
The
semantic system described in COMP9414 assumes compositional semantics.
A
concrete noun is a noun that describes a physical object, for example apple.
Contrast abstract noun.
The conditional
probability of event B given event A is the probability that B
will occur given that we know that A has occurred. The example used in lecture
notes was that of a horse Harry that won 20 races out of 100 starts, but of the
30 of these races that were run in the rain, Harry won 15. So while the
probability that Harry would win a race (in general) would be estimated as
20/100, the conditional probability Pr(Win | Rain) would be estimated as 15/30
= 0.5. The formal definition of Pr(B | A) is Pr(B & A) / Pr(A). In the case
of B = Win and A = Rain, Pr(B & A) is the probability that it will be
raining and Harry will win (which on the data given above is 15/100), while
Pr(A) is the probability that it will be raining, or 30/100. So again Pr(B | A)
= 0.15/0.30 = 0.5
symbol used in grammar
rules for a conjunction.
A
conjunction is a word used to join two sentences together to make a larger
sentence. Conjunctions include coordinate conjunctions, like
"and", "or" and "but": "Jim is happy and
Mary is proud", "India will win the test match or I'm a monkey's
uncle".
There are also subordinate conjunctions,
like "if" and "when", as in "I will play with you if you
will lend me your marbles" and "I will lend you this book when you
return the last one you borrowed".
Conjunctions may also be used to join nouns,
adjectives, adverbs, verbs, phrases ...
Examples:
Examples:
nouns
|
Boys
and girls [come out to play].
|
adjectives
|
[The
team colours are] black and yellow.
|
adverbs
|
[He
was] well and truly [beaten].
|
verbs
|
[Mary]
played and won [her match].
|
phrases
|
across
the river and into the trees
[She] fell down and hit her head. |
Conjunction is often abbreviated to CONJ.
CONJ is a lexical grammatical category.
A
constituent, in parsing, is a lexical or phrasal category that has been found in a sentence being parsed, or
alternatively one that is being sought for but has not yet been found.
See active arc. When an active arc is completed (when all its sub-constituents
are found), the active arc becomes a constituent.
Constituents
are used to create new active arcs - when there is a constituent X1 of type X,
and a grammar rule whose right hand side starts with the
grammar symbol X, then a new active arc of type X may be created, with the
constituent X1 listed as a found constituent for the active arc (the only one,
so far).
The
components of a constituent, as recorded in the chart parsing algorithm described in lectures, are:
component
|
example
|
NP1: NP → ART1 ADJ1 N1 from 0 to 3
|
name
|
NP1
|
usually formed from the type + a
number
|
type
|
NP
|
a phrasal or lexical category of
the grammar
|
decomposition
|
ART1 ADJ1 N1
|
(ART1, ADJ1 and N1 would be the
names of other constituents already found)
|
from
|
0
|
sentence position of the left end
of this NP
|
to
|
3
|
sentence position of the right end
of this NP
|
.
context-free
A
context-free grammar is defined to be a 5-tuple (P, A, N, T, S) with components
as follows:
P
|
A
set of grammar rules or productions, that is,
items of the form X → a,
where X is a member of the set N, that is, a non-terminal symbol, and a is a string over
the alphabet A.
An example would be the rule NP → ART ADJ N which signifies that a Noun Phrase can be an ARTicle followed by an ADJective followed by a Noun, or N → horse, which signifies that horse is a Noun. NP, ART, ADJ, and N are all non-terminal symbols, and horse is a terminal symbol. |
A
|
the
alphabet of the grammar, equal to the disjoint union of N and T
|
N
|
the
set of non-terminal symbols (i.e. grammatical or phrasal categories)
|
T
|
the
set of terminal symbols (i.e. words of the language that the grammar defines)
|
S
|
a distinguished non-terminal,
normally interpreted as representing a full sentence (or program, in the case
of a programming language grammar)
|
context-sensitive
A
context-sensitive grammar is a grammar with context-sensitive rules. There are
two equivalent formulations of the definition of a context-sensitive grammar
rule (cf. Chomsky hierarchy):
·
rules of the form a → b where a and b are strings of alphabet symbols, with the
restriction that length(a) <= length(b)
·
rules of the form l X r → l b r where l, r, and b are (possibly empty) strings of alphabet symbols, and X is a
non-terminal. l and r are referred to as the left and right
context for X → b in the
context-sensitive rule.
Context-sensitive
grammars are more powerful than context-free grammars, but they are much harder
to work with.
A corpus is a large body
of natural language text used for accumulating statistics on natural language
text. The plural is corpora. Corpora often include extra
information such as a tag for each word indicating its part-of-speech, and perhaps the parse tree for each sentence.
See
also statistical NLP.
A noun of a type that can
be counted. Thus horse is a count noun, but water is
not. Contrast mass noun.
= context-sensitive
grammar
= discourse
entity
DE list
= indicative.
A
kind of determiner, that is, an ingredient of noun phrases.
This class of words includes "this", "that",
"these", and "those". They are part of the reference system of English. That is, they is used to tell which of a
number of possibilities for the interpretation of the rest of the noun phrase is in fact intended. Demonstratives are most useful in
spoken language, and are often accompanied by a pointing gesture.
A derivation of a sentence of a grammar is, in effect, a proof that the sentence can be derived from
the start symbol of the grammar using the grammar rules and
a rewriting process. For example, given the grammar 1.S → NP VP, 2.
NP → ART N, 3. VP → V, and lexical rules 4. ART → "the", 5. N →
"cat", and 6. V → "miaowed", we can derive the sentence
"the cat miaowed" as follows:
S
|
⇒
|
NP VP
|
rule 1
|
⇒
|
ART N VP
|
rule 2
|
|
⇒
|
the N VP
|
rule 4
|
|
⇒
|
the cat VP
|
rule 5
|
|
⇒
|
the cat V
|
rule 3
|
|
⇒
|
the cat miaowed
|
rule 6
|
One can then write S ⇒* "the cat miaowed": i.e. ⇒* is the symbol for the derivation
relation. The symbol ⇒ is referred to as direct
derivation. A sentential form is any string that can be
derived (in the sense defined above) from the start symbol S.
The sense in which the
term grammar is primarily used in Natural Language Processing. A grammar is a
formalism for describing the syntax of a language. Contrast prescriptive grammar.
Determiners are one of
the ingredients of noun phrases. Along with cardinals and ordinals, they make up the set of specifiers,
which assist in reference - that is, determining exactly which of
several possible alternative objects in the world is referred to
by a noun phrase. They come in several varieties - articles, demonstratives, possessives,
and quantifying determiners.
See
also here.
A discourse entity (DE)
is a something mentioned in a sentence that could act as a possible antecedent
for an anaphoric reference, e.g. noun phrases, verb phrases and sentences.
For example, with the sentence "Jack lost his wallet in his car", the
DEs would include representations of "Jack" "his wallet",
"his car", "lost his wallet in his car" and the whole
sentence. The whole sentence could serve as the antecedent for "it"
in a follow-up sentence like "He couldn't understand it" (while
"Jack" would be the antecedent of "He").
Sometimes
discourse entities have a more complex relation to the text. For example, in
"Three boys each bought a pizza", clearly "Three boys"
gives rise to a DE that is a set of three objects of type boy (B1: |B1| = 3 and B1
subset_of {x|Boy(x)}), but "a pizza", in this context, gives rise to
a representation of a set P1 of three pizzas (whereas in the usual case "a
pizza" would give rise to a DE representing a single pizza.)
P1 = {p | pizza(p) and exists(b) : Boy(b) and y = pizza_bought_by(b)}.
The function "pizza_bought_by" is the Skolem function referred to in lectures as "sk4".
P1 = {p | pizza(p) and exists(b) : Boy(b) and y = pizza_bought_by(b)}.
The function "pizza_bought_by" is the Skolem function referred to in lectures as "sk4".
See history list.
See context-free grammar.
A
verb in English that can take two objects, like give, as in "He gave his
mother a bunch of flowers". Here "his mother" is the indirect
object and "a bunch of flowers" is the direct object.
The same sentence can also be expressed as "He gave a bunch of flowers to his
mother", with the direct and indirect objects in the opposite order, and
the indirect object marked by the preposition "to".
The preposition in such cases is usually "to", or "for" (as
in "He bought his mother a bunch of flowers" = "He bought a
bunch of flowers for his mother."
Bitransitive
verbs can appear with just one or even no syntactic objects ("I gave two
dollars", "I gave at the office") - their distinguishing
characteristic is that they can have two objects, unlikeintransitive and transitive verbs.
Here
is an incomplete list of ditransitive verbs in English.
Ellipsis refers to
situations in which sentences are abbreviated by leaving out parts of them that
are to be understood from the context. For example, if someone asks "What
is your name?" and the reply is "John Smith" then this can be
viewed as an elliptical form of the full sentence "My name is John Smith".
Ellipsis causes problems for NLP since it is
necessary to infer the rest of the sentence from the context.
"ellipsis" is also the name of the
symbol "..." used when something is omitted from a piece of text, as
in "Parts of speech include nouns, verbs, adjectives, adverbs, determiners,
... - the list goes on and on."
"elliptical" is the adjectival form of
"ellipsis".
An
embedded sentence is a sentence that is contained inside another sentence. Some
examples, with the embedded sentence in italics:
·
John believes that Mary
likes pizza
·
If Mary likes
pizza then she may come to our pizza party.
·
If Joan liked
pizzathen she would come to our pizza party.
A noun phrase is said to evoke a discourse
entity if the noun phrase
refers to something related to a previously mentioned discourse entity (but not
to an already-mentioned DE). For example, in "Jack lost his wallet in his
car. Later he found it under the front seat.", the phrase "the front
seat" evokes a discourse entity that has not actually been mentioned, but
which is in a sense already present as part of the the DE
created by the phrase "his car".
See
also anaphor.
"exists" is a
textual way of writing the existential quantifier, which is
otherwise written as an back-to-front capital E. It corresponds fairly closely
to the English word "some". Thus,
exists(X, likes(X, spinach))
would be read as "for some entity X, X
likes spinach" or just "something likes spinach". This might be
too broad a statement, as it could be satisfied, for example, by a snail X that
liked spinach. It is common therefore to restrict the proposition to something
like:
exists(X, is_person(X) and likes(X, spinach))
i.e. "Some person likes icecream."
That is, we are restricting the type of X to persons. In some cases, it is more
reasonable to abbreviate the type restriction as follows:
exists(X : person, likes(X, spinach))
See also forall, Skolem functions and this riddle.
Experiencer is
a case that usually fills a similar syntactic role to the agent but where the entity involved cannot be said to act.
It is thus associated with the use of particular verbs like
"remember", as in "Jim remembered his homework when he got to
school". Here "Jim" is the experiencer of the
"remember" situation.
In some situations,
things that are equal cannot be substituted for each other in logical forms.
Consider believe(sue1, happy1(jack1)) - jack1 may = john22 (i.e. the individual
known as Jack may also be called John, e.g. by other people, but Sue
believes John is happy may not be true, e.g. because Sue may not know
that jack1 = john22. Thus john22 cannot be substituted for jack1, even though
they are equal in some sense. See also Allen pp. 237-238.
Features can be thought
of as slots in a lexicon entry or in
structures used to build a logical
form. They record syntactic
or semantic information about the word or phrase. Examples include the agragreement feature, the sem feature that
records the logical form of a word or phrase, and the var feature that
records the variable used to name the referent of a phrase in a logical form.
first person
One of the choices for
the person feature. A sentence is "in the first person" if
the subject of the sentence is the speaker, or the
speaker and some other individual(s), as in "I like pizza" and
"We like pizza".
"I"
and "we" are first-person pronouns, as are "me",
"us". Other words with the first-person feature include
"mine", "my", "myself", "ours",
"our", and "ourselves".
This stands for First
Order Predicate Calculus, a standard formulation of logic that has logical
operators like and, or, and not, predicate
symbols and constants and functions, and terms built from these, together with the
quantifiers "forall and exists. It is common for
semantic representation systems in NLP to be expressed in languages that
resemble or are based on FOPC, though sometimes they add significant features
of more elaborate logical systems.
"forall" is a
textual way of writing the universal quantifier, which is otherwise
written as an upside-down capital A. It corresponds fairly closely to the
English words "each" and "every". Thus,
forall(X, likes(X, icecream))
would be read as "for every entity X, X
likes icecream" or just "everything likes icecream". This would
be too broad a statement, as it would allege that, for example, rocks like
icecream. It is usual therefore to restrict the proposition to something like:
forall(X, is_person(X) ⇒ likes(X, icecream))
i.e. "Every person likes icecream."
That is, we are restricting the type of X to persons. In some cases, it is more
reasonable to abbreviate the type restriction as follows:
forall(X : person, likes(X, icecream))
See also exists.
A
free morpheme is a basic or root form of a word, to which can be attached bound morphemes that
modify the meaning. For example, "happy" is a free morpheme, which
becomes "unhappy" when the prefix "un-", a bound morpheme,
is attached.
See modal
operators - tense and tense - future.
See tense.
One of the features of a noun phrase. In English, gender is only marked in third-person singular pronouns and associated
words. The possible values of the gender feature are masculine, feminine,
andneuter.
type
|
masculine
|
feminine
|
neuter
|
example
|
pronoun (nominative)
|
he
|
she
|
it
|
he hit the ball.
|
pronoun (accusative)
|
him
|
her
|
it
|
Frank hit him.
|
pronoun(possessive adjective)
|
his
|
her
|
its
|
Frank hit his arm.
|
pronoun(possessive)
|
his
|
hers
|
its
|
The ball is his.
|
pronoun(reflexive)
|
himself
|
herself
|
itself
|
Frank hurt himself.
|
An
alternative grammatical formalism, in which, among other things, the
non-terminal symbols of context-free grammars are replaced by sets of features, and the
grammar rules show the relationships between these objects much as context-free
rules show the relationships between grammar symbols in a CFG.
If there is a derivation of
a sentence from a grammar, then the grammar is said to generate the
sentence.
= generalized
phrase structure grammar
1. A system for describing a language, the rules of
a language.
2. A formal system for describing the syntax of a language. In
COMP9414, we are principally concerned with context-free grammars, sometimes augmented.
See
also Chomsky hierarchy.
See Chomsky hierarchy and context-free
grammars.
= head-driven
phrase structure grammar
A
head feature is one for which the feature value on a parent category must
be the same as the value on the head subconstituent. Each phrasal category has
associated with it a head subconstituent - N, NAME or PRO or
CNP for NPs, VP for S, V for VP, P (= PREP) for PP.
For example, var is a head feature for a range of phrasal categories, including S. This means that an S gets its var feature by copying the var feature of its head subconstituent, namely its VP.
Head features are discussed on page 94-96 of Allen.
For example, var is a head feature for a range of phrasal categories, including S. This means that an S gets its var feature by copying the var feature of its head subconstituent, namely its VP.
Head features are discussed on page 94-96 of Allen.
See head feature.
head-driven phrase
structure grammar
A
Hidden Markov Model, for our purposes in COMP9414, is a set of states (lexical
categories in our case) with directed edges (cf. directed graphs) labelled with transition
probabilities that indicate the probability of moving to the state at
the end of the directed edge, given that one is now in the state at the start
of the edge. The states are also labelled with a function which indicates the
probabilities of outputting different symbols if in that state (while in a
state, one outputs a single symbol before moving to the next state). In our
case, the symbol output from a state/lexical category is a word belonging to
that lexical category. Here is an example:
Using
this model, the probability of generating "dogcatchers catch old red
fish" can be calculated as follows: first work out the probability of the
lexical category sequence → N → V → ADJ → ADJ → N, which is 1 × 1 × 0.5 ×
0.1 × 0.9 = 0.045, and then multiply this by the product of the output
probabilities of the words, i.e. by 0.3 × 0.2 × 0.6 × 0.2 × 0.5 = 0.0036, for a
final probability of 0.000162.
This
is the list of discourse entities mentioned in recent sentences, ordered
from most recent to least recent. Some versions also include the syntactic and
semantic analyses of the previous sentence (or previous clause of a compound
sentence). Some versions keep only the last few sentences worth of discourse
entities, others keep all the discourse entities since the start of the
discourse.
= Hidden
Markov model
ill-formed text
Much "naturally occurring" text
contains some or many typographical errors or other errors. Industrial-strength
parsers have to be able to deal with these, just as people can deal with typos
and ungrammaticality. Such a parser is called a robust parser.
An imperative sentence
is one that expresses a command, as opposed to a question or a statement. See
also WH-question, Y/N-question, indicative, subjunctive, and mood.
Two
events A and B are said to be statistically independent if Pr(B | A) = Pr(B) -
i.e. whether or not A is a fact has no effect on the probability that B will
occur. Using Bayes' rule, this can be reformulated as Pr(A and B) = Pr(A)
× Pr(B).
An indicative sentence
is one that makes a statement, as opposed to a question or a command. See also WH-question, Y/N-question, imperative, subjunctive,
and mood.
A form of verbs. In English, this form
is introduced by the word "to" - the infinitive particle.
Examples: to go, to rather prefer, to have
owned. Verb phrases that have an infinitive verb construction
in them are referred to as infinitive verb phrases. Constructions that are not
infinitive are referred to as infinitive verb phrases. See vp:inf and np_vp:inf
in the article on subcategorization.
See
also here for the distinction between
"infinite" and "infinitive".
An inflection is a type
of bound morpheme, with a grammatical function. For example, the
suffix "-ing" is an inflection which, when attached to a participle form of the verb. Other inflections in English form the
other parts of verbs (such as the past tense and past
participle forms), and the plural of nouns.
Some
words inflect regularly, and some inflect irregularly, like the plural form
"children" of "child", and the past tense and past
participle forms "broke" and "broken" of the verb
"break".
A semantic case, frequently appearing
as a prepositional phrase introduced by the preposition
"with". For example, in "Mary ate the pizza with her
fingers", the prepositional phrase "with her fingers" indicates
the instrument used in the action described by the sentence.
A
kind of adverb, used to indicate the level or intensity of an adjective or another adverb. Examples include "very",
"slightly", "rather", "somewhat" and
"extremely". An example of use with an adjective: "Steve was
somewhat tired". An example of use with an adverb: "Mary ran very
quickly".
symbol used in grammar
rules for an interjection.
Interjection
is often abbreviated to INTERJ.
INTERJ is a lexical grammatical category. It usually appears as a single word utterance, indicating some
strong emotion or reaction to something. Examples include: "Oh!",
"Ouch!", "No!", "Hurray!" and a range of
blasphemies and obscenities, starting with "Damn!".
A
verb that can take no syntactic object, like laugh, as in "He laughed
loudly", or "She laughed at his remark". Contrast ditransitive and transitive. See also subcategorization.
= knowledge
base
See knowledge representation language.
See knowledge representation language.
The
term knowledge representation language (KRL) is used to refer to the language
used by a particular system to encode the knowledge. The collection of
knowledge used by the system is referred to as a knowledge base (KB).
= knowledge
representation language
The process of applying
a lambda-expression to it's argument (in general, arguments, but the examples
we've seen in COMP9414 have all been single argument lambda-expressions). A
lambda expression is a formula of the form (lambda ?x P(?x)), in an Allen- like notation, or lambda(X, p(X)) in
a Prolog-ish notation. P(?x) (or p(X)) signifies a formula involving the
variable ?x (or X). The lambda-expression can be viewed as a function to be
applied to an argument. The result of applying lambda(X, p(X)) to an argument a
is p(a) - that is, the formula p(X) with all the instances of the variable X
replaced by a. Using a more clearly NLP example, if we apply lambda(X, eat1(l1,
X, pizza1)) to mary1 we get eat1(l1, mary1, pizza1)).
Prolog
code for lambda-reduction is:
lambda_reduce(lambda(X, Predicate), Argument, Predicate) :-
X = Argument.
Applying this to an actual example:
: lambda_reduce(
lambda(X, eats(e1,
X, the1(p1, pizza1))),
name(m1, 'Mary'),
Result) ?
X = name(m1, 'Mary')
Result = eats(e1, name(m1, 'Mary'), the1(p1, pizza1))
The language generated
by a grammar is the set of all sentences that can be derived from the start symbol S
of the grammar using the grammar rules. Less formally, it is the set of all
sentences that "follow from" or are consistent with the grammar
rules.
Parsing that processes the words of the sentence from left to right (i.e. from beginning to end), as opposed
to right-to-left (or end-to-beginning) parsing. Logically
it may not matter which direction parsing proceeds in, and the parser will work,
eventually, in either direction. However, right-to-left parsing is likely to be
less intuitive than left-to-right. If the sentence is damaged (e.g. by the
presence of a mis-spelled word) it may help to use a parsing algorithm that
incorporates both left-to-right and right-to-left strategies, to allow one to
parse material to the right of the error.
A set of lexemes with the same stem, the same major part-of-speech,
and the same word-sense. E.g. {cat, cats}.
Fancy name for a word,
including any suffix or prefix. Contrast free and bound morphemes.
A
grammatical formalism, not covered in COMP9414.
The probability that a
particular lexical category (in context or out of context) will give rise to a
particular word. For example, suppose, in a system with a very small lexicon, there might be only
two nouns, say cat and dog. Given a corpus of sentences using this lexicon, one could count the number
of times that the two words cat anddog occurred
(as a noun), say ncats and ndogs. Then the lexical
generation probability for cats as a noun would be ncats/(ncats+ndogs), written symbolically as Pr(cat | N).
A
rule of a grammar (particularly a context-free grammar) of
the form X → w, where w is a single word. In most lexicons, all the lexical insertion rules for a
particular word are "collapsed" into a single lexical entry, like
"pig": N V ADJ.
"pig"
is familiar as a N, but also occurs as a verb ("Jane pigged herself on
pizza") and an adjective, in the phrase "pig iron", for example.
Synonymous with part-of-speech (POS). Also called a pre-terminal symbol. A kind of non-terminal symbol of a grammar - a non-terminal is a lexical symbol if it can appear in a
lexical insertion rule. Examples are N, V, ADJ, PREP, INTERJ, ADV. Non-examples
include NP, VP, PP and S (these are non-terminals). The term lexical
category signifies the collection of all words that belong to a
particular lexical symbol, for example, the collection of all Nouns or the
collection of all ADJectives.
Contrast
with phrasal category.
A lexicon is a
collection of information about the words of a language about the lexical
categories to which they belong. A lexicon is usually structured as a
collection of lexical entries, like ("pig" N V ADJ).
"pig" is familiar as a N, but also occurs as a verb ("Jane
pigged herself on pizza") and an adjective, in the phrase "pig
iron", for example. In practice, a lexical entry will include further
information about the roles the word plays, such as feature information - for
example, whether a verb is transitive, intransitive, ditransitive, etc., what
form the verb takes (e.g. present participle, or past tense, etc.)
= lexical
functional grammar
The local discourse
context, or just local context includes the syntactic and semantic analysis of
the preceding sentence, together with a list of objects mentioned in the
sentence that could be antecedents for later pronouns and definite noun phrases.
Thus the local context is used for the reference stage of NLP. See
also history list.
Logical
forms are expressions in a special language, resembling FOPC (first order predicate calculus) and used to encode the meanings (out of
context) of NLP sentences. The logical form language used in the book by James Allen includes:
terms
constants or expressions that describe objects:
fido1, jack1
predicates
constants or expressions that describe relations
or properties, like bites1. Each predicate has an associated
number of arguments - bites1 is binary.
propositions
a predicate followed by the appropriate number
of arguments: bites1(fido1, jack1), dog1(fido1) - Fido is a dog. More complex propositions can be constructed using logical operators not(loves1(sue1,
jack1)), &(bites1(fido1, jack1), dog1(fido1)).
quantifiers
English has some precise quantifier-like words: some,
all, each, every, the,
aas well as vague ones: most,
many, a few. The logical form language has quantifiers to encode the meanings of each quantifier-like word.
variables
are needed because of the quantifiers, and
because while the words in a sentence in many cases give us the types of the
objects, states and events being discussed, but it is not until a later stage
of processing (reference) that we know to what instances of those types the
words refer.
Variables
in logical form language, unlike in FOPC, persist beyond the "scope"
of the quantifier. E.g. A man came in. He went to the table. The
first sentence introduces a new object of type man1. The He, in the
second sentence refers to this object.
NL
quantifiers are typically restricted in the range of objects that the variable
ranges over. In Most dogs bark the variable in the most1
quantifier is restricted to dog1 objects: most1(d1 : dog1(d1), barks1(d1)).
predicate operators
A predicate operator takes a predicate as an
argument and produces a new predicate. For example, we can take a predicate
like cat1 (a unary predicate true of a single object of type cat1) and apply
the predicate operator plur that converts singular predicates into the corresponding plural predicate plur(cat1), which is true of any set of cats with
more than one member.
modal operators
Modal operators are used to represent certain verbs like believe, know,
want, that express attitudes to other propositions, and for tense, and other purposes. Sue believes Jack is happy becomes
believe(sue1,
happy1(jack1))
With
tenses, we use the modal operators pres, past, fut, as in:
pres(sees1)(john1,
fido1))
past(sees1)(john1, fido1)
fut(sees1)(john1, fido1)
past(sees1)(john1, fido1)
fut(sees1)(john1, fido1)
The operators and, or, not, ⇒ (implies), and <⇒ (equivalent to). and is
sometimes written as &. They are used to connect propositions to make
larger propositions: e.g.
is-blue(sky1) and is-green(grass1) or can-fly(pig1)
is-blue(sky1) and is-green(grass1) or can-fly(pig1)
A
parsing technique, not covered in COMP9414.
A noun that cannot be
counted. Water is a mass noun, as is sand (if
you want to count sand, you refer to grains). Contrast count noun.
A modal auxiliary is distinguished syntactically by the fact
that it forces the main verb that follows it to take the infinitive form. For example, "can", "do",
"will" are modal ("she can eat the pizza", "she does
eat pizza", "she will eat pizza") but "be" and
"have" are not ("she is eating pizza", "she has eaten
pizza").
As far as we are
concerned in COMP9414, modal operators are a feature of the logical form language
used to represent certain epistemic verbs like
"believe", "know" and other verbs like "want",
and the tense operators, which convert an untensed
logical form into a tensed one.
Thus
if likes1(jack1, sue1) is a formula in the logical form language, then we can
construct logical forms like know(mary1, likes1(jack1, sue1)) meaning that Mary
knows that Jack likes Sue. Similarly for believe(mary1, likes1(jack1, sue1))
and want(marg1, own(marg1, (?obj : &(porsche1(?obj),
fire_engine_red(?obj))))) - that's Marg wants to own a fire-engine red
Porsche.
The tense operators include fut, pres, and past, representing future, present and past. For example, fut(likes1(jack1, sue1)) would represent Jack will like Sue.
The tense operators include fut, pres, and past, representing future, present and past. For example, fut(likes1(jack1, sue1)) would represent Jack will like Sue.
See
also failure of substitutivity.
See also articles on
individual moods.
Mood
|
Description
|
Example
|
indicative
|
A
plain statement
|
John
eats the pizza
|
imperative
|
A
command
|
Eat
the pizza!
|
WH-question
|
A
question with a phrasal answer,
often starting with a question-word beginning with "wh" |
Who
is eating the pizza?
What is John eating? What is John doing to the pizza? |
Y/N-question
|
A
question with yes/no answer
|
Did
John eat the pizza?
|
subjunctive
|
An
embedded sentence that is
counter-factual but must be expressed to, e.g. explain a possible consequence. |
If John
were to eat more pizza
he would be sick. |
A unit of language
immediately below the word level. See free morpheme and bound morpheme, and morphology.
The study of the
analysis of words into morphemes, and conversely of the synthesis of words from morphemes.
A rather vague natural
language quantifier, corresponding to the word "most" in English.
"Many", "a few", and "several" are other
quantifier-type expressions that are similarly problematical in their
interpretation.
symbol used in grammar
rules for a noun.
n-gram
A n-gram is an n-tuple of things,
but usually of lexical categories. Suppose that we are concerned with n lexical
categories L1, L2, ..., Ln.
The term n-gram is used in statistical NLP in connection with the conditional
probability that a word will
belong to Ln given that the preceding words were in L1, L2,
..., Ln–1. This probability is written Pr(Ln | Ln–1...L2 L1),
or more fully Prob(wi ∈ Ln | wi–1 ∈ Ln–1 ∧ ... ∧ wi–n–1 ∈ L1). See also bigram and trigram,
and p. 197 in Allen.
Word for a noun functioning as an adjective, as with the word "wood" in "wood fire".
Longer expressions constructed from nominals are possible. It can be difficult
to infer the meaning of the nominal compound (like "wood fire") from
the meanings of the individual words.- for instance, while "wood
fire" presumably means a fire made with wood, "brain damage"
means damage to a brain, rather than damage made with a brain. Another example:
"noun modifier" could on the face of it either mean a noun that acts
as a modifier (i.e. a nominal as just defined) or a modifier of a noun.
In
fact, noun modifier is a synonym for nominal.
non-terminal
A non-terminal symbol of
a grammar is a symbol that represents a lexical or phrasal category in a language. Examples in
English would include N, V, ADJ, ADV (lexical categories) and NP, VP, ADJP,
ADVP and S (phrasal categories). See also terminal symbol and context-free
grammar.
A noun is a word
describing a (real or abstract) object. See also mass
noun, count noun, common noun, abstract noun, proper noun, and concrete
noun.
Constrast verb, adjective, adverb, preposition, conjunction, and interjection.
Constrast verb, adjective, adverb, preposition, conjunction, and interjection.
Noun
is often abbreviated to N.
N
is a lexical grammatical
category.
= nominal.
Noun Phrase is a phrasal grammatical category. Noun phrase is usually abbreviated to NP. NPs have a noun as their head, together with (optionally) some of the
following:
adjectives, nominal modifiers (i.e. other nouns, acting as though they were adjectives), certain kinds of adverbs that modify the adjectives, as with "very" in "very bright lights", participles functioning as adjectives (as in "hired man" and "firing squad"), cardinals, ordinals, determiners, and quantifiers. There are constraints on the way these ingredients can be put together. Here are some examples of noun phrases: Ships (as in Ships are expensive to build, three ships (cardinal + noun), all three ships (quantifier + cardinal + noun), the ships (determiner + noun), enemy ships (nominal + noun), large, grey ships (adjective + adjective + noun), the first three ships (determiner + ordinal + cardinal + noun), my ships (possessive + noun).
adjectives, nominal modifiers (i.e. other nouns, acting as though they were adjectives), certain kinds of adverbs that modify the adjectives, as with "very" in "very bright lights", participles functioning as adjectives (as in "hired man" and "firing squad"), cardinals, ordinals, determiners, and quantifiers. There are constraints on the way these ingredients can be put together. Here are some examples of noun phrases: Ships (as in Ships are expensive to build, three ships (cardinal + noun), all three ships (quantifier + cardinal + noun), the ships (determiner + noun), enemy ships (nominal + noun), large, grey ships (adjective + adjective + noun), the first three ships (determiner + ordinal + cardinal + noun), my ships (possessive + noun).
symbol used in grammar
rules for a noun phrase.
The term grammatical
number refers to whether the concept described consists of a single unit (singular number), like "this pen", or to
more than one unit (plural number), like "these pens", or
"three pens".
In
some languages other than English, there may be different distinctions drawn -
some languages distinguish between one, two, and many, rather than just one and
many as in English.
Nouns in English are
mostly marked for number - see plural.
Pronouns and certain determiners may also be marked for number. For example, "this"
is singular, but "these" is plural, and "he" is singular,
while "they" is plural.
The object of a sentence is the noun phrase that appears after the verb in a declarative English sentence. For example, in The cat ate the
pizza, the pizza is the object. In The pizza was
eaten by the cat, there is no object. Object noun phrases can be
arbitrarily long and complex. For example, in He ate a pizza with lots
of pepperoni, pineapple, capsicum, mushrooms, anchovies, olives, and vegemite,
the object is a pizza with lots of pepperoni, pineapple, capsicum,
mushrooms, anchovies, olives, and vegemite. [No, I do not have shares in a
pizza company.]
See
also ditransitive, transitive,
and intransitive.
A form of number word
that indicates rank rather than value. Thus "one, two, three, four, five,
six, seven" are cardinal numbers, whose corresponding ordinal numbers are
"first, second, third, fourth, fifth, sixth, seventh".
= lexical generation probability, but used in the context of a Hidden Markov Model.
A parse tree is a way of
representing the output of a parser, particularly with a context-free grammar. Each phrasal constituent found
during parsing becomes a branch node of the parse tree. The words of the
sentence become the leaves of the parse tree. As there can be more than one
parse for a single sentence, so there can be more than one parse. Example, for
the sentence "He ate the pizza", with the respect to the grammar with
rules
S → NP VP, NP → PRO, NP
→ ART N, VP → V NP,
and lexicon
("ate" V)
("he" PRO) ("pizza" N) ("the" ART)
the parse tree is
Note that this graphical representation of the
parse tree is unsuitable for further computer processing, so the parse tree is
normally represented in some other way internally in NLP systems. For example,
in a Prolog-like notation, the tree above could be represented as:
s(np(pro("He")),
vp(v("ate"),
np(art("the"), n("pizza")))).
A parser is an algorithm
(or a program that implements that algorithm) that takes a grammar, a lexicon, and a string of words,
decides whether the string of words can be derived from the grammar and lexicon
(i.e. is a sentence with respect to the grammar and lexicon.
If
so, it produces as output some kind of representation of the way (or ways) in
which the sentence can be derived from the grammar and lexicon. A common way of
doing this is to output (a) parse tree(s).
The process of going
through a corpus of sentences and labelling each word in each sentence with
its part of speech. A tagged corpus is a corpus that has been
so labelled. A tag is one of the labels. Large-scale corpora
might use tag-sets with around 35-40 different tags (for
English). See Allen Fig. 7.3 p. 196 for an example of a
tag-set.
part of speech, POS
Synonymous with lexical category: the role, like noun, verb, adjective, adverb, pronoun,
preposition, etc. that a word is either playing in a particular sentence (e.g. like is
acting as a preposition in I like pizza) or that it can play in
some sentence: e.g. like can act as a verb, noun, adjective,
adverb, preposition, and conjunction. (It can also act as a "filled
pause", as do um, er, and uh -
e.g. >He's, like, a pizza chef in this, like, fast food joint
downtown.
Participles come in two
varieties (in English) - present participles and past
participles. (Often abbreviated to PRESPART and PASTPART or something
equivalent, like ING and EN). Present participles are variants on verbs; they end in
"-ing", as in "setting", "being",
"eating", "hiring". Past participles end in
"-ed", "-en", or a few other possibilities, as in
"set" (past participle the same as theinfinitive form of the verb), "been", "eaten",
"hired", "flown" (from "fly").
Participles
are used in constructing tensed forms of verbs, as in "he is eating",
"you are hired", and also as though they were adjectives in phrases
like "a flying horse" and "a hired man".
In
some cases, present participles have become accepted as nouns representing an
instance of the action that the underlying verb describes, as with
"meeting".
PRESPART
and PASTPART are lexical grammatical
categories.
A particle is usually a
word that "normally" functions as a preposition,
but can also modify the sense of a noun. Not all prepositions
can be particles. An example of a word functioning as a particle is
"up", in "The mugger beat up his victim". Here "beat
up" functions as a unit that determines the action being described. A
telltale sign of a particle is that it can often be separated from the verb, as
in "The mugger beat the victim up". Sometimes it can be non-trivial
for an NLP system to tell whether a word is being used as a particle or as a
preposition. For example, in "Eat up your dinner", "up" is
definitely a particle, but in "He eats up the street", "up"
is a preposition, but it takes real-world knowledge to be sure of this, as the
alternative possibility, that the person being referred to is eating the
street, is syntactically reasonable (though not pragmatically reasonable,
unless perhaps "he" refers to a bug-eyed asphalt-eating alien.)
See
also phrasal verb.
Both active and passive
voice are described in the article on active voice.
See modal
operators - tense and tense.
An abbreviation for Past Participle,
particularly in grammar rules.
See tense.
= object.
Person is a feature of
English noun phrases that is principally of significance with pronouns and related forms. The possible values of
person are first person signifying the speaker (possibly with
his/her companions), second
person signifying the
person addressed (possibly with his/her companions), and third person signifying anybody else, i.e. not speaker or person
addressed or companion of either.
Below
is a table of the forms of pronouns, etc. in English, classified by person and
syntactic case:
case
|
first person
|
second person
|
third person
|
nominative
|
I/we
|
thou/you/ye
|
he/she/it/they
|
accusative
|
me/us
|
thee/you/ye
|
him/her/it/them
|
possessive adjective
|
mine/ours
|
thine/yours
|
his/hers/its/theirs
|
possessive
|
my/our
|
thy/your
|
his/her/its/their
|
reflexive
|
myself/ourselves
|
thyself/yourself
yourselves |
himself/herself
itself/themselves |
A low-level
classification of linguistic sounds - phones are the acoustic patterns that are
significant and distinguishable in some human language. Particular languages
may group together several phones and regard them as equivalent. For example,
in English, the L-sounds at the beginning and end of the word
"loyal", termed "light L" and "dark L" by
linguists, are distinct in some languages. Light L and dark L are termed allophones of
L in English. Similarly, the L and R sounds in English are regarded as equivalent
in some other languages.
Start by reading about phones. Phonemes are the
groups of phones (i.e. allophones) regarded as linguistically equivalent by
speakers of a particular language. Thus native English speakers hear light L
and dark L as the same sound, namely the phoneme L, unless trained to do
otherwise. One or more phonemes make up a morpheme.
The study of acoustic
signals from a linguistic viewpoint, that is, how acoustic signals are
classified into phones.
The study of phones, and how they are
grouped together in particular human languages to form phonemes.
A kind of non-terminal symbol of a grammar - a non-terminal
determines a phrasal category if it cannot appear in a lexical insertion rule,
that is, a rule of the form X → w, where w is a word.
Examples include NP, VP, PP, ADJP, ADVP and S . Non-examples include N, V, ADJ,
PREP, INTERJ, ADV (see lexical
category).
Contrast
with lexical category.
A
phrasal verb is one whose meaning is completed by the use of a particle. Different particles
can give rise to different meanings. The verb "take" participates in
a number of phrasal verb constructs - for example:
take in
|
deceive
|
He was taken in by the swindler
|
take in
|
help, esp. with housing
|
The homeless refugees were taken
in
by the Sisters of Mercy |
take up
|
accept
|
They took up the offer of help.
|
take off
|
remove
|
She took off her hat.
|
A unit of language
larger than a word but smaller than a sentence. Examples include noun
phrases, verb phrases, adjectival phrases, and adverbial phrases.
See
also phrasal categories.
See tense.
A predicate
operator that handle
plurals. plur transforms a predicate like book1 into a predicate plur(book1).
If book1 is true of any book, then plur(book1) is true of any set of books with
more than one member. Thus "the books fell" could be represented by
the(X : PLUR(BOOK1)(X), past(fall1(x))).
A noun in a form that signifies more than one of
whatever the base form of the noun refers to. For example, the plural of
"pizza" is "pizzas". While most plurals in English are
formed by adding "s" or "es", or occasionally doubling the
last letter and adding "es", there are a number of exceptions. Some
derive from words borrowed from other languages, like
"criterion"/"criteria", "minimum"/"minima",
"cherub"/"cherubim", and
"vertex"/"vertices". Others derive from Old English words
that formed plurals in nonstandard ways, like "man"/"men",
"mouse"/"mice", and "child"/"children".
This is a name applied
to two English pronoun forms that indicate possession. There are possessive
adjectives and possessive pronouns. They are tabulated below:
person
& number
|
possessive
adjective |
possessive
pronoun |
first
person singular
|
my
|
mine
|
first
person plural
|
our
|
ours
|
second
person singular
(archaic)
|
thy
|
thine
|
second
person (modern)
|
your
|
yours
|
third
person singular
|
his/her/its
|
his/hers/its
|
third
person plural
|
their
|
theirs
|
Abbreviation for prepositional
phrase.
The problem of deciding
what component of a sentence should be modified by a prepositional phrase
appearing in the sentence. In the classic example "The boy saw the man on
the hill with the telescope", "with the telescope" could modify
"hill" (so the man is on the "hill with the telescope") or
it could modify "saw" (so the boy "saw with the
telescope"). The first attachment corresponds to a grammar rule like np →
np pp, while the second corresponds to a grammar rule like vp → v np pp. Both
rules should be present to capture both readings, but this inevitably leads to
a multiplicity of parses. The problem of choosing between the parses is normally
deferred to the semantic and pragmatic phases of processing.
Pragmatics can be
described as the study of meaning in context, to be contrasted with semantics,
which covers meaning out of context. For example, if someone says "the
door is open", there is a single logical form for this. However, there is
much to be done beyond producing the logical form, in really understanding the
sentence. To begin with, it is necessary to know which door "the
door" refers to. Beyond that, we need to know what the intention of the
speaker (say, or the writer) is in making this utterance. It could be a pure statement
of fact, it could be an explanation of how the cat got in, or it could be a
tacit request to the person addressed to close the door.
It is also possible for a sentence to be
well-formed at the lexical, syntactic, and semantic levels, but ill-formed at
the pragmatic level because it is inappropriate or inexplicable in context. For
example, "Try to hit the person next to you as hard as you can" would
be pragmatically ill-formed in almost every conceivable situation in a lecture
on natural language processing, except in quotes as an example like this. (It
might however, be quite appropriate in some settings at a martial arts lesson.)
pre-terminal
This term is used in (at
least) three senses:
1. In NLP, equivalent to verb phrase,
used in the analysis of a sentence into a subject and predicate.
2.
In logic, a predicate is
a logical formula involving predicate symbols, variables, terms, quantifiers
and logical connectives.
3.
In Prolog - see here.
Predicate
operators form a part of the logical form language. They transform one predicate into another predicate. For example, the predicate operator PLUR transforms a singular predicate like (DOG x) which is true
if x is a dog, into a plural equivalent (PLUR DOG) such that ((PLUR DOG) x) is
true if x is a set of more than one dog.
A
predictive parser is a parsing algorithm that operates top-down, starting with the start symbol,
and predicting or guessing which grammar rule to used to rewrite the current sentential form Alternative grammar rules are stacked so that they can be
explored (using backtracking) if the current sequences of guesses turns out to
be wrong.
On
general context-free grammars, a vanilla predictive parser takes exponential
parsing time (i.e. it can be very very slow). See also bottom-up parsers.
symbol used in grammar
rules for a preposition.
A
preposition is a part of speech that is used to indicate the role that the noun phrase that
follows it plays in the sentence. For example, the preposition "with"
often signals that NP that follows is theinstrument of
the action described in the sentence. For example "She ate the pizza with
her fingers". It can also indicate the co-agent, especially if the NP describes something that is animate. For example, "She ate the pizza with her friends". As
well as signalling a relationship between a noun phrase and the main verb of
the sentence, a preposition can indicate a relationship between a noun phrase
and another noun phrase. This is particularly the case with "of", as
in "The length of the ruler is 40 cm".
Prepositions
are the head items of prepositional
phrases.
Preposition
is often abbreviated to PREP.
PREP
is a lexical grammatical
category.
Prepositional phrase is
a phrasal grammatical category. Prepositional phrase is usually abbreviated to
PP. PPs serve to modify a noun phrase or a verb or verb phrase. For example, in
"The house on the hill is green", the prepositional phrase "on
the hill" modifies "the house", while in "He shot the deer
with a rifle", "with a rifle" is a prepositional phrase that
modifies "shot ..." (except in the extremely rare case that the deer
has a rifle :→).
Prepositional
phrases normally consist of a preposition followed
by a noun phrase.
The
main exception is the possessive clitic 's, as in "my uncle's
car", where the 's functions as a preposition ("my
uncle's car" = the car of my uncle") but follows the
noun phrase that the preposition normally precedes.
Occasionally
other structures are seen, such as "these errors notwithstanding", an
allowable variant on "notwithstanding these errors"
("notwithstanding" is a preposition).
PP
is a phrasal grammatical
category.
See
also PP attachment.
See modal
operators - tense and tense - present.
When
we think of grammar, we often think of the rules of good grammar that we may
have been taught when younger. In English, these may have included things like
"never split an infinitive" i.e. do not put an adverb between the
word "to" and the verb, as in "I want to really enjoy this
subject." (The origin of this rule is said to be the fact that
infinitive-splitting is grammatically impossible in Latin: some early
grammarians sought to transfer the rule to English for some twisted reason.)
Grammar in this sense is called "prescriptive grammar" and has
nothing to do with descriptive grammar", which is what we are concerned with in
NLP.
An abbreviation for
Present Participle, particularly in grammar rules.
= grammar rule - see Chomsky hierarchy and context
free grammar.
See aspect.
A proper noun is a noun that names an
individual, such as Amelia Earhart, rather than a type, for example woman,
or philosophy. Proper nouns are often compound. Amelia and Earhart would
each rank as proper nouns in their own right.
Contrast common noun.
Contrast common noun.
A
proposition is a statement of which it is possible to decide whether it is true
or false. There are atomic propositions, like "Mary likes pizza", and
compound ones involving
Umbrella term for adjectives and nominals or noun modifiers.
1. (in semantic interpretation) - objects in the logical form language
that correspond to the various words and groups of words that act in language
in the way that quantifiers do in formal logic systems. Obvious examples of
such words in English include all each every some most many several.
Less obvious examples include the, which is similar in effect to
"there exists a unique" - thus when we refer to, say, "the green
box", we are indicating that there exists, in the current discourse
context, a unique green box that is being referred to. This phrase would be
represented in the logical form language by an expression like the(b1 :
&(box1(b1), green1(b1))).
NL
quantifiers are typically restricted in the range of objects that the variable
ranges over. In Most dogs bark the variable in the MOST1
quantifier is restricted to dog1 objects: most1(d1 : dog1(d1), barks1(d1))
2.
In logic, this term
refers to the logical operators "forall" and "exists".
3.
This term is also
sometimes used for quantifying
determiners.
A quantifying
determiner, in English, is one of a fairly small class of words like
"all", "both", "some", "most",
"few", "more" that behave in a similar way to quantifiers
in logic.
See also determiners.
Reference, in NLP, is
the problem/methods of deciding to what real-world objects various natural
language expressions refer. Described in Chapter 14 of Allen. See also anaphor, cataphor, co-refer,discourse entity, history list,
and local discourse context
A
type of ambiguity where what is uncertain is what is being referred to by a
particular natural language expression. For example, in John hit Paul.
He was angry with him. it is not entirely clear to whom the pronouns
"he" and "him" refer - it could be that the second sentence
explains the first, in which case the "he" is John, or it could be
that the second sentence gives a consequence of the first, in which case the
"he" is Paul.
1. Regular grammar = right-linear grammar - see Chomsky hierarchy.
2.
A regular verb, noun,
etc. is one that inflects in a regular way. "save" is a
regular verb and "house" is a regular noun. On the other hand,
"break" (with past tense "broke" and past participle
"broken") is an irregular verb, and
"mouse" (with plural form "mice") is an irregular noun.
Relative clauses involve
sentence forms used as modifiers in noun phrases. These clauses are often introduced by relative
pronouns such as who, which and that.
For example, "The man who gave Barry the money". See Allen p.34
The rewriting process is
what is used in derivation to get from one sentential form to
the next.
The
process is as follows with context free grammars: pick a non-terminal X in the current string (or sentential form)
and a grammar rule whose left-hand side is that non-terminal X. Replace X in
the current string by the right-hand side of the grammar rule, to obtain a new
current string. This definition also works for regular grammars. A single step in the rewriting process is
called a direct derivation. For an example, see derivation.
The
process is similar with context-sensitive grammarsand unrestricted grammars, except that instead of picking a non-terminal
X in the current string, we find a substring of the current
string that matches the left-hand side of some context-sensitive or
unrestricted grammar rule, and replace it with the right-hand side of that
grammar rule.
right-linear grammar
right-to-left parsing
A parser or other NLP
algorithm is robust if it can recover from or otherwise handle ill-formed or
otherwise deviant natural language expressions. For example, we would not want
a syntactic parser to give up as soon as it encounters a word that is not in
its lexicon - preferably it should try to infer the lexical category and
continue parsing (as humans do when they first encounter a new word).
symbol used in grammar
rules for a sentence.
One
of the choices for the person feature. A sentence is "in the second person" if
the subject of the sentence is the person(s) addressed
as in "you like pizza" and the archaic "Ye like pizza" and
"Thou likest pizza".
"you",
"thou" and "ye" are second-person pronouns, as is "thee". Other words with the second-person
feature include "yours", "thine", "your",
"thy", "yourself", "yourselves", and
"thyself".
A
variant on a context free grammar, in which the non-terminals correspond to
semantic rather than syntactic concepts. A system of this type encodes semantic
knowledge about the types of sentences likely to appear in the input in the
grammar rules. For example, a system to handle text about ships and ports might
encode in its grammar rules information about the subject of a sentence about
"docking" must be a ship:
docksentence → shipnp
dockvp
The
problem with semantic grammar is that for coverage of a significant portion of
a language, a huge number of rules would be required, and a massive analysis of
the meanings that those rules could encode would be included in their
development.
Semantics is the study
of meaning (as opposed to form/syntax, for example) in language. Normally
semantics is restricted to "meaning out of context" - that is, too
meaning so far as it can be determined without taking context into account. See Allen page 10 for the
different levels of language analysis, and chapters 8-12 for detailed treatment
(we covered parts of chapters 8 and 9 only in COMP9414).
See
also logical form.
Sentence is the level of
language above phrase. Above sentence are pragmatic-level structures that interconnect
sentences into paragraphs, etc., using concepts such as cohesion. They are
beyond the scope of this subject.
Sentences
are sometimes classified into simple, compound, complex, and compound-complex,
according to the absence or presence of conjunctions, relative clauses (phrases with verb groups that are
introduced by "which", "that", etc.) or both. They may also
be analysed into subject and predicate.
Sentence
is often abbreviated to S (see also start symbol).
S
is a phrasal grammatical
category.
See
article on derivation.
shift-reduce parser
A type of parsing algorithm, not discussed in
COMP9414.
See tense.
See tense.
See tense.
A noun in a form that
signifies one of whatever type of object the noun refers to. For example, dog is
singular, whereas dogs is plural.
Universally quantified
variables can be handled (and are handled in Prolog) simply by assuming that
any variable is universally quantified. Existentially quantified variables must
thus be removed in some way. This is handled by a technique called skolemization.
In its simplest form, skolemization replaces the
variable with a new constant, called a Skolem constant. For
example, the formula:
exists(y, forall(x, loves(x, y))
would be encoded as an expression such as
loves(X, sk1),
where sk1 is a new constant that stands for the
object that is asserted to exist, i.e. the person (or whatever) that is loved
by every X.
Quantifier scoping dependencies are shown using
new functions called Skolem functions. For
example, the formula:
forall(y, exists(x, loves(x,y))
would be encoded as an expression such as
loves(sk2(Y), Y),
where sk2 is a new function that produces a
potentially new object for each value of Y.
See Skolem functions
A term from the
pragmatic end of language use: when we say (or write) something, each utterance
has a purpose and, if effective, accomplishes an act of some type. Examples of
different types of speech act include:
ask request inform deny congratulate confirm
promise
. Not covered in COMP9414, except to point out
that it is ultimately vital to understanding language. For some discussion, see Allen p. 542 ff., and
compare surface speech act.
The
start symbol of a grammar is a another name for the "distinguished
non-terminal" of the grammar. Details at context-free grammar. The start symbol of most NLP grammars is S (for sentence).
A group of techniques
relying on mathematical statistics and used in NLP to, for example, find the
most likely lexical categories or parses for a sentence. Often the techniques
are based on frequency information collected by analysing very large corpora of sentences in a
single language, to find out, for example, how many times a particular word (dog,
perhaps) has been used with a particular part of speech. The sentences in the
corpus have usually been tagged in some way
(sometimes manually) so that the information about the part of speech each time
each word is used, is known. Sometimes the sentences are hand-parsed, as well
(a treebank).
See
chapter 7 in Allen, and also Bayes' rule, bigram, trigram, n-gram, conditional probability, statistical independence, Hidden Markov Model, and Viterbi algorithm
string
A "string over an alphabet
A" means a sequence of symbols taken from the alphabet A, where by alphabet we mean just a set of symbols that we are
using in a similar way to the way that we use, say, the Latin alphabet to make
up words. Thus a word (in English) is a string over the alphabet {a, b, c, d,
e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z} (plus
arguably a few other items like hyphen and apostrophe). A construct like "ART
ADJ NOUN" is a string over an alphabet that includes the symbols ART, ADJ,
and NOUN. Similarly "NP of NP" is a string (over some alphabet that
includes the symbols "NP" and "of") that has two non-terminal symbols and one terminal symbol (namely
"of").
A
form of ambiguity in which what is in doubt is the syntactic structure of the
sentence or fragment of language in question. An example of pure structural
ambiguity is "old men and women" which is ambiguous in that it is not
clear whether the adjective old applies to the women or just to the men.
Frequently structural ambiguity occurs in conjunction with word-sense ambiguity, as in "the red eyes water" which could signify "the
communist looks at water":
s(np(art(the), n(red)), vp(v(eyes), np(n(water))))
or alternatively "the reddened eyes drip tear fluid"
s(np(art(the), adj(red), n(eyes)), vp(v(water)))
s(np(art(the), n(red)), vp(v(eyes), np(n(water))))
or alternatively "the reddened eyes drip tear fluid"
s(np(art(the), adj(red), n(eyes)), vp(v(water)))
See
also referential ambiguity.
The name for the feature
used to record the subcategorization of a verb or adjective.
Verbs and some adjectives admit complement structures.
They are said to subcategorize the structures that they can be
followed by. For example, some verbs can be followed by two noun phrases (likeJack
gave Mary food), some by at most one (like Jack kicked the dog),
and some by none (like Jack laughed). We would record this by
saying that the verbs have subcat np_np, or subcat np,
or subcatnone. Further examples are shown below (taken from Figures
4.2 and 4.4 in Allen:
Value
|
Example
Verb |
Example of Use
|
none
|
laugh
|
Jack laughed
|
np
|
find
|
Jack found a key
|
np_np
|
give
|
Jack gave Sue the paper
|
vp:inf
|
want
|
Jack wants to fly
|
np_vp:inf
|
tell
|
Jack told the man to go
|
vp:ing
|
keep
|
Jack keeps hoping for the best
|
np_vp:ing
|
catch
|
Jack caught Sam looking at his
desk
|
np_vp:base
|
watch
|
Jack watched Sam look at his desk
|
np_pp:to
|
give
|
Jack gave the key to the man
|
pp:loc
|
be
|
Jack is at the store
|
np_pp:loc
|
put
|
Jack put the box in the corner
|
pp_mot
|
go
|
Jack went to the store
|
np_pp:mot
|
take
|
Jack took the hat to the party
|
adjp
|
be, seem
|
Jack is happy
|
np_adjp
|
keep
|
Jack kept the dinner hot
|
s:that
|
believe
|
Jack believed that sharks wear
wedding rings
|
s:for
|
hope
|
Jack hoped for Mary to eat the
pizza.
|
Notice
that several verbs (give be keep) among the examples have more than one subcat.
This is not unusual. As an example of subcategorization by adjectives, notice
that "Freddo was happy to be a frog" is OK, so happy subcategorizes
vp:inf, but "Freddo was green to ..." cannot be completed in any way,
so green does not subcategorize vp:inf.
The subject of a sentence is the noun phrase that appears before the verb in a declarative English sentence. For example, in The cat sat on the
mat, The cat is the subject. In The mat was sat on, The
matis the subject. Subject noun phrases can be arbitrarily long and
complex, and may not look like "typical" noun phrases. For example,
in Surfing the net caused him to fail his course, the subject issurfing
the net. [Please excuse the subliminal message.]
A
subjunctive sentence is an embedded sentence that expresses a proposition that
is counterfactual (not true), such as "If John were to eat more
pizza, he would make himself sick", as opposed to a Y/N-question, a
WH-question, a command, or a statement. As can be seen from the example, the
subjunctive form of a verb resembles the past form in modern English, even
though it frequently refers to a possible future action or state (as in our
example). The past forms of some modals (e.g. "should",
"would" which were originally the past forms of verbs
"shall" and "will") are used for little else in modern
English.
See also Y/N-question, WH-question, imperative, indicative,
and mood.
See failure of substitutivity.
This
term refers to analysing the type of sentence into standard syntactic
categories - assertion, command, and the two kinds of question:
yes/no-questions and wh-questions. See Allen p. 250.
To
be contrasted with (pragmatic) speech acts.
Not a part of COMP9414
Artificial Intelligence.
A figure of speech in
which a single word appears to be in the same relationship to two others, but
must be understood in a different sense with each of the two other words (the
"pair"). See alsozeugma. I'm leaving for greener pastures and ten days.
One of a couple of dozen little-used terms for figures of speech.
One of a couple of dozen little-used terms for figures of speech.
Syntax means the rules
of language that primarily concern the form of phrases and
sentences, as distinct from the substructure of words (see morphology) or the meaning of phrases and sentences in or
out of context (see pragmatics and semantics).
An alternative approach
to linguistic grammar, driven from the functional (rather than the structural)
end of language. Not covered in COMP9414. See p.95 (Box 4.3) in Allen.
1. See part of speech tagging.
2.
As TAG, it
is also an acronym for Tree-Adjoining Grammar. TAGs were not mentioned
elsewhere in COMP9414, but are mentioned here just in case you run into them in
a book somewhere.
See part of speech tagging.
The tense of a verb or a sentence relates to the time when the action or state described by
the verb or sentence occurred/occurs/will occur. The main contrast with tense
is between past, present and future. Most verbs in English indicate the
past/present distinction by inflection (a
few, like "set", are invariant in this respect). Thus
"break" and "breaks" are the present tense forms of the
verb "break", and "broke" is the past tense form. The
future tense is constructed, in English, by using the auxiliaries "shall" and "will" with the verb -
"it will break", for example. Here is a list of the six major tense
forms:
Form
|
Example
|
Meaning
|
present
|
He drives a Ford
|
The "drive" action
occurs in the present, though
it suggests that this is habitual - it may have occurred in the past and may continue in the future. |
simple past
|
He drove a Ford
|
The action occurred at some time
in the past.
|
simple future
|
He will drive a Ford
|
The action will occur at some time
in the future.
|
past perfect
or pluperfect |
He had driven a Ford
|
At some time in the past, it was
true to say
"He drove a Ford". |
future perfect
|
He will have driven a Ford
|
At some point in the future, it
will be true to say
"He drove a Ford". |
See also participle.
Contrast tense, mood and aspect.
1. Used in the logical
form language to
describe constants and expressions that describe objects.
2.
Used in FOPC to refer to a class of objects that may be defined,
recursively, as follows:
o a constant is a term;
o a variable is a term;
o a function f applied to a suitable number of
terms t1, t2, ..., tn is a term: f(t1, t2, ..., tn).
3.
Used to refer to certain
types of Prolog language
constructs
A terminal symbol of a grammar is a symbol that
can appear in a sentence of the grammar. In effect, a terminal
symbol is a word of the language described by the grammar.
See
also non-terminal symbol and context-free grammar.
The word the gives
rise to an important NL quantifier written as THE in Allen or as the or the1 in the Prolog
notation used in COMP9414 assignments. Thus the dog barks has
logical form:
(THE d1 : (DOG1 d1)
(BARKS1 d1))
|
(Allen),
|
or
|
the(d1 : dog1(d1), barks1(d1))
|
(COMP9414)
|
Here d1 is the variable over
which THE quantifies.
This is also written as (BARKS1 <THE d1 DOG1>) in Allen's notation, and as barks1(the(d1, dog1)) in the Prolog notation.
This is also written as (BARKS1 <THE d1 DOG1>) in Allen's notation, and as barks1(the(d1, dog1)) in the Prolog notation.
= semantic case.
Term used for the noun
phrase that follows the verb in an active voice, indicative sentence in English. Also referred to as the object or sometimes the victim.
One
of the choices for the person feature. A sentence is "in the third person" if
the subject of the sentence is neither the speaker nor
the person(s) addressed, as in "she likes pizza" and "they like
pizza".
"she",
"he", "it", and "they" are third-person pronouns, as are "her", "him", and "them".
Other words with the third-person feature include "hers",
"his", "theirs", "their", "herself",
"himself", "itself" and "themselves".
top-down parser
A parser that starts by hypothesizing an S (see start symbol)
and proceeds to refine its hypothesis by expanding S using a grammar rule which
has S as its left-hand side (see rewriting process, and successively refining the non-terminals so produced, and so
on until there are no non-terminals left (only terminals).
See
also predictive parser.
A verb that can take a
single syntactic object, like eat, as in "He ate the pizza".
Sometimes transitive verbs appear without their object, as in "He ate
slowly" - the distinguishing characteristic of transitive verbs is that
they can take an object (unlike intransitive verbs, and they cannot take two objects, as ditransitive verbs can. See also subcategorization.
A trigram is a triple of
things, but usually a triple of lexical categories. Suppose that we are
concerned with three lexical categories L1, L2 and L3.
The term trigram is used in statistical NLP in connection with the conditional probability that a word will belong to L3 given
that the preceding words were in L1 and L2.
This probability is written Pr(L3 | L2 L1),
or more fully Prob(wi ∈ L3 | wi–1 ∈L2 & wi–2 ∈ L1). For example, in the
phrase "The green flies", given that The is tagged with ART, and green with ADJ, we would be
concerned with the conditional probabilities Pr(N | ADJ ART) and Pr(V | ADJ
ART) given that flies can be tagged with N and V. See also bigram and n-gram.
See Chomsky hierarchy.
symbol used in grammar
rules for a verb.
A feature used as part of
the logical form generation system described in lectures. It is used to provide
a "discourse variable" that corresponds to the constituent it belongs
to. It is useful in handling certain types of modifiers - for example, if we
have a ball b1 and it turns out to be red, then we can assert
(in the logical form for "red ball") that the object is both red and
a ball, by including &(ball1(b1), red1(b1)). Grammar rules,
once augmented to handle logical forms, usually give explicit instructions on
how to incorporate which var feature. For example, the rule
for intransitive VPs:
VP(var(?v), sem(lambda(A2,
?semv(?v, A2))) →
V(subcat(none), var(?v), sem(?semv))
V(subcat(none), var(?v), sem(?semv))
indicates that the VPs var feature
is derived from that of its V subconstituent and shows how the feature (?v) is
also incorporated into the sem of the VP.
var features are
described in Allen on page 268 ff.
A word describing an
action or state or attitude. Examples of each of these would be "ate"
in "Jane ate the pizza", "is" in "Jane is happy",
and "believed" in "Jane believed Paul at the pizza".
Verbs are one of the major sources of inflection in English, with most verbs having five
distinct forms (like "eat" with
"eat"/"eats"/"eating"/"ate"/"eaten".
The verb "be" is the most irregular, with forms "be",
"am", "is", "are", "being",
"was", "were", "been", plus some archaic forms,
like "art" as in "thou art".
Verb
is often abbreviated to V.
V
is a lexical grammatical
category.
The structure which
follows the verb or verb group in a sentence.
Example
|
Type of complement
|
Jane laughed.
|
empty
|
Jane ate the pizza.
|
NP
|
Jane believed Paul ate the pizza.
|
S
|
Jane wanted to eat the pizza.
|
to+VP
|
Jane gave Paul the pizza.
|
NP+NP
|
Jane was happy to eat the pizza.
|
ADJP
|
See also verb phrase.
This term is used for a
sequence of words headed by a verb together with auxiliaries, and possibly adverbs and the negative
particle "not".
For
example, in "Jane may not have eaten all the pizza", the verb group
is "may not have eaten".
Verb Phrase is a phrasal grammatical category. Verb phrase is usually abbreviated to VP. A verb phrase normally
consists of a verb or verb group and
a complement, together possibly with adverbialmodifiers and PP modifiers. The
simplest complements are noun phrases,
but sentential complements and similar structures are also possible.
The logical form of a verb phrase is a lambda-expression. For example, the logical form of "likes pizza" would
be something like λ(X, likes1(st1, X, pizza1)), where st1 is the var feature variable for the state of liking (pizza), and likes1 and
pizza1 are the semantic interpretations of the verb "likes" and the
noun "pizza", respectively.
See
also predicate.
A feature of verbs that signifies what form of the
verb is present - particularly useful with verbs are irregular in some of their
forms, or where a particular form of the verb is required by a particular
syntactic rule (for example, modal auxiliaries force the infinitive form of the
verb - VFORM inf).
VFORM
|
Example
|
Comment
|
base
|
break, be, set, decide
|
base form
|
pres
|
break, breaks, am, is, are,
set, sets, decide, decides |
simple present tense
|
past
|
broke, was, were, set, decided
|
simple past tense
|
fin
|
-
|
finite = tensed = pres or past
|
ing
|
breaking, being, setting, deciding
|
present participle
|
pastprt
|
broken, been, set, decided
|
past participle
|
inf
|
-
|
used for infinitive forms with to
|
= object.
The Viterbi algorithm is
an algorithm applicable in a range of situations that allows a space that
apparently has an exponential number of points in it to be searched in
polynomial time.
The Viterbi algorithm was not actually described
in detail in COMP9414, but was referred to in the section on statistical NLP in
connection with a method for finding the most likely sequence of tags for a
sequence of words. Reference: Allen p. 201 ff.,
especially from p. 202.
symbol used in grammar
rules for a verb phrase.
wh-question
A WH-question sentence is one that expresses a
question whose answer is not merely yes or no, as opposed to a Y/N-question, a
command, or a statement. See also Y/N-question, imperative,indicative, subjunctive, and mood.
Words are units of
language. They are built of morphemes and are used to build phrases (which are in turn
used to build sentences.
·
See also lexeme
·
See also terminal symbol
One of several possible
meanings for a word, particularly one of several with the same part of speech.
For example, dog as a noun has at least the following senses:
canine animal, a type of fastening, a low person, a constellation - the
dog barked, dog the hatches, You filthy dog! Canis major is also called the
great dog.
A
kind of ambiguity where what is in doubt is what sense of a word is intended.
One classic example is in the sentence "John shot some bucks". Here
there are (at least) two readings - one corresponding to interpreting
"bucks" as meaning male deer, and "shot" meaning to kill,
wound or damage with a projectile weapon (gun or arrow), and the other
corresponding to interpreting "shot" as meaning "waste",
and "bucks" as meaning dollars. Other readings (such as damaging some
dollars) are possible but semantically implausible. Notice that all readings
mentioned have the same syntactic structure, as in each case, "shot"
is a verb and "bucks" is a noun.
See also structural ambiguity and referential
ambiguity.
y/n question
A Y/N-question sentence is one that expresses a
question whose answer is either yes or no, as opposed to a WH-question, a
command, or a statement. See also WH-question, imperative, indicative,subjunctive, and mood.
Not a part of COMP9414
Artificial Intelligence, but it allows us to avoid having an empty list of
Z-concepts in the NLP Dictionary. :-)
A zeugma is a syllepsis in which the single word fails to give
meaning to one of its pair. She greeted him with arms and expectations
wide.
One
of a couple of dozen little-used terms for figures of speech.
No comments:
Post a Comment