"ako"
signifies "a kind of". It acts as a relation between two types of
things, and specifies that one is a subset of the other. For example, chair ako furniture signifies that the
concept of chair is a subconcept of the concept of furniture, or to put it another
way, the set of all chairs is a subset of the set of all furniture.
"ako" acts like an operator in the
iProlog frame implementation. For example, assuming that a furniture frame already
existed, chair ako furniture with ... would have the
effect of creating a newgeneric frame with all the slots of the furniture generic frame, together with whatever extra slots were specified after the with.
"ako"
should be contrasted with isa.
Backward
chaining is a means of utilizing a set of condition-action rules. In backward chaining, we work back from possible conclusions of
the system to the evidence, using the rules backwards. Thus backward chaining
behaves in a goal-driven manner.
One
needs to know which possible conclusions of the system one wishes to test for.
Suppose, for example, in a medical diagnosis expert system, that one wished to
know if the data on the patient supported the conclusion that the patient had
some particular disease, D.
In
backward-chaining, the goal (initially) is to find evidence for disease D. To
achieve this, one would search for all rules whose action-part included a
conclusion that the patient had disease D. One would then take each such rule and
examine, in turn, the condition part of the rule. To support the disease D
hypothesis, one has to show that these conditions are true. Thus these
conditions now become the goals of the backward-chaining production system. If
the conditions are not supported directly by the contents of working memory, we
need to find rules whose action-parts include these conditions as their
conclusions. And so on, until either we have established a chain of reasoning
demonstrating that the patient has disease D, or until we can find no more
rules whose action-parts include conditions that are now among our list of
goals.
Backward-chaining
is to be contrasted with forward chaining.
Best-first search,
rather than plunging as deep as possible into the tree (as in depth-first search), or traversing each level of the tree in succession (as in breadth-first search), uses a heuristic to decide at each stage
which is the best place to continue the search. Here is a more technical discussion of best-first search.
Breadth-first
search is best understood with respect to a tree, though it can be
applied to graphs in general, provided a starting
node is nominated.
A
complete breadth first search traverses every node of the tree or graph,
starting from the root node or starting node, first processing, checking, or
inspecting the root/starting node. In future we'll just say it
"processes" the node. Next it processes the neighbours of the
root/starting node (in some order), and then the neighbours of the neighbours,
and so on, until all the nodes have been processed.
In
the case of a tree, no node will be visited twice - this is a property of
trees. In the case of a graph, whether a directed acyclic graph or a general graph, some nodes will be
visited twice. On the second and subsequent visits, the node and its neighbours
should be ignored. Thus a breadth-first algorithm on such graphs needs to mark
each node as visited when it first encounters it, and check
each node it comes to process to see if it has already been visited.
For
the tree shown below, the order of visiting for a breadth first search would
be: A B C D E F G H I J
A
/ \
/ \
B C
/ \ / \
D E F
G
/ \
\
H I
J
Compare depth first search.
A certainty factor is a
number, often in the range -1 to +1, which is associated with a condition or an
action of a rule. In more detail, each component of a condition may have an
certainty factor associated with it - for example if the condition is of the
form A and B, then there could be a certainty factor for A and
a certainty factor for B.
A
certainty factor of 1 means that the fact (or proposition) is highly certain. A
certainty factor of 0 means no information about whether the proposition is
true or not. A certainty factor of -1 means that the proposition is certainly
false. A certainty factor of 0.7 means that the proposition is quite likely to
be true, and so on.
The
certainty factors of conditions are associated with facts held in working memory. Certainty factors for actions are stored as part of the rules.
Rules
for manipulating certainty factors are given in the lecture notes on uncertain reasoning.
However,
here is a simple example. Suppose that there is a rule
if P then Q (0.7)
meaning that if P is true, then, with certainty
factor 0.7, Q follows. Suppose also that P is stored in working memory with an
associated certainty factor of 0.8. Suppose that the rule above fires (see alsomatch-resolve-act cycle). Then Q will be added to working memory with
an associated certainty factor of 0.7 * 0.8 = 0.56.
A
condition-action rule, also called a production or production
rule, is a rule of the form
if condition then action.
The
condition may be a compound one using connectives like and, or,
and not. The action, too, may be compound. The action can affect
the value of working memory variables, or take some real world action, or potentially do
other things, including stopping the production system.
The
knowledge of many expert systems is principally stored in their collections of
rules.
See
also inference engine.
Conflict
resolution in a forward-chaining inference engine decides which of several rules that could be
fired (because their condition part matches the contents of working memory should actually be fired. Conflict resolution proceeds by
sorting the rules into some order, and then using the rule that is first in
that particular ordering. There are quite a number of possible orderings that
could be used:
Specificity Ordering
If a rule's condition part is a superset of
another, use the first rule since it is more specialised for the current task.
For example, you might have two satisfied rules whose condition parts are:
(i) X is a dog and X likes cheese and X chases cars
(ii) X is a dog and X likes cheese
Then (i) is more specific than (ii) so under specificity ordering, you would use (i).
(i) X is a dog and X likes cheese and X chases cars
(ii) X is a dog and X likes cheese
Then (i) is more specific than (ii) so under specificity ordering, you would use (i).
Rule Ordering
Choose the first rule in the text, ordered
top-to-bottom.
Data Ordering
Arrange the data in a priority list. Choose the
rule that applies to data that have the highest priority.
Size Ordering
Choose the rule that has the largest number of
conditions. For example, you might have two satisfied rules whose condition
parts are:
(ii) X is a dog and X likes cheese
(iii) X is a dog and X bites postmen and X chases cars and X has rabies
Then (iii) is has larger size (i.e. more conditions) than (ii) so under size ordering, you would use (iii). Note that neither rule is more specific than the other, so this is different from specificity ordering, which can be viewed as a special case of size ordering. Size ordering will work in cases where specificity ordering will not. Conflict resolution by size ordering can of course fail if there is tie for the satisfied rule with the most conditions.
(ii) X is a dog and X likes cheese
(iii) X is a dog and X bites postmen and X chases cars and X has rabies
Then (iii) is has larger size (i.e. more conditions) than (ii) so under size ordering, you would use (iii). Note that neither rule is more specific than the other, so this is different from specificity ordering, which can be viewed as a special case of size ordering. Size ordering will work in cases where specificity ordering will not. Conflict resolution by size ordering can of course fail if there is tie for the satisfied rule with the most conditions.
Recency Ordering
The most recently used rule has highest
priority. The least recently used rule has highest priority. The most recently
used datum has highest priority. The least recently used datum has highest
priority.
Context Limiting
Reduce the likelihood of conflict by separating
the rules into groups, only some of which are active at any one time. Have a
procedure that activates and deactivates groups.
In frames, a default (value) is a
value to be assigned to a particular slot in an instance frame if a specific value has not be assigned or added to
that slot. The default is specified as part of the corresponding generic frame, and thus is inherited by the instance
frame when the instance frame is created.
A demon is a facet of a slot in a frame which causes some
action to be taken when the frame is accessed in certain types of ways. For
example, an if-needed demon is activated or triggered if
the value of the slot is required and a value has not yet been stored in the
slot, and it should calculate or otherwise obtain a value for the slot, while a range demon
is triggered if a new value is added to the slot, to check that the value added
is permissible for this particular slot.
Here
is a list of the demon types supported by the iProlog frame implementation:
if_removed
if_replaced
if_needed
demons are triggered
when there is no value present in an instance frame and a value must be
computed from a generic frame.
if_new
range
is triggered when a new
value is added. The value must satisfy the range constraint specified for the
slot.
help
is triggered when the
range demon is triggered and returns false.
The following are not demons but demon-related
slots in a frame.
multi_valued
means
that the slot may contain more than one value.
For examples, see the lecture notes.
Depth-first search is
best understood with respect to a tree, though it can be applied to graphs in general, provided a starting node is
nominated.
A
complete depth first search traverses every node of the tree or graph, starting
from the root node or starting node, first processing, checking, or inspecting
the root/starting node. In future we'll just say it "processes" the
node. Next it considers (but does not yet process) the neighbours of the
root/starting node. The neighbours should be ordered in some way. Suppose that
the first neighbour is called F1. Depth-first search proceeds to search first
the subtree or subgraph composed of the neighbours of F1. Suppose that the
first of F1's neighbours is F2. Depth-first search proceeds by searching the
subtree or subgraph composed of the neighbours of F2. And so on, until the
bottom of the tree/graph is reached. Thus the algorithm can be expressed recursively as follows (for a tree):
to depthFirstSearch a tree
with root R
if tree is empty
then % we're finished
else
let N1, N2, ..., Nk be the
neighbours of R
depthFirstSearch the subtree with root
N1
depthFirstSearch the subtree with root
N2
...
depthFirstSearch the subtree with root
Nk
In the case of a tree, no node will be visited
twice - this is a property of trees. In the case of a graph, whether a directed acyclic graph or a general graph, some nodes will be
visited twice. On the second and subsequent visits, the node and its neighbours
should be ignored. Thus a depth-first algorithm on such graphs needs to mark
each node as visited when it first encounters it, and check
each node it comes to process to see if it has already been visited.
For
the tree shown below, the order of first visiting for a depth first search
would be: A B D H E C F I G J
A
/ \
/
\
B C
/ \ / \
D E F
G
/ \
\
H I
J
In the case of binary trees there are 3 common variants of depth-first search called pre-order,
in-order, and post-order traversal. The variants distinguish between first
visiting a node, and processingthat node, i.e. doing
something with the data stored in the node (e.g. printing out the name of the
node). There are six possible orders for three objects, of which the three
orders commonly used are as follows:
·
process node, visit left
subtree, visit right subtree. This variant is called pre-order traversal. In
order traversal of the binary tree shown above processes the nodes in the order A B D H E C F I G Jas for simple depth-first search.
·
visit left subtree,
process node, visit right subtree. This variant is called in-order traversal.
In order traversal of the binary tree shown above processes the nodes in the
order H D B E A F I C G J.
·
visit left subtree,
visit right subtree, process node. This variant is called post-order traversal.
In order traversal of the binary tree shown above processes the nodes in the
order H D E B I F J G C A.
Compare breadth first search.
directed acyclic graph
see graph
directed graph
see graph
A component of a graph.
An expert system is a
computer system intended to perform at the level of a human expert in whatever
domain it aspires to deal with. Early expert systems included systems aimed at
medical diagnosis. All expert systems must confront the issue of knowledge representation
- that is, how the program will encode the knowledge of the human expert.
Knowledge representation methods include production rules, frames, ripple-down rules, semantic networks. Many or most expert systems are rule-based systems.
Facets are the
components of a slot in a frame. Most slots would have
a value facet and a default facet.
Many would also have facets corresponding to various demons such as a range or if-neededdemon
- each demon is a facet.
A production is said to be ready to fire if its
condition part matches the contents of working memory.
Forward
chaining is a means of utilizing a set of condition-action rules. In this mode of operation, a rule-based system is data-driven.
We use the data to decide which rules can fire, then we fire one of those
rules, which may add to the data in working memory and
then we repeat this process until we (hopefully) establish a conclusion.
For
more details of this method of using rules, see inference engine.
Forward-chaining
is to be contrasted with backward chaining.
Frames are a knowledge
representation technique. They resemble an extended form of record (as in
Pascal and Modula-2) or struct (using C terminology) or class (in Java) in that
they have a number ofslots which are like fields in a record or struct, or variable in
a class. Unlike a record/struct/class, it is possible to add slots to a frame
dynamically (i.e. while the program is executing) and the contents of the slot
need not be a simple value. If there is no value present in a slot, the frame
system may use a default for frames of that type, or there may be a demon present to help compute a value for the slot. Documentation
for frames as implemented in iProlog is available at http://www.cse.unsw.edu.au/~claude/teaching/AI/notes/prolog/Frames/Frames.html. (This link is outside my web space and may
break.)
Demons
in frames differ from methods in a Java class in that a demon is associated
with a particular slot, whereas a Java method is not so linked to a particular
variable.
A frame that serves as a
template for building instance
frames. For example, a generic
frame might describe the "elephant" concept in general, giving
defaults for various elephant features (number of legs, ears, presence of trunk
and tusks, colour, size, weight, habitat, membership of the class of mammals,
etc.), which an instance frame would describe a particular elephant, say
"Dumbo", who might have a missing tusk and who would thus have the
default for number of tusks overridden by specifically setting number of tusks
to 1. Instance frames are said to inherit their slots from the generic frame used to create them. Generic frames
may also inherit slots from other generic frames of which they are a subconcept
(as with mammal and elephant - elephant
inherits all the properties of mammal that are encoded in the mammal generic
frame - warm blood, bear young alive, etc.)
See article on operators
and states.
This material should be familiar to you if you
have studied discrete mathematics, either in a mathematics department, or, for
example, in the subject COMP9020 Foundations of Computer Science.
Technically, a
(directed) graph is a set V of vertices or nodes,
together with a set E of directed edges, which are ordered pairs of
vertices. Graphs are widely used in computer science as a modelling tool. A
simple example of a graph would be V = {1, 2, 3} and E = {(1,2), (3,1)}, which
could be drawn as:
3 -----> 1 ----->
2
Usually it is convenient to have names or
"labels" for the nodes - technically this could be expressed as a
mapping from V to a set VLabels of labels for vertices. Sometimes it is useful
(as in semantic
networks), to have labels for the edges. Again,
technically this can be expressed as a function from E to a set ELabels of
labels for edges.
It is also possible to have undirected graphs,
in which the edges are not ordered but rather unordered pairs. Consider the possibility
of edges from a node to itself - sometimes these could be useful, sometimes
not.
A directed cycle in a directed
graph is a sequence of edges (v1, v2), (v2, v3), ..., (vn, v1) such that the
second vertex of the final edge is the same as the first vertex of the first
edge. Here is a simple example:
1 ------> 3
^ |
| |
| v
2 <------ 4
The in-degree of
a vertex v in a directed graph is the number of directed edges (u, v) such that
v is the second vertex of the edge. In the following example, vertex b has
in-degree 2 and the other vertices have in-degree 0. In the example above, all
vertices have in-degree 1.
a
|
|
v
b <------ c
The out-degree of
a vertex u in a directed graph is the number of directed edges (u, v) such that
u is the first vertex of the edge. In the example above, vertex b has
out-degree 0 and the other vertices have out-degree 1.
Directed acyclic graphs are graphs with no directed cycles.
Trees are a special kind of directed graph, in
which there is a special vertex, called the root, and which has
in-degree 0, and every other vertex has in-degree 1.
Binary trees are a special kind of tree, in which every
node has out-degree at most 2.
Graphs can be represented in a variety of ways.
One can represent them directly as lists or other collections of (directed)
edges:
edge(1, 3).
edge(3, 4).
edge(4, 2).
edge(2, 1).
or
[[1, 3], [3, 4], [4, 2], [2, 1]]
It is also possible to
use adjacency matrices, a representation using a matrix
of 0s and 1s:
0 0 1 0
1 0 0 0 .... the 1 in the (2,1)-position tells us (2,1) is an edge
0 0 0 1
0 1 0 0
1 0 0 0 .... the 1 in the (2,1)-position tells us (2,1) is an edge
0 0 0 1
0 1 0 0
The structures represented using graphs in
Artificial Intelligence (and elsewhere) frequently need to be searched. There
are a number of graph search algorithms for this purpose.
See breadth first search, depth
first search, and best first search. There are many other
search concepts - see the textbook by Bratko or elsewhere: look for minimax
search, and alpha-beta pruning for a start, if interested.
A heuristic is a fancy name for a "rule of
thumb" - a rule or approach that doesn't always work or doesn't always
produce completely optimal results, but which goes some way towards solving a
particularly difficult problem for which no optimal or perfect solution is
available.
A rule-based system requires some kind of
program to manipulate the rules - for example to decide which ones are ready to fire. (i.e. which ones have
conditions that match the contents of working
memory). The program that does this is called an
inference engine, because in many rule-based systems, the task of the system is
to infer something, e.g. a diagnosis, from the data using the rules. See also match-resolve-act
cycle.
Inheritance is a property of semantic networks and frames. It works as follows.
In the case of a semantic network, if I try to
use the network to retrieve, say, the number of legs of a node (with name
"Dumbo") the system will first look to see if the "Dumbo"
node has an explicit "legs" link. If so, it is followed. If not,
inheritance is applied, and, since "Dumbo" is an object rather than a
type, the isa link is followed.
Assuming "Dumbo" isa elephant, we then check the
"elephant" node to see if it has a link labelled "legs". If
so, we use it. If not, we look to see if "elephant" has a ako link, perhaps to
"mammal". If mammal has a link labelled "legs" then we use
it. If not, then we look for further "ako" links from either
"elephant" or "mammal", and so on, until we either find the
"legs" or run out of semantic network to search.
In the case of a frame, the effect is the same,
but the details are different - an "elephant" frame would have been
constructed using the information in a "mammal" frame, since elephant ako mammal. The "Dumbo" instance
frame will have been constructed using the
"elephant" generic frame since Dumbo isa elephant. Thus elephant inherits all the
properties of mammal and Dumbo inherits all the
properties of elephant. At each stage, there is an opportunity to
change e.g. the number of legs if Dumbo should happen to an aberrant elephant
in the matter of legs.
See article on operators and states.
See generic
frames.
"isa"
signifies "is a". It acts as a relation between an object and a type,
and specifies that the object is a member of the type. For example, Fido isa dog signifies that the
object Fido is a member of the set of all dogs.
"isa" acts like an operator in the
iProlog frame implementation. For example, X isa dog with ... would have the
effect of binding X to a new instance
frame with
all the slots of the dog generic frame, together with whatever extra information (such
as slot values) was provided after the with.
"isa"
should be contrasted with ako.
The
match-resolve-act cycle is the algorithm performed by a forward-chaining inference engine. It can be expressed as follows:
loop
1. match all condition parts of condition-action rules
against working memory and collect all
the rules that match;
2. if more than one match, resolve
which to use;
3. perform the action for the chosen rule
until action is
STOP or no conditions match
Step 2 is called conflict resolution. There are a number of conflict resolution
strategies.
A component of a graph.
May also be a component of a neural network (see node in a neural network.
May also be a component of a neural network (see node in a neural network.
In problem solving, the
term operator is used to describe one of the procedures that
can be used to move from one state to another. One starts at
an initial state and uses the allowable operators to move towards the
or a goal state.
The sequence of states and operators (or
sometimes just the states) that lead from the initial state to the goal state
is referred to as a path. The trick lies in choosing operators that
do in fact lie along a path, and preferably a short or cheap path towards the
or a goal state. Sometimes it is possible to navigate intelligently through state
space, but sometimes blind backtracking search through state space is the only
possibility.
In
most cases, it is necessary to check, at each state, which operators are
feasible at this particular point. If, for example, the states were physical
positions in some real or simulated terrain, the operators moved one in
different physical directions, it would be necessary to check for example that
there were no barriers in some directions, or that there were no bad
consequences for steps in certain directions. For example, a step to the West
might land lead to a state which could not be escaped from with any of the
available operators.
path
See article on operators
and states.
Procedural attachment
refers to the practice of attaching small procedures called demons to slots in frames. Demons in frames resemble methods in
object-oriented languages such as Java, and historically precede them.
production or production rule
See condition-action rule.
Not covered in COMP9414.
Discussed in COMP9416.
A rule-based system is
one based on condition-action rules. See also backward chaining, forward chaining, inference engine, working memory,
and match-resolve-act cycle.
Search is a prevalent
metaphor in artificial intelligence. Many types of problems that do not
immediately present themselves as requiring search can be transformed into
search problems. An example is problem solving, which can be viewed in many
cases as search a state space, using operators to move from one
state to the next.
Particular
kinds of search are described under the headings breadth-first search, depth-first search, and best-first
search
Semantic networks are a
knowledge representation technique. More specifically, it is a way of recording
all the relevant relationships between members of set of objects and types.
"Object" means an individual (a particular person, or other
particular animal or object, such as a particular cat, tree, chair, brick,
etc.). "Type" means a set of related objects - the set of all
persons, cats, trees, chairs, bricks, mammals, plants, furniture, etc. Possible
relationships include the special set-theoretic relationships isa (set
membership) and ako(the subset relation), and also general
relationships like likes, child-of. Technically a
semantic network is a node- and edge-labelled directed graph, and they are
frequently depicted that way. Here is a pair of labelled nodes and a single
labelled edge (relationship) between them (there could be more than one
relationship between a single pair):
Here is a larger fragment of a semantic net,
showing 4 labelled nodes (Fifi, cat, mammal, milk) and three labelled edges
(isa, ako, likes) between them.
A slot in a frame is like a field in a record or struct in
languages like Pascal, Modula-2 and C. However, slots can be added dynamically
to frames, and slots contain substructure, called facets. The facets would normally include a value, perhaps a default, quite likely some demons, and possibly some
flags like the iProlog frame system's cache and multi_valued facets.
state
See the article on operators and states.
tree
see graph
A component of a graph. The irregular plural of vertex is vertices.
The working memory of a rule-based system is a store of information used by the system to decide which
of the condition-action rules is able to be fired. The contents of the working
memory when the system was started up would normally include the input data -
e.g. the patient's symptoms and signs in the case of a medical diagnosis
system. Subsequently, the working memory might be used to store intermediate
conclusions and any other information inferred by the system from the data
(using the condition-action rules.
The
term "working memory" is also used in cognitive psychology, where it
refers to the limited store of "chunks" (roughly, items in memory)
available at the same time for conscious processing.
No comments:
Post a Comment