Thus the history of computer programming has tended toward more sophisticated forms of abstraction, and promises to continue to do so. These abstractions, however, often come at a price: slower execution and greater memory consumption. Unfortunately, this conflicts with the broad principle in programming language design that ``you shouldn't have to pay for a feature you do not use.'' Smart compiler optimizations, however, can often avoid these disadvantages, if they can determine how individual uses of an abstraction will behave.
But it was the ground breaking work of Cousot and Cousot that finally provided a solid theoretical foundation for static analysis, which they called abstract interpretation. It is perhaps not coincidental that the very development that has created the growing need for analysis --- abstraction --- also lies at the heart of the solution. Abstract interpretation is based on the idea of viewing the analysis of a program as an abstraction of the program's behavior. The generality of this view allows abstract interpretation to abstract the concept of analysis altogether, providing a single framework that encompasses all the standard analysis techniques.
The foundation on which abstract interpretation is built is a formal understanding of how a program should behave. This discipline of formality benefits programming language implementation in general, as it precisely defines what every program should do. This, in turn, makes it possible to prove that an analysis is correct. As analyzers and the program properties they seek to determine become more subtle and more complex to compute, this proof of correctness will become increasingly important. Fortunately, the framework of abstract interpretation comes with a guarantee of correctness; only the individual analyses need to be proved.
We are quite accustomed to abstracting values, although we may not think of it this way. For example, we know when we multiply two numbers together, they will obey the rule of signs: a product will be negative whenever exactly one of the multiplicands is negative. This rule isn't quite right, however, because if the other multiplicand is zero, the product will be zero. We specify the rule more accurately in this table:
× | negative | zero | positive |
negative | positive | zero | negative |
zero | zero | zero | zero |
positive | negative | zero | positive |
We use this rule intuitively to verify the correctness of multiplications (though the ubiquity of hand-held calculators is rapidly relegating this sort of ``sanity test'' to the history books), and feel foolish if we arrive at an answer that defies this rule without noticing the error. What we are doing here is merely abstracting the values we are multiplying to their sign, performing an abstraction of the multiplication operation on these abstract values , and verifying that the abstraction of the value we have computed matches the expected abstraction.
We can use exactly this sort of reasoning in analyzing computer programs. In analyzing a program, one usually faces many possible pairs of values to be multiplied rather than a single pair, but the principle is the same.
Consider the following program fragment:
1: x := 10;
2: for i := 1 to 10 do
3: x := x + i
× 2;
An analyzer will easily determine that after line 1, x will
be
positive. It can also determine that i must always be
positive,
since it ranges from 1 to 10. In analyzing line 3, we find that we must
compute the abstraction: positive + positive × positive.
For this we will need a rule-of-signs abstraction for addition:
+ | negative | zero | positive |
negative | negative | negative | unknown |
zero | negative | zero | positive |
positive | unknown | positive | positive |
Referring to this table, as well as the table for multiplication, we see that (as we would expect), x must always be positive.
Notice, though, the problem in the rule of signs for addition: we cannot predict the sign of the result of adding a negative and a positive number. None of the values negative, zero, or positive can be chosen, because any number can result from adding a negative and positive number. Therefore we must add a new possibility, unknown, which means we don't know the sign. Almost every analysis must include the possibility that no pertinent information can be derived, so there is almost always such an abstract value.
In this example, we were immediately able to determine the sign of x. In general, however, the correct result may take some iterations of analysis.
Consider this program fragment:
1: x := 0;
2: y := 1;
3: while y > 0 do
4: y := y + x;
5: x := x + (-1);
In this case we conclude that after line 2, x is zero and y is positive. After line 4, the rule-of-signs table for addition tells us that y will still be positive. However, line 5 finds us adding zero and a negative number, giving us a negative number.
If we were to stop there, we would erroneously conclude that after line 4, y is always positive. Since lines 4 and 5 comprise a loop, we must reanalyze line 4, using the combination of analyses we have after line 5 and after line 2. In both cases we know that y is positive, but after line 2, x was zero, while after line 5 it was negative. We would like to conclude, then, that before line 4, x is less than or equal to zero, but we don't have any such abstract value. None of our ordinary abstract values are applicable, so we can only conclude that it must be unknown. We have not included rows and columns for unknown in our rule-of-signs tables, but in most cases, if at least one argument of either operation is unknown, the result will be unknown as well. (One exception to this is multiplication by zero: even if you do not know anything about a number, you know that its product with zero must be zero. This shows that it is sometimes possible to have more precise abstract information after an operation than before.) So on reanalysis we conclude that after line 4, y is unknown as well.
In this case, our analysis was unable to determine any useful information about the values of x and y inside the loop. If we included an abstract value for ``less than or equal to zero,'' however, we could have concluded that throughout the loop, x is less than or equal to zero.
This example illustrates that in some cases we must repeat our analysis in order to ensure we have the correct results. We have also seen that, not surprisingly, the appropriate choice of abstract values is very important to the descriptiveness of an analysis.
We can abstract these simple statements, giving them short names: R for ``it is raining'' and W for ``I will get wet.'' Then we can write R W for ``if it is raining, then I will get wet,'' R W for ``it is raining and I will get wet,'' and R for ``it is not raining.'' The rule of modus ponens tells us that from R and R W we can conclude W. Similarly, modus tollens lets us conclude R from R W and W.
This approach to logic works well as far as it goes, but doesn't allow one to capture the relationships among entities. For example, we might like a more sophisticated abstraction of raining and getting wet that says ``for every person x, if it is raining, and x does not have an umbrella, and x is outside, then x will get wet.'' In 1879, Gottlob Frege proposed the begriffsschrift , or ``idea writing,'' which has come to be known as predicate calculus. (Frege's own begriffsschrift notation was two-dimensional, and was therefore a significant burden to typesetters. Perhaps for this reason, it is no longer actively used. However, the principles and approach he outlined in 1879 are still in use today.) In predicate calculus, one might write
x . p(x) r u(x) o(x) w(x)
for this. Here x is called a variable, p, r, u, o, and w are called predicates , and p(x),r,u(x), o(x), and w(x) are called atomic formulae, or atoms. In this example, the atom p(x) means x is a person, u(x) means x has an umbrella, o(x) means x is outside, and w(x) means x gets wet. The universal quantification x . ..., is read ``for all x, it is the case that ....'' Predicate calculus has a dual existential quantification x . ..., which is read ``there exists an x such that ....'' Note that the scope of the quantifiers is important. For example, ( x . p(x)) ( x . q(x)) means that either p holds for everything, or q does, while ( x . (p(x) q(x)) means that for any x you want to pick, either p or q will hold for it. If we confine our discourse to integers and take p(x) to mean that x is even and q(x) to mean that x is odd, then clearly the latter holds, because every integer is either even or odd, while the former doesn't, because it's not true that all integers are odd, nor all even.It is this ability to make sweeping statements provided by the quantifiers, together with the ability to make statements about relationships among things given by predicates with multiple arguments, and the proof techniques that go with them, that makes predicate calculus so powerful.
The advent of computers brought interest in automatic theorem proving. A very successful early automatic theorem prover was devised by J. Alan Robinson , using what he called resolution. Robinson restricted his attention to formulae in clausal form (which was shown by Skolem to be fully general). A clause is a disjunction of literals, where a literal is an atom or a negated atom, in which all variables are universally quantified over the entire clause. Robinson was able to prove that resolution is sound, meaning it can only prove statements that are valid, that is, statements which must be true. He also showed that resolution is complete, meaning that it can prove any true statement in a finite number of steps. Unfortunately, resolution is not able to decide conclusively, in a finite number of steps, whether or not a statement is true. However, Alonzo Church had already shown that no automatic method would be able to do that, so resolution is as powerful as we could reasonably expect.
The basic idea of resolution is to prove an atom follows from a set of axioms by contradiction. Given a collection of clauses, which constitute the axioms of the system, we negate the atom to be proved and add it to the set of axioms (this is possible since we can view a single negated atom as a clause). Let us call this collection the ``store.'' Then we repeatedly choose two clauses from the store which have atoms that are identical, or can be made to be identical, except that precisely one is negated (we shall explain what we mean by ``can be made to be identical'' shortly, but for now consider only atoms which are identical). ``Resolving'' these two clauses on the matching atoms simply creates a new clause made of the disjunction of the two selected clauses with the matching atoms removed, and adds this clause to the store. This process of choosing two clauses and creating a new one from them continues until an empty clause is produced, which means a contradiction is found (because the empty clause is equivalent to false).
For example, from the two clauses
a
b
c
c
d
we would create the new clause
a b d
The reasoning behind this is simple. We know that c must be either true or false. If it is false, then from the first clause we know that a b must be true; if it is true, then the second clause tells us that d must be true. Therefore, a b d must be true in either case.Let us return to our ``getting wet'' example. Suppose we have the following clauses:
r | (it is raining) |
p(Peter) | (Peter is a person) |
u(Peter) | (Peter does not have an umbrella) |
o(Peter) | (Peter is outside) |
p(x) r u(x) o(x) w(x) | (for all x either x is not a person, or it is not raining, or x has an umbrella, or x is not outside, or x gets wet). |
In this example, Peter is a term, or constant expression. We use this set of axioms to prove that Peter gets wet by adding to this set the new clause
w(Peter) (Peter does not get wet)
and repeatedly resolving pairs of literals. Immediately we see that there are many pairs of clauses that could be resolved, but let us choose the new clause, w(Peter), and the axiom p(x) r u(x) o(x) w(x).
Here we see that w(Peter) is not identical to w(x). But we can make them identical by remembering that the variable x is universally quantified. If x refers to everything, then surely it refers to Peter! But if we make the x in w(x) refer to Peter specifically, then we had better make all the other x's in the new clause we are adding refer to Peter as well. This process of making two atoms identical by binding variables is called unification.
Thus we create the new clause
p(Peter) r u(Peter) o(Peter) (either Peter is not a person, or it is not raining, or Peter has an umbrella, or Peter is not outside).
We can immediately resolve this new clause with the second clause, giving us:
r u(Peter) o(Peter) (either it is not raining, or Peter has an umbrella, or Peter is not outside).
Next we resolve this new clause with the first clause, getting
u(Peter) o(Peter) (either Peter has an umbrella, or Peter is not outside).
Next we appeal to the fourth clause, giving us
o(Peter) (Peter is not outside).
Finally, we resolve this with clause 3, which leaves us with the empty clause, meaning we have demonstrated a contradiction, so our original negated atom was wrong, so the unnegated atom w(Peter) must be true.
This technique is rather laborious, but the individual steps are simple, making it very suitable for implementation by computer. However, we have left open the question of how to choose which two atoms from which two clauses to resolve. A poor algorithm for choosing what to resolve could keep choosing the same two atoms in the same two clauses, and so run forever. Loveland and Luckham independently suggested that one of the clauses to resolve should always be the most recently added clause. This is known as linear resolution. This avoids the possibility of repeatedly resolving the same clauses, since one of the clauses would always be new. Furthermore, it allows us to avoid adding the new clauses to the store at all; we can simply have a ``current'' clause, and repeatedly choose another clause to resolve with, always adding the new atoms to the current clause.
However, we are still left with two decisions to make: we must choose one of the literals of the current clause to resolve with, and given this choice, we must choose a matching literal in some other clause. The way we select a literal to resolve with is called our selection rule; the way we choose a matching literal to resolve with the selected literal, and what we do when we cannot find a matching literal, is our search rule.
One efficient search rule is known as depth-first search. With this approach, we do not keep the entire current clause explicitly at all. When we must select a literal from the current clause to resolve with, we always choose one of the remaining literals that was added to the current clause most recently, and then choose another clause with a literal which can be unified with the selected literal (of course, precisely one of them must be negative). Once we have done this, we remove the selected literal from the current clause and push the current clause on a stack, and make the newly selected clause, with the resolved literal removed, be the new current clause. When (if) the current clause becomes empty, we pop the previous one from the stack making it the current clause, and continue from there. When the stack becomes empty, we have succeeded in proving a contradiction.
Sometimes, however, we will be unable to find a clause with a literal that can be unified with our selected literal. In this case we must backtrack and reconsider a choice we made in selecting a matching clause earlier in the computation. This means that we will need a second stack to keep track of choices we make that we may later need to reconsider.
One way to narrow the search is to encode a directionality into the axioms. We can do this by requiring that each clause have precisely one positive literal; all the others must be negative. Such a clause is called a Horn clause, after the logician Alfred Horn who first studied them. By restricting ourselves to using only Horn clauses, we ensure that all the literals in the current goal are always negative (because the positive literal in each new clause is removed before being incorporated into the current clause). Since only one atom in each Horn clause is positive, this cuts down the search space dramatically. This is known as SLD resolution.
In 1974, Robert Kowalski proposed that we look at predicate calculus as a programming methodology. This new paradigm he named logic programming. The central idea is to view a Horn clause as an implication; in fact, we view the set of clauses with positive literals for the same predicate as the definition of that predicate. To emphasize what is being defined, Kowalski proposed that the order of the implication, and the direction of the arrow, be reversed. Recall that our original formulation of the ``rule of getting wet'' was
x . p(x) r u(x) o(x) w(x).
The negative u(x) literal means that this is not a Horn clause, so let us replace u(x) with n(x) (meaning x has no umbrella). Since all variables in each clause are universally quantified over that clause, there is no need to explicitly write the quantification. Thus as a logic program, our axioms of wetness would be written:
r
p(Peter)
n(Peter)
o(Peter)
w(x)
p(x)
r
n(x)
o(x)
The part of a clause to the left of the arrow is called the clause head>, and the part to the right is the body. A clause with an empty body, such as the first 4 clauses above, is called a unit clause. Such clauses state things that are unconditionally true. In this view, we think of the computation as a constructive proof, rather than a proof by contradiction, and think of all the literals in the body of each clause as being positive, rather than negative. When we seek to prove w(x), for example, we try to find an x that satisfies p(x) r n(x) o(x). Thus the query w(x) can be though of as a request to (constructively) prove that x . w(x), that is, to find an x such that w(x). This last view accords well with the way a programmer would like to view the query.
If programs could use only simple terms like Peter, then logic programming wouldn't be very practical. Fortunately, Frege's predicate calculus allows compound terms, which have structure. Such terms can represent many of the sorts of data structures commonly used in programming. For example, a list node can be represented as a term
n(h,t)
where h is the head, or first element, of the list, and t is the tail, or the next pointer, as it would be called in a conventional programming language. In this example n is the functor of the term, and its arity, or the number of arguments it takes, is two. We often call the functor of a term its principal functor to distinguish it from the functors of its various subterms. For example, the list of the first 3 counting numbers would be written n(1, n(2, n(3, nil))). Here nil is used as the list terminator; it is a simple term, like Peter, and both may be viewed as functors of arity 0.A predicate to append two lists together can be defined using recursion, that is, defined in terms of itself, this way:
a(nil, l, l)
a(n(h,t_{1}), l, n(h,t_{2}))
a(t_{1},
l, t_{2})
The first clause says that the result of appending the empty list to the front of any list l is l itself. The second clause says that if appending t_{1} to l yields t_{2}, then appending h followed by t_{1} to l yields h followed by t_{2}. Another way of putting this is to say that appending a list beginning with h and following with t_{1} to l gives a list that begins with h and follows with t_{2} if it is true that appending t_{1} to l yields t_{2}. Given this definition, a logic programming system would be able to append two lists together. Interestingly, using only this definition, a logic programming system could also find what tail must be appended to one list to get another, or even find all the pairs of lists that can be appended together to form a given list. This ability to use a single definition in different ``directions'', called modes, is a pleasant result of thinking about relationships among the arguments of a predicate rather than thinking in terms of inputs and outputs.
In parallel with Kowalski's work, Alain Colmerauer was investigating natural language understanding by computers . He needed a programming language in which to write his system, and was taken by Kowalski's proposal. With a team at Marseille, he implemented the first logic programming language , called Prolog , after ``programmation en logique.''
The restriction to Horn clauses advocated by Kowalski and others was shown by Horn not to reduce the expressiveness of the logic. However, it is somewhat unnatural for a programming methodology. Keith Clark suggested that a negative literal could be proved by trying to prove the positive literal: if the proof succeeds, then the negation fails; if the proof fails, then the negation succeeds. This is called negation as failure. To do this, though, we must assume that the logic program as written already specifies everything of interest that is true. For example, suppose we wish to show that Walter is not wet in the clauses of our ``rules of getting wet.'' We try to show that he is, and quickly fail, and therefore we conclude that Walter indeed is not wet. However, it may be the case that Walter is swimming, and people who are swimming are wet. The axioms as written do not exclude this possibility.
For this reason, Clark introduced the concept of the completion of a program, which has come to be know as the Clark completion. This is obtained by collecting all the clauses whose heads define the same predicate and considering them to define not only when that predicate holds, but also when it does not. If a program contains all and only these clauses for a predicate w, for example:
w(x)
p(x)
r
u(x)
o(x)
w(x)
p(x)
swimming(x)
w(x)
p(x)
bathing(x)
then the Clark completion of w would be
x . (w(x)
( (p(x)
r
u(x)
o(x))
(p(x)
swimming(x))
(p(x)
bathing(x))))
The Marseille Prolog implementation was an interpreter, and as such, it paid a significant interpretation overhead. In 1977, David~H.~D. Warren implemented the first Prolog compiler which formed the core of the very popular early Prolog implementation known as DEC-10 Prolog . This new Prolog proved to be efficient enough for serious work; in fact, it was shown to be as efficient as LISP , which was the language of choice at the time for artificial intelligence research.
The syntax of the original Marseille Prolog was varied somewhat in DEC-10 Prolog, and it is the latter's syntax that has become popularly accepted, and formed the basis of ISO standard Prolog . We shall henceforth use the generic word ``Prolog'' when referring to common features of most Prolog systems descending from DEC-10 Prolog (which are too numerous to list).
Since it is the most common logic programming language, it is worth giving a brief description of Prolog syntax. Prolog predicate and functor names begin with lower case letters, while a variable name begins with an upper case letter. The left-pointing arrow is written as :-. Unit clauses are written without the trailing :-, and each clause is terminated with a full stop (period) character. Conjunction is indicated by an infix comma, and disjunction by an infix semicolon. It is common practice in Prolog to refer to a predicate or functor by giving only the name and arity, separated by a forward slash. Prolog performs arithmetic with the is/2 built-in predicate, which has an infix notation; for example, one would write X1 is X+1 to unify X1 with the successor of X. Due to their ubiquity in Prolog programs, lists are written with the special notation [H_{1}, ..., H_{n}|T], where H_{1}, ... H_{n} are the first n elements of the list and T is the tail. If T is empty, the |T are usually omitted. Thus the list of the first 3 counting numbers would be written [1,2,3], the empty list is written [], and the list whose head is A and tail is B is written [A|B].
The Prolog language did not (and still does not) implement the logic programming paradigm perfectly faithfully. Several compromises were made in this respect, in order to make it a practical programming language. One compromise was in the implementation of unification. When unifying two terms, it is necessary to ensure that neither term already appears in the other. For example, one might wish to unify X and f(X); this unification should fail. Most Prolog systems do not do this test on the grounds that it is very expensive to test, and the need rarely arises. (The ISO Prolog standard does contain a proper unification operation, but it must be invoked explicitly when it is wanted.)
Another way in which Prolog diverges from pure logic programming is in the existence of a cut primitive. Cut prunes the search space of a computation; thus derivations which involve cuts may not be complete, and derivations involving cut and negation as failure may not even be sound. In fact, this feature is often used by Prolog programmers to achieve a result that does not match the logical reading of the program. Prolog also makes it possible for the program to modify itself while it is running. This makes it very convenient to implement a Prolog programming environment, as the facilities of the environment can be implemented in Prolog itself. It can, however, make it difficult to understand a program, since any part of it may change at any time. More modern Prologs, and the recently approved ISO Prolog standard require the programmer to explicitly declare which predicates will be modified at runtime. In using a simple depth-first search, Prolog also compromises resolution's completeness in order to achieve high speed. It is quite possible for Prolog to pursue an infinite computation when there is a finite solution present. Consider this example:
p(X) :- p(X). p(a).
According to this program, p(a) is certainly true. Unfortunately, an attempt to prove p(a) will never terminate, as it will always try the first clause for p/1 first, only trying the second clause when the first one has failed. But the first clause will never fail (or succeed, for that matter).
Prolog's support for calculation on integers and floating point numbers, leads to some difficulties, due to Prolog's very simple left-to-right selection rule. Consider this program:
length([], 0). length([H|T], L) :- length(T, M), L is M + 1.
This predicate will determine the length of a list. We would also like to be able to use this predicate to produce a list of a given length, but unfortunately it will loop forever if used in this mode, because in the recursive invocation, both arguments will be unbound. For this to work, the arithmetic computation must be performed before the recursive call, and the calculation itself must be inverted, to calculate M is L - 1. For reasons such as this, Prolog also adds another stable of primitives that do not fit into predicate calculus. These primitives allow the programmer to determine whether or not a variable is bound, to see if two variables are bound to one another, and to perform other similar tests.
All of these non-logical features violate some of the fundamental precepts of logic. For example, in Prolog it is not uncommon to find that a, b is not the same as b, a, even when both terminate without errors. Therefore, features such as these must be used with great care.
Prolog's ability to invoke predicates in different modes means that each predicate must be prepared for any argument to be either an input or an output (or a combination of the two). That is, when a predicate is invoked, each argument may be bound to a term, or it may be unbound. If it is bound, that means that its principal functor is determined; if all of its functors are determined, then we say the term is ground. If it is unbound, it may later be bound to any term. Determining which of these cases apply is important to the correct functioning of a predicate, but it does take some time. Warren's DEC-10 Prolog compiler allowed users to declare which arguments would always be bound, and which would always be unbound, when certain predicates were invoked. This further improved the performance of the code the compiler generated. However, this information was difficult to determine and difficult to maintain, and if erroneous declarations were made, the results would be unpredictable. Due to this danger, very few programmers made use of these declarations.
In later Prolog implementations, Peter Van Roy and Andrew Taylor were able to further improve the performance of Prolog programs. In some particular circumstances, Van Roy's compiler was able to generate code that ran faster than that generated by a heavily optimizing C compiler.
Zoltan Somogyi proposed a new logic programming language, called Ptah, designed for very efficient parallel execution . The language was based upon a system of strong types and modes , that is, user declared and compiler checked types and modes. Somogyi argued that strong typing had been found to be very useful in other programming languages in early detection of certain sorts of common errors, and this benefit should be brought to logic programming. By extension, he argued, logic programming languages could enjoy a similar benefit through the use of strong modes, that is, user declared instantiation states for each variable at the time each predicate is called and when it succeeds. These declarations have the added benefit of allowing much more efficient implementation. The price of this improved error detection and efficiency is expressiveness: some perfectly valid Horn clause logic programs are not valid Ptah programs, no matter what declarations are given.
Although Ptah was never implemented, a group at the University of Melbourne implemented many of the ideas of Ptah in the Mercury language . Mercury adopts most of Prolog's syntax, but does not support Prolog's non-logical features. It does, however, use a strict depth-first search, and so, like Prolog, is incomplete. (To mitigate this flaw somewhat, Speirs et al. incorporated an efficient termination analyzer into the Mercury compiler. Although it cannot always determine which atoms will terminate as this property is undecidable, it will warn about any atoms which it cannot prove will terminate.) Mercury's mode system allows the compiler to invoke literals in the body of a clause in different orders depending on the mode. Thus, for example, the predicate length/2 defined above would work perfectly well in Mercury. Because Mercury achieves performance often several times better than Prolog's without non-logical features, it demonstrates conclusively that such features are not needed to achieve high performance. This research is currently ongoing, but the group has already distributed several versions of the Mercury system.
The paradigm of constraint programming grew up alongside logic programming. In his 1979 thesis, Alan Borning presented a system for specifying constraints graphically. The following year, in his thesis, Guy Steele proposed that computation could be performed by progressively constraining the values that can be taken by variables until the variables are completely determined. This is a paradigm familiar in the mathematical community, which has developed many techniques over the years for solving such systems of constraints.
In fact, one could view unification as solving systems of constraints on variables. For example, the unification x = f(y,z) can be thought of as constraining x to be a term whose principal functor is f with arity 2. Jaffar and Lassez proposed this view, arguing that logic programming was the perfect complement to constraint programming. Constraint programming brought a proper handling of arithmetic constraints, and the proper perspective of SLD resolution to the marriage, and logic programming brought a sound logical foundation, as well as strong support for programming with data structures. Thus was born the paradigm of constraint logic programming.
Implementations soon followed, with CHIP developed by Dincbas et al. at the ECRC in in Munich, Prolog-III developed by Colmerauer in Marseille, and CLP(R) developed by Jaffar, Lassez et al. first in Melbourne and then at IBM. Since this time, several other systems have appeared, and much of the current research into logic programming includes constraint logic programming.