INDUCTIVE EQUATIONAL LOGIC PROGRAMMING

Inductive Logic Programming (ILP) is an area of research that is at the intersection of Machine Learning and Logic Programming. An ILP system uses positive and negative facts (examples) and optional background knowledge to induce a logic program that 1) accurately describes the facts and 2) successfully predicts the outcome of unseen examples. This thesis introduces a new ILP algorithm implemented in Equational Logic that takes a hybrid approach to induction, using bottom-up generalization combined with inverse narrowing to create recursive equations. We also introduce a framework for the induction of conditional equations from positive ground examples.

We further the research into inductive logic programming and inductive concept learning using equational logic that was begun by Hamel [1,2,3] and Shen [4], as well as the inductive processes used in the functional programming system FLIP [5].
Additionally, we present a framework for the induction of conditional equations in equational logic. Initial results of this research show that it can be a powerful addition to the field of inductive logic programming.

Statement of Problem
While the problem of inductive logic programming (ILP) in first-order predicate logic systems and traditional attribute-value representation languages has been well researched, the use of equational logic programming has been a fairly open problem in the field of ILP.
An inductive logic programming system's learning algorithm essentially has three parts: representation, search, and evaluation [6]. Because the representation language of equational logic has been thoroughly established and formalized, this dissertation has concentrated on the search and evaluation of an ILP algorithm.
In general, the search algorithm of ILP systems is a set covering algorithm. We now discuss a brief overview so that the reader has an understanding of an ILP problem.
Suppose we would like to know whether or not to play tennis given the day's weather attributes. We give our learning algorithm a set of positive and negative examples from past observations, using Propositional Calculus as the representation language. We have four weather attributes for outlook, temperature, humidity, and wind speed as operands and use the logical conjunction operator ∧ to connect them. If all of the operands are true, then the outcome, represented as −→, is to play tennis, or do not play tennis. These examples are shown below: Based on this input knowledge, the learning algorithm is able to induce the following set of rules that tell us when we should play tennis. Here X and Y are variables that can represent any value for that attribute. The algorithm searched the hypothesis space and discovered that the three rules below cover all of the examples that were given to it. That is, this set of rules account for all of the Play Tennis and Do Not Play Tennis examples given above, and thus a solution was found.

→ Do Not Play Tennis
We can now evaluate the algorithm by testing unseen examples against these rules and comparing the results with the actual values. If Outlook=Sunny, Temp=Cool, Humidity=Low, and Wind=Weak and the outcome was to play tennis, then this would be a positive test. However, if Outlook=Overcast, Temp=Mild, Humidity=High, and Wind=Weak and tennis was not played, this would be a negative test, as the first rule states that the outcome should be to play tennis when those conditions are true.
In ILP, one of the primary goals is for the solutions to be complete and consistent. Meaning it covers all of the set of positive examples given as input (complete), but none of the negative examples (consistent). Another goal is for that solution to accurately predict or classify unseen data. The goal of this dissertation is to implement an inductive logic programming system using equational logic as the representation language.

Contribution
The significant contribution of this thesis is a new method for learning equational logic programs from given example equations. This algorithm uses a novel hybrid approach to equation induction that combines bottom-up induction for equation generalization with inverse narrowing for the discovery of recursive equations.
We show that using sorted equational logic for inductive logic programming is an effective representation language. We also introduce a framework for inducing conditional equations. Conditional equations are shown to be a powerful tool in equational logic.

Related Work 1.4.1 Inductive Logic Programming
Inductive logic programing (ILP) is the intersection of inductive machine learning and logic programming. It can be stated that ILP is the discovery of a theory from positive and negative facts using optional background knowledge.  [7]. A complete and consistent program is called correct, and a correct program is considered a solution in terms of inductive logic programming. Notice that P = E + is a solution, but it would be useless in prediction of new, unseen examples, as any example e ∈ E + would always be classified as negative.
The relationship between induction and deduction is interesting. In Philosophy, Induction is the study of the derivation of general statements from specific instances. In Principles of Science [8], Jevons demonstrated that inductive inference could be performed by reversing the deductive rules of inference. In deduction, we are given a theory, or set of premises, that is assumed to be true and use this to prove that certain statements hold true. In inductive logic, we are given a set of facts and a theory is induced that explains those facts. Figure 1 is a summary of this relationship [1]. The challenge for ILP is to create a system where the machine can learn these hypotheses automatically given the facts and background knowledge.

ILP Methods
In a broad sense, inductive logic programming can be viewed as a search of the hypothesis space for a solution to a given input theory and possible background knowledge. Traditionally, these search techniques in ILP used two strategies, namely top-down and bottom-up. Figure 2 shows the generality lattice of clause formulae [9]. At the base of the lattice is a clause in its most specific state, i.e. a ground clause or a clause with no variables. At the top of the lattice is a most general clause, or a clause with no literals. In equational logic, the term playtennis(overcast,hot,normal,weak) is in its most specific state, while playtennis(OutlookVar,TempVar,HumidityVar,WindVar) is its most general form.

Bottom-Up ILP
While top-down strategies successively specialize a general starting clause, bottom-up approaches begin with a specific ground clause (usually a posi-tive example) and generalize. Generalizations are created by inverting logical resolution. If a generalization covers a negative example, then it is discarded. The term playtennis(overcast,hot,normal,weak) generalizes to playtennis(overcast,hot,HumidityVar,weak). This is an example of generalizing a subterm, the literal constant normal, by replacing it with a variable, HumidityVar.

Inductive Logic Programming in Predicate Logic
There have been many systems built that induce first-order logic programs. In the early 1990s, Stephen Muggleton [9] officially coined the term Inductive Logic Programming and has contributed significant work in the field with his Progol, Golem, and ProGol ILP systems. Prior to this, Plotkin [10]  Claude Sammut developed a system, Marvin [14], which contributed to the field of ILP in several ways. Marvin was one of the first learners to test its generalizations by showing the training engine instances of the hypothesis. Marvin's generalization procedure would also become the groundwork for Muggleton's absorption operator and Rouveirol's saturation operator [15]. Finally, Marvin was one of the only ILP systems that combined both generalization and specialization, using the latter as a way to refine inconsistent generalizations.
Quinlan's First Order Inductive Learner (FOIL) [16]  Stephen Muggleton has been one of the most prominent researchers in ILP over the last twenty years. His first foray in the field was with the DUCE system [17], which used rewrite operators generalize a theory composed of Horn clauses to a smaller one. These operators were the operational equivalent of inverse resolution.

Evolutionary Equational Logic Programming
Hamel [1,2,3] and Shen [4] have produced ILP algorithms in equational logic using a genetic algorithm for searching the space of possible solution programs.
These genetic programming engines were implemented in the OBJ3 and Maude equational logic programming languages, respectively. While these systems are able to accurately learn equational logic programs, there are several limitations, which include: • Implementation of only a subset of equational logic. Conditional equations are not supported.
• Some solutions produced are technically correct, yet presented in a way that is algebraically incorrect. This was a result of the way the underlying Maude rewrite engine considers equations in order [4].
• These systems used significant memory resources and computational time due to the stochastic nature of genetic algorithms.
The theory behind the evolutionary algorithms implemented in these systems used mutation and cross-over for equation induction. Mutation replaces a term in an equation with a randomly generated term of the same sort. Cross-over generates new equations from two parent equations by selecting cross-over points (subterms) and replacing the cross-over point in Parent A with the cross-over point in Parent B. Equations are good candidates for this type of algorithm as they can easily be represented as a graph, as shown in Figure 3. Potential cross-over points would be any of the edges of the graph, and the nodes (terms) are where mutation occurs.

Functional Inductive Logic Programming
The work that is most closely related to this thesis is the FLIP system [19] [20]. FLIP is a system for the induction of functional logic programs. It takes a There are several limitations to the FLIP system. First, it is not typed (manysorted). Terms are simply represented as sets of symbols, so 0 + 1 = 1 and 0 + 1 = true are perfectly acceptable input functions as FLIP does not check that the sort of the right hand sides of the two functions are different.
Secondly, there is no concept of conditional functions in the FLIP system. We believe conditional equations to be a powerful aspect of equational logic and have begun work on implementing them into inductive equational logic.
Finally, for each positive input function, the system generates almost all possible generalizations for that function in the initial step of the equation. Then, at any given iteration of the induction process, the FLIP system continues to generate all the possible generalizations and hypothesis programs for each newly created function. We believe this is an inefficiency of the algorithm because many of these programs will be unsound and are therefore discarded immediately.
We address each of these limitations in our implementation of inductive equational logic programming.

A Few Notes
Throughout this dissertation, we use the Peano notation for natural numbers in many of the examples and in several of our experiments in Chapter 5. The Peano notation uses the successor function to define the naturals. That is, there is a natural number 0 and every natural number X has a natural number successor, denoted s(X). Therefore we can represent the natural numbers as 0 = 0, s(0) = 1, s(s(0)) = 2, s(s(s(0))) = 3, and so on.
Additionally, we use capital letters such as X and Y to represent variables and lower case letters, such as a and c, for literals and functions.

Structure of Thesis
The remainder of this dissertation is structured as follows.
Chapter 2 presents the preliminaries and background work in equational logic and equational logic programming.
Chapter 3 describes the process of inverse narrowing, which we use as a method of inducing recursive equations.
Chapter 4 presents our algorithm work in detail. We present the algorithms implemented in a pseudocode and review the methods taken.
Chapter 5 discusses experiments using our algorithm and the results from those tests.
Chapter 6 is an overview of conditional equations and how to handle them in inductive equational logic programming.
Chapter 7 concludes the dissertation with some final remarks and directions for future work in this area.

Equational Logic
Equational logic is a subset of first-order logic. It deals with logic sentences where the only logical operator is the binary predicate for equality, typically written as = or the standard equals sign [21]. It is the logic of substituting equals for equals using algebras as models and term rewriting as the operational semantics [1]. In 1935, Birkhoff developed a general theory of algebras as we know them to be a fully mathematical discipline [22]. He also proved two theorems: a completeness theorem for equational logic and a theorem which provides a purely algebraic characterization for equational classes [21].
In equational logic, equations are built from the equality operator and firstorder terms. Equations are expressions of the form l = r where l and r are terms.
For the remainder of this dissertation, we abbreviate the left hand side and right hand side of an equation as LHS and RHS. Terms are well-formed expressions built on a set of operator symbols (functions) with arity (the number of operands to an operator) and a set of variables. A term is either a variable, an operator, or a constant (an operator of arity 0).
A term u is a subterm of a term t if u is t or if t is f (t 1 , t 2 , ..., t n ) and u is a subterm of some t i . Subterms appear at occurrences within a term, which are defined in Definition 4 [23]. A sort s is a type or kind of object, such as integers, booleans, lists, and so on. Variables of a term may be instantiated by a substitution, which are mappings from a subset of variables to terms [24].

Definition 5.
A substitution is a mapping θ : X −→ T Σ (X) which maps variables to terms of the same sort.
In this dissertation, substitutions are represented as sets of variable/replacement terms, such as {X/0, Y /s(0)}, which, when the substitution is applied to a term or equation, any occurrences of X and Y should be replaced with 0 and s(0),

respectively.
Substitutions play an important role in one of the main ideas of logic programming called unification. In unification, given a set of terms with variables, we want to find a substitution that will make all the terms (syntactically) equal.

Definition 6.
A substitution θ is a unifier for a set {E 1 , E 2 , ..., E n } iff θ(E 1 ) = θ(E 2 ) = ... = θ(E n ) Definition 7. A unifier θ is a most general unifier (mgu) for a set of equations E = {E 1 , E 2 , ..., E n } iff for each unifier λ there exists a substitution µ such that The idea of the most general unifier is that θ is less specific (more general) than any other unifier λ. That is, we can substitute literal terms for some of the variables in θ and produce λ.
From equations, terms, variables, and sorts, we can construct theories, or hypotheses, that describe a concept. Theories include an equational signature, which defines the operations and sorts of the theory, and a set of equations.
Definition 8. An equational signature is a pair (S, Σ), where S is a set of sorts and Σ is a (S * × S)-sorted set of operation names. We usually abbreviate (S, Σ) as Σ.
is a set of variables and l, r ∈ T Σ (X) are terms over the set Σ and X. If l and r contain no variables, i.e. X = ∅, then we say the equation is ground.
From a Σ-theory, new equalities can be deduced using inference rules. The inference rules for equational deduction are shown in Figure 5 [25]. Let us work through some proofs to better explain these inference rules. First, assume the following axioms on the evenness of natural numbers are true: Axiom 1. even(s(0)) = false Axiom 2. even(0) = true Axiom 3. even(s(s(X))) = even(X)

Figure 4. Axioms of Evenness of Natural Numbers
Using the axioms in Figure 4 and the inference rules in Figure 5, we are able to prove the following theorems: Theorem 1. even(s(s(s(0)))) = false Proof.
(i) even(s(s(s(s(0))))) = true Using the Leibniz rule, Substitute s(s(0)) for X in Axiom 3 to obtain: (ii) even(s(s(s(s(0))))) = even(s(s(0))) Applying this same procedure, using 0 for X and the RHS of (ii) gives us: (iii) even(s(s(0))) = even (0) And by Axiom 2 and the RHS of (iii): (iv) even(0) = true Finally, through Transitivity we have: (v) even(s(s(s(s(0))))) = even(s(s(0))) = even(0) = true We say that an equation (∀X)t = t is deducible from a theory (Σ, E) if there For the inference rules below, p[X := e] denotes textual substitution of expression e, for variable X, in expression p. A = B represents equality for A and B of the same sort and A ≡ B is equivalence only of sort Boolean. A = B and A ≡ B have the same meaning for Booleans.

Symmetry
If p = q is a theorem, then so is q = p

Substitution
If p is a theorem, then so is p[X := e]

Transitivity
If p = q and q = r are theorems, then so is p = r

Leibniz
If p = q is a theorem, then so is e[X := p] = e[X := q]

Equanimity
If p and p ≡ q are theorems, then so is q Figure 5. Inference Rules of Equational Logic is a deduction from E using the inference rules whose last equation is (∀X)t = t .
We write this as E (∀X)t = t .
A basic question for equational logic is: when does an equation follow from a set of other equations? Or, when is an equation or term a logical consequence from other equations? The semantic notion of logical consequence is when an equation is true in a Σ-theory. The syntactic notion is the axioms and rules of inference. These two notions are equivalent, and this equivalence is the soundness and completeness of equational logic.
Soundness means that only equations that correspond to valid arguments are derivable in a theory. That is, all theorems of the theory are universally valid.
Completeness means that all equations that correspond to a valid argument can be derived in a theory. Or, a theory Γ is complete iff Γ |= A and Γ A for any equation A [26].

Theorem 3 (Soundness and Completeness of Equational Logic). Given a set of equations E, an arbitrary equation
The proofs of the soundness and completeness theorems of equational logic have been shown in [25,26].

Programming with Equations
Goguen [27] has stated that "any reasonable computational process can be specified purely equationally." From a programming view, computation in equational logic is the reduction of an input term to an equivalent normal form using a given set of equations and symbols of the programming language. If a set of equations can be used as a term rewriting system, then we can compute with it using an equation as a rewrite rule [28].
Before we go into the operational semantics of equational logic programming, the notion of what a logic programming language is should be defined. A program P over a logic Λ is a set of Σ-sentences, written Sen(Σ); a query q is a sentence of the form (∃X) q(X) where X is a set of variables; and an answer a to a query is an assignment from X to terms such that q(a) is in Sen(Σ) and P Σ q(a), where q(a) is the result of substituting a(x) into q for each x ∈ X [29].
Let us now describe a computing scenario using equations: A programmer inputs a sequence of equations as an equational logic program. She then may query the program with questions such as "What is X?" or "Is X equivalent to Y?" The program will respond with an answer such as "X = Z" in the former case or "true/false" in the latter.
Programming with equations and reasoning about equations are closely related. Reasoning may involve determining if an equation is a consequence of a given equational theory or if it is true [30]. As we can see in the example of our programmer above, this is what they are trying to determine using a programming system.

Rewriting as Operational Semantics
The operational semantics of equational logic is rewriting. That is, given a set of equations l 1 = r 1 , l 2 = r 2 , ..., l n = r n , rewrite rules are used to replace "equals for equals." These rules are repeatedly applied to terms containing a subterm that matches some l i , which then replaces the subterm. In rewriting, this is a one way direction, so the converse is never used (unlike in equational logic).
Definition 10. A term t rewrites to a term t using an equation l = r if there is a subterm t |u of t at a given occurrence u of t such that l matches t |u via a substitution σ and t is obtained by replacing the subterm t |u = σ(l) with the term Given a signature Σ and a set of variables X, a Σ-rewrite rule is a pair of terms, l −→ r such that l and r have the same sort and all variables in r also appear in l. A Σ-rewrite system, or term rewrite system, is a set of Σ-rewrite rules. A term rewrite system is terminating if there is no infinite rewriting, such Evans [31] then Knuth and Bendix [32] were the first to propose rewriting as the way to operationalize equation deduction. The goal was to establish term rewriting systems for proving the validity of equalities in first-order equational theories [24].
The OBJ family of languages use term rewriting as operational semantics, using equations as rewrite rules. Equations are viewed as rewrite rules which are applied with the command red, for reduce, followed by a term, a space, then a period [29]. A reduction in OBJ evaluates a term within its given Σ-theory.

BOBJ
The BOBJ equational logic programming language originates from Goguen's original development of the OBJ family of languages. OBJ-2 [33] and OBJ-3 [34] are based on order-sorted equational logic, and BOBJ is the most recent implementation that includes new techniques for increased rewrite speed.
The original goal for BOBJ was to be a language for prototyping, algebraic specification and verification. Many of the interesting features of the language are not utilized in the implementation of our algorithms for this dissertation, such as behavioral rewriting, cobasis generation, and modulo attributes including associativity and commutativity. We are primarily interested in the equational logic language parser and the ordinary rewrite engine for order sorted equational logic to parse our input programs and test hypotheses, respectively.  17 eq p l a y t e n n i s ( OutlookVar , mild , normal , WindVar ) = true .

end
First, we define our theory with the obj keyword (for object). On line 2, we define the sorts that this theory will contain. Lines 4-6 set three variables and their sort. Lines 8-11 are operators of arity 0, and are interpreted as literals in the theory. We must also specify their sort just as with variables. Line 13 is our playtennis operator which we define as taking four arguments of sorts Outlook, Temp, Humidity, and Wind, and returns a Boolean result. BOBJ includes several types predefined in the system, of which Boolean is one, and therefore we do not need to define it with the others. Finally, on lines 15-17 we have three equations that use the playtennis operator using the operators and variables defined above.
With the tennis Σ-theory defined in BOBJ, a programmer can then query the system as in Listing 2.2. Here, we are asking BOBJ to reduce the term playtennis(sunny, mild, normal, weak) in the theory defined in Listing 2.1 and the system returned the value true. Inverse Narrowing

Narrowing as Equational Logic Unification
As stated in 2.1, unification is a deductive method used in many programming languages to solve equations among symbolic terms. A unification algorithm is a process to determine if two expressions are the same based on some assumptions.
If they are, the algorithm finds the unifiers (the assumptions). Unification is a special form of pattern matching. Where pattern matching finds a substitution that can make pattern t equal to term t and free variables are only allowed in the pattern, unification allows for free variables in both the term and the pattern.
Narrowing is a computational method for solving equations by computing unifiers with respect to a Σ-theory. While term rewriting uses pattern matching, narrowing performs unification to reduce a term [35]. Therefore, substitutions can be applied to both the pattern and the term in narrowing as well as its inverse operation, which we will discuss in the next section.
Consider the equations that define the concept of sum of natural numbers, 0 + X = X and s(X) + Y = s(X + Y ). The term U + 0, where U is a variable, narrows to 0 as follows. First, U + 0 is set to 0 + 0 by narrowing with the term 0 + X using the unifier β = {U/0, X/0}. A narrowing is notated as {U −→ 0}.
Then the narrowed term 0 + 0 is rewritten to 0 via the first equation above. We follow [23] by defining narrowing in Definition 11.
Definition 11. A term t narrows to term t iff u ∈Ō(t), there exists a rule l = r, It has been shown that narrowing is a complete method for solving equations in a terminating term rewriting system [36]. Here, completeness means that for every solution to an equation, a more general solution can be found through narrowing [37]. The soundness of narrowing says that for all terms t 0 ∈ T Σ (X), is the set of all variable and non-variable terms and T Σ is the set of non-variable terms, then there exists a ground substitution σ such that

Inverse Narrowing for Equation Induction
Logic programming is the programmatic deduction of logical formulae, and induction can be thought of as the inverse of deduction and therefore inductive inference rules can be created by inverting deductive rules. It follows, then, that by using the above definition of equation narrowing, we can now look to its inverse operation for equation induction. The definition of inverse narrowing is given in [5].
Definition 12. Given an equational logic program P, a term t inversely narrows to t iff u ∈ O(t), l = r is a new variant of a rule from P, θ = mgu(t |u , r) and is the set of occurrences of t, θ is the most general unification of (t |u , r).
To clarify the above, let us look at an example. Suppose we want to attempt to inverse narrow between the two equations sum(X, 0) = X and sum(X, s(0)) = s(X). The right hand side of the second equation, i.e. s(X), can be used in the first equation, unifying with the X and creating the new term t 1 ←− sum(s(X), 0).
That is to say, t 1 can be narrowed to s(X) using the first equation. The resulting equation would be sum(X, s(0)) = sum(s(X), 0).
Algorithm 1 defines our inverse narrowing algorithm in pseudocode. It uses a helper function that finds the most general unifier between two terms (Algorithm 2). This algorithm takes two Σ-theories as input, p 1 and p 2 . For each equation e 1 from p 1 and, for each equation e 2 ∈ p 2 , we attempt to inverse narrow e 2 with e 1 by finding the most general unifier between the RHS of the two equations. If this most general unifier exists, create a substitution θ using it. This substitution is then applied to the LHS of e 1 to create a new term t , which is used as the RHS of a new equation. The new equation is then added to a set of narrowed equations to be returned.
Algorithm 1: Inverse Narrowing (inverseNarrow) input : Two Σ-theories p 1 and p 2 output: A set of narrowed equations, NE The most general unifier algorithm (Algorithm 2) takes two terms as input and initializes index k to 0 and the array S k to contain the input terms. If S k contains two identical terms, then there is no mgu and return, otherwise find the disagreement set D k of S k . While D k is not empty, check the two terms in the disagreement set if either is a variable. If one of the terms is a variable and that variable is not a subterm of the second term, then create a substitution with these terms and add it to the mgu σ as well as apply the substitution to the terms in S k (storing the new terms in S k+1 ). Next, find the disagreement set D k+1 of S k+1 , increment k and continue with the next loop iteration. Example 1 is an example of finding the mgu of two terms with this algorithm.
Example 1. Find the most general unifier of p(a, X) and p(Z, h(b)): • σ ={Z/a} add the substitution to mgu Two terms cannot be unified if we find a disagreement set D x where neither of the terms in D x are a variable, or if the variable is a subterm of the second term in D x . Example 2 shows two terms that cannot be unified.
Although a general disagreement set algorithm could be applied to a set of any number of terms to find their disagreement, our algorithm is implemented to work on specifically two terms, as that is all that is needed for our most general unifier used in inverse narrowing.

CHAPTER 4 Induction of Equational Logic Programs
We have implemented an inductive learning engine in the BOBJ equational logic programming language [29,38,39]. The learning algorithm uses a hybrid of bottom-up generalization and inverse narrowing for the creation of recursive equations. In this chapter we describe the algorithm in detail.

A Hybrid Approach to Induction
While

Induce
The induction process is run by first loading a valid equational logic theory using BOBJ's in operation, then calling our newly created induce command which runs on the currently loaded module in BOBJ. By default, conditions are not used in the induction process (see Chapter 6), neither are background knowledge equations. However, both of these can be induced if the appropriate flags are set.
Appendix A is output of a full run of the algorithm on the SUM example shown in the next chapter.

Initialization
The

Inverse Narrowing
If no solution is found after the initial GE-1 equation creation, the algorithm enters a two loop process. At each iteration of the inner loop, we select two programs with the best covering factor and which are not marked as having been used for inverse narrowing. We use these two programs, p 1 and p 2 , to run the inverse narrowing procedure on their equations. As described in Chapter

Equation Pruning
For classification problems, we are able to implement a pruning operator on theories. When a solution is found, the system tries to prune equations to get a If a term is subsumed by the LHS from some other equation in the theory, then that equation is removed from the theory and the covering factor is recalculated to ensure that the equation removal did not reduce it, and thus make the solution unsound.

Negative Knowledge Representation
Our system allows the programmer to represent negative knowledge in two ways. The first method is by letting the right hand side of the equation be the Boolean value false. For example, even(s(0)) = f alse.
The second way to represent negative knowledge in the system is by syn- sum(s(0), 0) = 0.

Background Knowledge
The programmer in our system also has the option to include background knowledge. Equations are identified as background knowledge using the [background] declaration. Alternatively, the shorthand syntax [back] can be used. By default, background equations are not considered in the generalization process, only used when reducing equations in the generated hypotheses. However, the user can run the induction process with an optional flag, in which case background knowledge equations will also be generalized.

CHAPTER 5 Experiments and Results
We now discuss several experimental input programs and the results that our algorithm produced. First, we show how the algorithm performed on the trivial example of learning the definition of the stack data structure. We then show how the system performed on some classification problems. Finally, we ran the induction algorithm on input programs where the solution produced concept definitions with recursive equations. For each experiment we present the input program used, the solution our system was able to find, and some discussion on each problem. All of these experiments were run on an Intel Core i5 microprocessor with two CPU cores and clock speed of 1.4 GHz. The computer has 4 GB of DDR3 RAM and is running Java Runtime Environment 1.8.0 31 on Mac OS X version 10.10.5.

Trivial Example
We first present the example of learning the concept of a stack data structure.
While rather trivial, this experiment shows how our system was able to learn a concept using standard ILP bottom-up induction.

Stack
We define a stack with the following Σ-theory, where a stack is built from a sequence of push operations, elements are literals that can be pushed onto the stack, and the top operator returns an element. Listing 5.1 is the input program for this experiment.

Discussion
We can see from the solution, the system learned the concept of a stack's top operator, which is defined as the last element pushed onto the stack.

Stack -Multiple Terms
In the first Stack example, we showed how the system learned the definition of a stack with a single defining term, namely, top. Next, we show how the system is capable of learning multiple terms simultaneously. Our input program is shown   [STACK50] eq top ( push ( StackVar1 , ElementVar ) ) = ElementVar .

Discussion
Like the previous example, our system was able to discover the definition of top, which returns an Element, and pop, which returns a Stack object with the top element removed. This is an important result because learning multiple predicates in first-order logic ILP systems turns out to be a difficult task [40].

Classification Problems
In The three data sets we have chosen were taken from the University of California, Irvine's Machine Learning Repository [41].

Car Buying
In this example, the equations describe the concept of whether or not a customer bought a car based on six attributes: the price (low, medium, high, very high), maintenance cost (low, medium, high, very high), number of doors, number of passengers it can seat, size of the trunk (small, medium, big), and the safety rating (low, medium, high) [42]. For input, we used 54 positive and 5 negative equations. For brevity, we have omitted the input program. Listing 5.5 is the solution program discovered by our system.

Discussion
If we look at the solution program, we see that it induced three equations by bottom-up generalization, and the fourth equation is recursive and uses equation one to evaluate to true.

Voting Patterns
In this example, we show how the system first found a solution, then through equation pruning was able to produce a more compact solution theory. The concept to learn in this example is the prediction of which way a Congressperson is most likely to vote (Democrat or Republican), based on their yes/no vote for nine previous votes [43]. We used a subset of ten examples from the dataset for ops democrat r e p u b l i c a n : −> Party . [ VOTE884 ] eq v o t e ( VoteOutcomeVar , y , y , VoteOutcomeVar1 , y , y , n , n , n )= democrat .

14
[ VOTE884 ] eq v o t e ( n , y , n , VoteOutcomeVar , VoteOutcomeVar1 , y , n , n , VoteOutcomeVar2 )=r e p u b l i c a n .

Play Tennis
Finally, we return to the classic machine learning classification problem of learning when to play tennis [44]. Our input equations take four weather attributes,

12
[ TENNIS101 ] eq p l a y t e n n i s ( r a i n , mild , HumidityVar , weak )=true .

13
[ TENNIS101 ] eq p l a y t e n n i s ( o v e r c a s t , hot , HumidityVar , weak )= p l a y t e n n i s ( r a i n , mild , HumidityVar , weak ) .

14
[ TENNIS101 ] eq p l a y t e n n i s ( OutlookVar , c o o l , HumidityVar , weak )=true .

16
[ TENNIS101 ] eq p l a y t e n n i s ( o v e r c a s t , TempVar , HumidityVar , s t r o n g )= true .

Recursive Problems
In this section we show the results of the induction algorithm on some interesting recursive problems.

Sum
The next test case was to learn the definition of the sum operation. This is the first example of the system finding a solution with recursive equations. See eq sum ( 0 , ( s ( 0 ) ) ) = s ( 0 ) .

Discussion
The induced solution program is shown in Listing 5.11. Looking at the solution, we see that SUM is an inductive program. The base case states that zero plus any natural number is that natural number. The inductive step is the second equation.
It may not be clear on viewing this theory that this is a valid solution for SUM, so let us walk through an example: sum(s(s(0)), s(s(0))), or 2 + 2. On re-duction, this term would unify with the LHS of equation two, with the substitution {NatVar1/s(0), NatVar/s(s(0))} and rewriting to the RHS as sum(s(0), s(s(s(0)))).
On the next reduce step, the term would unify with the LHS of equation two again, with the substitution {NatVar1/0, NatVar/s(s(s(s(0))))}, and rewriting to sum(0, s(s(s(s(0))))). Finally, the term unifies with the LHS of equation one, substituting NatVar with s(s(s(s(0)))) and rewriting to the RHS. With no more possibilities, the reduction is complete and thus the result returned is s(s(s(s(0)))), which is correct.

Even
Here, the system attempts to learn the concept of evenness of the natural numbers. Listing 5.12 is our input Σ-theory and Listing 5.13 is the solution that the system found. eq even ( ( s ( ( s ( 0 ) ) ) ) ) = true .

11
12 * * * n e g a t i v e examples 13 eq even ( ( s ( 0 ) ) ) = f a l s e .

Discussion
Again, the solution produced is an inductive program. The base case being zero is even. The inductive equation states that a natural number is even if two less than that natural number is also even.

Less Than
The next experiment is another simple example of a recursive theory is the concept of less than, shown in Listing 5.14. Input equations use the operation lt, which takes two terms. The term is true if the first term is less than the second, and false otherwise.

Discussion
We can see that the the base equation defines that zero is less than any natural number, and the recursive equation states that the successor of a natural number NatVar1 is less than the successor of another natural NatVar, if NatVar1 is less than NatVar.
This solution also brings up an interesting point about the closed world assumption. If we attempt to reduce the term lt(s(0), 0) in this program, there is no sequence of rewrite steps that can be performed with this input. In fact, BOBJ returns the following: result Bool: lt ((s (0)) , 0). However, if we try to reduce lt(s(0), 0) == true, then BOBJ returns result Bool: false. That is, if something cannot be proven true in a theory, then it is assumed to be false.

23
[ n e g a t i v e ] length ( push ( v , c ) ) = 0 .

Discussion
The solution theory defines the length of an empty stack as 0 (the base equation), and the recursive equation that defines the length of a stack variable with one element pushed onto it is one more than the length of the stack variable.

Drop
This experiment, while seems trivial at first, actually highlights an interesting attribute of recursive equations. The concept to learn is dropping items from a list of natural numbers. Using Peano notation for the naturals, the system treats each term as a symbolic representation of a natural number, but does not know, for example, that s(s (0) 12 eq drop ( 0 , add ( empty , j ) )=add ( empty , j ) .

20
[ n e g a t i v e ] drop ( 0 , add ( empty , a ) ) = empty .

endo
The operator add( , ) represents adding an element to a list. The drop operator takes a natural number and removes that many elements from a list. Listing

Discussion
The solution found is another recursive program, where the base case defines dropping zero elements from a list returns the list. The induction case continues to drop elements from a list until the first equation is reached. It might be more useful to walk through an example by reducing the term drop(s(s(0)), add(add(add(add(empty,a),b),c),w)) in this theory.
At the next reduction step, the term is unified again with the LHS of equation 2, using substitution {NatVar0/0, ListVar/add(add(empty,a),b), ElementVar0/c}.
The term is then rewritten to drop(0, add(add(empty,a),b)). This term then unifies with the LHS of equation 1 on the next reduction step, substituting ListVar with add(add(empty,a),b). This term in its final form is the original list, with two elements removed.

Conclusion
In this chapter we have shown the results of several experiments using our inductive learning engine in equational logic. These results are very promising as the system was able to find a solution in each case, and the running time was under one second in all but one of the experiments. The longer running time for the car buying classification experiment was expected due to the greater number of examples and each equation having more attributes (subterms) for the concept to be learned.

CHAPTER 6 Conditional Equations
In Equational Logic, equations can also contain conditions on them. A conditional Σ-equation consists of three terms, say l, r, and c, over variables from a given ground signature Ξ, such that l and r are of the same sort, and c is of sort Boolean. The notation "(∀Ξ) l = r if c" is used. This conditional Σ-equation is satisfied by a Σ-theory iff for every substitution θ, we have θ(l) = θ(r) whenever θ(c) = true [45]. In this chapter, we present our approach to an initial framework for inducing conditions in the system.

Induction of Conditional Equations
When the input equational theory contains conditional equations, the obvious way to handle these is to treat the condition as just another term in the equation and generalize the condition with respect to the equations. That is, if a term in the LHS of the equation is generalized, then check for that term in the condition and generalize it as well. When inverse narrowing between equations with conditions, the condition is simply carried over to the newly generated equations, or dropped if the condition was not part of the original equation.

Example
In this section, we introduce an example input program and the solution that our induction engine was able to produce using conditional equations. For this example, we would like to learn the definition of set membership [29]. The input program for this example is shown in Listing 6.1. 11 eq a i n i n s e r t ( b , i n s e r t ( a , empty ) ) = true .
12 eq b i n i n s e r t ( b , i n s e r t ( a , empty ) ) = true .
13 eq c i n i n s e r t ( a , i n s e r t ( b , i n s e r t ( c , empty ) ) ) = true .
14 eq b i n i n s e r t ( a , i n s e r t ( b , i n s e r t ( c , empty ) ) ) = true .
15 eq a i n i n s e r t ( a , i n s e r t ( b , i n s e r t ( c , empty ) ) ) = true .
16 eq d i n i n s e r t ( a , i n s e r t ( b , i n s e r t ( c , i n s e r t ( d , empty ) ) ) ) = true . The solution found contains one base equation that says that an item is a member if it is the first item in the set. The second equation is the conditional equation found that states that if an item is not the first item in the set, it may still be a member if it is in the rest of the set. We can think of this as the condition is checking if the item is in the tail of the set. This is an interesting solution, as the condition of the second equation is essentially handling the recursion.
Let us see how this works in reduction. Assume we try to reduce the following term in this theory: a in insert(b, insert(a, empty)). This term would unify with the second equation, with the substitution {ItemVar/a, ItemVar1/b, SetVar/insert(a. empty)}. The condition would then be applied and the term would be rewritten to a in insert(a, empty). This would then be unified with the first equation, rewriting to true and returning this result.
It is important to note that the current algorithm for condition creation is limited to simple, one term conditions. More complex conditions that use disjunction and conjunction are discussed in Chapter 7.

Parallel Execution
At any iteration of the algorithm, there are multiple possible hypothesis programs that could be a solution. This aspect makes it a good candidate for parallelization (multi-threading). At the end of each iteration, instead of checking all possible hypothesis programs for their covering factor and negative coverings one at a time, a parallel algorithm could check several simultaneously. This could improve execution time, but it would come at a cost of resource allocation.

Sophisticated Pruning Operator
Our current pruning operator only works on classification problems where the right hand side terms of the equations are of sort Boolean. More research needs to be conducted to see if other types of equations can be pruned and how.

Hypothesis Selection
Minimum Description Length has been a common method for hypothesis selection in many ILP systems, and has shown with our system that it is indeed sufficient. Ockham's razor even states that if two theories explain the same facts, then the simpler theory is preferred [46]. However, more research could be done to see if there are better heuristics for solution hypothesis selection. Alternative methods have been studied in [47] and could be explored more in IELP.

Conclusions
We have presented a new method for the induction of logic programs using equational logic as the representation language. We have shown that a hybrid approach to induction, using bottom up generalization found in many predicate logic ILP systems, combined with inverse narrowing for recursive equation creation is able to find solution programs quickly and efficiently.
We have also implemented a framework for the induction of conditional equa-tions. Preliminary results have shown that induction of conditions in inductive equational logic programming is an interesting field of research to explore.