Systematic Analysis of Algorithms

SYSTEMATIC ANALYSIS OF ALGORITHMS The limits and methods involved in the systematic analysis of algorithms are explored. A review of the existing work in this field is presented. A specific method of systematic analysis is developed. The method consists of (1) the translation of algorithm loop structures into recursive subroutines and recursive subroutine references, and (2) the semantic manipulation of expressions representing the joint probability distribution function of the program variables. A new delta function is introduced to describe the effects of conditional statements on the joint probability density function of the program variables. The method is applied to several simple algorithms, sorting and searching algorithms, and a tree insertion/deletion algorithm.

This chapter is divided into two parts.
In the first part we will state and discuss the problem in computer science that will be addressed in the rest of the thesis.
In the second part we will give an overview of the remaining chapters of the thesis.

Statement of the Problem
This thesis is concerned with the systematic analysis of algorithms. In order to understand what it is about, we must answer these three questions: 1. What are algorithms?
2. What is the analysis of algorithms?
3. What is the systematic analysis of algorithms?
We will also be discussing a fourth question: 4. What are the limits of systematic analysis?
This will involve a short discussion of: Horowitz and Sahni [7] give this definition of an algorithm: "Algorithm has come to refer to a precise method b a computer for the solution of a problem." In useable Y order to be considered an algorithm the method must have the following characteristics: When we talk about the analysis of an algorithm, we will only be concerned with its time behavior unless otherwise stated.
What is the Systematic Analysis of Algorithms?
There are two basic ways to approach the analysis of algorithms. The first way is to approach each alogrithm as a separate new problem and to find the solution by appealing to previous experience with similar problems. The second way is to make up general rules which apply to "all" algorithms and to apply these rules step by step to the algorithm being studied.
The first way is very suitable to humans who come equipped with a great deal of problem-solving and patternrecognition ability. It is not so well suited to the digital computers of today because they are not so equipped.
The more systematic approach of the second way to analyze algorithms is better suited to implementation by digital co11puters. we shall say that the human approach involves ad dures and the computer approach involves boC proce ' systematic procedures.
What are the limits of Systematic Analysis?
The gross limits of systematic or automatic algorithm analysis are known.
1. We know that systems can be built which will analyze simple programs. (1,3,4] 2. We know that no completely automatic system or complete formal system can be constructed which can analyze all algorithms. This fact is firmly established by computability theory. (15] In between the simple programs and all possible programs there is a lot of ground which can be covered.
What We Can Do Wegbreit [l] has built a system which can analyze simple LISP programs automatically. Cohen and Zuckerman [3] have built a system which greatly aids in the analysis of algorithms written in an ALGOL-like programming language.
Their system helps the analyst with the details of the analysis while requiring the analyst to provide the branching probabilities. Wegbrei t [2] developed a formal system for the verification of program performance. His technique can also be used to provide the branching probabilities which are needed. Recently, Ramshaw [5] has shown that there are problems with Wegbrei t' s and has developed a formal system probabilistic approach which he calls the Frequency System. There are problems with the Frequency system, which Ramshaw points out in his thesis [5]. We will show that some of the problems in the Frequency System can be overcome.

What We Cannot Do
Douglas R. Hofstadter [15] gives a beautiful exposition of the nature of the • whole question of computability and decidability and the wide-ranging and unexpected topics upon which it touches. The formal study of this subject springs " from Godel's Theorem which Hofstadter paraphrases: "All consistent axiomatic formulations of number theory include undecidable propositions." The undecidability of the Halting Problem is an example of one such "undecidable proposition." Stated in terms of a Turing Machine, the Halting Problem is this: Can one construct . a Turing Machine which can decide whether any other Turing Machine will halt for any input, when given an input tape containing a description of the other Turing Machine and its input?
A negative answer to this question was given in 1937 by Alan Turing. The argument which he used is called a diagonal .
method. This method was discovered by Georg Cantor, the founder of set theory. It involves feeding a hypothetical Turing Machine, which could decide whether any other Turing Machine would halt for any input, a description of itse l f Which has been modified in a particularly diabolica l manner.
Hofstadter's book [15] devotes much of its 740 pages to the O f topics to which this method may be applied. variety It appears to us that undecidability and incompleteness creep into formal systems when statements which can be interpreted as being about the system itself are allowed.
In our discussions we will try to avoid these kinds of questions, and thereby the completeness problem.

Overview of the Thesis
We have chosen to organize this thesis along the lines which were taken in the development of the research upon which it is based. We feel that the road taken is interesting in and of itself. For this reason we will point out the •aead-ends" which periodically blocked our path.
The first step which we took was a survey of the work which had been done in this field.
In Chapter 2, we will discuss the current state of the art of algorithm analysis.
We will point out t h e areas where results are firmly established and the benefi . ts of particular procedures that are known. We will examine some of the recent advances both to see how they work and to d i scover the kinds of problems which they cannot solve.
When this survey was completed we formulated a plan.
The approach which we used was to start from the program statements themselves. We attempted to determine just how much could be l earned from manipulations of the programs using various translation schema. We restricted ourselves to programs written in a "structured" developed by Horowitz and Sahni [ 7,9] , language. SPARKS, was chosen as the language for representing algorithms for the same reasons they used it in their books.
our initial work revealed a transformation which proved ff Ctl ·ve i·n analyzing several deterministic algoto be e e rithrns in a straight-forward manner.
Chapter 3 describes this technique which involves the transformation of all looping structures of a program into a series of recursive subroutines and recursive subroutine calls. Because this process is designed to follow the syntax of the algorithm, we refer to this as a "syntax-directed translation." The program characteristic to be analyzed is selected, and the recursive program statements are transformed into recurrence equations. The analysis is done by solving the recurrence equations. This is not always easy [8]. For this reason we concerned ourselves with solving as well as setting up the recursions.
In Chapter 3, we ·will examine some very simple, deterministic algorithms (i.e. ones for which we know the inputs exactly), then some very simple probabilistic algorithms (i.e. ones where we only know some characteristics of the inputs). While looking at these· examples we will discover the "problem of the conditional statement." We started with the FINDMAX algorithm which was analyzed both by Knuth [6] and by Ramshaw [ 5] • We soon discovered that when the statistical behavior of algorithms is being analyzed, the distribution from which the input data is drawn is an rt ant factor in the running time.
While we could solve impo the problems relating to distributions in algorithms such as FINDMAX, we often found ourselves using information from •outside the system".
Chapter 4 presents our formal approach for handling the conditional ~a tement.
This approach is to use statements about the distrioutions of program variables directly in the analysis of the algorithms. We found that we had to study the propagation of the distributions of the program variables through the program. As a result, we developed a •calculus" for the behavior of the distributions themselves.
we will use this · method to analyze the probabilistic algorithms from Chapter 3.
We will then move on and apply the techniques to some sorting and searching algorithms in Chapter 5, and to a miscellaneous problem in Chapter 6. Chapter 7 is a summary of the work and an outline of poss i ·ble future efforts. following section closely follows Davies [ 14] • In the examples which follow, this notation applies: Wegbreit's probability system is that it sets out to calculate the branching probabilities in order to determine average computation time.
Ramshaw [ 5]  Pr obability that the predicate P is true is equal to the the real-valued expression e. Ramshaw has shown [5] that systems of this form have p~oblems with a very simple program which be calls the Leapfrog Problem: Leapfrog: if K = 0 then K ~-K + 2 endif we assume that K can take on the values of 1 and O with equal probability, i.e., The output assertion which one would expect to get is: However, all that can be asserted using a Floyd-Hoare system is: This is not particularly informative or of much use in subsequent portions of the program since all of the information about the distribution of the input has been lost.
In this way he "avoids the rescalings that are associated with taking conditional Probabilities." Ramshaw's frequency "is like probability in way except that it doesn't always have to add up to everY on•·" He defines a frequentistic state as a collection of ·nistic states with their associated frequencies.
determ1 l ·c assertions are statements of the form Fr(P)=e, where Atoll p is a predicate and e is a real-valued expression.
Ramshaw applies his frequency system successfully to the Leapfrog problem.
Bis input assertion is: This means that the frequency associated with the state K=O is ~ and the frequency associated with the state K=l is also 1 2· The total frequency associated with the variable K is So far we have followed Ramshaw' s thesis closely. The following is a slightly different interpretation of the application of his method which arrives at the same answer.
We present it here in this way because it seems a little •ore formal than his presentation. Each atomic assertion in the input assertion is . ·dually resolved with the branch atomic assertion, in indlVl the manner of theorem proving systems.
If there is a · then that conJ·unct of the input assertion is contradiction, dropped. In the TRUE branch we have: is a contradiction and is dropped.
In the FALSE branch we have: which is a contradiction, and which is a valid assertion.
In the TRUE branch, the assignment statement changes the deterministic states of K to have the value K+2.
The assignment statement maps all of the frequencies of the states of K in this branch into the frequency of the state K+2.
At the final join, the output assertion is the conjunction of the two branch assertions, namely: This statement contains the logical contradiction: Unlike the case with the restriction at the if-test, a contradiction at the join (which must be between atomic we arrive at Ramshaw's output assertion: This result is a little more useful! It says that K is either 1 or 2 and that it takes on either value with equal probability· Now, one would think that all this would lead to a very powerful method. It does. Ramshaw shows how to apply this straight forward approach to the COINFLIP algorithm in Chapter 5 of his thesis [5]. His analysis is very similar to the one that we will give in Chapter 4. But, instead of continuing to use the more straight-forward approach, Ramshaw follows Kozen's semantics for probabilisitic programs, applies measure theory, ~nd shifts to a "theoremproving" approach. He uses the following rule of consequence to prove theorems about the conditional statement:

1-[AIP]S[B], 1-[Al..,P]T[C] 1-[A]if P then S else T fi[B+C]
This rule of consequence means that, if the truth of Predicate A given that p is true implies that B is true after the execution of program section S, and if the truth of Predicate A given that p is false implies the truth of Predicate C after the execution of program section T, then i s true before the if statement involving P, s, and T, if A it follows that either B or C is true afterward. then Ramshaw's frequency system can handle some of the which Wegbreit's can't, because Ramshaw avoids proprograms blems of renormalizing probabilities.
But because Ramshaw to use this rule of consequence for the if statement, chose bis system still can't handle the "useless test": if R then nothing else nothing endif.
Ramshaw must include a special rule of consequence for the •useless test" (one that says that nothing happens).
This seems to be symptomatic of those formal systems of algorithm analysis which have grown from the work in program verification based on theorem proving.
we have just given a taste of Ramshaw's frequency system.
Readers who are interested in learning more about it should see Ramshaw's dissertation [5].

Automatic Ar alVzers
We now turn our attention to the current state of automatic analysis. We will look at two systems which have been reported in the literature. The PL statements are compiled by the PL compiler into a symbolic formula representing the time ex ecuting the program. This "object deck" is present to for the EL processor.
The EL processor, in turn, provides a buman operator with the means to manipulate the symbolic formula into answers. EL runs in an interactive mode. It allows the operator to bind formal or numerical values to the execution counts of loops and to assign formal or numerical values to the probabilities of boolean expressions.
Here, as with METRIC, the operator has to provide the critical data on the branching probabilities. The branching probabilities of different conditional statements are assumed to be independent of each other. This seems to be the most serious defect in the automatic analyzers to date.

CHAPTER 3 SYNTAX DIRECTED TRANSLATION APPROACH
In this chapter, we will discuss our approach to the systematic analysis of algorithms. The presentation follows the order in which the work actually progressed. our research was sparked by the arrival of Ramshaw's thesis [5].
It seemed to us, at the time, that the theorem-proving approach was overly mathematical. There must be, we said, a way to look at this which is more closely related to the code and more understandable by programmers. Wegbreit's article on METRIC [ 1] got us thinking about the uti 1 i ty of translating program loops into recursive subroutines.
Loops make the analysis of algorithms interesting. G. s. Lueker in a recent tutorial "Some Techniques for Sol:ving Recurrences" [ 16] gives an excellent introduction to these methods. Advanced techniques can be founa i K . n nuth [6], and especially Jonassen and Knuth [8]. shall we Lueker [ 16] • list some of the techniques mentioned by 1. summing factors --where one tries to manipulate the recurrence relations by addition of expressions for adjacent terms in the hope that the sum will •telescope" into a few terms, one of which is the nth term. This method is pa rt i cular 1 y powerful for handling probabilistic aspects of solutions.
Our work in this thesis, involved some very familiar recurrences for which the answers were easily guessed.
Translating Loops into Recursive Subroutines we will limit our discussion to algorithms expressed structured programming constructs only. This is not a usin9 particularly restrictive limitation since the structured · g constructs are all that is theoretically needed programm1n to describe any a log r i thm.
For this reason and the fact that such programs are easier to maintain, most new programming is being done using structured programming aethods.
we will adopt SPARKS as the language for expressing The following algorithm is a · modification of one by Horowitz and Sahni [10].
Only program variable R3 has any effect on the course of the recursion. Let i be the mathematical variable which corresponds to R3, and T be the number of calls on the subroutine. Then: The subroutine Tl is called from the main program with i • Rl. Therefore, the recursion is solved by: He goes te s "The essential idea is to map a recursive procedure sta , p into a new recursive procedure whose value is the p •• we are interested in the number of times that printed· The recurrence relation for it is given by: Note that i -0 2 is also odd.
we now examine the case when Ta(ie) = 1 + Ta(ie+ 1) Now, ie +l is odd, so we have i is odd, we have: i 0 is even: Since the recursions for the odd and even cases have been transformed to eliminate the dependence on parity, we have the new recurrence relations: Whose solution is easily shown to be Ta(i) = i.   The problem with the conditional statement stems from the normalizations required when taking probabilities, so why not, we reasoned, put off taking the probabilities as long as possible? Ramshaw's thesis [5] was a key to this. We observed his abandoning of his raw frequencies in favor of asserting predicates about frequencies. Another key factor in our choosing this direction was Jonassen and Knuth's paper on "A Trivial Algorithm Whose Analysis Isn't" [8].
Here were these nice joint probability distribution functions ·(p.d.f.) which appeared from "directly translating the algorithm into mathematical formalism." We set out to find the rules that had to have been used to get to these simple recurrence relations. Because we took IO many wrong turns on our way to our final ideas, we will abandon our historical presentation in favor of a more •xpository one. We also have to abandon our initial assess- •ent that Ramshaw' s approach was "too mathematical". There We perform the analysis of an algorithm's behavior by manipulating these distributions to find probabili tes for various conditions. We can then use this information in any of the analysis techniques (e.g., those given in Chapters 2 and 3), which work for known branching probabilities.
We begin by associating a random variable with each algorithm or program variable. We wi 11 follow Ramshaw [ 5] and differentiate between the two by continuing to represent algorithm variables by upper-case character strings and representing the corresponding random variable by the same characters in lower-case letters. For example, the random variable xmax is associated with the program variable XMAX.
we will deal with the discrete type of random variable our formalism because of the fact that all values within a computer can be mapped onto a finite set of integers. By discrete representations, we avoid the need for the concept of "differential equality" which Ramshaw [5] introduced to bridge the gap between continuous variables and program equality expressions. · We wi 11 develop a notation which is very close to the calculus of finite differences. Some of the rules which we will use will be derived from analogous rules in continuous probability theory and the calculus of continuous variables.
Equations (4-1) can be generalized to any finite number of Program variables by thinking of the X as a vector of the n ordered program variables and x as an n dimensional random The random variables form a vector space in ~n and a functional over that space. We will not present a formal proof, but will use It is easy to see that the following properties hold:   In general, if we wish to keep the equations in terms the original variables, we have: It is now time to examine the general assignment stateaent between two program variables.
We will use a memoryo-register, register-to-memory model for the assignment statement. This will allow us to have the statement X ~-X be a NOOP in the formalism without any special rules. We introduce the notation f x.  We will replace the input array A(I) of random variables by repeated calls to a random number generator. This simplifies the notation somewhat without sacrificing generality. We will return to the array notation when we deal with the sorting algorithms. The To get a handle on what is going on, we will follow the first few iterations of the program.
drop the termination delta function.

•ade with
In doing so we will The initial call is h(n,m,c,i) = 6(c=O) ·f(m) ·6(i=2) th rules we find that Applying e h(n,m,c-l,i-1) = 6(c=l) ·f(m) ·6(i=3) and h(n,m,c,i-1) = 6(c=O) •f(m) ·6(i=3) 10  which is the answer given by Hogg [12]. The recursion for fc(c) is the same as Knuth's [6] and Ramshaw's [5]. The first step is to convert the loops to recursive subroutine calls. We will number the statements so that they may be related back to the original program. We will also insert a counter variable, Y, to keep track of the number of times an EXCHANGE takes place. analysis.
From it we can develop the form which the distribution of a "sorted" list takes. Specifically, we have: When analyzing sorting algorithms, three different types of input distributions are usually used. These e sent the initally sorted list, the initially reverse re pr t ed list, and the initially "random" list.
These three sor sometimes cover the best, worst, and average case execution although not necessarily in that order.
In some more times, exotic algorithms, there is a more complicated input distribution which leads to the best or worst case behavior. Our approach can be used to determine the best and worst case distributions, although we will not dwell on this. The best case performance for Insertion Sort comes when the EXCHANGE never takes place, and the worst case performance comes when the exchange always takes place. "Improved" Insertion Sort

The work shown in Appendix
There is an easy way to improve the relative performance of the "oblivious" insertion sort, although the order of its running time remains the same. We note from the analysis that the portion of the joint p.d.f. that fails the test at statement 5, is already in sorted order. This suggests that we could exit from the INNER loop at this point without affecting the algorithm's ability to sort.
Even such "obvious" improvements often have hidden side effects.
Luckily our method will let us not only calculate the improvement in perf.ormance from this change, but also prove that the modified algorithm still sorts! It also turns out that the distribution of I will give a direct indication of the algorithm's performance. For this reason, we will delete the counter variable Y. In the true branch: In the false branch: 6 (i ~ 1) • 6 (i = j ) • 6 ( j < n) • 6 ( j = 1 ) This EXCHANGES the values of b 2 and b 1 This sends the false branch joint p.d.f. back to OUTER.
we see now that this test "traps" all of the joint This collapses the old joint p.d.f. on i and results in In the oblivious version, this was a trivial operation.
Here it destroys information about the distribution of the I in the last iteration.  1 2 N In the false branch: The exchange yields: Here the false branch again escapes in the form of 6 (i = 2 ) • 6 ( j < n ) • 6 ( j = 2 ) • 2 • 6 ( b 2 .?. b 1 ) At the join we have only the true branch joint p.d.f. left: This gets through to statement 5 in INNER.
In the true branch (multiply by 6cb 1 >b 2 ) and simplify): At the join we have only the true branch joint p.d.f. left: Sa sets I to zero in this case, and the next call of INNER returns this joint p.d.f.
The three sets of joint p. d. f. s meet and a re added here. We have: Increments J and we get, going back into OUTER at 9b: By now the pattern is clear. It is even easier to show that the result at the end will be: If we collapse this on i, then we get the same result as before. Therefore, the change in the program has not changed its ability to sort. This form tells us some other  We now believe that our work has formalized th is "reasoning almost directly from the code", because, when applied to this algorithm, it proceeds directly to their equations 2.1, 2.2, and 2.3 [8].
Basically the algorithm involves the insertion and deletion of keys in a binary tree structure. Jonassen and Knuth [8] give the graphical and word Procedure representation of the algorithm, we will only Present the algorithm as a SPARKS program. We wi 11 use Rarnshaw's [5] notation for the tuples representing the condition of the tree. Furthermore, we will adopt the convention that after assignment the "from" variables are set to zero ( "killed" ) • This is not really necessary, but it does simplify the notation, since after the variables are After 5 6cx>y) "f{x) "f{y) After 6 6cs=F) ·6cv<w) "f{v) "f{w) After 7 Which is what we expected, either tree is equally likely, and the joint p.d.f. is that of a sorted list of two variables. Rather than continue to follow an explicit example through the algorithm, as we have done in the past, we will define unknown functions to represent the various tree forms.
After 11 6(s=F) "fk(v,w) "f(r) ·6(v<w) ·6(r<v) • 6 (t=A) ·6 (x=r) ·6 (y=v) ·6 (z=w) using the convention of "killing" the old variables, 6(t=A) "fk(y,z) "f(x) ·6(x<y<z) Note that this convention simplifies the assignments to <t;x,y,z> because the distributions of these variables is always at this point. We have the sum of the six arms of the case statement.
It is at this point that, by looking ahead, we see that the next general functions should be defined as: ak(x,y,z)=fk(y,z) "f(x) bk(x,y,z)=fk(x,z) "f(y) ck(x,y,z)= fk(x,y) "f(z) + gk(y,z) "f(x) With f(x)=6(o<x<l) for a unitary distribution, these are equations 2.1 in Jonassen and Knuth [8].
After 20 6(t=A) ·ak(x,y,z) ·~6(l=X) ·6(s=F) ·6(v=y) ·6(w=z) ·6(x<y<z) We now apply the convention of setting t,x,y, and z to zero. This is done by "integration" over these variables using Theorem 5. We will use our summation notation, which is defined to work the same as integration if the functions are taken to be continuous. Remember that if a variable of integration appears in an Anderson delta function and is equal to a free variable, then the effect is the same as a change of variable. In this case y and z appear this way, while x appears only with respect to other variables of integration.
In the true branch: In the false branch: At this point we must decide whether the probability that b.=b. is going to be significant, or not.
If we choose 1 J to deal with continuous distributions, then this probability is zero. Likewise, if we say that the discrete elements are distinct we have the same thing. We will do this so that we can write the joined joi\ t