Towards Efficient Stochastic Optimization of Functions of Convex Sets

Let D be a DAG and let X be any non-empty subset of D’s vertices. X is a convex set of D if D contains no path that originates in X , then visits one or more vertices not in X , and then re-enters X . This work presents basic convexity algorithms for creating, growing, and shrinking convex sets using two different approaches: predecessor and successor sets, and topological sorts. It shows that the algorithms based on predecessor and successor sets typically have higher asymptotic running times than those based on topological sorts. However, when creating a convex set based upon a potentially non-convex set of “seed” vertices, the use of predecessor and successor sets permits the creation of a convex set which is the uniquely smallest superset of the seeds. This work also considers the problem of stochastically searching for a global minimum over all convex sets of a given DAG, using the basic convexity algorithms described above. This work demonstrates the existence of such an algorithm that, when run for an unbounded finite number of iterations, the probability of it covering the entire search space approaches one. This work presents one possible mapping which extends this work’s optimization of a single parallel task to the optimization of task-parallel programs with multiple tasks. This mapping is studied experimentally. The results demonstrate a significant (32%) speed improvement over a human-crafted parallelization of the same program, suggesting the possible merit in this work’s approach to automated parallel-program optimization.

This work develops a machine-learning technique to automate the work of adapting existing computer codes to take advantage of multi-processor / multi-core computers.
This work focuses on a model of parallel programming called task-parallelism, in which a computer source code is divided into sections called tasks (see Section 2.10). When the resulting program is run, different tasks can run on different processor cores, assuming that certain interdependencies are satisfied. In this way parallel execution is obtained.
For all but the smallest programs, there are typically many different valid ways to divide a program code into tasks. Generally, different groupings (called task sets) result in different running times for the resulting program, due to factors such as cache warmth and thread scheduling overhead. A number of different approaches may be taken to execute a program code that has been divided according to some task set. In this work's proof-of-concept (see Chapter 8), the program and specified task set are converted into a multi-threaded C++ program, which is then compiled to a native executable using any standard C++ compiler. This translation process is described in detail in Section 3.3.
The full task-set optimization problem (see Section 3.4) is the optimization problem of finding a task set which leads to a parallel program with an approximately minimal running time.
While the full task-set optimization problem is the motivation for this present research, this work focuses on a restricted form of the problem, called the single-task optimization problem (see Section 3.5). The relationship between these two problems is briefly discussed below, and in much more detail in Chapter 3. Because each task is also a convex set (see Section 2.9), the single-task optimization problem is also called the single-convex-set optimization problem. This work treats these two terms as interchangeable.
For this work, any program to be optimized is represented as a dataflow graph (DFG) (see Section 3.1). When representing program codes as dataflow graphs, we represent an individual task as some subset of the DFG's vertices. Figure 1 (page 19) provides an example of a dataflow graph for a program which computes a simple mathematical expression. 1 A task set is a set of tasks, and thus is modeled as a set of vertex sets. In this work, the pair of a DFG and a task set is considered to be a complete specification of a task-parallel program. Other details which may affect a task-parallel program's running time, such as how the (DFG, task-set) pair is mapped to C++ source code, are outside the scope of this research and are treated as fixed details during the optimization process. Figure 5 (pages 88-89) provides an example of different task sets defined over the same DFG.
The key complicating factor for the optimization problems considered in this work is that not all task sets are valid. For a given program, there may be some task sets which would lead to the resulting program to deadlock during execution. This problem is partially avoided by requiring that each task to be a convex set. However, as was discovered in the latter stages of this research effort, the convexity of each individual task in a set is not sufficient to ensure the absence of deadlock at runtime. This issue is treated in depth in Chapter 3.

Three Problem Levels
This thesis develops three broad groups of algorithms: (1) the Full Task-set Optimization Algorithm, (2) the Single Convex-Set Optimization Algorithm, and (3) Basic Convexity Algorithms.
These three groups of algorithms, and the relationships between the groups, are described briefly below.

Full Task-set Optimization Algorithm
The full task-set optimization algorithm directly addresses the automatic parallelization problem which motivates this thesis. This algorithm is studied in the proof-of-concept described in Chapter 8.
In the early stages of this research, our goal was to focus on the development and study of an algorithm which directly addressed the full task-set optimization problem.
Furthermore, we strove to ensure that given an unbounded number of iterations, the probability of the algorithm finding a globally optimal value would approach one.
Our early approach to this algorithm was to develop another algorithm which mutates a single convex set (see Subsection 1.2.2,below). We designed the full task-set optimization algorithm to execute one instance of the single convex-set mutation algorithm for each task in the evolving task set. The evolved task-set would simply be the collection of convex sets evolved in this manner. We assumed that as the full task-set optimization algorithm ran for an unbounded number of iterations, the probability of it discovering a globally optimal task set would approach one. We expected this to be an emergent property resulting from trivially combining individual convex sets whose mutation algorithm had a similar probability property. This approach to the research was necessarily changed when we discovered that some collections of convex sets did not form a valid task set. When a task set contains exactly one task, deadlock is avoided precisely when that task is a convex set. However, when a task set contains multiple tasks, deadlock may arise even when each individual task is a convex set. This problem is discussed in detail in Subsection 3.3.1.3. To address this problem, we modified our full task-set optimization algorithm to alter individual tasks, as necessary, to eliminate any deadlock which might arise when combining those tasks into a single task set. Our approach is described in Appendix E. However, this resulted in a non-trivial mapping from individually mutated convex sets to a full task set. The non-triviality of that mapping raised serious doubts that the resulting full task-set optimization algorithm could be proven to discover globally optimal task sets given an unbounded number of iterations.
Due to the then-limited amount of time to complete this thesis' research, a decision was made to refocus our efforts on proving the above-mentioned probability quality for only our single task-set optimization algorithm. Our hope, and a central assumption of this work, is that by proving that quality for our single convex-set optimization algorithm, we provide a stepping-stone for future attempts to develop a full task-set optimization problem with that probability property.
The proof of concept in Chapter 8 demonstrates that regardless of whether or not our full task-set optimization algorithm would probably discover a globally optimal task set given enough iterations, the algorithm can in practice discover useful task-parallel optimizations.

Single Convex-Set Optimization Algorithm
This algorithm (Algorithm 3 in Chapter 7) discovers a single convex set which approximately minimizes a specified cost function. As the algorithm runs for an unbounded number of iterations, the probability of it discovering a global minimum asymptotically approaches one.
Note that the full-convex-set optimization algorithm described above does not use the entirety of this single-convex-set optimization algorithm. Instead, it uses this algorithm's technique for mutating individual convex sets, which is arguably the most interesting aspect of this algorithm. This thesis establishes the following qualities for the presented single-convex-set optimization algorithm: Asymptotically Complete Search/Optimization As discussed above, this is the property that if run for an unbounded number of iterations, the probability of this algorithm examining every convex set of the specified DFG approaches one. This quality is proven in Subsection 7.5.3. When this algorithm is used in its full form, examining every convex set of the DFG implies finding a global minimum.
As noted earlier, this thesis' implementation of the full-task-set optimization algorithm uses just part of this single-convex-set optimization algorithm. However, even the excerpted portions of this algorithm visit all convex sets of the DFG with a provable asymptotic probability of one.
Per-iteration Efficiency We show that each individual iteration of this algorithm has a running time that is polynomial with respect to the size of the DFG. We establish this (Subsection 7.5.2) to demonstrate the practical usefulness of the algorithm.
Usefulness to the Full Task-set Optimization Problem Because the singleconvex-set optimization problem is not identical to the full-task-set optimization algorithm which motivates this work, it remains necessary to show that this algorithm is at least a step towards solving the motivating problem. This thesis' proof-of-concept (Chapter 8) provides empirical evidence of this.
This algorithm draws upon a third group of algorithms, basic convexity algorithms, described below.

Basic Convexity Algorithms
Appendix C presents a collection of algorithms which create a new convex set, or grow or shrink an existing convex set. These algorithms are required by the single-convex-set optimization algorithm, and by implication the full-task-set optimization algorithm, both described above.
Two alternative groups of these algorithms are developed and studied. One group of algorithms is based on predecessor and successor sets (Chapter 5), and the other group is based on topological sorts (Chapter 4). The comparative merits of these two approaches are discussed in Section 7.4.

Structure of Thesis
The remainder of this work is structured as follows. This chapter presents this work's contribution (Section 1.4) and related work (Section 1.5).
Chapter 2 presents the notation and basic graph theory concepts used in this work.
Chapter 3 describes in detail the a parallel-program optimization problems which motivate this present work.
Chapters 4 and 5 develop two distinct approaches to creating, growing, and shrinking some arbitrary convex set of a DAG. Chapter 4 develops these basic convexity operations based upon a DAG's topological sorts. Chapter 5 presents the same basic convexity operations, computed in terms of the predecessor and successor sets.
Chapter 6 establishes that given some DAG D and any two convex sets of D, X and Y, one can always evolve X into Y using a sequence of single-vertex additions, deletions, or replacements, such that the set resulting from each modification is itself a convex set of D.
Chapter 7 discusses the design of this work's single-convex-set stochastic optimization algorithm, Algorithm 3. Algorithm 3 is implemented in terms of topological sort-based basic convexity algorithms (see Chapter 4 and Appendix D). However, the optimization algorithm may be trivially modified to instead use operations based on predecessor/suc-cessor sets (see Chapter 5 and Appendix D). The relative merits of these two approaches is discussed. This chapter also discusses a modification to the optimization algorithm in order to achieve local refinement.
Chapter 8 presents experimental results obtained in the empirical validation of this work's algorithms.
Chapter 9 provides the thesis' conclusions and suggestions for future work.
Appendix A details the primitive operations used by this work's algorithms. For each primitive operation we state an asymptotic running time and, when appropriate, the randomness properties exhibited by the operation.
Appendix B presents utility algorithms which are called by this work's higher-level algorithms, but which are not the focus of this work.
Appendix C presents algorithms for initializing, growing, and shrinking an arbitrary convex set of a DAG. This appendix discusses both the basic algorithms based upon topological sort as well as those based on predecessor and successor sets. A runningtime analysis is given for each algorithm in this appendix, to support the running-time analysis of the single-convex-set algorithm, Algorithm 3. Analyses are also given for the randomness properties of the algorithms which grow or shrink convex sets, to support related proofs in Section 7.5.
Appendix D presents algorithms which are not used by the optimization algorithm which is the focus of this work (Algorithm 3), but are instead used in the constructive proofs of Algorithm 3's correctness.
Appendix E presents an algorithm for the deconfliction of task sets, an issue raised in Subsection 3.3.1.3.
Appendix F presents one of the task-parallel C++ codes produced during that same experiment.

Contribution
In Chapter 4, Theorem 4.2.2 establishes that for any convex set X of some DAG D, there exists a topological sort of D in which the vertices of X appear as a contiguous subsequence. Theorems 4.2.6 and 4.2.7 show that for any convex set X of some DAG D, one can always generate a superset or subset of X , having an arbitrary order (i.e., number of vertices), that is also convex set of D.
In Chapter 5,Lemma 5.2.11 shows that the formula used in [3, and possibly referenced in [2], called in this present work an internal path closure, is sufficient to ensure the convexity of a set. Corollary 5.3.5 shows that the internal path closure of any set X is the uniquely smallest improper superset of X that is also convex.
Theorem 5.4.7 provides a precise formula for identifying which vertices in some convex set X have the quality that deleting just that one vertex from X yields another convex set.
Section 7.5 presents an efficient stochastic optimization algorithm (Algorithm 3) whose domain is all convex sets of a DAG. As the algorithm's iterations increase without bound, its probability of exploring the entire search space approaches 1. To the author's knowledge this is the first presentation of an optimization algorithm over this domain having these qualities.
In Appendix D, Algorithm 19 demonstrates that given any DAG D and any two sets X and Y that are convex sets of D, one can always evolve X into Y using a sequence of single-vertex additions, deletions, and replacements, such that each intermediate set produced by the single-vertex changes is itself a convex set of D.

Iterative / Optimal Compiling
Numerous projects have used machine learning and/or iterative compilation to improve the optimization heuristics employed by compilers [4,5]. Fursin et al [6] developed a system that repeatedly compiles a target program, searching for the optimal optimization parameters for each individual construct in the target. Cooper et al [7] seek to reduce the time required for iterative compilation by using models rather than actual executions to determine the running times of various optimized compilations of the target program.

Automated Algorithm Selection
The FFTW project [8,9] provides a library for calculating discrete Fourier transforms (DFTs). It benchmarks the running times of algorithm fragments, and then composes them to handle arbitrarily large inputs. The Spiral [10] system searches through the space of valid employments of interchangeable algorithms in order to minimize program running time. As in FFTW, Olszewski and Voss [11] use benchmark-informed search for optimal algorithm choices, but do so for parallel sorting algorithms. The PetaBricks project [12] goes further by allowing application programmers to specify a set of functionally similar algorithms, which the PetaBricks system seeks to optimally compose.

Task Partitioning and Forking
Extensive research has occurred regarding finding the optimal partitioning of applicative programs into tasks for parallel execution. This research typically involves partitioning a dataflow-graph representation of the target program [13,14,3]. Smyk et al [15] describe a genetic algorithm for partitioning dataflow graphs, but with the goal of defining a computational mesh for a distributed system, rather than a set of tasks to be activated as needed on a shared-memory system.
The Mul-T system [16] requires programs to explicitly indicate which function calls may be safely executed in parallel, unlike my proposed work and [17] whose functions are parallel-safe by construction. Our work also can reorganize the original program to have parallel functions that never existed in the original source code, whereas Mul-T cannot.

Conditional Parallelism
The Mul-T system [16] decides at runtime, based on issues such as current system load, whether to spawn a function call as a parallel task or to instead directly call the function from within the thread of the caller. Rus et al [18] use runtime analysis to decide the safety, not the profitability, of executing a program fragment asynchronously. Huelsbergen et al [19] use static analysis to estimate the cost of each function call based on the size of its parameters, and at runtime use that information to decide whether or not the overhead of an asynchronous call is worth incurring.
Duran et al [20] and Prechelt et al [21] use machine learning to decide whether to make a recursive function call synchronously or asynchronously based on the level of recursive nesting and on current system load. Both of these systems use decision functions whose form is fixed by their learning systems.

Parallelization of Sisal Programs
Various projects have parallelized Sisal or IF1 codes. Sarkar [3] uses modeling to estimate the execution cost of each task created by a particular partitioning of a target program's IF1 representation, ultimately producing tasks whose estimated running time always exceeds some minimum threshold. The OSC compiler [22] offers automatic data parallelism of some loops (via loop slicing), based on factors including the estimated cost of certain units of work, and on thresholds specified during compilation. Beard [23] developed a Sisal compiler back-end to target distributed (e.g., message-passing) computers.

POSC
This work is most directly a continuation of the work done by Sarkar and Cann for the POSC [17] extension to the OSC compiler. In POSC, each function is potentially a different child task; compile-time-specified dofork flags determine whether a given function call is performed inline or asynchronously via a fork operation. My work goes beyond this by actually formulating new functions to be parallelized.

Convex Sets
Sarkar and Hennesy [2] describe an approach to parallelization in which the vertices of a dataflow graph may be grouped together into one or more parallel tasks. Their paper recognizes the need for convexity to ensure freedom from deadlock at runtime.
They present an algorithm which potentially generates non-convex tasks, which are later converted into similar, convex sets using what the author's call the convex hull of the non-convex set. The paper provides no obvious definition for that term, but later work ( [3]) suggests that their convex hull is the same as this present work's internal path closure concept (see Section 5.3).
The algorithm in [2,3] uses a deterministic optimization technique to efficiently obtain a collection of convex sets representing parallel tasks. However, the problem which they seek to solve has NP time complexity ([3, section 4.2-4.3]), and so their approach cannot guarantee optimality.
Sanchez and Trystram [24] use a genetic algorithm to obtain a collection of tasks from a dataflow graph. As with [2,3], all convex sets in a collection are mutually disjoint.
However, [24] assigns every vertex to some convex set, whereas [2,3] permits some vertices in the dataflow graph to remain unassigned. It is not clear from this paper whether or not the genetic algorithm has a non-zero probability of searching the entire problem space when starting with an arbitrary seed and allowed to run for a finite but unbounded number of iterations.
Pecero and Bouvry [25] provide a different genetic algorithm to obtain a collection of convex sets in a dataflow graph. As with [2], a local heuristic is employed to convert a non-convex task into a convex one. However, whereas [2] obtains convexity by adding vertices, Pecero and Bouvry's algorithm obtains convexity by deleting vertices.
The mutation operator of [25] splits some convex set X into two convex sets: for some u ∈ X , the sets {u} and (X \ {u}). [25] states that u is drawn from the "top" or "bottom" of X , suggesting a recognition of the validity of Theorem 5.4.5 in Section 5.4 of this present work.
Andersen et al [26] consider a problem domain in which groups of operations are permitted to overlap. This is motivated by the fact that in distributed systems, the performance impact of moving data between processors may outweigh the cost of redundantly computing the data at whichever additional processors require it.
Baslister et al [27] and Bang-Jensen and Gutin [1, section 17.2.3] present an efficient algorithm for enumerating all convex sets of any directed acyclic graph.

Coverage of Search Space by Stochastic Algorithms
Rudolf and others have studied the problem of proving that certain classes of stochastic optimization algorithms cover the entire search space of a specified problem, given enough iterations. Rudolf uses Markov chains to demonstrate that for a particular conception of genetic algorithms, convergence to a globally optimal solution is entirely contingent on whether or not elitism is used to retain the best result [28]. [29] extends this analysis to a broader class of evolutionary algorithms.

CHAPTER 2 Terminology, Notation, and Preliminaries
The circumflex diacritic appearing inP andŜ (see Chapter 5) is used exclusively to denote those two sets and has no broader meaning. Other notational conventions are presented in the appropriate subsections below.

Sets
This work considers sets of several kinds of objects. Items surrounded by curly braces ({ }) always indicate the specification of a set's content, either using set-builder notation {x|P (x)}, or by explicit enumeration of a set's values, i.e. {a, b, c}.

Sequences
This work uses several different kinds of sequences: sequences of vertices, which sometimes represent paths (see Section 2.3), and sequences of vertex sets. In this subsection we give the sequence-related notation that applies regardless of the kind of element contained within the sequence.
A sequence may be identified by its elements, enclosed in square brackets and with individual elements separated by commas, as in [a, b, c]. Ellipses indicate an region of the sequence having zero or more unspecified elements, for example [a, b, . . . , z].
Names of paths and other vertex sequences are given as upper-case non-script letters with an overhead arrow, e.g., Q. An upper-case script letter with an overbar denotes a sequence of sets. For example, T = [T 1 , T 2 , . . . , T n ] denotes a sequence of n sets.
The elements of each sequence are numbered 1, 2, etc. As with sets, enclosing a sequence's name within vertical bars (e.g., | X|) indicates the number of elements in the sequence, called its length.
Suppose X is some sequence. Then X i denotes the i th element of the sequence.
X i . . . j denotes the subsequence of X from X i to X j , inclusive.
Suppose X and Y are two sequences. Then the notation [ X Y ] indicates the concatenation of the elements of the X and Y : does not indicate a two-element-long sequence of sequences.
When the same letter is used to name both a path and an unordered vertex set, e.g. Q and Q, the two objects implicitly contain the same vertex set.

Directed Graphs, Paths, Cycles
A directed graph D = (V, A), also called a digraph, is defined as a set V of vertices, and a set A of arcs.
Lower-case letters indicate individual vertices within a graph. Unordered sets of vertices are named with upper-case script letters, e.g. X . The letter D always names an entire digraph, A always indicates the complete set of arcs in a digraph, and V always indicates the complete vertex set in a digraph.
The number of vertices in a graph is called the graph's order, and the number of arcs in a graph is called its size. Each arc a ∈ A is denoted (u, v), where u and v are called the arc's source and destination, respectively.
In some cases an unnamed digraph may be identified by its vertex set and arc set. For example, (V, A) rather than D. We treat these two notations as interchangeable.
This work uses a shorthand notation for arcs, in that an arc's source and/or destination may be specified as a set, for example (X , y). This notation indicates that the arc is an unspecified member of the set of arcs whose source and/or destination vertex is a member of the indicated set. For example, "(X , y)" is shorthand for "(u, y) for some u ∈ X ".
Suppose D = (V, A) is a digraph, and P is a vertex sequence in V. We say that P is a path of D if and only if ( P 1 , P 2 ) ∈ A, ( P 2 , P 3 ) ∈ A, . . ., ( P | P − 1| , P | P | ) ∈ A. P 1 is called the initial vertex of P , P | P | is called the terminal vertex of P , and all other vertices in P are called internal vertices.
Let P be any path in some digraph D. Using the notation of [1], we say that P is an (x, y)-path if P 1 = x and P | P | = y. We say that P is an (X , Y)-path if P 1 ∈ X and P | P | ∈ Y. This notation also entails (x, Y) and (X , y) constructions with the obvious meaning.
A subpath is any contiguous subsequence within a given path. For example, in the path and [ P 1 , P 2 , P 3 ]. Note that if P is a path in some digraph D, then every subpath of P is also a path in D.
A path is direct if it has a length of exactly two, and is indirect if it has length greater than two. A DAG is acyclic precisely if there exists at least one topological sort of its vertices. Let T be any total ordering of V. We now consider what set of arcs A would permit T to be a topological sort of D while maximizing |A|.
T is a topological sort of D if and only if for every arc (x, y) ∈ A, x appears before y in T . We assert but do not prove that the maximally large arc set A max meeting this requirement is constructed as follows. For each vertex T i in T , create an arc from T i to every higher-indexed vertex in T . That is, Then for any i ∈ [1, |T |], A max contains |T | − i arcs whose source vertex is T i . From this we have: Because T = V, we have |A max | = |V|(|V| − 1)/2.

Contractions of Digraphs
Let D = (V, A) be a directed graph, and let X be any non-empty subset of V. Informally, the contraction of X in D (see also [1, sect. 1.3]) is based on D, but replaces all of X with a single placeholder vertex x ′ .
More precisely, let D C = (V C , A ′ C ) be the contraction of the vertex set X to x ′ in D. Then D C is constructed as follows: Note that the contraction of a DAG might contain a cycle. Suppose D = (V, A) is a DAG, and X is the subset of V to be replaced by the new vertex x ′ in contracted directed graph. If A contains at least one arc having both endpoints in X , then the contracted graph contains the the arc (x ′ , x ′ ).

Induced Subdigraphs
Let D = (V, A) be a digraph, and let X be any non-empty subset of V. The subdigraph D < X >= (V S , A S ), read "the subdigraph of D induced by X ", is defined as follows (see also ([1, sect. 1.2]): Algorithm 6 (Appendix B.2) computes induced subdigraphs.

Transitive Closure
Every DAG D has a transitive closure, denoted T C(D).
Informally, T C(D) has the same vertices as D, but a superset of the arcs in D. It has been shown that T C(D) can be computed for a given directed graph of order n in time O(n 2.376 ) [1,Prop. 2.3.5].

In/Out-neighbors
For a DAG D = (V, A), we define the in-neighbors of a vertex x as the set of Similarly, the out-neighbors of a vertex x are the vertices The concept of in-neighbors and out-neighbors may be extended to the neighbors of a set rather than of an individual vertex, as follows. The in-neighbors of a vertex set Similarly, the out-neighbors of a vertex set X is the set of vertices { y | ((X , y) ∈ A) ∧ (y / ∈ X ) }. Note that our definitions ensure that a vertex is not both a member and an in/out-neighbor of the same set.
A vertex w is an strictly direct in-neighbor of a vertex set B if and only if w is an in-neighbor of B and D contains no path of the form [w, . . . , x, . . . , B], with x / ∈ B.
That is, there must be an arc from w to B, but there must not be any indirect paths that begin in w, and then pass through other vertices not in B, and then enter B.
Similarly, a vertex y is an strictly direct out-neighbor of a vertex set B if and only if y is an out-neighbor of B and D no path of the form [B, . . . , x, . . . , y], with x / ∈ B.
The set of strictly direct in-neighbors of a set X is denoted N ⊖ (X ), and its set of strictly direct out-neighbors is denoted N ⊕ (X ). Algorithms 7 and 8 (Appendix B.3) compute N ⊖ and N ⊕ , respectively

Convexity and C-paths
Let D = (V, A) be a DAG, and ∅ ⊂ X ⊆ V. A directed acyclic path Q in D is a C-path of X 1 if Q's initial and terminal vertices are members of X , Q has at least one internal vertex, and and all of Q's internal vertices are members of D \ X .
Let D = (V, A) be a DAG, and ∅ ⊂ X ⊆ V. We say X is a convex set of D if and only if D contains no C-path of X . If D does contain at least one C-path of X then X is a concave set of D. The The gray vertex set {+, /, f } is not a convex set of this DFG, because there exists a path, [+, * , f ], which originates within the set, and then exits the set, and then re-enters the set.
Examples of convex and concave sets of a DAG are given in Figure 1a and Figure 1b, respectively.
A DAG D = (V, A) may have as many as 2 |V| − 1 convex sets. This occurs when A = ∅, in which case every non-empty subset of V is also convex set of D.

Threads vs. Tasks
For this work we assume that the term thread has the usual meaning of a mechanism for executing a sequence of program instructions. Examples of thread implementations include POSIX pthreads, the std::thread class provided by the C++11 standard library, and low-level portions of the Thread Building Blocks library. This present work does not strongly distinguish between threads and tasks, as both are units of execution which can be created, used to execute a specified subroutine, and then be deleted.
The only difference in this work between threads and tasks is one of connotation, reinforced by the terminology of the Thread Building Blocks library used in this work's proof of concept implementation (Chapter 8). In some parallel programming environments (including the Thread Building Blocks), a thread is a long-lived object which executes zero or more tasks over its lifetime. Within that framework, this work's optimization technique concerns the assignment of program instructions to tasks, not threads.

Algorithm Pseudocode
This work contains numerous algorithms presented as pseudocode. For readability, common mathematical notation is used when possible. For example, X ← ∅ denotes assigning the vertex set identifier X to the value empty-set. To clarify the semantics and running time of a given line of pseudocode, supplemental text is often provided on the right-hand side of the algorithm's text.  . . .

3: end for
time required for the loop iterator to obtain all |X | members of X for use in the loop.
For pseudocode statements appearing within the body of a loop, a specified running time denotes the execution of that statement during a single iteration of the loop.
In some cases, one line of algorithm pseudocode may imply multiple operations, each of which has a running time that must be considered. In such cases, one of the underlying operations is presented on the same line as the pseudocode, and the remaining underlying operations are listed on subsequent lines, as in this example: . . .

3: end if
When a single line of pseudocode indicates multiple operations, in some cases for brevity only the operation with the dominant asymptotic running time is provided.

CHAPTER 3 Motivation and Problem Statement
The primary motivation for this work is the problem of efficiently finding an adaptation of a program for task-parallel execution which approximately minimizes its running time. Section 3.1 provides a brief overview of the automatic task parallelization problem. Section 3.2 describes several execution models which motivate this work. Section 3.3 provides a complete description of the process used by this present research to convert one or more convex sets to a task-parallel computer program with a measured average running-time. Of particular interest is Subsection 3.3.1.2, which discusses the need for parallel tasks to be convex sets. This provides the fundamental connection between this thesis' focus on convex sets and the motivating problem of parallel tasks. Section 3.4 presents the formalization of an optimization problem for solving automatic task parallelization. In this problem, the solution space is all non-overlapping convex sets of a dataflow graph. This present research is a step towards solving this problem, but does not fully solve this optimization problem. Section 3.5 presents a special case of the full optimization problem discussed in Section 3.4. In the special case, solutions are restricted to containing just one convex set.
A solution to this problem is the focus of this present work. Section 3.6 discusses one approach that may be used to solve the full task-set optimization problem using an algorithm which solves only the single task-set optimization problem.

Automatic Task Parallelization
In task-parallel computing, fragments of a computer program's code are grouped together into units of work called tasks, which at certain points in the program's execution may be executed in parallel with other running parts of the program.
A parallel program's running time can be significantly affected by the particular mapping of program instructions to tasks. Automatic task parallelization is the problem of using machine learning to identify a grouping of program instructions into tasks such that the running time of the resulting program is approximately minimized.
When automating task parallelization, it is convenient to represent the logic of the source program as a dataflow graph (DFG) (see Figure 4). A DFG is a directed acyclic graph (Section 2.3) in which each vertex represents either a program input source, a program output destination, or basic program operation. Each arc (u, v) in the DFG indicates that the data provided by vertex u is required by vertex v. Each task is defined as a particular subset of that DFG's vertices. An example of mapping each vertex in a DFG to some task or to no task at all can be seen in Figure 5 (pages 88-88). In that figure, all vertices contained within the same dash-outlined region belong to the same task.
A DFG, paired with a collection of tasks defined over that DFG, may then be translated into a standard parallel executable program.

Execution Models
For this work we consider two execution models. The first is an intermediate execution model for task-parallel DFG's. This first execution model is used to reason about taskparallel optimization in graph-theoretical terms. The second is a more concrete target execution model, in which DFG's are translated into compiled C++ code for actual execution. This second execution model is used in this work's proof of concept (see Chapter 8). We assume a strict execution model, in which a vertex may begin execution any time after all of its input data are available. An input datum is available immediately if it is a program input or a constant value. Otherwise, an input datum is available as soon as the vertex producing it has completed execution. Parallel execution of a DFG is possible precisely when all input data are available for two or more vertices that have not yet completed execution.
When a DFG is executed within a thread, the DFG's vertices are executed in some particular total order, called a thread-local schedule. For this work, we consider a thread-local schedule to be valid if and only if it orders the vertices in a sequence compatible with the assumptions of a strict execution model. In other words, a threadlocal schedule is valid if any only if it is a topological sort of the DFG (see Chapter 4).
A multi-threaded schedule is set of thread-local schedules for all threads in the program. A two-threaded program may use DFG's which admit only one thread-local schedule per thread, and yet have a multiple multi-threaded schedules. This arises because we make no assumption about the relative paces at which two threads execute the vertices of their currently respective DFG's. In the execution model assumed by this work, a task-parallel DFG executes to completion if and only if each individual task executes to completion.

Computational Vertices
This model does not limit the kinds of operations that may be performed by a given vertex. However, some assumptions are made about all vertices.
• A vertex's output is assumed to be purely a function of its inputs.
• A vertex can only influence the output of the program through the particular values it produces on its output arcs.
• Once a vertex begins execution, it runs to completion within a finite amount of time.

Threading
For this work we assume a two-level hierarchical threading model, with one parent thread and zero or more child threads. Each thread may execute up to one DFG at a time.
The parent thread's DFG may contain SPAWN(D) and WAIT(D) vertices, where D is the name of some other DFG to be executed within a child thread. Child DFG's may not contain SPAWN and WAIT vertices. This limitation is imposed only to constrain the size of the search space treated by this thesis.
The SPAWN and WAIT vertices provide both synchronization and data transfer in the multi-threaded environment.
A SPAWN(D) vertex creates a new task in which the DFG D executes to completion.
The SPAWN(D) vertex does not necessarily create a new thread; instead it creates a task which is to be executed at sometime in the future by an otherwise idle thread. For each arc in the parent DFG whose destination is a SPAWN vertex, the data carried by that arc are treated as external inputs to DFG D.
A WAIT(D) vertex can begin execution at any time, but will not complete its execution until the entire DFG D has completed its execution. For each arc in the parent DFG whose source is a SPAWN vertex, the data carried by that arc are treated as external outputs of the DFG D.
For each child DFG D, we assume a one-to-one pairing of SPAWN(D) and WAIT(D) vertices in the parent DFG.

Task-parallel DFG Specification
We define a task as any non-empty subset of vertices in a DFG. For a given DFG D, we define a task set P = {T 1 . . . T n } as a set of tasks of D.
For this work's purposes, we assume that within a given task set P , all tasks are pairwise disjoint. That is, ∀i = j, T i ∩T j = ∅. 1 One example of a task set may be seen in Figure 5a (page 88). In that figure, the DFG has eleven vertices, and the dashed regions collectively indicate a task set containing six tasks: {{randomize1, quicksort2}, Let D be a DFG, and let P be a task set of D. As described below (Section 3.3), the translation from (D, P ) to a system-native executable program is a deterministic function of only D and P . This present work therefore considers the pair (D, P ) to fully specify a task-parallel program in its optimization problems.

C++ Program Execution Model
For this work, the target is C++ source code, using the Thread Building Blocks library, compiled and executed on a computer running the Linux operating system.
Each DFG (the parent and each child-task DFG) is represented as a C++ function.
Each input arc for a DFG is represented by an input parameter to the function, and each output arc is represented by an output parameter to the function.
Each vertex in a DFG is translated to a pre-defined group of one or more C++ statements. The C++ renditions of a DFG's vertices are ordered within the C++ function in a manner consistent with some topological sort of the DFG.
The C++ program has a predefined main function, which calls the C++ function representing the top-level DFG. The top-level DFG uses Thread Building Block library calls to spawn each child DFG's function as a parallel task, and to later block on that task until it completes.

Translation Chain: From Multiple Convex Sets to Task-Parallel Executables
Translating the pair (D, P = {T 1 , T 2 , . . . , T n }) into a measured average running time involves several steps, described in detail below.

From Task-set to DFG Set
The approach studied in this thesis to translate a pair (D, P = {T 1 , T 2 , . . . , T n }) into an equivalent collection of DFG's {D parent , D child 1 , D child 2 , . . . , D child m } is given below.
We begin, however, with describing one obvious approach to such translation (Subsec-

Naïve Translation Algorithm
One obvious translation of (D, P = {T 1 , T 2 , . . . , T n }) into an equivalent collection of DFG's {D parent , D child 1 , D child 2 , . . . , D child m } is as follows: • Each child task T i is formed into a new DFG by simply using computing the subgraph of D induced by T i (see Section 2.6). That is, D child i = D<T i >.
• The parent DFG D parent is formed by replacing each task D child i with a pair of vertices SPAWN(D child i ) and WAIT(D child i ) , and an implicit arc (SPAWN(D child i ), WAIT(D child i )) as described in Subsection 3.2.1.3.

Necessity of Child-task Convexity
Consider the DFG and task set pair (D, P = {T 1 , T 2 , . . . , T n }). If using the naïve translation technique described above, every task T i must be a convex set (see Section 2.9) of the D. The reason is as follows.
Recall that the execution of a DFG in the task-parallel DFG execution model terminates if and only if the DFG is acyclic (Subsection 3.2.1.1).
Suppose D i parent is the graph which results from replacing, in D, V i with the vertices SPAWN(T i ) and WAIT(T i ) . Because D is a DFG, it is by definition acyclic. However, D i parent will contain a cycle if and only if T i is not a convex set of D, for reasons explained below. If D i parent contains a cycle, it cannot be executed to completion in our execution model. We therefore require that every task T i is a convex set of D. Because v / ∈ T i , v resides in D i parent rather than in some child-task DFG. However, either directly or indirectly, T i requires the output of v as an input to the DFG for the child task T i . Therefore D i parent contains a (v, SPAWN(D child i )) path.
However, due to the concavity of T i , D also contains a (T i , v) path. That is, either directly or indirectly, v requires the output of T i as an input to v. Therefore

Insufficiency of Child-task Convexity with Multiple-Task Sets
Consider again the DFG and task-set pair (D, P = {T 1 , T 2 , . . . , T n }).
When P contains precisely one task (i.e., n = 1), and T 1 is a convex set of D, the naïve translation algorithm described above (Subsection 3.3.1.1) always yields a parent DFG D parent which is a convex set of D.
However, when n > 1, there are some DFG's D and task sets P such that the naïve algorithm yields a parent graph D parent contain a cycle. This is possible even when every task in P is itself a convex set of D. An example of this is shown in Figure 2 on page 31.
This issue arises because, while all the tasks in P are convex sets within the original DFG D, they are not necessarily convex sets within the DFG's formed as the individual tasks within P are successively replaced with SP AW N and W AIT vertices.

Corrected Translation Algorithm
Here we consider improvements to the naïve translation algorithm (Subsection 3.3.1.1) such that the problem of induced cycles described above (Subsection 3. has not yet been contracted. Note that task {v2, v3} is a convex set of D, but is not a convex set of D 1 parent . The arc (v3, SPAWN(v1v4)) is present because the output of v3 is a required input to v4. The arc (WAIT(v1v4), v2) is present because the output of v1 is a required input to v2.

Algorithm 1 Corrected Task-Set to DFG-Set Translation
Function task set to DFG set corrected : (D, P ) → S Require: Add vertex SPAWN(D child i ) to D parent . 10: Update each (u, dummy i) arc in D parent to be (u, SPAWN(D child i )).

11:
Update each (dummy i, v) arc in D parent to be (WAIT(D child i ), v).

12:
Delete dummy i from D parent . 13: For each DFG D i , an equivalent C++ function is defined. Each arc in the DFG which enters T i is an input parameter of the C++ function, and each arc leaving T i is an output parameter of the C++ function. If a given task T i = ∅, then the translation proceeds as though T i does not exist as a task.
The DFG D parent is translated into a C++ function which directly executes within the C++ program's main thread. SPAWN and WAIT vertices are directly mapped to equivalent Thread Building Blocks constructs.
Within this present work, each DFG vertex maps to an equivalent block of C++ code.
Some topological sort of the DFG's vertices is computed, and the C++ renditions of the DFG vertices appear in the C++ function according to that sort.
DFG's sometimes have multiple topological sorts, and it is possible that different orderings of the translated DFG vertices within the C++ function lead to different running times for that function. The freedom to choose among different topological sorts could be a reasonable component of any optimization algorithm which seeks to minimize running time. The careful selection of this topological sort lies outside the scope of this present research.

Obtaining Program Running Time
For this research, we assume that a program's average running time is obtained by repeatedly executing a program some number of times to obtain an average. More sophisticated approaches may be used to optimally estimate the program's mean runningtime, assuming some particular underlying distribution and a desired confidence interval for the mean. However, these more sophisticated approaches were not employed, as the focus of this thesis' research lies elsewhere.

Full Task-set Optimization Problem
One approach to parallelization is to identify one or more groups of tasks in a DFG such that (1) each task can be safely run in parallel with the rest of the program, and (2) program running time is approximately minimized [2,3,24,25,30]. We define the full task-set optimization problem as follows.
Let D = (V, A) be a dataflow graph representing part or all of a program's logic. Let S be the collection of all valid task sets for D. A task set P = {T 1 . . . T n } is considered a valid task set of D if and only if it has the following qualities: 1. Every task T ∈ P is a convex set of D. See Subsection 3.3.1.2 for the origin of this requirement.

For any two tasks
for the origin of this requirement.

Each of the DFG's in {D
. . , f Tn }, the contraction 2 of P in D, is acyclic. See Subsection 3.3.1.3 for the origin of this requirement.
Note that the corrected translation algorithm (Algorithm 1) accepts inputs in which only the first of those three requirements, convexity, is guaranteed. The output of Algorithm 1 is a task set which ensures all three of the above-listed requirements are satisfied. By providing this mapping, we allow a greater variety of solution-generating kernels to be used for this optimization problem, as we have a strategy for using output from kernels whose outputs do not necessarily meet the second or third requirements listed above.
Let cost : S → R be a cost function. A typical definition of cost might be the average running time of the program obtained by translating a task set S into an executable program and then averaging its running time from several runs, as described in Subsec- The task-set optimization problem is to find a task set P min ∈ S which approximately minimizes cost.
The author is unaware of an efficient deterministic solution for this optimization problem.
Solution by full evaluation of the solution space is impractical for non-trivial dataflow graphs, as the size of the solution space can be at least exponential in the number of vertices |V| depending on the arc set A. Convex-function optimization techniques are problematic as well, as we make no assumption that the cost function is globally convex for any obvious presentation of the search space.

Single-task Optimization Problem
We define the single-task optimization problem as the special form of the task-set optimization in which we impose the additional limitation that the solution space is limited to task sets containing precisely one task. By imposing this constraint on the solution space of the more general full task-set optimization problem, several other constraints in the task-set optimization problem become moot, as they're trivially satisfied for task sets containing just one task.
By permitting the task set to contain only a one task, the search space described for the full task-set optimization problem (Section 3.4) is greatly simplified, leaving only the first requirement that the task in the task set is a convex set of the DFG.
As with the task-set optimization problem, the single-task optimization problem cannot be practically solved via full evaluation of the search space for any but the smallest DFG's, and our lack of assumptions about the structure of the cost function prevents the use of optimization algorithms which assume a globally convex cost function.

Extending Single-task to Full Task-set Optimization
The research presented in this thesis was motivated by the full task-parallelization problem (Section 3.4), and yet the research focuses on a significantly restricted form of the problem (Section 3.5). We demonstrate here that an algorithm which solves the restricted optimization problem does in fact help to solve the full optimization problem.
Algorithm 3 is a per-iteration-efficient stochastic optimization algorithm for the singletask optimization problem. Consider a variation of Algorithm 3, which we'll call Z. Each iteration of Z yields some convex set of the specified DFG, but may also emit the empty set.
From this we may construct a stochastic algorithm which seeks to solve the full task-set optimization problem. A as sketch of which is given by Algorithm 2. It works as follows.
The algorithm Z evolves each convex set, but can also yield the empty set. Any empty sets which do occur are eliminated, in line 8 of the algorithm, from the current candidatesolution task set P . In this manner Algorithm 2 potentially searches task sets containing anywhere between zero and |V| tasks. The capacity to examine task sets of any size between 1 and |V| is clearly a necessity for any algorithm which seeks to solve the full task-set optimization problem.
This thesis does not attempt to show that Algorithm 2 is constructed in a manner that, given enough iterations, has a non-zero probability of searching the entire search space of all valid task sets.
Algorithm 2 is presented merely to demonstrate that Algorithm 3 is a step toward solving the full task-set optimization problem, in that it can be a building block of algorithms such as Algorithm 2 which may in fact solve the full task-set optimization problem.

Algorithm 2 Algorithm which Solves the Full Task-Set Optimization Problem
Function optimize full task set : (D, P ) → P best Require: (R2) max iter is the maximum number of iterations to perform.
(R3) Z is a single-task-set optimization algorithm like that presented in Algorithm 3. Z is also permitted to return the empty set, which is by definition not a convex set. Ensure: (E1) P best is the fastest-performing task set discovered in max iter iterations.
1: Instantiate |V| copies of the algorithm Z. Each algorithm instance Z i evolves the task (or empty set) T i , for i ∈ [1, |V|]. 2: time best ← ∞ 3: for iter = 1, 2, . . . , max iter do 4: for all i ∈ [1, |V|] do 5: T i ← set yielded by one iteration of Z i 6: end for 7: Delete from P any members which are the empty set. 9: S ← call task set to DFG set corrected( D, P ) (Algorithm 1) 10: Translate S to equivalent C++ code and compile it to a native executable, E.

11:
time ← average running time from 3 executions of E.

12:
if time < time best then 13: P best ← P 14: end if 15: end for 16: return P best CHAPTER 4 Topological Sort Theory

Introduction
Let D = (V, A) be a directed graph, and let Q be a total ordering of the vertices in V.
Every directed acyclic graph has at least one topological sort of its vertices [1,Prop. 2.1.3]. As shown in [1,Thm. 2.1.4] and elsewhere, a topological sort for a DAG Some directed graphs have multiple topological sorts. For example, consider the di-

Results
Theorem 4.2.1. Let D = (V, A) be a DAG, and let Q be any topological sort of D. Let X be any contiguous subsequence of Q, so that the structure of Q = [. . . , X, . . .]. Then X , the set of vertices comprising X, is a convex set of D. 2 Proof. Suppose for contradiction that X is a contiguous subsequence of Q, but X is not a convex set of D.
Because X is not convex, D must contain at least one C-path of X . Let C be such a C-path. By the definition of C-path, C 1 and C | C| are members of X , and all internal vertices of C are not members of C. Note that every C-path must have at least three elements.
By the definition of a path of D, each adjacent pair of vertices in C is an arc in A. That From the definition of topological sorts, (u, v) ∈ A implies that every topological sort of D must have the structure [. . . , u, . . . , v, . . .]. Applying this to Q and C, it follows that the structure of Q is: Recall that C 2 / ∈ X and C 1 , C | C| ∈ X . It follows then that X is not a contiguous subsequence of Q, violating our assumption to the contrary. Proof. The existence of Algorithm 9 (see Appendix B.4) demonstrates that this is true.
be any topological sort of D such that X 1 is a total ordering of X .
Then for any topological sort X 2 of D<X>, Proof. Suppose for contradiction that T 2 were not a topological sort of D. Then there must be some vertices u, v ∈ V such that u appears before v in We now consider where u and v might reside within T 2 . Because we assume that u precedes v in T 2 , we have six possible cases. We show that each of these six cases yields a contradiction.
Case 1 implies that u precedes v in Y . Therefore no total ordering of D containing Y could be a topological sort of D. This violates our assumption that T 1 is a topological sort of D.
Like T 2 , T 1 has the structure [ W , X, . . .]. Therefore we have u preceding v not only in T 2 , but also in T 1 . This violates our assumption that T 1 is a topological sort of D.
The same logic applies here as for Case 2 above.
By the definition of D<X>, every (X , X ) arc in D is also an arc in D<X>.
By our original assumptions the arc (v, u) is an arc in D. By Case 4's assumption, u and v are both members of X . Therefore the arc (v, u) is also an arc in D<X>.
However, if (v, u) is an arc in D<X>, and u precedes v in X 2 , it cannot be true that X 2 is a topological sort of D<X>. This violates our assumption to the contrary.
The same logic applies here as for Case 2 above.
The same logic applies here as for Case 1 above. T is a subsequence of S. Therefore because u precedes v in T , u precedes v in S.
Similarly, every arc in D<T > is also an arc in D. Therefore we have (v, u) ∈ A.
Because u precedes v in S, and (v, u) ∈ A, S is not a topological sort of D. This contradicts our assumption to the contrary. Clearly from the definition of induced subgraphs (Section 2.6) we have A X ⊆ A X ∪{v} .
Any arc present in D<X ∪ {v}> but not D<X> must have the form (v, X ) or (X , v).
It is impossible that D<X ∪ {v}> has both (v, X ) arcs and (X , v). If both kinds of arcs were present in D<X ∪ {v}>, there would exist a vertex v ∈P (X ) ∩Ŝ(X ). This would be a contradiction, as it would indicate that X was not in fact convex (Theorem 5.3.3).
We therefore have three possibilities for the structure of the arcs found in A X ∪{v} but not in A X . For each of the three cases we demonstrate how to construct a topological sort R of D<X ∪ {v}> such that v appears as either the first or last element of the R.
After showing this for each of the three cases below, we proceed provide the rest of the proof.
Case 1: A X = A X ∪{v} : Every DAG has at least one topological sort [1,Prop. 2.1.3]. Let Q be any topological R is a topological sort of D<X ∪ {v}> if and only if R is a total ordering of X ∪ {v}, and for each arc (p, q) ∈ A X ∪{v} , p appears before q in R.
R is clearly a total ordering of X ∪ {v}, because its structure is R = [v, Q] and Q is a total ordering of X .
For every (u, v) arc of the form (X , X ), u appears before v in Q. Because R preserves the relative ordering of each arc in Q, we have that the ordering of R is consistent with every arc of the form (X , X ).
In Case 1 we have A X = A X ∪{v} , and therefore we've established that the ordering R

is consistent with every arc in
Note that in Case 1, placing v anywhere in within R would have preserved R's status as a topological sort of D<X ∪ {v}>. However our overall goal is to show that we can always place v either immediately before or after X in some topological sort of D, and so we place it at the beginning of R.
Case 2: A X ∪{v} \ A X is a set of arcs of the form (v, X ): The reasoning for Case 2 is essentially the same as for Case 1, except that there are additional arc in A X ∪{v} that must be respected in the ordering of R.
These additional arcs are all of the form (v, V), and are clearly respected by the order Case 3: A X ∪{v} \ A X is a set of arcs of the form (X , v): The reasoning for Case 3 is similar to that of Case 2, except the additional arcs to be respected are all of the form (X , v), not (v, X ).
These additional arcs are clearly satisfied by the ordering We now complete our proof. Because X ∪ {v} is a convex set of D, there exists some topological sort S in which the vertices X ∪ {v} appear as a contiguous subsequence We now have that both F and R are topological sorts of D<X ∪{v}>, and Theorem 4.2.6. Let D = (V, A) be a DAG, and let X ⊂ V by any convex set of D.
Then for every 1 ≤ δ ≤ |V \ X |, D contains a convex superset of X having order |X | + δ.
Proof. The existence of Algorithm 15 (see Appendix C.6) proves this. Proof. The existence of Algorithm 17 (see Appendix C.8) proves this.

Predecessor and Successor Theory
This section develops a number of results regarding predecessor and successor sets, and their relationships to convex sets.
Let D = (V, A) be a DAG, and let X be any set with ∅ ⊂ X ⊆ V. The predecessor set P (X ) is the set of all vertices q ∈ V such that D contains a (q, X )-path. Similarly, the successor setŜ(X ) is the set of all vertices q ∈ V such that D contains a (X , q)-path.
P andŜ are useful concepts because they can be used to efficiently describe which vertices lie on paths contributing to a set's concavity (i.e., C-paths of that set), and provide an indication of which vertices must be added to a concave vertex set in order to achieve convexity (see Theorem 5.3.4).

Relationship betweenP /Ŝ and N ⊖ /N ⊕
Here we briefly discuss the relationship betweenP /Ŝ and N ⊖ /N ⊕ . N ⊖ (X ) can be calculated usingP andŜ: As sketch for validity of Equation (1) is as follows. The terms limit N ⊖ (X ) to only those vertices which are in-neighbors (strictly direct or otherise) of X . For an in-neighbor of X, vertex v, to be a strictly direct in-neighbor, there must not exist any path [v, . . . , u, . . . , X ] in D, with u / ∈ X . Any such vertex u is a member of (Ŝ(v) ∩P (X )) but not a member of X . This gives rise to the final clause 1 in our formula: By similar reasoning we may obtain:

BasicP /Ŝ and N ⊖ /N ⊕ Results
In this section we present a number of basic results pertaining toP andŜ.
Proof. Proof by contradiction. Suppose D contains no C-path of X , but ¬((P (X ) ∩Ŝ(X )) ⊆ X ). By the second premise, there must exist a vertex x such that x ∈ (P (X ) ∩Ŝ(X )) and x ∈ X . We'll show that this allows us to construct a C-path in X , thus contradicting our first premise.
Let x be an arbitrary vertex with x ∈ (P (X ) ∩Ŝ(X )) and which is by definition a C-path in X .
Proof. The proof flows directly from the definition ofP . First prove thatP (X ∪ Y) ⊆ . By the first premise, there must be a directed path in D that passes through v and terminates at some vertex in X and/orY. But this contradicts the By the first premise, v must lie on some directed path in D that terminates in either X orY. But the existence of such a path contradicts the second premise.
Proof. This proof is identical in structure to the proof of Lemma 5.2.2.
AlthoughP andŜ distribute over union (Lemmas 5.2.2 and 5.2.3), they do not distribute over intersection. For this reason the following two lemmas are weaker than the previous two.
From the premise that v ∈Ŝ(X ∩ Y), there must be a vertex u ∈ (X ∩ Y) such that which contradicts the second premise, v / ∈ (Ŝ(X ) ∩Ŝ(Y)).
Proof. By the definition ofP , every vertex e ∈P (P (X )) lies on a path of the form Also by the definition ofP , every vertex f ∈P (X ) lies on a path of the form R = Therefore, for every e ∈P (P (X )), there exists a path in D of the form S = [. . . , e, . . . , f, . . . , x], with f ∈P (X ) and x ∈ X . Therefore e is also a nonterminal vertex of some path ( S) terminating in X , and thus e ∈P (X ). Because every e ∈P (P (X )) is also a member ofP (X ), we haveP (P (X )) ⊆P (X ).
Proof. This proof is similar in structure to the proof for Lemma 5.2.5.
Proof. This proof has the same structure as that for Lemma 5.2.10.
Proof. We derive the right-hand side of the equation from the left-hand side, as follows.
First distribute using Lemma 5.2.2: Using Lemma 5.2.1: . This gives us: which trivially reduces to: Proof. This proof has the same structure as the proof of Lemma 5.2.8.
Proof. Because X ⊆ Y, every path ending in X is also a path ending in Y. Therefore the non-terminal vertices of paths ending X constitute a subset of the non-terminal vertices of paths ending in Y.
Lemma 5.2.11. Let D = (V, A) by a DAG, and let X be any set with ∅ ⊂ X ⊆ V. If Proof. Proof by contradiction. Suppose that (P (X ) ∩Ŝ(X )) ⊆ X , but D does contain a path Q which is a C-path of X . Let x be an arbitrary internal vertex of Q. From Lemma 5.3.1, x ∈ ((P (X ) ∩Ŝ(X )). Because of our premise that (P (X ) ∩Ŝ(X )) ⊆ X , we have x ∈ X . However, the definition of a C-path of X indicates that each internal vertex is not in X . Because x cannot be both in X and not in X , we have a contradiction.
Lemma 5.2.12. Let D = (V, A) be a DAG, and let ∅ ⊆ X ⊂ V be a convex set of D.
Proof. N ⊖ (X ) ⊆P (X ) is trivially true, because every vertex that has an strictly direct path to X also has a path to X .
From Lemma 5.2.1, we have that a vertex v which is external to a convex set X cannot be in bothP (X ) andŜ(X ).
Lemma 5.2.13. Let D = (V, A) be a DAG, and let ∅ ⊆ X ⊂ V be a convex set of D.
Proof. This proof is similar to that of Lemma 5.2.12.

Obtaining Convexity via Internal Path Closure
Let D = (V, A) be a DAG, and let X be an arbitrary non-empty subset of V. We define the internal path closure of X to be the set X ∪ (P (X ) ∩Ŝ(X )) 2 . Finally, Corollary 5.3.5 establishes that the internal path closure of some set X is the smallest convex set containing X . This result is useful for search algorithms that establish convexity by adding vertices to a concave set, but strive to modify the original set as little as possible.
by a DAG, let X be any set with ∅ ⊂ X ⊆ V, and let Q be any C-path of X . Then every internal vertex of Q is a member of (P (X ) ∩Ŝ(X )).
Proof. By the definition of C-path of X , Q's initial and terminal vertices are in X .
Because Q's initial vertex is in X , there exists a directed path from X to each of Q's internal vertices. Therefore each of Q's internal vertices is a member ofŜ(X ). Similarly, because Q's terminal vertex is in X , there exists a directed path from each internal vertex of Q to X . Therefore each of Q's internal vertices is a member ofP (X ).
Proof. Suppose for contradiction that Y is not a convex set of D. Then there must be some C-path of D, C = [a, . . . , w, . . . , z], with a, c ∈ Y and w / ∈ Y.
Because Y = X ∪ {v}, we consider four possible variations of C: a ∈ X or a = v, and z ∈ X or z = v. We show below that each of the four possible variations leads to a contradiction. Note that these four cases are not necessarily mutually exclusive, however the following proofs do not depend their mutual exclusion.
From Lemma 5.2.1, we have (P (X) ∩Ŝ(X)) ⊆ X . From the transitivity of the subset relation, this gives us w ∈ X , which violates our assumption to the contrary.
Case 2: a = v, z = v: Case 2 implies that [v, . . . , w, . . . , v] is a path in D, which violates the assumption that D is acyclic.
Proof. By the definition of convexity, X is a convex set of D if and only if D contains no C-path of X . For convenience we'll use that fact to restate the theorem to be proven: D contains no C-path of X if and only if ((P (X ) ∩Ŝ(X )) ⊆ X ).
be a DAG, and let X be any set with ∅ ⊂ X ⊆ V, and let T =P (X ) ∩Ŝ(X ). Then X ∪ T is a convex set of D.
Proof. Suppose for contradiction that X ∪ T is concave in D. Then there exists in D a C-path of X ∪T in D. Let C = [x, . . . , y, . . . , z] be such a path, with with x, z ∈ X ∪T and y / ∈ X ∪ T .
We'll use case analysis to show that four possible forms exist for C, and none of those forms can exist.
This case is false for reasoning similar to that used in Case 2. However, this case's logic requires Lemma 5.2.6, not Lemma 5.2.5.
. Then Y is the uniquely smallest improper superset of X that is also a convex set of D.
Proof. Suppose for contradiction that there exists another set Z = Y that is also a convex improper superset of X , and yet is no larger than Y. We'll show that Z cannot exist.
Because Z = Y and |Z| ≤ |Y|, there must be some vertex in v ∈ Y that is not present in Z.
Because v ∈ Y, and by the definition of Y, we have v ∈ X and/or v ∈ (P (X ) ∩Ŝ(X )).
We'll consider those two cases separately, and show that both lead to contradictions.
Suppose for contradiction that v ∈ X . Then v is a vertex in X that is absent from Z.
This contradicts our assumption that Z is an improper superset of X .
Because v ∈ (P (Z) ∩Ŝ(Z)), and v / ∈ Z (by our definition of v), it is not the case that (P (Z) ∩Ŝ(Z)) ⊆ Z). According to Theorem 5.3.3, this implies that Z is not a convex set of D, which contradicts our definition of Z.

Modifying Convex Sets withP /Ŝ and N ⊖ /N ⊕
In this subsection we develop a theoretical basis for algorithms which seek to grow or shrink an existing convex set by adding or deleting vertices.
Note: By assumption v / ∈ X , and v ∈P (X ). It follows then from Lemma 5.
Informally, this Lemma states the following. Suppose X is a convex set of D, and v ∈P (X ) is not in X . Then a necessary and sufficient criterion for the convexity in D if and only if it contains one or more intermediate vertices.
Proof. We begin by showing that if an indirect path from v to V exists, then Y is concave.
Suppose without loss of generality that P = [v, . . . , w, . . . , X] is a path in D. By the definition of convexity, a set is a convex set of D if and only if no C-path of that set exists in D. Then P is a C-path of the set Y, and therefore Y is not a convex set of D.
We now show that when the only paths from v to V are direct, Y is convex. We do this using Lemma 5.2.11, which states that a set Y is a convex set of D if (and only We proceed by developing the left-hand size of the relation, to obtain a result that is trivially a subset of Y. We begin with the left-hand-side expressionP (Y) ∩Ŝ(Y), and replace Y with its definition:P DistributeP over union (Lemma 5.2.2): DistributeŜ over union (Lemma 5.2.3): Apply the distributive law of sets Again apply the distributive law of sets Now apply the distributive law of sets A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) to the right side of (7), with A =Ŝ({v}), B =P (X ), and C =P ({v}): We now have an expression of the form By the associativity rules of set union, we may eliminate some grouping: We now show that each of the four major expressions (9a) to (9d) in Eq. (9) is, individually, a subset of Y. When that has been established for each of the four subexpressions, we can conclude that the conjunction of those four expressions is itself a subset of Y, which completes our proof.
Case 9a:P (X ) ∩Ŝ(X ): We also have Y = X ∪ {v}, and so X ⊆ Y. ThereforeP (X ) ∩Ŝ(X ) ⊆ Y. Combining these paths at q, we have that D contains a path of the form X . . . v. This implies that v ∈Ŝ(X ). However, this theorem also has by assumption v ∈P (X ).
Therefore we have v ∈ (P (X ) ∩Ŝ(V)), and yet v / ∈ X . This implies that v lies on a C-path of X . This violates our assumption that X is a convex set of D. Note: By assumption v / ∈ X , and v ∈Ŝ(X ). It follows then from Lemma 5.2.1 that v / ∈P (X ).
Proof. The proof for this is similar to that of Theorem 5.4.1.
Then the collection of convex supersets of X having order |X | + 1 is given precisely by {X ∪ {y}|y ∈ Y}.
Proof. We divide the vertices which might be added to X into several groups, each of which we consider separately. Note that we limit our consideration to vertices in V \ X , because we're only interested in vertices that are not yet members of X .
We distinguish the vertices in V \ X based on their memberships inP (X ) and in S(X ). Note that the vertices inP (X ) ∩Ŝ(X ) are already members of X , as proven in Lemma 5.2.1. This leaves us with just three groups of vertices to examine: By Theorem 5.4.1, we have that for any vertex w ∈P (X ) \Ŝ(X ), X ∪ {w} is a convex set of D if and only if w ∈ N ⊖ (X ).
Furthermore, every vertex in N ⊖ (X ) is covered by Case 1 (Lemma 5.2.12).
It follows then that each vertex w ∈ N ⊖ (X ) has the quality that X ∪ {w} is a convex set of D.
Furthermore, we conclude that no other vertex in covered by Case 1 (that is, no other single vertex inP (X ) \Ŝ(X )) may be added to X to yield a convex set.
By Theorem 5.4.2, we have that for any vertex w ∈Ŝ(X ) \P (X ), X ∪ {w} is a convex set of D if and only if w ∈ N ⊕ (X ).
Furthermore, every vertex in N ⊕ (X ) is covered by Case 2 (Lemma 5.2.13).
It follows then that each vertex w ∈ N ⊕ (X ) has the quality that X ∪ {w} is a convex set of D.
Furthermore, we conclude that no other vertex in covered by Case 2 (that is, no other single vertex inŜ(X ) \P (X )) may be added to X to yield a convex set.
Theorem 5.3.2 indicates that any one vertex w in this group has the quality that X ∪ {w} is a convex set of D.
We have shown that Groups 1, 2, and 3 collectively cover all of the vertices that are candidates for single-vertex additions to X .
The vertices from Group 1 that may be added to X are precisely the set N ⊖ (X ) The vertices from Group 2 that may be added to X are precisely the set N ⊕ (X ).
The vertices from Group 3 that may be added to X are precisely all vertices in Group 3. That is, V \ (X ∪P (X ) ∪Ŝ(X )), or R.
Therefore, the vertex set of which any one may be added to X to yield a convex set is Note that a result similar to Theorem 5.4.5 is presented in [1,Lemma 17.2.4]: Let D be an acyclic graph, let X be a convex set of D and let s ∈ X be a source or sink of D<X>. Then X \ {s} is a convex set of D.
The sources and sinks of D < X > are precisely those vertices in Y. Theorem 5.4.5 strengthens [1,Lemma 17.2.4] by establishing that deleting from X any subset of Y, rather than just a single vertex s ∈ Y, yields another convex set of D.
Proof. Suppose for contradiction that X \ Z is not a convex set of D. Then D must contain a C-path of X \ Z, having the structure C = [(X \ Z), . . . , w, . . . , (X \ Z)], with w / ∈ (X \ Z).
We consider the three possible regions of V in which w could reside, and show that w's presence in each of those regions leads to a contradiction. The three possibilities are w ∈ V \ X , w ∈ X \ Z, and w ∈ Z.
Case 1: w ∈ V \ X : Recall that the presumed structure of C is Because (X \ Z) ⊆ X , it is also true that the structure of C is C = [X , . . . , w, . . . , X ] In Case 1, we have w ∈ V \ X , and therefore w / ∈ X . Therefore the structure of C is C = [X , . . . , (w / ∈ X ), . . . , X ]. However, such a path is a C-path of X in D. The existence of such a path in D indicates that X is not a convex set of D, in contradiction of our assumptions.
Case 2: w ∈ X \ Z: Case 2 is not possible by the definition of w.
Case 3: w ∈ Z: Because w ∈ Z, and Z ⊆ Y, we have w ∈ Y. By the definition of Y, no vertex in Y is a member ofP (X ) ∩Ŝ(X ). Therefore, w / ∈P (X ) ∩Ŝ(X ).
By the assumptions of Theorem 5.4.5 we have that X = ∅. Therefore, in order to have Y = ∅, it must the case that X ⊆ (P (X ) ∩Ŝ(X )). That is, ∀x ∈ X , (x ∈P (X ) ∧ x ∈ S(X )).
However, if every vertex in X is a predecessor of some other vertex in X , then X contains a cycle. This contradicts the assumption of Theorem 5.4.5 that D, the digraph containing X , is acyclic.
This theorem complements Theorem 5.4.5, which states that given a convex set X , one can remove from X any combination of vertices in (X \ (P (X ) ∩Ŝ(X ))) to yield another convex set. However, Theorem 5.4.5 only establishes the sufficiency of that condition.
This present theorem (5.4.7) establishes that when removing just a single vertex v from X , it is not merely sufficient, but also necessary, that v ∈ X \ (P (X ) ∩Ŝ(X )) in order for X \ {v} to be convex. That is, the set X \ (P (X ) ∩Ŝ(X )) is the precise formula for which individual vertices may be removed from X to yield a convex subset of X having order |X | − 1.
Note that one of the this theorem's assumptions is that |X | > 1. This is because the empty set is, by definition, not a convex set of D. Therefore removing a single vertex from X when |X | = 1 would not yield a convex set of D.
Proof. From Theorem 5.4.5 we have that if v ∈ X \ (P (X ) ∩Ŝ(X )) then V \ {v} is a convex set of D.
What remains is to show the opposite implication: if X \ {v} is a convex set of D, then v ∈ X \ (P (X ) ∩Ŝ(X )).
Suppose for contradiction that X \{v} is a convex set of D, and yet v / ∈ X \(P (X )∩Ŝ(X )).
v / ∈ X \ (P (X ) ∩Ŝ(X )) is satisfied by v / ∈ X or v ∈ (P (X ) ∩Ŝ(X )). However, v ∈ X is a requirement of the theorem, and so the only remaining possibility is that v ∈ (P (X ) ∩ S(X )).
Because D is acyclic, a vertex cannot appear more than once in any path of D. We may therefore refine the structure of C to be C = [(X \ {v}), . . . , v, . . . , (X \ {v})].
However, we see from the structure of C that it is a C-path of the set (X \{v}). Therefore, we have that (X \ {v}) is not a convex set of D. This violates our assumption to the contrary. This corollary demonstrates that when δ > 1, there are some DAG's D with particular convex sets X , such that the formula {X \ Z | Z ⊆ X \ (P (X ) ∩Ŝ(X )), |Z| = δ} generates only some of convex sets of D that are subsets of X and have order |X | − δ.
Proof. We prove this with a counterexample. Proof. Theorem 4.2.7 establishes that any convex set X may always be shrunk by the deletion of exactly one vertex, yielding another convex set.
Theorem 5.4.7 identifies { X ∪ {y} | y ∈ Y } as the exact set of such vertices. This set cannot be empty if at least one such subset of X exists.

CHAPTER 6 Arbitrary Transformation of Convex Sets
In this section we show that given any two convex sets X and Y of the same DAG D, it is always possible to transform X into Y using a sequence of single-vertex changes, such that This section is structured as follows. In Section 6.1 we give an overview of the approach used by Algorithm 19. In Section 6.2 we walk through an example execution of Algorithm 19 to clarify its approach.
In keeping with the structure of this work, a proof of Algorithm 19's correctness is presented alongside the algorithm itself (see Appendix D.2.1).
The focus of this chapter is to demonstrate that particular evolutionary chains always exist for convex sets. This thesis presents Algorithm 19 solely for the purpose of establishing that fact. Algorithm 19 is not intended to for use in any actual optimization system, and for this reason we do not study its asymptotic running time.

General Approach
The general approach taken by Algorithm 19 is as follows. Variable and parameter names used below have the meanings given in Algorithm 19 on page 134.
We first identify a sequence S of |X | − 1 single-vertex deletions that transforms X into a one-vertex convex set {u}.
Next, we identify a sequence T of |Y| − 1 single-vertex deletions that transforms Y into a one-vertex convex set {v}. Importantly, the reversal of this sequence T is a sequence of single-vertex additions which transforms {v} into Y.
From this we obtain the overall transformation sequence of X to Y as follows. We begin with X . Apply A concrete example of this follows.

Example of Algorithm 19
Here we step through an instance of evolve between convex sets (Algorithm 19) in order to provide an intuitive understanding of its operation. As shown in Figure 3 On line 2 of evolve convex set to vertex we set x low ← 4 and x high ← 7. The progression of modifications made to X is therefore as follows: delete c, delete e,

CHAPTER 7 Stochastic Optimization of a Single Convex Set
In this chapter, we consider in detail the optimization problem of finding a global minimum over all convex sets of a DAG, as motivated by the program parallelization problem described in Section 3.1.
Section 7.1 formally states the optimization problem. In Section 7.2 we discuss how simulated annealing may be useful in this kind of optimization algorithm. Section 7.3 discusses options regarding seeding such an optimization algorithm, and Section 7.4 discusses the choice of using choosing predecessor/successor sets versus topological sorts when evolving convex sets. Finally, Section 7.5 presents an algorithm which optimizes a single convex set of a DFG.

Problem Statement
Our optimization problem is as follows. Given a DAG D and a cost function cost : (any convex set of D) → R, find some convex set M of D that minimizes cost( M ).
This optimization problem is addressed by Algorithm 3 in Section 7.5.
Note that Algorithm 3 only attempts to evolve one convex set at a time, rather than a collection of convex sets as discussed in Section 3.1.
Algorithm 3 is presented primarily to demonstrate the existence of an evolutionary algorithm which is efficient on a per-iteration basis, and which always retains the possibility to cover the entire search space in a finite number of future iterations. This is in contrast to some genetic algorithms which may become trapped in local minima due to premature convergence [31], or whose designs a priori may preclude parts of the search space for certain inputs.
We make no assumption that, as presented, Algorithm 3 is likely to identify approximately an optimal solution within a usefully small number of iterations. Below we discuss several adaptations to Algorithm 3 which may be useful in pursuing such an algorithm.
These may be fruitful areas of future investigation.
Note that when proving that Algorithm 3 potentially covers the entire search space, for generality we make no assumption that the function cost. It follows then that in order to find a global minimum, Algorithm 3 must potentially visit every convex set in a DAG.
In contrast, the algorithm modifications described below are only effective when the cost function contains at least some structure to guide the search.

Simulated Annealing
Simulated annealing (SA) [32,33] is an evolutionary optimization method in which initial iterations permit large changes to the trial solutions, but as the algorithm continues, successive iterations permit progressively smaller changes to the trial solutions.
SA requires a distance function δ : (X , Y) → R to be defined over members of the search space. SA further requires a random perturbation operator π which, given a trial solution X and an upper-limit ∆, returns a new trial solution Y such that δ( X , Y ) ≤ ∆.
The basic convexity algorithms presented in this work may be well-suited to simulated annealing-type optimizers. If we define δ to be the number of vertices added or deleted from an existing convex set, then δ may be easily defined using the algorithms grow convex set PS, grow convex set TS, shrink convex set PS, and shrink convex set TS (Appendix C).
Another potentially useful approach is to let δ be the number of single-vertex changes applied to some trial solution X , and for π to simply be the random application of up to ∆ single-vertex changes. Algorithm 3 implicitly provides an example of such perturbation operator, when δ iterations of its main loop are executed.

Initialization by Seed Set vs. Order
In Appendix C we present to separate two variations of algorithms for initializing a convex set.
init convex set by seeds PS and init convex set by seeds TS take a potentially non-convex seed set of vertices and return some convex set which is a superset of the seed set. In contrast, init convex set by order PS and init convex set by order TS return a convex set having some specified number of vertices, without regard to which particular vertices constitute that set.
There may exist applications in which a seed set carries special significance, such that the optimization algorithm benefits from beginning its search in the neighborhood of the seed set(s). For such applications, it may be preferable to use init convex set by seeds PS rather than init convex set by order TS.
init convex set by seeds PS is guaranteed to return the smallest superset of the seed set that is also a convex set, whereas init convex set by order TS makes no provision for limiting the size of the resulting set. Therefore init convex set by seeds PS may initialize the search algorithm with a trial solution that is closer to an acceptable solution, leading to faster convergence than would be obtained if using init convex set by order TS.
However, if any of theP /Ŝ-based basic convexity algorithms is employed by the optimization algorithm, it must first compute the transitive closure of the dataflow graph, which for a DAG of order |V| requires time O(|V| 2.376 ) (see Section 2.7). Algorithm 3 uses init convex set by order TS and so limits the running time of its initialization phase to just O(|V| 2 ).

7.4P /Ŝ vs. Topological Sort
Appendix C presents two essentially interchangeable collections of basic convexity algorithms, one based on predecessor and successor sets, and one based on topological sorts.
Algorithm 3 could be trivially modified to use either group of basic convexity algorithms, or any combination thereof.
In general the basic convexity algorithms in Appendix C which are based uponP and S have longer asymptotic running times than those using topological sort. All of thê Note that this work assumes the existence of a random algorithm which has a positive probability of producing each topological sort of a given DAG (Appendix A.3).

Convex Set Stochastic Optimization Algorithm
In this section we present and discuss a stochastic optimization algorithm, Algorithm 3, whose domain is all convex sets of a given DAG. This section is structured as follows.
The pseudocode for Algorithm 3 is presented on page 73. Subsection 7.5.1 provides a brief overview of the algorithm's approach.
Subsection 7.5.2 demonstrates that each individual iteration of the Algorithm 3 is fairly efficient (time O|V 2 |), and Subsection 7.5.3 presents a proof that the when permitted to run indefinitely, the probability of Algorithm 3 searching all convex set of a DAG approaches 1.

Overview of Algorithm 3
Algorithm 3 operates as follows.
A convex set, X trial is initialized to some arbitrary convex set (line 2). Although line 2 selects a random convex set of order 1, any convex set of D would be acceptable as an initial value of X trial .
Each iteration of the loop evaluates the cost function score of the current value of X trial (line 4), and if the score is the minimum value so far discovered, both the score and the convex set which yielded it are recorded (lines 5-10).
Line 11 calls is done to provide the user an opportunity to terminate the search for any reason, such as time constraints. As stated in the algorithm's preconditions, we permit is done to retain state between calls so that it may detect convergence over multiple iterations of the optimization algorithm.
Lines 12-29 choose and apply any valid single-vertex change to the current value of X trial .

Running Time of Algorithm 3
We easily see that the algorithm lines other than 4 and 11 have a maximum running time of O(|V| 2 ). We make no assumption about the running times of the functions cost and is done. Therefore the overall running time of each iteration of this algorithm is max( O(|V| 2 ), (running time of cost), (running time of is done) ).

Algorithm 3 Stochastically Optimize over all Convex Sets of a DAG
Function stochastic search TO : (D, cost, is done) → X best Require: is done : (vertex set, R) → Boolean is a subroutine which may retain internal state between invocations, and whose internal state is assumed to be properly initialized prior to the execution of this algorithm. Ensure: (E1) Of all sets evaluated by cost before is done returns true, X best is one of the sets for which cost returned the lowest value. This search algorithm is efficient insofar as, aside from the unconstrained running times of cost and is done, its per-iteration running time is polynomial with respect to the order of the DAG. The relevance of such efficiency is clearly dictated by the running times of cost and is done in any particular application of this algorithm.

Complete Search Coverage of Algorithm 3
In this subsection we show that given enough iterations, stochastic search TO always discovers some vertex set M which minimizes cost. Then there is a finite, positive probability that the algorithm sets X trial = M during loop iteration s, with s ∈ [1, (i + 2 × |V| − 1)].
Informally, Lemma 7.5.1 states that regardless of where Algorithm 3 is in the search process, as encoded by its current values for X trial , X best , f irst iter and cost best, there is a finite, positive probability that during its next (2 × |V| − 1) iterations the algorithm discovers some global minimum M. However, because the function is done could terminate the algorithm after any loop iteration, we require the assumption that is done doesn't terminate the algorithm until the next (2 × |V| − 1) iterations have completed.
Proof. Let T = evolve between convex sets ( D, X We proceed by showing that there is a finite, positive probability that the next |T | − 1 iterations of the algorithm's loop evolve X trial into M in precisely the sequence indicated by T . That is, for all j ∈ [1, |T |], the algorithm sets X We proceed inductively. Assuming that X (i+j−1) trial = T j , we demonstrate that there is a finite, positive probability that the next iteration of the algorithm's loop (the (i + j) th iteration overall) evolves the convex set X For our inductive base case, we must show that when j = 1, there is a finite, positive probability that X (i+j−1) trial = T j . This is trivially established by the definition of evolve between convex sets, which deterministically ensures that T 1 = X (i) trial .
Our inductive step covers 1 < j < |T |, that is, the (i + 1) th through the (i + |T | − 1) th loop iterations. By our inductive assumption, prior to each of these loop iterations we have X (i+j−1) trial = T j . Our goal is to demonstrate that each of these loop iterations has a finite, positive probability of setting X (i+j) trial = T j + 1 .
We consider three cases reflecting the different relationships that may exist between T j and T j + 1 . We show that for each of these cases, the next (i.e., the (i+j −1) th overall) loop iteration has a finite, positive probability of setting X (i+j) trial = T j + 1 .

Case 1: Delete one vertex:
In Case 1, there is some vertex u ∈ T j , such that T j + 1 = T j \ {u}.
By the definition of evolve between convex sets, every element of T is a non-empty set. Therefore |T j + 1 | ≥ 1. Because T j + 1 is the result of deleting a vertex from

Case 2: Add one vertex:
In Case 2, there is some vertex u / ∈ T j , such that T j + 1 = T j ∪ {u}.
The proof for Case 2 is similar in structure to that of Case 1, and for brevity we provide only a sketch.
|T j | ≤ |V|, and so |T j + 1 | < |V|. Therefore there is a finite, positive probability that line 23 of the algorithm executes during this loop iteration.
Similarly to shrink convex set TS, there is a finite, positive probability that grow convex set TS produces any one of the convex sets of D which is a superset of T j and has order |T j | + 1. T j + 1 is one such set, and so there is a finite, positive probability that that line 23 of the algorithm sets X (i+j) trial = T j + 1 .

Case 3: Swap vertex:
From the definition of evolve between convex sets, Case 3 only arises when we have |T j + 1 | = |T j | = 1.
In Case  Both X trial and M are constrained to be improper subset of V, giving us |X trial | ≤ |V| and |M| ≤ |V|. From this we have |T | ≤ 2 × |V|. Note however that immediately prior to the i th iteration of the algorithm's loop, we already have X trial = T 1 . Therefore we need only |T | − 1, or 2 × |V| − 1, loop iterations to obtain X trial = T |T | using the evolutionary sequence indicated by T .
Lemma 7.5.2. Let D = (V, A) be a DAG and let cost be a cost function as required by Algorithm 3. For any particular activation of Algorithm 3, let P i be the probability that Algorithm 3 finds a global minimum within the first i iterations, with P i < 1. Let P j be the probability that Algorithm 3 finds a global minimum within the first i + (2 × |V| − 1) loop iterations. Then P i < P j ≤ 1.
Informally, Lemma 7.5.2 states that the more iterations executed by the loop in Algorithm 3, the greater the cumulative probability that a global minimum is found and returned.
Note that the theorem does not claim that every single iteration following the i th increases the probability of finding a global minimum. We avoid such a claim because there is no assurance that the convex set indicated by X trial just prior to the i th iteration of the algorithm's loop can be evolved into some global minimum of cost by adding, deleting, or swapping precisely one vertex. However, Lemma 7.5.1 establishes that regardless of the graph topology of D and regardless of the state of the algorithm just prior to the i th loop iteration, there is a finite, positive probability of reaching a global minimum within the subsequent 2 × |V| − 1 iterations.
Proof. From Lemma 7.5.1 we have that there is a finite, positive probability that during iterations i to i + (2 × |V| − 1) of the algorithm, there is some finite, positive probability that X trial is set to some global minimum of cost. Let P ij by the probability of some particular global minimum M being found during these 2 × |V| − 1 iterations.
Then the cumulative probability P j of the algorithm finding M after i + (2 × |V| − 1) loop iterations have been performed is: From Lemma 7.5.1 we have that P ij > 0, and so P j > P i . Proof. Lemma 7.5.1 shows that for every 2×|V|−1 consecutive iterations of Algorithm 3, there is a finite, non-zero probability that the algorithm discovers some global minimum.
Without loss of generality, let p be the lowest-valued probability with which Algorithm 3 discovers a global minimum of cost during any 2 × |V| − 1 consecutive iterations.
Then the probability P (i) of Algorithm 3 discovering some particular global minimum M during its first i × (2 × |V| − 1) loop iterations is given by: Equation (10) gives our basic formula. For any j, the j th term in the series provides the the probability that loop iterations 1 . . . (j − 1) did not discover a global minimum but the j th loop iteration does.
Equation (11) restructures the series to clarify that it's a geometric series, and Equation (12) simplifies the upper bound of the series so that a standard formula may be applied later in Equation (16).
We now consider the limit of Equation (12) as i → ∞: Equation (13) simplifies by factoring p out of the limit expression.
Equation (14) separates terms in the limit expression to obtain two separate limits, and Equation (15) solves the second limit.
Equation (16) applies the standard formula for the limit of an infinite series of a geometric sequence, and Equation (17) simplifies, completing our proof.

Proof of Concept
The optimization algorithms presented in this thesis were implemented to validate the correctness of the theoretical results, and to clarify the potential for this approach to optimizing the performance of task-parallel codes. The methodology and results are discussed below.
A model program, represented as an abstract DFG, was developed to generate and then sort a collection of pseudo-random floating-point numbers (see Section 8.1). Several alternative approaches were used optimize the DFG's task set: • A program named evolver1 , an implementation of the full-task-set optimization algorithm outlined in Algorithm 2 (page 37). See Section 8.2 for a more complete description of evolver1 .
• A task set was selected based on the author's experience in developing multithreaded software (see Subsection 8.3.3). This is to provide a baseline against which the learning algorithm's results are compared. The manual creation of a task set was tractable because the program being optimized was represented with a DFG having only 11 vertices (see Section 8.3).
• Two task sets representing extremes of the search space were manually chosen: the empty task set, and the task set in which each of the DFG's 11 vertices is assigned to its own separate task. Their performance was examined to confirm that the optimization problem treated by this thesis was not simply solved by trivial selection of one of these extrema.

Model Problem : Sorting an Array of Numbers
The model problem chosen is a program which sorts an array of floating-point number, as follows.
A DFG representation of this program is shown in Figure 4. For this work, we generate the sort program's DFG in terms of a single pre-runtime parameter, l. l gives the number of levels of mergesort vertices in the DFG. For a given value of l, this work produces a DFG with 2 l randomize vertices, 2 l quicksort vertices, and 2 l − 1 mergesort vertices.
The DFG is represented as a simple text file, and is produced by a program named generate-DFG .
Note that the value of l is only loosely related to the number of data to be sorted by the program at runtime. The number of data to be sorted is specified as a runtime parameter, whereas l is a pre-runtime parameter. Our only assumption is that the specified number of data to sort is an integer multiple of 2 l , to simplify the runtime logic governing the number of data produced by each randomize vertex.
The vertex types behave as follows.
• Each randomize vertex produces an array of pseudo-randomly generated floatingpoint numbers. The size of the array is provided as a runtime parameter.
• Each quicksort performs an in-place sorting of the array provided by its inneighbor. The vertex passes a reference to that same array on to the vertex's out-neighbor.
• Each mergesort allocates a new array capable of storing the combined content of the vertex's two input arrays. It then uses a simple mergesort algorithm to merge the two (pre-sorted) input arrays, and provides a reference to that array to its out-neighbor.
This model problem was chosen for several reasons. Its simple, recursive structure per- With this approach, several parameters govern the static and runtime qualities of the sorting program: • l. This is depth of the mergesort portion of the DFG, as described above.
• Number of data to be sorted. This is the number of floating-point numbers to be randomly generated and sorted.
• Task set. The collection of convex sets which forms the basis for Thread Building Block tasks in the resulting executable program.

Task-set Optimization Algorithm
evolver1 implements a simple evolutionary algorithm whose structure is consistent with that of Algorithm 2 (page 37).
Note that Algorithm 2 is parameterized by which algorithm it uses to optimize a single convex set (the Z parameter). For this work, Z = Algorithm 3 (see page 73). Additional details of evolver1 are as follows.

Initialization
The initial generation is created by pseudo-randomly generating a user-specified number of task sets, population size. A user-specified probability of inclusion parameter influences the content of each task in each task set of the initial population, as indicated in Algorithm 4.

Reproduction and Mutation
Each subsequent generation of task sets is formed as follows. population size copies of the fastest-running task set from the previous generation are created. An additional population size 3 copies are the fastest-running task set from any previous generation are created. These two groups are task sets are the basis for the subsequent generation. All of these task sets are then mutated as follows.
As an invocation of evolver1 iterates from its first to its last evolved generation, a "temperature" value decreases linearly from 1.0 to 0.0. The temperature indicates the fractional number of tasks within each task set which are to be mutated during that iteration of the algorithm. This mechanism is provided to allow broad search of the entire solution space in early iterations of the algorithm, followed by more precise local refinement in the later stages so that locally optimum solutions are more likely to be be discovered.
Any task which is to be mutated is mutated using the convex-set-mutation approach described in Section 7.5.

Termination
The proof-of-concept evolutionary algorithm terminates after running for a user-specified number of iterations. A more sophisticated evolutionary algorithm might use convergence analysis, rather than a fixed number of iterations, to decide after which iteration to terminate execution.

Methodology
For this work, the computer used was a laptop computer with an Intel Core i7-Q820 For each parallelization of the program, the resulting executable program was run three times to obtain the average running time. During these executions, the computer was running in the typical fashion, with one user logged into a desktop environment, and several desktop applications open but untouched by the user.

Problem Details
This experiment used a DFG with two mergesort levels (l = 2), as shown in Figure 4.
This was based on several considerations, most notably that early prototyping showed no significant performance benefit to the optimized program by using values of l greater than 2.
All runs of the sort program generated and sorted 10 million floating-point numbers.
This number was chosen to minimize the impact of other system activity on the sort program's measured running time, while keeping the sort program's running times brief enough to allow research to proceed at a reasonable pace.

Evolutionary Algorithm Parameters
Based on common practices for the evaluation of evolutionary algorithms, each run of this work's evolutionary algorithm ran for 50 generations. Each generation had population size = 100. As described in Subsection 8.2.2, this led to an effective population size of 133 in all generations after the first generation.

Manually Chosen Task Set
The manually designed task set for the DFG is This task set is visually depicted in Figure 5f (page 89).
The following reasoning led to the task set in Subsection 8.3.3. The sort program was to run with 1 × 10 7 input data, and so each of the randomize vertices would likely require a quantity of CPU time which significantly exceeded the time required to launch a parallel task. Therefore it seemed profitable to in fact run all four randomize vertices concurrently. This is achieved by having three of those vertices run in separate child tasks, and the fourth randomize vertex run in the parent thread.  Figure 4). The program generated for each of the listed task sets was run three times, and each run generated and sorted 1 × 10 7 floating-point numbers. The non-degenerate task sets from this table are graphically depicted in Figure 5.
By similar reasoning, all four of the quicksort appeared suitable for running in parallel on separate CPU cores. Furthermore, each quicksort could be run in the same task as the randomize vertex which supplied its input data. This grouping appeared sensible because it might reduce the overhead of launching the quicksort vertices in new tasks, and there appeared to be no way in which it could cause additional delay in the start of execution of a quicksort vertex.
Each of the two non-DFG-sink mergesort vertices merges together two arrays of 2.5×10 6 floating-point numbers. Intuitively, this large CPU burden for each mergesort vertex appears to make a strong case for running the mergesort vertices in parallel if at all possible. For this reason that the task set {5} was specified to be in a child task and vertex 6 was left to run separately in the parent thread.

Proof-of-Concept Results
Each run of the stochastic optimization algorithms took approximately 12 hours. Table 1 presents the running times obtained from each of the task sets considered by this proof of concept. Six of the task sets are shown in Figure 5. (The two degenerate  Table 1.  Table 1 (continued). task sets are not shown.) From Table 1 we see that all of the stochastically optimized task sets outperformed the manually designed one. The fastest stochastically optimized task set (Run 1) completed its work 32.2% faster than did the manually developed task set ( 1.384 ÷ 1.047 seconds). The best-performing stochastic optimization runs (Run 1 and Run 3) produced remarkably similar task sets. The worse-performing stochastic optimization runs (Runs 2, 4, and 5) also significantly similar task sets as each other, however with more apparent variation.

Limitations of Proof-of-Concept
This proof-of-concept is provided to help validate the algorithms developed in this thesis.
Care must be take to not draw overly broad conclusions from this exercise.
In particular, this exercise has not established that either the evolutionary algorithm or the human-crafted parallelization offers the best possible performance for that kind of endeavor. A programmer with more skill or luck than this author could perhaps produce a parallelization of the test programmer which outperformed any that the evolutionary algorithm is likely to discover. Conversely, a better-tuned evolutionary algorithm could perhaps outperform a human in nearly all circumstances.
Another limitation of this exercise is that the selected DFG was quite small, with only 11 vertices, and each those vertices is computationally intensive. Given the good efficiency (a) Comparative absolute performances of the five runs of the stochastic optimization algorithm.
(b) Relative pace, in generations, at which each run of the stochastic optimization algorithm discovered its final result. Figure 6: Performance Comparison of Five Auto-parallelization Runs of the Thread Building Blocks library, this means that even a full parallelization of this DFG, in which each vertex is assigned to a separate task, will have very low book-keeping overhead for the tasks. Therefore this proof-of-concept does little to explore how this thesis' algorithms would perform on a very large DFG with shorter-running vertices.
This search algorithm initializes, grows, and shrinks convex sets using only algorithms based on topological sort, not the alternative algorithms based on predecessor and successor sets. This limitation was deemed acceptable because the primary goal of this proof-of-concept was to examine both the quality of the solutions produced by the task set optimization algorithm, and speed with which it obtained those solutions. Any difference in performance between these two groups of support algorithms were expected to be negligible for this purpose, in particular because the target program's DFG contained only 11 vertices.

Discussion
The machine-learning algorithm developed in this work outperformed what appears to be a sensibly chosen manual tuning of the same problem by 32%. This result was obtained from two of the five stochastic optimization runs, which collectively required approximately 60 hours of computer time.
This appears to validate several assumptions of this present research. The first is that on modern computing hardware, even a seasoned programmer's expectation about the ideal parallelization of a simple program isn't necessarily accurate.
The second validated assumption is that, at least in this one test case, the evolutionary algorithm can uncover a good parallelization of the subject program in a relatively small amount of time.
The solution space over which this proof of concept's evolutionary algorithm ranges is all task sets containing six or fewer convex sets. However, the deconfliction process (see

CHAPTER 9
Conclusions and Future Work

Central Conjecture Proven
This thesis proves the central conjecture that a single convex set can be evolved in a per-iteration efficient manner, such that as the number of iterations approaches infinity, the probability of all convex sets being explored approaches one.
The primary motivation for this work was to advance the state of the art in machinelearned task parallelism, so that parallel programs can run faster and be developed with less human labor. The proof-of-concept developed in this work shows that for the one subject program studied, the resulting parallel program was substantially faster (32%) than the version produced by a human programmer. This suggests that automatic task parallelization may continue to be a fruitful area for future research.

Alternative Search Spaces
Under the execution model assumptions made in this present work, the true optimization space is all valid task sets, not merely all valid convex sets. Either of two definitions of task-set validity is appropriate, depending on the execution model assumed: In the weak definition of task-set validity, a task set is valid merely if each contained task is a convex set, and no two tasks contain the same vertex. A more demanding definition of task-set-validity, based in this work's assumed execution models, is that it must also be possible to replace each task in the task set with a pair of SPAWN and WAIT vertices without inducing cycles in the parent DFG.
While this thesis shows that a single convex set can be efficiently evolved to cover the space of all convex sets, it does establish the existence of a search algorithm which covers precisely the space of all valid task sets, using either of the stated definitions of validity. The development of algorithms covering either of those valid task-set spaces, in a manner that is per-iteration efficient and which will cover the entire search space given enough iterations, may lead to better-performing machine learning systems than the one demonstrated in this work's proof-of-concept. To this author's knowledge, no similar algorithm has been discovered for enumerating either of the valid task-set search spaces described above. The development of such algorithms may be a useful area of future endeavor.

APPENDIX A Algorithm Assumptions and Primitive Operations
In this appendix we provide asymptotic running times, and when appropriate the randomness properties, of primitive operations and algorithms used within this work. The operations and algorithms described below are not the ones of primary interest in this work. They are called by this work's more interesting algorithms, and are presented here to facilitate the analysis of the running times and randomness properties of the algorithms which call them.
Some primitive operations on sequences backed by linked lists have running times that are proportional to the current size of the sequence, as opposed to the sequence's maximum possible size. However, for the sake of simplified worst-case running-time analyses, for each we primitive operation we indicate a worst-case asymptotic running time, typically in terms of |V|, the order of the overall DAG being considered.
The remainder of this appendix is structured as follows.

A.1 Primitive Random Operations
Primitive random operations particular to some primitive data structure presented in the appropriate subsection below.
A critical property of all primitive random operations used in this work is that each valid outcome of the operation has a finite, positive probability each time the operation is performed. From this foundation we establish certain randomness properties in the algorithms presented in Appendix C and Section 7.5.
We present here our one primitive random on integers. All random operations on primitive data structures are described in appropriate subsection below.

A.2 Primitive Operations on Literal Sets
A literal set is a set of zero-or-more pseudocode literal values. In particular, the set {GROW, SHRINK, SWAP} in Algorithm 3.
We assume each instance of this kind of set to be a random-access array of Boolean values, having the following operations. This modifies S to be the set S ∪ {t}. This operation requires time O(1).

A.3 Primitive Operations on a DAG
We assume for this work that the order (i.e., number of vertices) in DAG are known a priori, and that all data structures are pre-allocated or whose allocation requires a negligible amount of running-time.
Let D = (V, A) be the DAG. Each vertex in the DAG is named by a distinct number in the range [1, |V|]. Unless stated otherwise, any mention of D, V or A in this appendix refers to the DAG over which the primitive operations work.  This returns T C(D), the transitive closure of D, as described in Section 2.7.

A.4 Primitive Operations on a Set of Vertices
A vertex set is represented as a random-access vector of O(|V|) Boolean values, with the following operations.

A.5 Primitive Operations on a Set of Arcs
For simplicity we assume that a set of arcs has the same representation as a DAG: A set of arcs potentially connecting any two vertices in some vertex set V is represented as a |V| × |V| array of Boolean values.

A.6 Primitive Operations on a Sequence of Vertices
We assume that each vertex V is unambiguously identified by an integer, and that a vertex sequence is represented as a linked list of these integers. We further assume that allocation of storage for linked list elements requires negligible running time.
No vertex sequence manipulated by this work's algorithms contain multiple occurrences of the same vertex. (This is ultimately stems from this work's focus on acyclic graphs.) Therefore, for the sake of asymptotic running-time analyses, we assume that the length of a given vertex sequences is O(|V|).
The primitive operations on this type are as follows.

A.7 Primitive Operations on a Sequence of Vertex Sets
We assume that a sequence of vertex sets is represented as a linked list of references to existing vertex set objects, and that allocating nodes in the linked list requires negligible time. The primitive operations on this type are as follows. A sequence of vertex sets is implicitly empty until modified. vss reverse : [in] T : sequence of vertex sets, → sequence of vertex sets This returns a copy of T in which the order of the elements has been reversed. This operation requires time O(|T |).

A.8 Primitive Operations on the Transitive Closure of a DAG
We assume the following representation for the transitive closure of a DAG. Suppose The transitive closure has two components: a vertex set and a vertex adjacency matrix.
The D and T C(D) have the same vertex set, and we assume that T C(D) merely holds a reference to the vertex set of D.
Both A T C and A are |V| × |V| adjacency matrices, in which the indices along both axes are in correspondence with the elements of V. The difference between A and A T C is the meaning of a given cell in the matrix. A(i, j) = true indicates that the arc (v i , v j ) ∈ A.
A T C (i, j) = true indicates that a path of the form v i . . . v j exists in D.
We first provide the operation for computing the transitive closure of a DAG. We then provide operations which use the DAG's transitive closure to computingP andŜ (see Chapter 5).
The predecessor set (P ) and successor set (Ŝ) for a vertex or vertex set can be computed in various ways. The approach used in this work's algorithms is to pre-compute the predecessor and successor set for each individual vertex in the DAG. Note that this is directly given by a DAG's transitive closure, which is given as a primitive operation in Appendix A.3.
Once we haveP andŜ for each individual vertex in the DAG, computingP andŜ for a vertex set X is simply a matter of computing the obtaining and merging the predecessor and successor sets, respectively, of each vertex in V. These operations are as follows. The operation returns the verticesP (X ).
Note however that 1 ≤ |X | ≤ |V|. Therefore the running time of this algorithm in terms The operation returns the verticesŜ(X ).
Note however that 1 ≤ |X | ≤ |V|. Therefore the running time of this algorithm in terms of |V| is O(|V| 2 ).

A.9 Other Assumptions
Below are the remaining assumptions made in our analyses of this work's algorithms.
We assume that assignments (denoted x ← y) have negligible running times, and are treated as having time O (1). For simple objects such as integers, this is clearly an appropriate assumption for modern computers. For complex objects such as vertex sets, we assume that the assignment is is the binding of an identifier to a pre-existing object, and not the creation of a new object or a new copy of the object.
Simple scalar operations on numbers (addition, comparison, etc.) or Boolean values (negation, conjunction, etc.) are assumed to have time O(1).

APPENDIX B Utility Algorithms
In this appendix we present algorithms which are not the main focus of this work, but which are called by the more interesting algorithms elsewhere in the work.
The algorithms presented here differ from the operations in Appendix A in that their function, asymptotic running time, and/or randomness properties merit more explanation than do the operations presented in Appendix A.   We present below an algorithm for computing the contraction of a digraph (see Sec- This algorithm is a trivial implementation of the definition given in Section 2.5, and we consider its correctness to be easily verified by inspection. The algorithm's asymptotic running time is as follows.
The loop spanning lines 2-6 may have up to |V| iterations, and each iteration of the loop

Algorithm 5 Contract a Digraph
Function contract : body is O (1). Therefore the running time of lines 1-6 is clearly O(|V|).

B.2 Inducing a Subgraph
Algorithm 6 computes an induced subdigraph, as described in Section 2.6.

Algorithm 6 Induce Subgraph
Function induce subdigraph : (D, X ) → D<X> Require: This algorithm is a trivial implementation of the definition given in Section 2.6, and we consider its correctness to be easily verified by inspection. The algorithm's asymptotic running time is as follows.

Algorithm 7 Compute Strictly Direct In-Neighbors
Function get strict dir innbrs : (T C(D), X ) → N ⊖ (X ) Require: end if 9: end for 10: return N ⊖ (X ) Algorithms 7 and 8 are trivial implementations of the formulae given in Equation (1) and Equation (2), respectively. Using the assumptions stated in Appendix A, these algorithms' runnings times are clearly O(|V| 2 ).

B.4 Topological Sorting with Embedded Convex Set
Given X , a convex set of D, Algorithm 9 returns some topological sort of D in which the vertices of X appears as a contiguous subsequence.

B.4.1 Running Time of Algorithm 9
From inspection, algorithm clearly has running time O(|V| 2 ). We proceed below with an overview of the algorithm's approach, followed by more formal arguments its correctness.

B.4.2 Overview of Algorithm 9
This algorithm ensures that Q a topological sort of D by establishing that for every arc (u, v) ∈ A, u appears before v in Q.
The arcs of D are implicitly divided into four groups, differentiated on whether or not a

Algorithm 9 Topologically Sort with an Embedded Contiguous Convex Set
Function topsort with embedded cvx set : (D, X ) → Q Require: (E2) Q is a topological sort of D.
(E4) Let Z be any topological sort of D with structure [. . . , u, X, . . .]. Then there is a non-zero probability that each invocation of this algorithm returns Q such that Q = Z.
(E5) Let Z be any topological sort of D with structure [. . . , X, u, . . .]. Then there is a non-zero probability that each invocation of this algorithm returns Q such that Q = Z.
O(1) 10: return Q given arc originates in X , and whether or not the arc terminates in X .
Although T is a topological sort of D C rather than D, D C may have some arcs in common with D. The algorithm ensures that for the vertices ordered by these common arcs, the relative ordering of these vertices as provided by T is preserved during the construction of Q.
Similarly, although X is a topological sort of D<X > rather than of D, D<X > and D, and (X , X )-arc in D is also an arc in D<X>. By ensuring that Q has the same relative ordering of all vertices in X as does X, the algorithm guarantees that Q is consistent with all (X , X ) arcs of D.
Line 1. Recall that in the contraction X → x ′ of D, every arc in D that terminates in X is replaced by an arc terminating at the vertex x ′ . Similarly, any arc originating in X in D is replaced by an arc originating at x ′ . In all other regards, the directed graph Line 2. Although D is by assumption acyclic, the possibility exists that (V ′ , A ′ ) contains a cycle. If A has any arc of the form (X , X ), then the arc (x ′ , x ′ ) is be present in (V ′ , A ′ ).
The presence of such an arc in (V ′ , A ′ ) is not problematic for our overall purposes, but if present, (V ′ , A ′ ) is not acyclic and therefore has no topological sort. Therefore on line Lines 3-6. Here we create a topological sort of (V ′ , A ′ ). A crucial correspondence exists between T and every topological sort of D. The correspondence is discussed in terms of the four categories of arcs in A identified above. All four categories are covered in our analyses of the remaining lines of this algorithm. Lines 4-6 simply decompose T into the components W , x ′ , and Y .
Lines 7-9. The induced digraph D < X > contains all vertices in X , and all of the (X , X ) arcs in A. Because D < X > has a subset of the arcs in a DAG (D), it is not possible that D < X > contains any cycle. Therefore D < X > is also a DAG. All DAG's have at least one topological sort 1 , which here we call X.
Line 8 obtains an arbitrary topological sort of D < X >, and line 9 constructs a total ordering of V. In the following subsections we show that Q satisfies all of this algorithm's postconditions.
Q is a topological sort of D if and only if for every arc (u, v) ∈ A, u appears before v in Q. We divide the arcs of A into four groups, and for each group show that this ordering requirement is satisfied by D.
Every arc of this form is also an arc in the DAG D<X >. X is a topological sort of D<X>. Therefore for every arc (u, v) ∈ A, u appears before v in X.
Q contains X, and therefore if u appears before v in X, u also appears before v in Q.
Thus every arc in A covered by Case 1 is respected by Q.
T is a topological sort of the DAG (V ′ , A ′ ). Therefore for every arc of the form (u, x ′ ) ∈ A ′ , u appears before x ′ in T .
From lines 4-5 of the algorithm, we have u ∈ W. From lines 7-8 of the algorithm, we have that if v ∈ X , then v ∈ X. Therefore every arc covered by Case 2 is an arc of the form (W, X ).
Line 9 of the algorithms constructs Q such that all vertices in W occur before all vertices 1 See [1,Prop. 2.1.3].
of X . Therefore every arc in A covered by Case 2 is respected by Q.
The proof for Case 3 has the same structure as the proof for Case 2, above.
Because neither u nor v is a member of X , the arc (u, v) is an arc in both D and in the contracted digraph (V ′ , A ′ ). T is a topological sort of (V ′ , A ′ ), therefore for every arc (u, v) covered by Case 4, u appears before v in T .
We now show that for the arcs (u, v) covered by Case 4, u also appears before v in Q.
There are three possible locations for u and v: u, v ∈ W, or u, v ∈ Y, or u ∈ W ∧ v ∈ Y.
If u, v ∈ W, then any topological sort with W as an embedded subpath respects the (u, v) arc. Q is one such topological sort, and therefore u appears before v in Q.
By similar reasoning, if u, v ∈ Y, then u appears before v in Q.
Finally, if u ∈ W ∧ v ∈ Y, then u appears before v in Q because line 9 of the algorithm Q such that all vertices of W appear before all vertices of Y .
This concludes our proof of postcondition (E3).

B.4.4 Correctness of Algorithm 9 postconditions (E4)-(E5)
We now demonstrate that postcondition (E4) holds: If Z = . . . u, X . . . is a topological sort of D for any u / ∈ X , then there's a non-zero probability that this algorithm returns

Z.
This follows directly from our assumptions about the randomness of the dag rand topo sort subroutine as detailed in Appendix A.3.
Postcondition (E5) holds by the same logic.

APPENDIX C Basic Convexity Algorithms
In this appendix we present algorithms for initializing a convex set according to either of two criteria, and for growing or shrinking an existing convex set by some specified number of vertices.
For each of these four kinds of algorithms, we provide two variations based on their underlying approach. One group of algorithms is implemented using the transitive closure of the DAG andP andŜ sets, and the other group is implemented in terms of topological sorts of the DAG. Section 7.3 discusses the relative merits of using these different implementation approaches. Operation

C.1 Initializing Convex Set by Seeds usingP andŜ
Algorithm 10 computes the smallest convex superset of the specified set of seed vertices.
If the seed set X is a convex set of D, this simply returns X .
The correctness of Algorithm 10 is clearly established by Theorem 5.3.4 and Corollary 5.3.5. From inspection the running time of this algorithm is clearly O(|V| 2 ).

Algorithm 10 Initialize a Convex Set by Seeds usingP andŜ
Function init convex set by seeds PS : (T C(D), X ) → W Require: is the transitive closure of D.
(R3) ∅ ⊂ X ⊂ V Ensure: (E1) W is the smallest superset of X that is a convex set of D.

C.2 Initializing Convex Set by Seeds using Topological Sort
Algorithm 11 computes some convex superset of the specified set set. Note that unlike Appendix C.1, this algorithm makes no assurance that it returns the smallest convex superset of the seed set. This stems from this algorithm's use of a random topological sort, which may not arrange the vertices of X into the smallest possible subsequence of the topological sort.

Algorithm 11 Initialize a Convex Set by Seeds using Topological Sort
Function init convex set by seeds TS : (D, X ) → W Require: We establish the algorithm's correctness as follows. Clearly X ⊆ Z, because the subse-quence of T from which Z is formed contains all members of X .
What remains is to show that Z is a convex set of D. This follows from Theorem 4.2.1, which states that any contiguous subsequence of any topological sort of D is itself a convex set of D.

C.3 Initializing Convex Set by Order usingP andŜ
Algorithm 12 creates a convex set of the specified order σ using the predecessor and successor set operations. The correctness of this algorithm is trivially given by the correctness of the subroutine grow convex set PS (Algorithm 14), which is established in Appendix C.5.

C.4 Initializing Convex Set by Order using Topological Sort
Algorithm 13 creates a convex set with the specified number of vertices, without regard to which particular vertices appear in the set.

Algorithm 14 Grow Convex Set Using Predecessor and Successor Sets
Function grow convex set PS : (T C(D), X , δ) → Z Require: Let u / ∈ X be any vertex such that X ∪ {u} is a convex set of D. Then if δ = 1, there is a non-zero probability that this algorithm returns with Z = X ∪ {u}. When δ = 1, this algorithm clearly ensures that whichever vertex was added to X is a vertex from Y. What remains is to show that this algorithm has a non-zero probability of selecting each vertex y ∈ Y. This comes from our definition of the vset random subset operation, which in Appendix A.4 is defined to have a non-zero probably of returning any subset of the specified order.

C.6 Growing Convex Sets Using Topological Sort
Given a convex set X , this algorithm returns some superset of X having |X | + δ vertices.
From inspection, the running time of this algorithm is clearly O(|V| 2 ). We now consider the algorithm's correctness.

Algorithm 15 Grow Convex Set Using Topological Sort
Function grow convex set TS : (D, X , δ) → Z Require: (E4) Let u / ∈ X be any vertex such that X ∪ {v} is a convex set of D. Then if δ = 1, there is a non-zero probability that this algorithm returns with Z = X ∪ {v}. O(|V|) 9: return Z Lines 2-8 randomly select a contiguous subsequence Z of T . Z has |X | + δ elements, and contains X. Because X contains precisely the vertices of V, Z is a superset of X .
Because Z is a contiguous subsequence of a topological sort of D, we have from Theo- show that given such a topological sort, there's a non-zero probability of v subsequently being selected to appear in Z.
In Appendix B.4 it is shown that topsort with embedded cvx set if a topological sort exists having either of those two forms, there is a non-zero probability that an invocation of topsort with embedded cvx set returns a topological sort having that form. And so in line 1 of this algorithm we have a non-zero probability that for any u ∈ V \ X such that X ∪ {v} is a convex set of D, T has either the structure T = [. . . , v, X, . . .] or of the form T = [. . . , X, v, . . .].
Line 3 determines which neighbors in T of X appear in Z. Because the random integer is defined to have a non-zero probability of returning any value in the closed interval [0, (x start − 1)], there is a non-zero probability that line 3 causes Z to contain either the vertex in T that immediately precedes X, and/or the vertex that immediately follows X.

C.7 Shrinking Convex Sets UsingP andŜ
Given a convex set X , this algorithm returns some subset of X having |X | − δ vertices.
The basis for our approach to shrinking existing convex sets is provided in Theorem 5.4.5.

Algorithm 16 Shrink Convex Set Using Predecessor and Successor Sets
Function shrink convex set PS : (T C(D), X , δ) → Z Require: Let u / ∈ X be any vertex such that X ∪ {v} is a convex set of D. Then if δ = 1, there is a non-zero probability that this algorithm returns with Z = X \ {v}. The correctness of this algorithm comes from Theorem 5.4.5, which indicates that for a given convex set X of D, any subset of X \ (P (X ) ∩Ŝ(X )) may be removed from X to obtain a new convex set of D.
The loop body spanning lines 5-9 deletes as many vertices as possible from Z using the formula given by Theorem 5.4.5. The algorithm's preconditions ensure that Z is a convex set of D before the first activation of the loop body, and Theorem 5.4.5 ensures that the new version of Z is a convex set of D after each completion of the loop body.
Corollary 5.4.6 ensures that Y = ∅, and therefore each activation of the loop body always reduces the order of Z by at least one. Therefore this algorithm always terminates.
Because there is no assurance that |Y| ≥ δ, repeated applications of the formula from Theorem 5.4.5 may be necessary, provide by the loop spanning lines 5-10.

C.7.3 Correctness of Algorithm 16 postcondition (E4)
We begin by showing that when δ = 1, the loop body spanning lines 6-9 is executed precisely once. When δ = 1, to remove > 0 is clearly true, and so at least one loop iteration occurs. From Corollary 5.4.6, we have that for every activation of line 6, |Y| > 0. From this if follows that lines 7-9 always cause to remove to be decremented by at least 1. When δ = 1, this ensures that after the first iteration of the loop, to remove is set to 0.
We now show that when the loop body executed precisely one time, this algorithms returns Z = X \ {y}, where there is a non-zero probability that y is any of the vertices in V \ X such that X \ {y} is a convex set of D.
Observe that in the first (and in this case, only) iteration of the loop, we have Z = X . Therefore line 6 computes the formula Y = X \(P (X )∩Ŝ(X )). Theorem 5.4.7 establishes that this is the precise vertex set such that the deletion of any one of them from X yields another convex set of D.
Our goal now is to show that for any y ∈ Y, there is a non-zero probability that Z = X \ {y}. This comes from our definition of the vset random subset operation, which in Appendix A.4 is defined to have a non-zero probably of returning any subset of the specified order.

C.8 Shrinking Convex Sets Using Topological Sort
Given a convex set X , this algorithm returns some superset of X having |X | + δ vertices.

Algorithm 17 Shrink Convex Set Using Topological Sort
Function shrink convex set TS : (D, X , δ) → Z Require: Let u / ∈ X be any vertex such that X ∪ {v} is a convex set of D. Then if δ = 1, there is a non-zero probability that this algorithm returns with Z = X \ {v}. U is a total ordering of X , and so any proper subsequence of U is an ordering of a proper subset of X . Because delta > 0, line 3 establishes Z as one such proper subsequence of U , and therefore Z ⊂ X , satisfying postcondition (E2).
Postcondition (E3) is obtained directly by the definition of vseq random slice (see Appendix A.6).

C.8.2 Correctness of Algorithm 17 postcondition (E4)
From Theorem 5.4.7 we have that X \ {y} is a convex set of D if and only if y ∈ X \ (P (X ) ∩Ŝ(X )). It follows then that either y / ∈P (X ), and/or y / ∈Ŝ(X ).
If y / ∈P (X ), then there must be some topological sort of D < X > in which y is the last element of the sequence. If no such topological sort existed, it would indicate that D < X > contains a (y, X \ {y}) arc, contradicting our assumption that y / ∈P (X ).
For similar reasons, if y / ∈Ŝ(X ), then there must exist a topological sort of D < X > in which y appears as the first element of the topological sort.
The function dag rand topo sort is defined to have a non-zero probability of returning any topological sort of the supplied DAG, and therefore has a non-zero probability of returning a topological sort with y as the first or last element.
When δ = 1, vseq random slice is constrained to return the subsequence of U containing all elements of U except either the first or last. Furthermore, vseq random slice is defined to have a non-zero probability of yielding either subsequence. Therefore regardless of whether y appears as the first or last element of U , there is a non-zero probability of Z = X \ {y}.

APPENDIX D Arbitrary Transformation Algorithms
Below we present the convex set transformation algorithms supporting the arbitrary transformation of convex sets as discussed in Chapter 6.

D.1 Transforming Convex Set into a Single Vertex with Convex Intermediate Sets
Given a convex set X of some DAG D, Algorithm 18 discovers a sequence R of singlevertex removals that may be progressively applied to X , such that the set resulting from each single-vertex removal is itself a convex set of D. That is, for any n ∈ [1, |R|], removing from X the first n vertices of R yields a convex set.

Algorithm 18 Transform Convex Set into a Single Vertex with Convex Intermediate Sets
Function evolve convex set to vertex : (D, X ) → R Require: call vss append( R, Y ) 8: end for 9: return R

D.1.1 Correctness of Algorithm 18
The correctness of this algorithm is rooted in the correctness of topsort with embedded cvx set and Theorem 4.2.1.
For any given convex set such as X , we can use topsort with embedded cvx set to produce a topological sort Q of D in which the elements X form a contiguous subsequence of Q. This is accomplished by line 1. where i > 1, Y is the same set as was produced by the previous iteration, except that the vertex Q i − 1 has been deleted. This ensures postcondition (E3).

D.2 Transforming between Arbitrary Convex Sets
Algorithm 19 provides a constructive proof that given any two convex sets X and Y of some digraph D, one can always evolve X into Y using a sequence of single-vertex additions, removals, and/or replacements, such that the set produced by each of those single-vertex changes is also itself convex.

D.2.1 Correctness of Algorithm 19
We consider each of the algorithm's postconditions in turn.
Every vertex set in T is one of the vertex sets produced by evolve convex set to vertex (Algorithm 18), and is therefore a convex set of D. Thus postcondition (E1) holds.
The first vertex set of T is also the first vertex set of Q, which evolve convex set to vertex ensures has the value X . This satisfies postcondition (E2).
The last element T is the first element of S, which in turn is the last element of R.
evolve convex set to vertex ensures that that vertex set is Y. Therefore postcondition (E3) is satisfied.
Recall that evolve convex set to vertex returns a sequence of vertex sets such that for elements i and (i+1) of some returned sequence Z, we have |Z i + 1 | = |Z i |−1. That is, as one proceeds through the sequence, each vertex set in the sequence is one vertex smaller than the previous. This corresponds to the antecedent predicate in postcondition (E4).
Because S is the reversal of a sequence produced by evolve convex set to vertex, in general for the elements of S we have |S i + 1 | = |S i | + 1. This corresponds to the antecedent predicate in postcondition (E6).
We consider two possibilities, corresponding to the test on line 4 of the algorithm. Postcondition (E6) is satisfied by similar reasoning, with S rather than Q.
Postcondition (E5) does not pertain to Case 1, because there is no pair of adjacent Case 2: Q |Q| = S 1 : In Case 2, from line 7 of the algorithm we have: As discussed in Case 1 above, the first |Q| elements of T comprise the only portion of T to which postcondition (E5) applies, and is satisfied by that portion's equality with Q.
Similarly, the remaining portion of T is the only portion to which postcondition (E6) applies, and is satisfied by that portion's equality with S.
Unlike Case 1 above, postcondition (E5) does pertain to Case 2. Both the final element of Q (i.e., T |Q| ) and the first element of S (i.e., T |Q| + 1 ) containing just one vertex.
It is this pair of vertices to which postcondition (E5) applies. However, this if-then test on line 4 ensures that in Case 2, we have T |Q| = T |Q| + 1 .

APPENDIX E Task Set Deconfliction
This appendix presents a pair of algorithms to modify a task set P such that no two tasks overlap, and that the cycle-induction problem described in Subsection 3.3.1.3 is avoided.
Suppose D = (V, A) is a DFG, and P = {T 1 , T 2 , . . . , T n }. For each set T i ∈ P , let ∅ ⊂ T i ⊆ V and let T i be a convex set of D.
Here we make no assumption that the sets in P are mutually disjoint (that is, ∀i = i, T i ∩ T j = ∅). We also make no assumption that if P were translated to a collection of graphs {D parent , D child 1 , D child 2 , . . . , D child m }, that D parent would be free of cycles.

E.1 Obtaining Disjoint Tasks
One approach to ensuring that independently evolved convex sets do not overlap is as follows.
Let X and Y be two convex sets of some DFG, in which X ∩ Y = ∅. Either X or Y is chosen as a "victim" set. Let us assume that Y is chosen as the victim. Vertices shall be deleted from Y to yield a set Y ′ , such that Y ′ is a convex set and X ∩ Y ′ = ∅.
Note that, but the definition of convex sets, the empty set is not a convex set. However, it may be necessary for the victim set to become the empty set in order to eliminate overlap. This is most easily seen in the example X = Y = {v} for some vertex v.
We assert but do not prove that either of the two formulae in Equation (E.18) ensure that Y ′ is a subset of Y', and is either a convex set of the DFG or is the empty set.
Y ′ = Y \ ((X ∩ Y) ∪P (X ∩ Y))) Y ′ = Y \ ((X ∩ Y) ∪Ŝ(X ∩ Y))) (E. 18) Informally, the formulae in Equation (E.18) may be understood as follows. By removing X ∩ Y from Y, we potentially introduce concavity in the resulting set, Y \ (X ∩ Y). This is because there may exist a path in the DFG which originates within Y \ (X ∩ Y), then passes through X ∩ Y, and then re-enters Y \ (X ∩ Y). Any such path is eliminated by deleting from Y not just the vertices X ∩ Y, but also any vertices which complete a cycle between X ∩ Y and any vertices which are to remain in Y ′ . This cycle may be broken by further removing eitherP (X ∩ Y) orŜ(X ∩ Y), which is why two variations of the this approach are presented in Equation (E.18).
This approach for eliminating overlap within a pair of convex sets may be generalized to eliminating any such pairwise overlap of convex sets in a task set of arbitrary order.
Algorithm 20 presents one such algorithm. Each vertex set in P ′ is a convex set of D or is the empty set.

E.2 Preventing Cycle-Induction during Translation
Algorithm 21 assumes that the input sets are individually convex sets of D, and are collectively disjoint with each other. It produces a new collection of sets which are guaranteed to be free of the cycle-induction problem described in Subsection 3.3.1.3.

E.2.1 Hyper-predecessor Sets and Hyper-successor Sets
Algorithm 21 makes use of the concepts of hyper-P and hyper-Ŝ sets, described below.
An illustration of hyper-P is provided in Figure E Let D = (V, A) be a DFG. Recall that the predecessor set of some vertex set X , denoted P (X ), is the collection of all vertices u / ∈ X such that a (u, X ) path exists within D (see Chapter 5).
Suppose that P ′ = {T ′ 1 . . . T ′ n } is a set of convex, mutually disjoint tasks of D, such as produced by Algorithm 20.
We define the hyper-P (X ) as follows: • If D contains an arc (u, X ), then every member of hyper-P (u) is also a member of hyper-P (X ).
• If there exists a task T ′ i ∈ P ′ such that u ∈ T ′ i , then for all v ∈ T ′ i , every vertex in hyper-P (v) is also a member of hyper-P (X ).
Hyper-P /Ŝ are defined so as to anticipate the (non-hyper)P /Ŝ sets that would arise from replacing, in D, every task P ′ with the corresponding pair of SPAWN ans WAIT vertices.
Note that these definitions depend upon D being acyclic, but remain well-defined even when if the graph resulting from replacing each task T ′ i ∈ P ′ with a pair of WAIT and SPAWN vertices contains a cycle.
This crucial difference between the supplied task set P ′ and P ′′ is as follows. If the DFG D is modified by replacing each task T ′ i ∈ P ′ with the two vertices SPAWN(T ′ i ) and WAIT(T ′ i ) , the resulting graph D parent may contain a cycle, as described in Subsection 3.3.1.3.
In contrast, if D is modified by replacing each task in P ′′ , rather than each task in P ′ , with SPAWN(T ′ i ) and WAIT(T ′ i ) vertices, the resulting graph D parent is guaranteed to by acyclic, solving the cycle-induction problem described in Subsection 3.3.1.3.
The insight which drives this algorithm's design is the following: Suppose P is a task set for some DFG D. We assert but only informally prove the following: If P contains two tasks T i and T j such that hyper-P (T i ) ∩ hyper-Ŝ(T i ) ∩ T j = ∅ (E. 19) then producing D parent as described above will cause D parent to contain a cycle.
That proposition is based on the observation that hyper-P (T i ), as computed in the context of the DFG D, is identical to the setP (SPAWN(T i )) as computed in the graph D parent .
We may similarly observe that hyper-Ŝ(T i ), as computed in the context of the DFG D, is identical to the setŜ(WAIT(T i )) as computed in the graph D parent .
Let D i parent be the graph produced by replacing T i with SPAWN(T ′ i ) and WAIT(T ′ i ) in D. Further assume that D i parent still contains the vertices T j (i.e., T j has not yet been replaced by SPAWN(T ′ j ) and WAIT(T ′ j ) vertices in D i parent ).
LetP D andŜ D be the functionsP andŜ, respectively, evaluated in the context of the DFG D. LetP D i parent andŜ D i parent be the functionsP andŜ, respectively, evaluated in the context of the graph D i parent . Then we have the following: Our conclusion then is the following.
• Let D be a DFG, and let P ′ be a task set with the qualities ensured by Algorithm 20.
• Let D i parent be the DFG produced by replacing some T ′ i ∈ P ′ with the vertices SPAWN(T i ) and WAIT(T i ) .
• Let T ′ j ∈ P ′ be a task which has not yet been replaced by the vertices SPAWN(T j ) and WAIT(T j ) in D i parent .
• Let D j parent be the graph produced by replacing T ′ j with the vertices SPAWN(T j ) and WAIT(T j ) in D i parent .
• Then D j parent contains a cycle if and only if hyper-P D (T i ) ∩ hyper-Ŝ D (T i ) ∩ T j = ∅.
In summary, once can use the hyper-P and hyper-Ŝ operators to anticipate whether or not two tasks are destined to participate in a parent-graph cycle should all of the task set's tasks be replaced with SPAWN and WAIT vertices.

E.2.3 Algorithm Overview
Algorithm 21 visits every pair of tasks in the supplied task set P ′ . For each pair of tasks, Equation (E.19) is used to predict whether or not those two tasks will ultimately participate in the creation of a parent-graph cycle.
If a cycle is indicated, then vertices are deleted from one of the two tasks (the "victim" task) such that Equation (E.19) no longer indicates that a cycle would be formed.
Line 9 of the algorithm is certain to produce a subset of Y which not only has no cycle with X , but is also a convex set of D. This is because HSX ∩ Y =P (Y), and removing all predecessors or all successors of a set ensures that what remains is acyclic.

E.2.4 Algorithm Observations
Algorithm 21 has several noteworthy qualities, in light of its intended use to ensure that machine-learned task sets can be translated into task-parallel programs.
The first observation is that this algorithm may be systematically biased toward modifying high-numbered tasks within the supplied task set P ′ , because when two tasks are found to be in conflict, the higher-numbered task is chosen as the victim. The impact of this bias on the performance of the overall machine-learning system is unclear.
Secondly, without further study it's not obvious that Algorithm 21 makes the smallestpossible modifications to the tasks in P ′ in order to obtain P ′′ . It may in fact be the case that the algorithm's statement P ′′ j ← Y \ HSX is equivalent to simply deleting the victim task, P ′′ j . This possibly over-aggressive modification of the victim task may lead to small changes in the task set P ′ causing unnecessarily large and unintuitive modifications in the modified task set P ′′ . It may be the case that this causes avoidable performance problems in the machine-learning system which evolves the task sets.

Algorithm 21 Eliminating Cycle-Induction by Multiple Convex Sets
Function eliminate cycle induction : (D, P ′ ) → P ′′ Require: Each task in T i ∈ P ′ is a convex set of D, or is the empty set.