Towards Faster Execution of Ensemble ML Bootstrap Based Techniques

Document Type

Conference Proceeding

Date of Original Version



Algorithms for ensemble methods (EM) based on bootstrap aggregation often perform copious amount of redundant computations (RC) thus limiting their practicality. Given this constraint, we propose a framework that views these algorithms as a collection of computational units (cu), a tightly coupled set of both mathematical operations and data. This view facilitates a reduction in RC (RRC), thereby allowing for faster execution plans. Inspired by the floor tiling approach in VLSI, we look to engineer solutions for RRC while possibly reconfiguring the underlying computing system's compiler technology stack. We start by showing that under the assumption that the computational system has unbounded but finite memory (i.e., the memory is large enough to hold all intermediate values) and that each has a uniform cost, our approach reduces to a well-studied directed bandwidth problem for the directed acyclic graphs (DAGs). Next, we consider a more realistic scenario where the computing system has limited memory and concurrent execution while still assuming a uniform cost. Using a new notion of (r, s) set cover of a DAG (nodes representing computational units and edges representing their interdependencies) we formulate the problem of reducing redundant computational steps in EM as a variation of a directed bandwidth problem. We show that the graph's minimum bandwidth is closely related to memory requirements for studying RRC. Finally, our preliminary experimental results are supportive of the proposed approach for RRC and promising that it can be applied to a broader set of algorithms in decision sciences.

Publication Title, e.g., Journal

ACM International Conference Proceeding Series