Training and Source Code Generation for Artificial Neural Networks

The ideas and technology behind artificial neural networks have advanced considerably since their introduction in 1943 by Warren McCulloch and Walter Pitts. However, the complexity of large networks means that it may not be computationally feasible to retrain a network during the execution of another program, or to store a network in such a form that it can be traversed node by node. The purpose of this project is to design and implement a program that would train an artificial neural network and export source code for it so that the network may be used in other projects. After discussing some of this history of neural networks, I explain the mathematical principals behind them. Two related training algorithms are discussed: backpropagation and RPROP. I also go into detail about some of the more useful activation functions. The actual training portion of the project was not self implemented. Instead, a third party external library was used: Encog, developed by Heaton Research. After analyzing how Encog stores the weights of the network, and how the network is trained, I discuss how I used several of the more important classes. There are also details of the slight modifications I needed to make to one of the classes in the library. The actual implementation of the project consists of five classes, all of which are discussed in the fourth chapter. The program has two inputs by the user (a config file and a training data set), and returns two outputs (a training error report and the source code). The paper concludes with discussions about additional features that may be implemented in the future. Finally, an example is given, proving that the program works as intended.


Predecessors to ANNs
The history of most neural network research can be traced back to the efforts of Warren McCulloch and Walter Pitts. In their 1943 paper 'A logical Calculus of Ideas Immanent in Nervous Activity' [1], McCulloch and Pitts introduced the foundation of a neuron, a single piece of the nervous system, which would respond once a certain threshold had been reached. This model of a neuron is still used today. In 1949, neuropsychologist Donald Hebb published his book 'The Organization of Behavior'. Hebb postulated that "When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased." [2]. Hebbian learning influenced research in the field of machine learning, especially in the area of unsupervised learning.
In 1958, Frank Rosenblatt developed the Perceptron. More information on this will be presented in the following subsection.
In 1959, Bernard Widrow and Marcian Hoff developed a working model they called ADALINE (ADAptive LINEar), as well as a more advanced version known as MADALINE (Multiple ADAptive LINEar) [3]. These models were some of the first to be applied to real world problems (such as eliminating echoes on phone lines), and may still be in use today. [4] Breakthroughs in neural network research declined starting in 1969, when Marvin Minsky and Seymour Papert published their book 'Perceptrons: an Introduction to Computational Geometry'. In this book, Minsky and Papert claimed Rosenblatt's perceptron wasn't as promising as it was originally believed to be.
For example, it was unable to correctly classify an XOR function. While this book did introduce some new ideas about neural networks, it also contributed to what was known as 'the dark age of connectionism' or an AI winter, as there was a lack of major research for over a decade.
Interest in artificial networks declined, and the focus of the community switched to other models such as support vector machines. [5]

Perceptron
In his 1958 paper, Frank Rosenblatt considered 3 questions[6]: 1. How is information about the physical world sensed, or detected, by the biological system?
2. In what form is information stored, or remembered?
3. How does information contained in storage, or in memory, influence recognition and behavior?
The first question wasn't addressed as he believed it "is in the province of sensory physiology, and is the only one for which appreciable understanding has been achieved." The other two questions became the basis of his concept of a perceptron (which he compared to the retina in an eye).
A perceptron functions as a single neuron, accepting weighted inputs and an unweighted bias, the output of which is passed to a transfer function. This step function evaluates to 1 if the value is positive, and either 0 or -1 if the value is negative (the exact value may vary depending on the model). After iterations with a learning algorithm, the perceptron calculates a decision surface to classify a data set into two categories.
The perceptron learning algorithm is as follows[7]: 1. Initialize the weights and threshold to small random numbers.
2. Present a pattern vector (x 1 , x 2 , ..., x n ) t and evaluate the output of the neuron.
3. Update the weights according to w j (t + 1) = w j (t) + η(d − y)x j , where d is the desired output, t is the iteration number, and η (0.0 < η < 1.0) is the gain (step size).
Steps two and three are repeated until the data set has been properly classified.
Unfortunately, due to the nature of the perceptron, it will only work with data that is linearly separable.

What is an Artificial Neural Network?
An artificial neural network (sometimes referred to as an ANN, or just a neural network) is a machine learning model inspired by biological neural networks (such as the central nervous system).
Neural networks fall under the supervised learning paradigm. In supervised learning, the network is presented with pairs of data, input and output. The goal is to be able to map the input to the output, training in a way that minimizes the error between the actual output and the desired output. More information may be found in section 2.1.
Each piece of the network is known as a neuron. Neural networks still use the McCulloch-Pitts model of a neuron (see figure 1). The inputs into a node are the values from the previous layer (or the input values, if the layer in question is the input layer). Each value is multiplied by the associated weight, and then those products are summed together ( w i x i ). Rather than being fed into a threshold function, an activation function is used. This allows for a wider ranger of potential output values, instead of just 0 or 1. The output value of the neuron can be used as the input into the next layer, or as the output for the network.
Neural networks consist of at least three layers: an input layer, at least one hidden layer, and an output layer:

Justification for Study
My main interest in machine learning pertains to the realm of video games.
Artificial intelligence is an important aspect of every game (except those which are exclusively multiplayer, with no computer-controlled agents). 5-60% of the CPU is utilized by AI-related processes, and this number has been known climb as high as 100% for turn-based strategy games [8]. While some modern games utilize machine learning, most of this is done before the game is published (rather than the training occurring during runtime). According to Charles and McGlinchey, "online learning means that the AI learns (or continues to learn) whilst the end product is being used, and the AI in games is able to adapt to the style of play of the user. Online learning is a much more difficult prospect because it is a realtime process and many of the commonly used algorithms for learning are therefore not suitable." [9] The project that I have completed focuses on generating the source code for an artificial neural network, which is directly applicable to the field of gaming. With the actual training occurring during the development phase, it makes sense to have a program that can create the network, separate from the rest of the project. The source code that it outputs then allows the network to be used within the context of a game. The other benefit of such a program is that it allows the neural network to be used without having to maintain the structure of the network. Reducing the results of the network down to mathematical formulas results in faster computation times than having to walk through the nodes of a network (as stored in multiple classes or data structures). The results of this project have been tested in a Quake II environment.
[9] D. Charles  The backpropagation algorithm works by using the method of gradient decent.
In order to do this, it needs to use an activation function which is differentiable.
This is a change from the perceptron, which used a step function. One of the more popular activations functions is the sigmoid function. This and other alternatives will be explored in section 2.2.
In order to make the calculations easier, each node is considered in two separate parts. Rojas calls this a B-diagram (or backpropagation diagram). As seen in figure 4, the right side calculates the output from the activation function, while the left side computes the derivative. Figure 4: The two sides of a computing unit [1] Rather than calculating the error function separately, the neural network is extended with an additional layer used to calculate the error internally (as seen Figure 5: Extended network for the computation of the error function [1] in figure 5). The equation for the error function is is the output value from node i, and t i is the target value. Keeping in mind the separation of the nodes as previously mentioned, the derivative calculated in the left portion will be (o i − t i ).
The backpropagation algorithm consists of four steps: 1. Feed-forward computation 2. Backpropagation to output layer 3. Backpropagation to hidden layer(s)

Weight updating
In the first step, the algorithm is processed in a straight forward manner, with the output from one node being used as the input to the next node, as seen in figure 6.
Generally speaking, backpropagation retraces through the network in reverse.
Since the network is being run backwards, we evaluate using the left side of the node (the derivative). Instead of outputs being used as the the inputs to the next node, outputs from a node are multiplied by the output of previous nodes. Figure 6: Result of the feed-forward step [1] Figure 7: Backpropagation path up to output unit j [1] We extended the network to calculate the error function, so for the output layer we use that derivative as an input, as seen in figure 7.
Backpropagation for the hidden layer(s) acts in the same way, using the values from the output layer as its input.
The final step is weight updating. The formula for updating the weight w ij (the weight between node i and node j) is ∆w ij = −γo i δ j , where γ is the learning rate, o i is the output from node i, and δ j is the error from node j.
A possible variation is the inclusion of a momentum variable η. This can help make the learning rate more stable: ∆w ij (t) = −γo i δ j + η∆w ij (t − 1)

Resilient Propagation
A promising alternative to backpropagation is resilient propagation (often referred to as RPROP), originally proposed by Martin Riedmiller and Heinrich Braun in 1992. Instead of updating the weights based on how large the partial derivative of the error function is, the weights are updated based on whether the partial derivative is positive or negative.
First, the change for each weight is updated based on if the derivative has changed signs. If such a change has occurred, that means the last update was too large, and the algorithm has passed over a local minimum. To counter this, the update value will be decreased. If the sign stays the same, then the update value is increased.
Typically, η + is assigned a value of 1.2, and η − is assigned a value of 0.5.
Once the update value is determined, the sign of the current partial derivative is considered. In order to bring the error closer to 0, the weight is decreased if the partial derivative is positive, and increased if it is negative.
0, else At the end of each epoch, all of the weights are updated: The exception to this rule is if the partial derivative has changed signs, then the previous weight change is reversed. According to Reidmiller and Braun, "due to that 'backtracking' weight-step, the derivative is supposed to change its sign once again in the following step. In order to avoid a double punishment of the updatevalue, there should be no adaptation of the update-value in the succeeding step." In most cases, the update value is limited to a specific range, with an upper limit of ∆ max = 50.0 and a lower limit of ∆ min = 1e −6 .
Reidmiller and Braun provide tested RPROP against several other popular algorithms: backpropagation (BP), SuperSAB (SSAB), and Quickprop (QP) [2]: Problem 10-5-10 12-2-12 9 Men's Morris Figure  One of the simpler activation functions is the linear function: This activation is very simple, and isn't used very often. The input is directly transferred to the output without being modified at all. Therefore, the output range is R. The derivative of this activation function is f (x) = 1.
A variation on this is the ramp activation function. This function has an upper and lower threshold, where all values below the lower threshold are assigned a certain value and all values above the upper threshold are assigned a different value (0 and 1 are common). The result is something similar to the step function used in the perceptron, but with a linear portion in the middle instead of a disjuncture.

Sigmoid
One of the more common activation functions is the sigmoid function. A sigmoid function maintains a shape similar to the step function used in perceptrons (with horizontal asymptotes at 0 at 1). However, the smooth curve of the sigmoid means that it is a differentiable function, so it can be used in backpropagation (which requires an activation function to have a derivative). The output range of this activation function 0 to 1. The derivative of this

Hyperbolic Tangent
The hyperbolic tangent function has a similar shape to the sigmoid function.
However, its lower horizontal asymptote is at -1 instead of 0. This may be more useful with some data sets, where use of a sigmoid activation function does not produce any negative numbers. The output range of this activation function is -1 to 1. The derivative of this

Elliott
Elliott activation functions were originally proposed by David L. Elliott in 1993 as more computationally effective alternatives to the sigmoid and hyperbolic tangent activation functions. [3] Encog provides two such activation functions: Elliott and Symmetric Elliott.
In all of the cases below, s is the slope, which has a default value of 1 (although this can be changed).
The Elliott activation function serves as an alternative to the sigmoid activation function: f (x) = 0.5(x * s) 1+|x * s| + 0.5 Just as the sigmoid activation function, this produces an output range of 0 to 1. The derivative of this activation function is f (x) = Heaton Research (the company that makes the Encog library) provided some interesting statistics on the efficiency of this activation function [4]:

Activation Function Total Training Time Avg Iterations Needed
TANH 6,168ms 474 ElliottSymmetric 2,928ms 557 According to the javadoc comments for the two classes, these activation functions approach their horizontal asymptotes more slowly than their traditional counterparts, so they "might be more suitable to classification tasks than predictions tasks".

How Encog Stores Weights
In their simplest form, Encog stores the weights for a neural network in an array of doubles inside a FlatNetwork object. As seen in Figure 13, the order of the weights is determined by a combination of the reversed order of the layers and the regular order of the nodes (with the biases being the last node in a layer, if applicable). For example, the network in Figure 13 consists of an input layer of 2 nodes and a bias, a hidden layer of 3 nodes and a bias, and an output layer of 1 node. The first 4 weights in the array are the weights going from the hidden layer to the output layer (weights[0] connects h1n0 to o0, weights [1] connects h1n1 to o0...). The next 3 weights connect the input layer to the first hidden node (weights [4] connects i0 to h1n0, weights [5] connects i1 to h1n0...). This continues in this fashion until the final weight in the array, weights [12], which connects the input bias node to the last regular hidden node (i2 to h1n2).
To access all of the weights at once, the BasicNetwork class provides a dumpWeights() method. It may also be useful to use the weightIndex array from the FlatNetwork, which indicates where in the weights array each layer starts.
Alternately, the BasicNetwork class has a getWeight() method, which allows a user to access the weight from one specific node to another. This is the method that I utilized in my implementation:

Training
Encog has several different ways to train networks. For the purpose of this project, we will focus on on propagation training.

MLTrain
BasicTraining Propagation Backpropagation ResilientPropagation Most of the training is done through the Propagation.iteration() method, which calls several helper methods. There are two different versions of this method: a default version and a version that accepts the number of iterations as a parameter.
In order to do a single iteration, the default form of the method calls the alternate version and passes 1 as a parameter.
The first method to be invoked is BasicTraining.preIteration(). This method increments a counter called iteration, which keeps track of the current iteration.
It also calls upon the preIteration() method for any strategies that may be in use.
Strategies are additional methods of training that may be used to enhance the performance of a training algorithm. The ResilientPropagation class doesn't use any, but the Backpropagation class allows for the use of two strategies: SmartLearningRate and SmartMomentum. These strategies will be used to attempt to calculate the learning rate and momentum if they have not been specified upon creation of the network. However, since both of these variables are assigned values by the user (with a default momentum of 0 if the use of that variable is not desired), training strategies are not used in the implementation of this project.
The next method to be invoked is Propagation.rollIteration(). However, the use of this method is superfluous. While the BasicTraining class has a variable which keeps track of the current iteration, the Propagation class has its own copy of that same variable (rather than inheriting the value from its parents class. The rollIteration() method increments this duplicate variable. Unfortunately, where the variable in the BasicTraining class is utilized in accessor and mutator methods, the same variable in the Propagation class is not used anywhere outside of the rollIteration() method.
Following this is the Propagation.processPureBatch() method (large data sets may want to make use of the processBatches() method, which uses a portion of the training set rather than the entire thing). This in turn calls upon Propagation.calculateGradients() and Propagation.learn().
Propagation.calculateGradients() iterates through the network and calculates the gradient at each portion of the network (for more information, see section 2.1.1). This is done through the GradientWorker class. The advantage of this is that it allows for multithreaded calculations. Different portions of the network that don't rely on each other (for example, nodes in the same layer do not have any weights connecting them) can be calculated in parallel using an array of GradientWorkers.
This project only uses single threaded calculations, so the array has a size of 1. for more information), so this is an abstract method, with each child class having its own implementation.
The last method to be used is BasicTraining.postIteration(). This method calls upon the postIteration() method for any strategies if applicable. The ResilientPropagation class has its own postIteration() method, which stores the error in the lastError variable, because RPROP uses this to check for sign changes in the error during subsequent iterations.

Some Individual Classes
The following sections will go into detail about how I used some of the classes from the Encog library. It wasn't feasible to describe all of the classes used in the program, but these seven were the most important.
The main method that I used was loadCSVTOMemory(). This method takes a CSV file and loads that into a ReadCSV object. Then, that object is converted into something that I could use for training: an MLDataSet. There were two problems I was encountering when importing CSV files: incomplete data entries were giving undesired results when training, and an inability to preserve the column headers.
It is not uncommon to have data sets with entries that don't have values for all columns (especially when dealing with data obtained through real world observations). These values can lead to undesired results if used for training, so I wanted to discard those entries in their entirety. Thankfully, attempting to load empty data values throws a CSVError exception (Error:Unparseable number), so I was able to surround that part of the code with a try-catch statement. Inside the catch portion, I decided not to print out the stack trace because that information wasn't very useful. However, I did increment a counter I had created called ignored, which would then be printed to the console at the conclusion of the importing process.
For the column headers, I needed to create a new data member: The information from the .CSV file is loaded into a ReadCSV object. If the .CSV file has headers (as specified through a boolean), these are stored in an ArrayList of Strings, which can be accessed through a getColumnNames() method in that class.
However, there is no way to access that ReadCSV object after the initial importing process is completed. Thus, I needed to add some additional functionality to the the TrainingSetUtil class.
Inside the loadCSVTOMemory() method, I added a simple statement to store the headers in the data member that I had defined above: if ( the ArrayList into a standard array because I am more comfortable accessing information in that format.

BasicNetwork
BasicNetwork (org.encog.neural.networks.BasicNetwork) serves as the main source of interaction between my implementation and the network itself. However, this doesn't necessarily mean that this class does most of the work. Much of the information is stored in related classes (for example, once the format of the network is set up, the majority of information about the network is stored in a FlatNetwork object).
Before a network can be used, its structure must be defined. For this purpose, the BasicNetwork class uses this data member: To set up this structure, each layer must be added through the use of the addLayer() method. Each layer passed through the parameters will be added to an ArrayList of Layer objects. The first layer added will be considered the input layer.
Once all of the layers are added, the network must be finalized by invoking structure.finalizeStructure(). Finalizing a neural structure eliminates the intermediate representation of the layers, temporarily storing that information in FlatLayer objects, and then creating the FlatNetwork object which will be used in the remaining network operations.
Once the network is finalized, the reset() method is invoked, which assigns random starting values to the weights.
The actual network training is done through the Propagation class (an abstract class which serves as a parent for classes such as Backpropagation and ResilientPropagation). The BasicNetwork object is passed as a parameter, as well as the training data set and any other necessary variables (such as the learning rate and momentum if applicable).
Once the network is fully trained, its effectiveness can be measured by use of the compute() method. This is used to compare each ideal output value with the output value the network produces when given the same input.

FlatNetwork
The FlatNetwork class (org.encog.neural.flat.FlatNetwork) is a more computationally efficient form of a neural network, designed to store everything in single arrays instead of keeping track of everything in multiple layers. Layers are maintained through the use of index arrays, which indicate where each layer starts in the main arrays. According to the javadoc comments, "this is meant to be a very highly efficient feedforward, or simple recurrent, neural network. It uses a minimum of objects and is designed with one principal in mind--SPEED. Readability, code reuse, object oriented programming are all secondary in consideration".
In concept, FlatNetwork objects act similarly from the standpoint of the user, for they share many of the same methods. However, most of the calculations (such as training) are actually done in this class (the BasicNetwork class invokes methods from here). The speed increase comes from the use of single-dimensional arrays of doubles and ints, which have a faster information access time than using accessor and mutator methods with multiple classes.

BasicLayer
The BasicLayer class (org.encog.neural.networks.layers.BasicLayer) is an implementation of the Layer interface. Its job is to store information about the specific layer it is assigned (input, hidden, or output) during the creation of the network. Once the network has been finalized, specific layers are no longer used.
The class has two constructors: one which has user defined parameters (activation function, bias, and neuron count), and one which just receives the number of neurons in the layer. If the second constructor is used, the default option is to create a layer which has a bias and uses a hyperbolic tangent activation function.
Hidden layers utilize all three variables when being initialized. Input layers do not have activation functions. Bias nodes are stored in the layer prior to where they will have an impact (a bias node which effects the nodes in the hidden layer will be declared as part of the input layer), so output layers should not have a bias.
Each layer also has a data member which indicates which network the layer is a part of.

BasicMLDataSet
The BasicMLDataSet class (org.encog.ml.data.basic.BasicMLDataSet) isn't a very complicated class, but it is very important. A child class for the more general MLDataSet interface, the main purpose of this class is to maintain an ArrayList of BasicMLDataPair objects. This is what the training data set will be stored in.
The class contains of several constructors, able to create an object by accepting multidimensional double arrays, an MLDataSet object, or an MLDataPair ArrayList.
The rest of the class contains several add methods, as well as methods to retrieve data entries or information about the data set (such as its size).

BasicMLDataPair
The BasicMLDataPair class (org.encog.ml.data.basic.BasicMLDataPair) is a child class of the MLDataPair interface. Its purpose is to hold the information of a single data entry. Each BasicMLDataPair contains two MLData objects, arrays of doubles designed to store the input data and the ideal data respectively. Both values are necessary for supervised learning, but only the input value is required for unsupervised learning (the ideal value should be left null).

ActivationFunction
ActivationFunction (org.encog.engine.network.activation.ActivationFunction) is an interface that serves as a parent class for any activation function that would be used with a neural network. The library comes with sixteen activation functions already implemented, but users are free to implement their own as long as they include all of the methods in the interface.
The two most important methods are as follows: This method is the main math portion of the activation function. The input values are stored in the double array d, with the range of values specified by the variables start and size. After some mathematical calculations, the output value from the activation function is stored in the same double array. For example, from the ActivationSigmoid class: The ActivationLinear class actually leaves this method blank. The linear activation function has outputs identical to its inputs, so there is no need to do anything with the array of doubles.
This method calculates the derivative of the activation function at a certain point. Not all activation functions have derivatives (there is another method called hasDerivative(), which will return true if the specific activation function has a derivative and false otherwise). However, there must be a derivative for an activation function to be used with backpropagation.
The method receives two doubles as parameters. The first double, b, is the original input number (in the activationFunction method, this number would have been in the d array). The second double, a, is the original output value. This is the value the activation function produces if it is given a as an input. Depending on the equation for each specific activation function, the derivative will be calculated with whichever value is more computationally efficient. For example, the ActivationSigmoid class uses the output value: return a * (1.0 -a); To contrast, the ActivationElliott class uses the input value: return (s*1.0)/(d*d); As of v3.3, all activation functions in the Encog library have derivatives with the exception of ActivationCompetitive. Attempting to use this activation function in a backpropagation network will throw an EncogError exception ("Can't use the competitive activation function where a derivative is required").

Overview
The purpose of this program is to train an artificial neural network and export source code for it. This will allow the results of the network to be used in other projects without needing to store it in a data structure.
All information is controlled through user input via a config file and a training data set. The program will output two files: a training error report, and the code for the network. The exact format of these outputs will be designated by the user.

Assumptions/Design Choices
Early in the design process, I decided that I was going to use an external third party library to handle the actual training of the neural network. The purpose of this project was more focused on the source code generation for a neural network, rather than the training itself. Doing the actual implementation for the network training would add additional development time to this project. In addition, unless it were to be made the main focus of the project, a personal implementation would not be as effective as a third party alternative, as the designers of said software have spent years optimizing the code. More information about the java library Encog may be found in the previous chapter.
The only other major design decision was the restriction of only numerical information for the training data set. The program is only designed to be used with numbers for all data entries. Using strings will result in rows being ignored when the .csv file is imported. For more information on this decision, see section 5.1.
The program also assumes that all inputs from the user are valid. As of now, there are very little debugging tools built into the program, so invalid data will result in the program not running.

Inputs
The program requires two inputs from the user: a config file containing all of the information required by the neural network, and a .csv file containing the training data set.

Config File
The only command line argument is the file path for a config file. This file can have any name, but it must have a .txt file extension. The config file contains the following information: • The complete file path for the training data set. This file will be described in detail in the next subsection.
• A boolean for whether or not the training data set file has a header row (true for yes, false for no).
• The number of input variables (how many columns in the training data set are independent variables).
• The number of output variables (how many columns in the training data set are dependent variables).
• The number of hidden layers the artificial neural network will be constructed with. There is no theoretical upper limit on the number of hidden layers this program can accommodate, although studies have shown that almost any problem can be solved with the use of at most two hidden layers. [1] • Attributes for each hidden layer: -An integer for the type of activation function: -An integer for the number of normal neurons in the layer.
• Attributes for the input layer (only bias information is needed).
• Attributes for the output layer (bias and activation function is needed).
• The file type for the first output file (the training error): 0. text (.txt) 1. just numbers (.csv) • The name of the first output file (not including the file extension; the program will add that information internally).
• The file type for the second output file (the code for the artificial neural network): • The name of the second output file (not including the file extension; the program will add that information internally).
• The desired training error. The network will continue to train until the error is less than or equal to this number.
• The maximum number of epochs. If the desired training error has not yet been reached, the network will stop training after this many iterations.
• An integer for the network type: 0. ResilientPropagation (see section 2.1.2) 1. Backpropagation (see section 2.1.1) • The learning rate. This is only applicable for backpropagation networks.
• The momentum. The program will not use momentum if this value is set to 0. This is only applicable for backpropagation networks.
Comments can be made by beginning a line with a percent symbol (%). The methods related to importing the config file will ignore any such lines.
Rather than prompting the user for this information within the program, using a file allows all of the required information to be stored in one place. This also makes multiple uses of the program easier, because the user is able to change the value of a single variable without going through the tedious process of re-inputting all of the other data as well.

Training Data Set
The other primary input is the training data set. As mentioned in the previous subsection, the file path for this file is given as part of the config file rather than as a command line argument.
The training data set must conform to the following specifications: • It must be in comma-separated values format (a .csv file).
• Headers are optional. If they are included, the code that the program exports will use the column names as identifiers for variables.
• If possible, do not include any rows with blank entries in them. These rows will be discarded when the .csv file is imported, and therefore not used for training purposes.
• The .csv file shall for organized so that the independent variables (input) are on the left, while the dependent variables (output) are on the right.
• All of the data entries must be numerical. At this time the program does not support categorical classification.
Currently, the program uses the same data set for both training and testing.

Outputs
The program has two separate output files: one file containing the training error report, and one file containing the code of the neural network.

Training Error Report
The first output file contains information about the training error. This is the overall error (how far the actual results are from the ideal results) after each iteration of training.
The exact format of this file can be specified by the user in the config file.
Currently, there are two possible formats.
If the user selects option 0, the output will be in a .txt file: Figure 17: Sample from first output file (.txt) If the user selects option 1, the output will be in a .csv file. This will have a header row, and can be loaded into other programs for analysis (such as graphing): Figure 18: Sample from first output file (.csv)

Source Code
The second output file contains the source code for the trained neural network.
Regardless of what file format this output is in, there will be two main sections to it: variable declaration, and network calculation.
The variable declaration section is a list of all the variables that will be used in the network calculation section, as well as any default values (such as 1.0 for biases). I decided upon the following naming conventions for variables: • i -Input layer (starts at 0) • h -Hidden layer (starts at 1) • o -Output layer (starts at 0) • n -Number of the node within a layer (starts at 0) • f -Indicates which node from the previous layer the link originates (starts at 0) • t -Total (the sum of all f nodes leading to a specified nodes), before being fed to the activation function.
• Lack of an f or a t indicates that this value is the output from an activation function (or a bias node).
If there are headers present in the input file, these will be included as input and output variable names.
The network calculation section is where the specific weight values are utilized in order to write equations that map the specified input values to output values.
This allows the function of the trained network to be maintained without needing to store the information in a data structure.
The exact format of this file can be specified by the user in the config file.
Currently, there are two possible formats.
If the user selects option 0, the output will be in a .txt file. Variable declarations will just consist of names, and the network calculation section will just be mathematical formulas.
If the user selects option 1 or 2, the output will be in a .java file. Variables will all be declared as doubles (original input and final output variables will be public, and all others will be private). The network calculation section will be inside a method. Everything will also be inside of a class (which shares the same name as the file itself, as specified by the user in the config file).

Individual classes
The program itself (not counting the modified Encog library) currently consists of five classes. This number may grow in the future if more output source code types were to be implemented.

NeuralGenerator
The NeuralGenerator class is the largest class in the program. Most of the work happens here.
The variable declarations are mostly self explanatory, so they will not be discussed here. The comments for each variable can be viewed in Appendix A.1.
After the initial setup, the first thing the program does is import data from the config file, through the validateConfig() method. This method goes through the config file line by line (through the use of a helper method, nextValidLine(), which ignores any lines that are commented out, as designated by a '%' at the beginning of a line). All information from the config file is stored into data members so it can be accessed by other methods, and is then printed out to the console.
The initializeOutput1() method is called, which creates the first output file.
This file will contain the training error report. For more information, see section 4.4.1.
The next method to be invoked is createNetwork(). This method creates a BasicNetwork, and populates it with an input layer, hidden layers, and an output layer. The information for each layer (activation function, bias, and number of nodes) is specified by LayerInfo objects, which in turn are defined by the information in the config file. Once all of the layers are added, the network is finalized, and the weights are reset to random values.
Next, the training data set is created from the .csv file. If there are headers, these are stored in an ArrayList (the information is then stored in a String array, because I prefer working with that format).
Then, the network is trained. The two current options for training utilize either the Backpropagation class or the ResilientPropagation class (for more information, see sections 2.1.1 and 2.1.2 respectively). After each iteration of training, the training error is calculated, and written to a file through the writeOne() method. This helper method also prints the information to the console. Training will continue until the training error is below the desired amount, or until the maximum number of epochs has been reached.
Once the network is trained, the first file is closed. The program prints the results of the network to the console, comparing the actual value of each input to its ideal value.
Finally, the initializeOuput2() method is invoked. This method creates the code output file (see section 4.4.2), and stores the necessary values in variables through accessor methods in the OutputWriter class. Finally, the program flow then proceeds to the writeFile() method for the desired OutputWriter child class, and then the program terminates.

LayerInfo
LayerInfo is a small class created to store the information needed to create a layer in an artificial neural network. I had originally planned on using a struct, but java does not support those, so I decided to make a separate class to hold the same information.
The class has 3 main variables: • private int activationFunction -An integer for the type of activation function for that layer.
• private boolean isBiased -A boolean for whether or not the layer has a bias node.
• private int neurons -An integer for the number of normal (non-bias) neurons in the layer.
All of these variables are set through parameters passed to the constructor.
There should not be a need to change these values once they have been set, so there are no mutator methods. Each variable has an accessor method so that its value can be used by the rest of the program.
The only other method is the toString() method. This method is used for returning the information from the layer in an easy-to-read format, so that it can be printed. While not essential to the flow of the program, it may be useful for the user to see this information displayed in the console (especially for debugging purposes).

OutputWriter
The OutputWriter class serves as a parent class for other OutputWriters. This class holds all of the shared methods required to create a file and output the code/formula for a trained artificial neural network.
The createFile() method creates the file used for output. While the majority of the code in this method is the same in all child classes, I found that it was easier to have each child class add its own file extension to the file name (.txt or .java).
The writeFile() method is rather lengthy. This is where the actual program writes the code/formula for the neural network to a file. While similar in terms of basic structure, the actual details of this will vary with each child class.
The parseActivationFunction() parses the equation of the activation function and returns it in String form. A series of if-else statements allowed for 14 of the 16 currently implemented activation functions to be used (Softmax is rather complicated and would require the code to be reworked, and Competitive is nondifferentiable so I didn't see a need to include it).

OutputWriterTxt
The OutputWriterTxt class is a child of the OutputWriter class. This class will be used if the user selects option 0 for the second output file.
The createFile() method creates the second output file, using the filename as specified in the config file and appending a .txt file extension.
The variable declarations section gives the names of all the variables to be used in the network calculation section, in the following order: • Header names for input (if applicable).
• Header names for output (if applicable).
If there any are bias nodes present, they are assigned the value of the bias as defined in the network (the default is 1.0, but this value is customizable).

OutputWriterJava
The OutputWriterJava class is a child of the OutputWriter class. This class will be used if the user selects option 1 or 2 for the second output file.
The createFile() method creates the second output file, using the filename as specified in the config file and appending a .java file extension.
The format of the .java file was inspired by the output of the program Tiberius.
One of the original intents of this program was to be used in a course that currently uses Tiberius, so it made sense to model the output file in a way that it would be compatible.
The first things written to the file is an import statement for java.lang.Math, followed by the declaration of the class (with the same name as the file).
The variable declarations section declares all of the variables to be used in the network calculation section, in the same order as specified in the previous subsection. All methods are static so that they can be accessed from the main method without creating a specific object of this class type, so all variables are declared as static doubles. Most variables are private, but the input and output variables (as well as any variable names defined by headers in the training data set) are declared as public so that they can be accessed by the user in other classes. If there any are bias nodes present, they are assigned the value of the bias as defined in the network (the default is 1.0, but this value is customizable). Bias nodes are also always declared as private, even if they are in the input layer.
If the user has chosen to make a standalone java file, there will be two additional methods: main() and initData(). The main() method will call the other two methods (initData() and calcNet()), and then print the output values to the console. The initData() method will provide default values for the input variables (using the header names if applicable). The default values are currently set to 1, although these can be modified by the user. If the user has selected to make an integrated java file, neither of these two methods will be present.
The calcNet() method contains the network calculation section of the code.

The code generation of this section is almost identical to the equivalent in the
OutputWriterTxt class, except that every line ends with a semicolon.

Categorical classification
As of right now, the program only works with data sets with entirely numerical entries. This means that any data sets with categorical entries will need to be changed into a numerical representation before they can be used. For example, with the iris data set, instead of species of setosa, versicolor, and virginica, it would use numbers such as 1, 2, and 3.
My original concept was to start out with numerical classification first, because it is easier to work with, and then expand to include categorical if I had time.
However, I discovered late in the implementation period that in order to be able to use categorical data sets, I would have to completely change how the network itself was implemented. Within Encog, categorical classification is done with several different classes than numerical classification.
As of the writing of this paper, I do not know if I can use those classes to work with numerical data sets, or if I would have to make a separate main class for the different types.

Additional output formats
Originally, this project was going to be written in C++, because of the applicability of that language to the gaming industry (where I want to work) [1].
However, it was changed to Java because of the portability of that language (there is no need to compile the code for different systems, because it is always run within the java virtual machine).
Currently, the second output file from the program can be in either basic equational format (in a .txt file), or in Java code. Given more time, I would have preferred to also allow for C++ code to be exported. Given the nature of the program, it would not be unfeasible to allow other target languages to be implemented as well.

Non-command line inputs
Currently, the only input into the program is through a config file, which contains all of the necessary data that the program needs to run. While this can be easier for multiple runs (because the user does not need to repeatedly input the information each time), I recognize that it can be hard to set up the config file for the first time. Some users may also prefer to enter the information on a step by step basis.
The basic implementation of an alternate input method is not very compli- Related to this additional input method would be improved config file debugging. Currently the program assumes that all of the data the user has inputted is valid. There is no checking in that method to see if a number is within a valid range (for example, a value of 17 for the activation function type). These numbers are checked in other places of the code, but it would be more useful for the user to have the numbers validated the first time they are encountered. If there is a major problem (for example, the program is expecting a number, and the user has a string of text instead), a generic exception will be thrown, and the stack trace will be printed to the console.
Although these implementations are not very difficult, they were omitted for because of timing.

Normalization
Depending on the data set being used, normalization can be an important feature in machine learning. For example, data sets with values for a certain variable that are much larger than other values may not converge as well during training as with a normalized data set. Encog supports several different types of normalization, but I would most likely be using range normalization due to the ability to normalize the data to a specific range (for example, -1 to 1 when using a hyperbolic tangent activation function, or 0 to 1 when using a sigmoid activation function).
In this equation, d L is the minimum value (low) in the data set, d H is the maximum value (high) in the data set, n L is the desired normalized minimum, and n H is the desired normalized maximum.

Conclusions
Overall, the program works. The best way to illustrate this is to walk through an example.