On Design and Implementation of an Embedded System for Neural-Machine Interface for Artificial Legs

According to limb loss statistics, there are over one million leg amputees in the US whose lives are severely impacted by their conditions. In order to improve the quality of life of patients with leg amputations, neural activities have been studied by many researchers for intuitive prosthesis control. The neural signals collected from muscles are electromyographic (EMG) signals, which represent neuromuscular activities and are effective bioelectrical signals for expressing movement intent. EMG pattern recognition (PR) is a widely used method for characterizing EMG signals and classifying movement intent. The key to the success of neural-controlled artificial limbs is the neural-machine interface (NMI) that collects neural signals, interprets the signals, and makes accurate decisions for prosthesis control. This dissertation presents the design and implementation of a real-time NMI that recognizes user intent for control of artificial legs. To realize the NMI that can be carried by leg amputees in daily lives, a unique integration of the hardware and software of the NMI on an embedded system has been proposed, which is real-time, accurate, memory efficient, and reliable. The embedded NMI contains two major parts: a data collection module for sensing and buffering input signals and a computing engine for fast processing the user intent recognition (UIR) algorithm. The designed NMI has been completely built and tested as a working prototype. The system performance of the real-time experiments on both able-bodied and amputee subjects for recognizing multiple locomotion tasks has demonstrated the feasibility of a self-contained real-time NMI for artificial legs. One of the challenges for applying the designed PR-based NMI to clinical practice is the lack of practical system training methods. The traditional training procedure for the locomotion mode recognition (LMR) system is time consuming and manually conducted by experts. To address this challenge, an automatic and userdriven training method for the LMR system has been presented in this dissertation. In this method, a wearable terrain detection interface based on a portable laser distance sensor and an inertial measurement unit is applied to detect the terrain change in front of the prosthesis user. The identification of terrain alterations together with the information of current gait phase can be used to automatically identify the transitions among various locomotion modes, and labels the training data with movement class in real-time. The pilot experimental results on an able-bodied subject have demonstrated that this new method can significantly simplify the LMR training system and the training procedure without sacrificing the system performance. Environmental uncertainty is another challenge to the design of NMI for artificial limbs. EMG signals can be easily contaminated by noise and disturbances, which may degrade the classification performance. The last part of the dissertation presents a realtime implementation of a self-recovery EMG PR interface. A novel self-recovery module consisting of multiple sensor fault detectors and a fast linear discriminant analysis (LDA) based classifier retraining strategy has been developed to immediately recover the classification performance from signal disturbances. The self-recovery EMG PR system has been implemented on a real-time embedded system. The preliminary experimental evaluation on an able-bodied subject has shown that the system can maintain high accuracy in classifying multiple movement tasks while motion artifacts have been manually introduced. The results may propel the clinical use of EMG PR for multifunctional prosthesis control.

One of the challenges for applying the designed PR-based NMI to clinical practice is the lack of practical system training methods. The traditional training procedure for the locomotion mode recognition (LMR) system is time consuming and manually conducted by experts. To address this challenge, an automatic and userdriven training method for the LMR system has been presented in this dissertation. In this method, a wearable terrain detection interface based on a portable laser distance sensor and an inertial measurement unit is applied to detect the terrain change in front of the prosthesis user. The identification of terrain alterations together with the information of current gait phase can be used to automatically identify the transitions among various locomotion modes, and labels the training data with movement class in real-time. The pilot experimental results on an able-bodied subject have demonstrated that this new method can significantly simplify the LMR training system and the training procedure without sacrificing the system performance.
Environmental uncertainty is another challenge to the design of NMI for artificial limbs. EMG signals can be easily contaminated by noise and disturbances, which may degrade the classification performance. The last part of the dissertation presents a realtime implementation of a self-recovery EMG PR interface. A novel self-recovery module consisting of multiple sensor fault detectors and a fast linear discriminant analysis (LDA) based classifier retraining strategy has been developed to immediately recover the classification performance from signal disturbances. The self-recovery EMG PR system has been implemented on a real-time embedded system. The preliminary experimental evaluation on an able-bodied subject has shown that the system can maintain high accuracy in classifying multiple movement tasks while motion artifacts have been manually introduced. The results may propel the clinical use of EMG PR for multifunctional prosthesis control.    2

Abstract
The quality of life of leg amputees can be improved dramatically by using a cyber physical system (CPS) that controls artificial legs based on neural signals representing amputees" intended movements. The key to the CPS is the neural-machine interface (NMI) that senses electromyographic (EMG) signals to make control decisions. This paper presents a design and implementation of a novel NMI using an embedded computer system to collect neural signals from a physical system -a leg amputee, provide adequate computational capability to interpret such signals, and make decisions to identify user"s intent for prostheses control in real time. A new deciphering algorithm, composed of an EMG pattern classifier and a post-processing scheme, was developed to identify the user"s intended lower limb movements. To deal with environmental uncertainty, a trust management mechanism was designed to handle unexpected sensor failures and signal disturbances. Integrating the neural deciphering algorithm with the trust management mechanism resulted in a highly accurate and reliable software system for neural control of artificial legs. The software was then embedded in a newly designed hardware platform based on an embedded microcontroller and a graphic processing unit (GPU) to form a complete NMI for real time testing. Real time experiments on a leg amputee subject and an able-bodied subject have been carried out to test the control accuracy of the new NMI. Our extensive experiments have shown promising results on both subjects, paving the way for clinical feasibility of neural controlled artificial legs.

Introduction
There are over 32 million amputees worldwide whose lives are severely impacted by their condition. This number is growing as the population ages and as the incidence of dysvascular disease increases. Over 75% of major amputations were lower-limb, with nearly 17% of lower-limb amputees suffering bilateral amputations [1].
Therefore, there is a continued need to provide this large and growing population of amputees with the best care and return of function possible.
With the rapid advances of cyber system technologies, it has in recent years become possible for high speed, low cost, and real time embedded computers to be widely applied in biomedical systems. The computerized prosthetic leg is one prominent example, in which motion and force sensors and a microcontroller embedded in the prosthesis form a close loop control and allow the user to produce natural gait patterns [2][3]. However, the function of such a computerized prosthesis is still limited. The primitive prosthesis control is based entirely on mechanical sensing without knowledge of user intent. Users have to "tell" the prostheses their intended activities manually or using body motion, which is cumbersome and does not allow smooth task transitions. The fundamental limitation on all existing prosthetic legs is lack of neural control that would allow the artificial legs to move naturally as if they were the patient"s own limb.
This paper presents a novel neural-machine interface (NMI) that makes neural controlled artificial legs possible. The new NMI is a cyber physical system (CPS), in which a complex physical system (i.e. neuromuscular control system of a leg amputee) is monitored and deciphered in real time by a cyber system. It senses neural control signals from leg amputees, interprets such signals, and makes accurate decisions for prostheses control. The neural signals that our NMI senses and collects from leg amputees are Electromyographic (EMG) signals that represent neuromuscular activity and are effective biological signals for expressing movement intent [4]. EMG signals have been used in many engineering applications, such as EMG-based power-assisted wheelchair [5], biofeedback therapeutic manipulator for lower limb rehabilitation [6], neuro-fuzzy interference system for identifying hand motion commands [7], and neural controlled artificial arms [8][9]. Previous research has shown that EMG was effective and clinically successful for artificial upper limbs [8][9]. However, no EMGcontrolled lower limb prosthesis is currently available, and published studies in this area are very limited because of the following technical challenges.
First of all, in human physiological systems, EMG signals recorded from leg muscles during dynamic movements are highly non-stationary. Dynamic signal processing strategies [10] are required for accurate decoding of user intent from such signals. In addition, patients with leg amputations may not have enough EMG recording sites available for neuromuscular information extraction due to the muscle loss [10]. Maximally extracting neural information from such limited signal sources is necessary and challenging.
The second important challenge is that the accuracy in identifying the user"s intent for artificial legs is more critical than that for upper limb prostheses. A 90% accuracy rate might be acceptable for control of artificial arms, but it may result in one stumble out of ten steps, which is clearly inadequate for safe use of artificial legs.
Achieving high accuracy is further complicated by environmental uncertainty, such as perspiration, temperature change, and movement between the residual limb and prosthetic socket may cause unexpected sensor failure, influence the recorded EMG signals, and reduce the trustworthiness of the NMI [11]. It is therefore critical to develop a reliable and trustworthy NMI for safe use of prosthetic legs.
The third challenge is the compact and efficient integration of software and hardware in an embedded computer system in order to make the EMG-based NMIs practical and available to patients with leg amputations. Such an embedded system must provide high speed and real time computation of neural deciphering algorithm because any delayed decision-making from the NMI also introduces instability and unsafe use of prostheses. Streaming and storing multiple sensor data, deciphering user intent, and running sensor monitoring algorithms at the same time superimpose a great challenge to the design of an embedded system for the NMI of artificial legs.
To tackle these challenges, a neural interfacing algorithm has been developed that takes EMG inputs from multiple EMG electrodes mounted on a user"s lower limb, decodes the user"s intended lower limb movements, and monitors sensor behaviors based on trust models. Our EMG pattern recognition (PR) algorithm, together with a post-processing scheme, effectively process non-stationary EMG signals of leg muscles so as to accurately decipher the user"s intent. The neural deciphering algorithm consists of two phases: offline training and online testing. To ensure the trustworthiness of NMI in an uncertain environment, a real time trust management (TM) module was designed and implemented to examine the changes of the EMG signals and estimate the trust level of individual sensors. The trust information can be used to reduce the impact of untrustworthy sensors on the system performance. 6 The new deciphering algorithm was implemented on an embedded hardware architecture as an integrated NMI to be carried by leg amputees. The two key requirements for the hardware architecture were high speed processing of training process and real time processing of the interfacing algorithm. To meet these requirements, the newly designed embedded architecture consists of an embedded microcontroller, a flash memory, and a graphic processing unit (GPU). The embedded microcontroller provided necessary interfaces for AD/DA signal conversion and processing and computation power needed for real time control. The control algorithm was implemented on the bare machine with our own memory and IO managements without using existing OS to avoid any unpredictability and variable delays. The flash memory was used to store training data. EMG PR training process involved intensive signal processing and numerical computations, which needs to be done periodically

Identification of User Intent
A dynamic EMG pattern classification strategy and post-processing methods were developed in this study for high decision accuracy.
EMG Signals: EMG signals recorded from gluteal and thigh muscles of residual limb were considered.
EMG Features: Four time-domain (TD) features [12] (the mean absolute value, the number of zero-crossings, the waveform length, and the number of slope sign changes) were selected for real-time operation because of their low computational complexity [9] compared to frequency or time-frequency domain features. The detailed equation and description of these four TD features can be found in [12].
The idea of discriminant analysis is to classify the observed data to the movement class in which the posteriori probability ) denote the movement classes and f be the feature vector in one analysis window. The posteriori probability is the probability of class g C given the observed feature vector f and can be expressed as is the likelihood, and ) ( f P is the possibility of observed feature vector f . Given the movement class g C , the observed feature vectors have a multivariate normal (MVN) distribution. In addition, the priori possibility is assumed to be equivalent for each movement class, and every class shares a common covariance. Hence, the maximization of posteriori possibility in The following expression, is defined as the linear discriminant function, where g  is the mean vector and  is the common covariance matrix.
During the offline training, g  and  were estimated by feature vectors calculated from a large amount of training data and were stored in the flash memory. 12 Dynamic Pattern Classification Strategy: When EMG signals are non-stationary, the EMG features across time show large variation within the same task mode, which results in overlaps of features among classes and therefore low accuracy for PR [10].
By assuming that the pattern of non-stationary EMGs has small variation in a shorttime window and EMG patterns are repeatable for each defined short-time phase, a 13 phase-dependent EMG classifier was designed, which was successfully applied to accurately and responsively recognize the user"s locomotion modes [10]. For nonlocomotion modes such as sitting and standing, the classifier can be built in the movement initiation phase by the same design concept. The structure of such a dynamic design of the classifier can be found elsewhere [10].
Post-processing of Decision Stream: Majority vote was used to eliminate erroneous decisions from the classifier. Majority vote [9] simply removes the decision error by smoothing the decision output. Note that this method can further increase the accuracy of NMI, but may sacrifice the system response time.

Trustworthy Sensor Interface
The NMI for artificial legs must be reliable and trusted by the prosthesis users.
The design goals of trustworthy sensor are (1) prompt and accurate detection of disturbances in real time applications, and (2) assessment of reliability of a sensor/system with potential disturbances. To achieve these goals, a trust management module that contains three parts: abnormal detection, trust manager, and decision support was designed. Moreover, since the changes are in two directions, a two-sided change detector, which can detect both positive change and negative change, is required.
Many statistical methods can be used to build the change detector. In this work, the Cumulative Sum (CUSUM) algorithm was chosen because it is reliable for detecting small changes, insensitive to the probabilistic distribution of the underlying signal, and optimal in terms of reducing the detection delay [19]. Particularly, the twosided CUSUM detector was adopted [20].
where i x represents the th i data sample, 0  is the mean value of data without changes, and k is CUSUM sensitivity parameter. The smaller the k is, the more sensitive the CUSUM detector is to small changes. In Notice that, to promptly respond to disturbances, CUSUM detector restarts for the next round of disturbance detection right after it detects a disturbance. However, there may be a disturbance lasting for some time and CUSUM detector would detect it for more than once. This may lead to an inaccurate trust calculation. To avoid this problem, a post processing scheme is proposed to stabilize the detection result.
Disturbances that are very close to each other are combined (i.e. within L continuous windows) as one disturbance. In our real time testing, L is set as 3, which represents 240ms. That is, if the detector is triggered repeatedly within 240ms, we consider this as one disturbance.
Trust manager: After the abnormal detector detects the disturbance in an EMG 16 signal, the EMG sensor is either permanently damaged or perfectly recoverable. To evaluate the trust level of the sensor, let 1 p denotes the probability that a sensor behaves normally after one disturbance is detected.
Assume all disturbances are independent. The probability that a sensor is still The trust value is computed from the probability value by the entropy-based trust quantification method [21], as is the entropy calculated as (1. Decision Making and Report: The trust information is provided to the user intent identification (UII) module to assist trust-based decisions. There are two levels of decisions.
1) Sensor level: When the sensor"s trust value drops below a threshold, this sensor is considered as damaged, and its reading is removed from the UII module. The classifier needs to be re-trained without the damaged sensor.
2) System level: After removing the damaged sensors, the system trust can be calculated by the summation of trust values of the remaining sensors. If the system trust is lower than a threshold, this entire UII model is not trustworthy, and actions for system recovery must be taken. One possible action is to re-train the classifier.
Another possible action is to instruct the patient to manually examine the artificial leg system.

Hardware Design
Technical challenges in hardware design are twofold. First of all, in order to increase the decision accuracy, frequent training computations may often be required, especially in uncertain environment, where the appearance of disturbances can be unpredictable and frequent. A training computation needs to be done not only whenever the user puts on the prosthesis but also whenever the system trust level goes below the predetermined threshold. Training data need to be recollected in these two cases. In addition, when a sensor"s trust value drops below a threshold, the classifier also needs to be re-trained using existing training data in the flash memory such that the classifier can make decisions based on the remaining undisturbed sensors. The training algorithms require intensive numerical computations that take a significantly long time, in the range of a few minutes to hours on a general purpose computer system [22]. It is very important to substantially speed up this training computation to make the training time of our NMI tolerable and practical. The second challenge is the real time processing of decision making in order to have smooth control of the artificial legs. Such real time processing includes signal sampling, AD/DA conversion, storing digital information in memory, executing PR algorithms, periodical trust management, and decision outputs. To meet these technical challenges, a new hardware design incorporating a multi-core GPU and an embedded system with a built-in flash memory was presented.
High performance and low cost multi-core GPUs [23]

Evaluation of Designed Algorithm
Assigned Tasks: To prove the design concept, the NMI system was designed to decipher the task transitions between sitting and standing. These tasks are the basic activities of daily living but difficult for patients with transfemoral amputations due to the lack of knee power. During the transition phase, EMG signals are non-stationary.
The classifier was designed in the short transition phase. Although it is possible to activate the knee joint directly based on the magnitude of one EMG signal or force data recorded from the prosthetic pylon, unintentional movements of the residual limb in the sitting or standing position may accidently activate the knee, which in turn may cause a fall in leg amputees. Hence, intuitive activation of a powered artificial knee joint for mode transitions requires accurate decoding of EMG signals for identifying the user"s intent from the brain. System Inc., Baton Rouge, LA) were used to record signals from gluteal and thigh muscles in one side of both subjects. The EMG electrodes contained a pre-amplifier which band-pass filtered the EMG signals between 10 Hz and 3,500 Hz with a passband gain of 20. For the able-bodied subject, the gluteal and thigh muscles on the dominant leg were monitored. After the skin was shaved and cleaned with alcohol pads, the EMG electrodes were placed on the anatomical locations described in [24].
For the amputee subject, the muscles surrounding the residual limb and the ipsilateral gluteral muscles were monitored. The subject was instructed to perform hip movements and to imagine and execute knee flexion and extension. We placed EMG electrodes at the locations, where strong EMG signals can be recorded. EMG electrodes were embedded into a customized gel-liner system (Ohio Willow Wood, US) for reliable electrode-skin contact. A ground electrode was placed near the anterior iliac spine for both able-bodied and amputee subjects. An MA-300 system (Motion Lab System Inc., Baton Rouge, LA) collected 7 channels of EMG data. The cut-off frequency of the anti-aliasing filter was 500 Hz for EMG channels. All the signals were digitally sampled at a rate of 1000 Hz and synchronized.
The states of sitting and standing were indicated by a pressure measuring mat.
The sensors were attached to the gluteal region of the subject. During the weight bearing standing, the recordings of the pressure sensors were zero; during the nonweight bearing sitting, the sensors gave non-zero readings.

22
Experiment Protocol: To evaluate the pattern recognition algorithm, before the real-time system testing, a training session was required in order to collect the training data for the classifier. During the training session, the subject was instructed to perform four tasks (sitting, sit-to-stand, standing, and stand-to-sit) on a chair (50 cm high). For sitting or standing task, the subject was required to keep the position for at least 10 sec. In the sitting or standing position, the subject was allowed to move the legs and shift the body weight. For two types of transitions, the subject performed the transitions without any assistance at least 5 times. During the real-time system evaluation testing, the subject was asked to sit and stand continuously. A total of 5 trials were conducted. In each trial, the subject was required to sit and stand at least five times, respectively. Rest periods were allowed between trials in order to avoid fatigue.
To evaluate the sensor trust algorithm, 13 trials of real-time disturbance detection testing were performed on the able-bodied subject. In each trial, motion artifacts were introduced randomly on one EMG electrode in each task phase for four times. To add motion artifacts, the experimenter tapped an EMG electrode with roughly same strength. Motion artifacts were introduced 159 times in the entire experiment.

Real-time Evaluation of EMG Pattern Recognition:
Four classes during the movement initiation phase were considered: sitting, sit-to-stand transition, standing, and stand-to-sit transition. Note that the classes of sitting and standing were not stationary because the subject was instructed to move the legs and shift the body weight in these positions. The output of the classifier was further combined into two classes (class 1: sitting and stand-to-sit transition; class 2: standing and sit-to-stand 23 transition). Four TD features defined in [12] and LDA-based classifier were used.
Overlapped analysis windows were used in order to achieve prompt system response.
For the real-time algorithm evaluation, 140ms window length and 80ms window increment were chosen. Two indicators were used to evaluate the real-time performance of EMG pattern classifier: classification accuracy and classification response time. Two types of classification response time were defined: the time delay (RT1) between the moment that the classification decision switched from sitting (0) and standing (1) and the moment that the gluteal region pressure changed from nonzero value (non-weight bearing sitting) to zero value (weight-bearing standing); the time delay (RT2) between the moment that the classification decision switched from standing (1) to sitting (0) and the moment that the gluteal region pressure changed from zero value (weight-bearing standing) to non-zero value (non-weight bearing sitting). The trust value of sensors will also be shown.

Algorithm Implementation on NMI Hardware System
The offline PR training algorithm, the real time PR testing algorithm, and the real time TM algorithm were all implemented on the NMI hardware described in the previous section. The window length and the window increment were set to 140ms and 80ms, respectively. This is because the computation speed of MPC5566 is limited.
It takes approximate 80ms to compute the EMG PR algorithm and to run the abnormal detection/trust evaluation algorithm on the data collected in a 140ms window.
Therefore, the window increment should be no less than 80ms. It was observed in our experiments that enlarging the window length exceeding 120ms does not affect the classification performance [10] but increases the decision-making time , which causes delayed system response.
A parallel algorithm specially tailored to the GPU architecture for the computation intensive part of the PR training algorithm was designed using CUDA: Compute Unified Device Architecture, which is a parallel computing engine developed by NVIDIA. At the time of this experiment, our GPU was not directly connected to the embedded MCU. Rather, NVIDIA 9500GT graphics card plugged into the PCI-Express slot of the PC server was used to do the training computation.
The training results were then manually loaded into the flash memory of the embedded system board for real time testing. The GPU took inputs from 7 EMG channels, each of which had about 10,000 data points. The EMG data were segmented into analysis windows with 140ms in length. As a result, each window contained a 140×7 matrix.
The training algorithm first extracted 4 TD features from each channel, producing a 28×1 feature vector for each window. Our parallel algorithm on the CUDA spawned 7 threads for each window resulting in totally 2,800 threads for 400 windows. All these threads were executed in parallel on the GPU to speed up the process. The resultant features were stored in a 28×W matrix, where W is the number of windows. The algorithm then set up K thread blocks, where K is the number of observed motions of the user. Each one of the K thread blocks had 28×14 threads, and a total of K×28×14 threads could execute simultaneously in parallel on the GPU architecture.
To demonstrate the speedup provided by our parallel implementation on the GPU, an experiment that compared the computation times of our training algorithm on both the GPU system and the fully equipped 3 GHz Pentium 4 PC server was conducted. (The results will be shown in Section 14.3.) The real time testing algorithm was implemented on Freescale"s MPC5566 evaluation board, integrating both the PR algorithm for user intent identification and the TM algorithm for sensor trust evaluation. The parameters of the trained PR classifier, a 28×4 matrix and a 1×4 matrix, calculated during the training phase by GPU were stored in the built-in flash memory on the MPC5566 EVB in advance. The ADCs sampled raw EMG data of 7 channels at the sampling rate of 1000 Hz 26 continuously. Same as in the training phase, the EMG data were divided into windows of length 140ms and increment 80ms. In every analysis window, 4 TD features were extracted for each individual channel. During the user intent identification process, a 28×1 feature vector was derived from each window and then fed to the trained classifier. After the EMG pattern classification, one movement class out of four was identified. The result was post-processed by the majority vote algorithm to produce a final decisionsitting or standing. During the sensor trust evaluation process, each EMG sensor was monitored by an individual abnormal detector. Only two of the four TD features (the mean absolute value and the number of slope sign changes) were used to detect motion artifacts (algorithm details in Section 1.2.3). Each abnormal detector monitored the changes of these two TD features to produce a status output for its corresponding sensor: normal or disturbed. A trust level manager then evaluated the trust level of individual sensor based on accumulated disturbance information.
In the real time embedded system design, to ensure smooth control of the artificial legs, precise timing control and efficient memory management are two challenges due to the speed and memory limitations of the embedded controller. We developed our own hardware management mechanism on the bare machine of the MPC5566 EVB without depending on any real time OS to avoid unpredictability and delay variations.
A circular buffer was designed to allow simultaneous data sampling and decision making. The circular buffer consisted of three memory blocks B1, B2 and B3 that were used to store the ADC sampling data. Each block stored the data sampled in one window increment (80 samples in this experiment). An additional memory block, B4, was used as a temporary storage during the computation of PR algorithm and TM 27 algorithm.
the ADCs begin to sample EMG signals continuously and the digital data are stored in B1. From is filled up and the in-coming data are stored in B2. At 2 t , the data for the first window W1 are available (stored in B1 and B2), and an interrupt request is generated to notify the CPU that the data is ready for computation and trigger the computation to start. At the same time, new data keep coming in to be stored in B3.

Real-Time Testing of the NMI Prototype
Using the NMI prototype described above, real time tests were carried out as described in Section 1.3.1. At the time of this experiment, our trust model focused on abnormal detection and the trust was evaluated at the sensor level. The communication between the trust manager and the classifier was not fully considered. Therefore, to better evaluate our system performance, a two-phase experiment was set up to evaluate the performance of pattern recognition and that of sensor trust management separately. For both phases, the subjects performed transitions between sitting and standing continuously. During the phase of PR evaluation, there was no motion artifact manually added. However, the subject"s unintentional movements and the movements between the residual limb and prosthetic socket were still a factor. The movement decisions made by the classification system were displayed on a LED light and a computer monitor in real time. In our experiment, a 5-window majority vote was applied to the decision stream to further eliminate the classification errors. During the phase of sensor trust evaluation, motion artifacts were manually introduced by 29 randomly tapping an EMG electrode with roughly equal strength. The sensor status and the sensor trust value were monitored and displayed on a computer monitor. The user intent classification results were ignored during this phase.

Real-Time Performance of Pattern Recognition
During the continuous real-time testing (more than 30 times sit-to-stand transitions and 30 times stand-to-sit transitions), all of the transitions between sitting and standing were accurately recognized. Although the subject moved the legs during the sitting position and shifted the body weight in the standing position, no classification error was observed.
The system classification response time (RT1 and RT2) was calculated by using the pressure data under the gluteal region and shown in Table 1.1.
The real-time performance of the designed NMI prototype in one representative trial is shown in Figure 1.5. Due to a 5-window majority vote method applied, around 400ms decision delay for the sit-to-stand transitions were observed in Figure 1.5, comparing to the falling edges of pressure data. It can be clearly seen that the majority Note: '+' represents the classification decision was made after the event (non-weight bearing sitting to weight-bearing standing); '-' means the classification decision was made before the event (weight-bearing standing to non-weight bearing sitting) vote post-processing method significantly improved system accuracy but sacrificed system response time. The video of real-time system performance can be found at http://www.youtube.com/watch?v=H3VrdqXfcm8.
Compared to the real-time testing results on one able-bodied subject (upper two photos in Figure 1.6) in our experiments [25], a similarly high classification accuracy and reasonable system response time were achieved on the patient with transfemoral amputation (lower two photos in Figure 1.6). The promising real-time performance of our designed NMI prototype demonstrates a great potential to allow the amputee patients to intuitively and efficiently control the prosthetic legs.    32 CUSUM detector was sensitive to motion artifacts, but insensitive to the muscle activity due to the normal leg movements. Additionally, the CUSUM had very small detection delay. The bars were always present immediately after a motion artifact. The lower figure shows the corresponding trust value. The trust value for motion artifacts gradually reduced when consistent disturbances were detected. In the future work, other methods for trust value calculation will be explored. For instance, for sensors with non-perfect trust values, it can be checked whether their future readings are consistent with other sensors that have high trust values. By doing so, the sensors that experienced an occasional disturbance and were not damaged can gradually regain the trust. Furthermore, the performance of CUSUM detector was evaluated by calculating its detection rate and false alarm rate. During the real time testing experiments, CUSUM detector achieved 99.37% detection rate and 0% false alarm rate. The undetected disturbances are disturbances with either small amplitude or short duration, so that a small number of such disturbances were not expected to affect the NMI decision significantly. The video of our experiment can be found at http://www.youtube.com/watch?v=6NwtMOw0YS0.

Real-Time Performance of Sensor Trust Algorithm
The designed CUSUM detector is accurate and prompt. The limitations of the current study are that only one electrode was disturbed and the trust manager evaluated the trust only at the sensor level. In the next design phase, we will (1) consider the situation with multiple sensor failures, (2) enable the communication between the trust manager and the classifier, and (3) evaluate the system-level trust of the entire NMI.  [22]. The same training algorithm takes less than a minute using our new parallel algorithm on the GPU. From an amputee user point of view, training for less than a minute for the purpose of accurate and smooth neural control of the artificial leg is fairly manageable as compared to training for half an hour every time training is necessary. Furthermore, the speedup increases as the number of windows increases ( Table 2). As a result, parallel computation of the training algorithm on GPU helps greatly in the NMI design since the larger the number of windows, the higher its decision accuracy will be [25].

Discussions
While our experiments on two subjects, one able-bodied subject and the other transemoral amputee subject have demonstrated promising results of the new NMI prototype, system performance may vary among different subjects due to the intersubject variations. One of our future research tasks is to recruit more amputee subjects with diverse age and gender groups to evaluate the performance of the new NMI In this presented study, only two simple tasks (sitting and standing) were tested.
To allow the prosthesis to perform more functions, more locomotion modes during the which was proposed in our previous study [10], may be required for the designed neural-machine interface structure to handle the walking dynamics.
Among all types of disturbances, motion artifact occurs most frequently in practice and therefore is the main disturbance studied in this paper. Besides motion artifact, there are also some other disturbances which should be considered in our future work, such as baseline noise amplitude change, signal saturation, sensor loss of contact, and etc. To handle these diverse disturbances, we may need to extend the work by (1) applying the abnormal detection on more EMG features and (2) modifying the trust manager so that the trust value is determined not only by how many times disturbances occur but also by more complex factors, such as disturbance type, duration, severe level and etc.

Related Work
Real-time EMG pattern recognition has been designed to increase the information extracted from EMG signals and improve the dexterity of myoelectric control for upper limb prosthetics [9,26]. However, no EMG-controlled lower-limb prostheses are currently available. Recently, the need for neural control of prosthetic legs has brought the idea of EMG-based control back to attention. Two previous studies have attempted to use EMG signals to identify locomotion modes for prosthetic leg control [10,27]. Jin et al. [27] used features extracted from EMG signals from a complete stride cycle. Using such features, the algorithm results in a time delay of one stride cycle in real-time. In practical application, this is inadequate for safe prosthesis use.
Our previous study designed a phase-dependent EMG pattern recognition method [10], which is a dynamic classifier over time. The result indicated over 90% classification accuracy, which can be applied for real time NMI. While both studies demonstrated that EMG information recorded from transfemoral amputees is sufficient for accurate identification of user intent, there has been no experimental study on design and implementation of embedded system to realize the NMI for reliable and real time control of prosthesis.
Reliable EMG pattern recognition system for artificial legs has been developed in our previous study [11]. It can enhance the system performance when sudden disturbances were applied to multiple sensors. In the previous work, however, the disturbances were generated through simulations and the algorithms were only tested offline [25]. The proposed algorithms in this paper, which were very different from the previous approaches, focused on real-time design with low detection latency, and were implemented and tested in a real-time embedded system.
There has been extensive research in using GPUs for general purpose computing (GPGPU) to obtain exceptional computation performance for many data parallel applications [23,[28][29][30]. A good summary of GPGPU can be found in [23,[31][32].
Our prior study made the first attempt to use GPU in EMG-controlled artificial legs and other medical applications [22]. Our results on individual computation components on EMG signal pattern recognition showed good speedups of GPU over CPU for various window sizes. The focus of the work reported in [22] was on parallel implementations of individual algorithms on GPU whereas this paper makes the first attempt to integrate the entire system for neural-machine interfacing (i.e. a CPS) for real time control of artificial legs. Our prior works [22] report offline analysis, while the work presented in this paper implements online decoding method for real-time testing. To the best knowledge of the authors, there has been no existing study on implementing the entire training algorithm on GPU for different numbers of windows and integrating the training algorithm together with real time testing on the same subject.

Conclusions
A new EMG-based neural-machine interface (NMI) for artificial legs was developed and implemented on an embedded system for real time operation. The NMI represents a typical cyber-physical system that tightly integrated cyber and physical systems to achieve high accuracy, reliability, and real-time operation. This cyberphysical system consists of (1) an EMG pattern classifier for decoding the user"s intended lower limb movements and (2) a trust management mechanism for handling unexpected sensor failures and signal disturbances. The software was then embedded in a newly designed hardware platform based on an embedded microcontroller and a GPU to form a complete NMI for real time testing. To prove our design concepts, a working prototype was built to conduct experiments on a human subject with transfemoral leg amputations and an able-bodied subject to identify their intent for sitting and standing. We also tested our trust management model on an able-bodied human subject by adding motion artifacts. The results showed high system accuracy, reliability and reasonable time response for real time operation. Our NMI design has a great potential to allow leg amputees to intuitively and efficiently control prosthetic legs, which in turn will improve the function of prosthetic legs and the quality of life 37 for patients with leg amputations. Our future work includes the consideration of other movement tasks such as walking on different terrains, communications between trust models and user intent identification models, and exploring online training algorithms.

List of References
[1] B. Potter and C. Scoville, "Amputation is not isolated: an overview of the us army amputee patient care program and associated amputee injuries," Journal of the American Academy of Orthopaedic Surgeons, vol. 14, p. S188, 2006.

Introduction
The technology of integrating human neuromuscular system and computer system to control artificial limbs has advanced rapidly in recent years. The key to the success of this advancement is the neural-machine interface (NMI) that senses neuromuscular control signals, interprets such signals, and makes decisions to identify user"s movement intent for prostheses control. Electromyographic (EMG) signals are effective muscle electrical signals for expressing movement intent for neural control of artificial limbs and can be acquired from electrodes mounted directly on the skin [1].
While EMG-based NMIs have been proposed and clinically tested for artificial arms [2][3], no EMG-controlled prosthetic legs are available because of two main reasons.
(1) Due to the highly non-stationary characteristics of leg EMG signals, accurate decoding of user intent from such signals requires dynamic signal processing strategies [4]. Furthermore, the accuracy in identifying user"s intended lower limb movement is more critical than upper limb movement. A 90% accuracy rate may be acceptable for artificial arm control, but is inadequate for safe use of artificial legs because any error may cause the user to fall.
(2) To realize the NMI that can be carried by leg amputees, compact integration of software and hardware on an embedded system is required. Our previous study has shown the potential of using a computer system to accurately decode EMG signals for neural controlled artificial legs [5]. A special (1) The accuracy of the neural deciphering algorithm increases as the window length increases, which in turn increases the computation complexity.
(2) The commodity embedded system can barely meet the requirement of real time testing for a very small window length. Increasing the window length for higher accuracy places overburden on the MCU.  Design of a parallel EMG pattern recognition algorithm specifically tailored to FPGA device to decode user intent in real time;  Demonstration of the feasibility of a self-contained, high performance, and low power real-time NMI for artificial legs.

Embedded System Architecture
The overall structure of our embedded system architecture for neural machine interface is shown in Figure 2

Pattern Recognition Algorithm
In order to decipher the intended lower limb movements using leg EMG signals, we need first extract EMG features from raw EMG signals and then employ an EMG pattern classifier to identify the movement modes. In this study, EMG time-domain (TD) features and a linear discriminant analysis (LDA) classifier [6] are adopted for our design.

EMG Feature Extraction
Several types of EMG features have been studied for EMG PR, such as timedomain features [6], frequency-domain features [7], and time-frequency features [8].
In our study, four time-domain (TD) features (the mean absolute value, the number of zero crossings, the waveform length, and the number of slope sign changes) are used.
In both training procedure and testing procedure, EMG signals are segmented into a series of analysis windows with a fixed window length and a window increment. In an analysis window, four TD features are extracted from the EMG signals for each channel. A

EMG Pattern Classification
Various classification methods have been applied to EMG PR, such as LDA [6], multilayer perceptron [9], Fuzzy logic [10], and artificial neural network [4]. In our 47 study, a LDA classifier is adopted to decode the movement class. The movement classes are expressed as ]) , , where G denotes the total number of classes.
Using the linear discriminant analysis, the observed data is classified to the movement class g C where the posteriori probability ) | ( f C P g can be maximized [11]. Given During the training procedure,  and

System Implementation
Our system implementation is based on Altera Stratix II GX EP2SGX90 FPGA.
The FPGA device is integrated on a PCI Express development board that has 64-Mbyte flash memory. The parallelism of FPGAs allows for high computational throughput even at a low clock rates. The flexibility of FPGAs allows for even higher performance by trading off precision and range in the data format. In this study, we have designed a special PR algorithm to make the best use of FPGA. The PR algorithm is implemented with the help of Impulse C C-to-HDL CoDeveloper software. To optimize the system performance, large tasks have been divided into small function processes. The processes can work in parallel unless there are data dependencies. The communications between different processes can be easily achieved using streams and signals. A stream is a dual-port first-in-first-out (FIFO) RAM buffer, where a producer process stores data and a consumer process loads data.
Each process can read data from and write data to multiple streams. Signals are used 49 to communicate status information from one process to another.
The hardware design and program implementation details are discussed in the following subsections.

Hardware Design
The hardware design of the embedded system is shown in Figure 2

Task Parallelism and Pipelining
The experimental results of our previous software implementation have shown that the processing of extracting features from multiple channels of EMG signals was the most computation-intensive task for both training and testing procedures. The feature extraction task accounted for more than 90% of the total execution time.
Fortunately, we have found that in the feature extraction process, each channel has the identical and independent procedure: sample and store EMG data, load data of one analysis window from memory, calculate the mean of data, subtract the mean from raw data, calculate four TD features and send these features to the LDA classifier. The parallelism of FPGAs enables all the EMG channels to be processed in parallel, rather than in sequence as in the software implementation. Therefore, if there are N EMG channels, N feature extraction threads will be generated.   53 series of sliding windows. In addition to the task parallelism among different channels, the feature extractions of subsequent windows can be efficiently pipelined. When the former analysis window has completed the stage of data fetching and moved to next stage to process the data, the following window can begin to fetch data from the memory.

Memory Utilization and Timing Control
In a real time embedded system design, precise timing control and efficient Because all the EMG channels are sampled and processed in parallel, each channel has its own buffer. The circular buffer of one channel consists of 1  Q memory blocks, where Q is the quotient of window length divided by window increment. Here we choose a proper window length and a window increment to make sure Q is an integer.
Each memory block stores the data sampled in one window increment.  The buffer is divided into 8 blocks. To fill up each block, it takes 20ms. To make the first decision, the system needs to wait for 140ms to get data for the first window fetched. In this way, the PR algorithm computation and the data sampling are operating in parallel. Since the execution time of PR algorithm is much shorter than 20ms, the system is sufficient to generate a decision every 20ms.

Mixed-Precision Operation
For high accuracy, most computations in LDA algorithm are based on floating point data format. Compared to floating point operations, fixed-point operations result in much higher performance for computation-intensive applications on FPGAs because of their reduced resource cost and lower latency [12]. However, lowerprecision operations may lose computation accuracy. To address this problem, mixedprecision operation is employed in our implementation. Since EMG signals are sampled by 16-bit ADCs, they can be represented by 32-bit fixed-point data types without loss of information. We analyzed the arithmetic operation types in the PR algorithm and found that most operations only involved EMG data summation, subtraction, and multiplication/division by integers, all can be handled by fixed-point data formats with careful tracking except for the computation of covariance matrix and inverse matrix. Therefore in our design, floating point data type is only used during the computations of covariance matrix and inverse matrix in the training procedure.

NMI Prototype
Based on the design described in the previous sections, we implemented a prototype board as shown in Figure 2   Overlapped analysis windows are used in order to gain timely system response. The window length and the window increment are set to 140ms and 20ms, respectively, with the EMG data sampling frequency of 1000Hz. The system clock of the FPGA NMI prototype is 100MHz. The resource utilization summary is presented in Table   2.1.

Experimental Settings and Results
Our NMI prototype contains a serial peripheral interface (SPI) module that is designed to interface with off board ADCs for data sampling. Since our current development board does not have enough IOs to interface with multiple EMG channels, we set up a two-phase experiment to evaluate the performance of our realtime control algorithm.

1) Evaluations of computation accuracy and speed:
To evaluate the computation 58 accuracy of our fixed-point computation on the FPGA implementation, our experiment was based on the same testing dataset as the software floating-point implementation.
The testing dataset was preloaded into the memory. The experiment results show that given a series of 400 analysis windows, the decision sequence of the FPGA system was exactly the same as the software implementation.
To generate one decision, the average execution time of our parallel implementation running on EP2SGX90 was 0.283ms. Compared this value with the performance of our real-time software implementation (80ms) based on Freescale"s MPC5566 132 MHz 32 bits microcontroller with Power Architecture, the new design using FPGA implementation gives a 280X speedup.

2) Evaluation of real time control:
Although there were no real EMG data sampled in, we assumed the circular buffer kept receiving new data. The PR module was triggered by a countdown counter every 20ms. Once triggered, the PR module began to fetch data from the circular buffer and process the data. A debugging software program was running on the Nios II processor. The result shows that a decision-out message was displayed on the Nios II software console every 20ms, which means that the real time decision algorithm sent out a decision every 20ms continuously. The result indicates a smooth and prompt real-time control. Once we move the design to a new board with enough IOs, the PR algorithm will be triggered by a signal as soon as one memory block of the circular buffer has been filled up (details in Section III). The decision making time interval is still 20ms. processor has a reported thermal design power of 35 W, which is about 10 times higher than our implementation.

Conclusions
This

Introduction
Neural-machine interface (NMI) is a typical example of biomedical cyber physical system (CPS) which utilizes neural activities to control machines. The neural signals collected from nerves, central neurons, and muscles contain a lot of important information that can represent human states such as emotion, intention, and motion. In such a CPS, a computer senses bioelectric signals from a physical system (i.e. human neural control system), interprets these signals, and then controls an external device, such as a power-assisted wheelchair [1], a telepresence robot [2], or a prosthesis [3][4][5], which is also a physical system.  [8]. This PR algorithm extracted neural information from limited signal sources and showed accurate classification (90% or higher accuracy) of seven locomotion modes when 7-9 channels of EMG signals were collected from able-bodied subjects and leg amputees. The performance of the phase-dependent PR strategy was further improved by incorporating EMG signals with mechanical signals resulting from forces/moments acting on prosthetic legs [9]. The experimental results showed that the classification accuracies of the neuromuscularmechanical fusion based PR algorithm were 2%-27% higher than the accuracies derived from the strategies using EMG signals alone, or mechanical signals alone [9].
The challenges on computational resources include tight integration of software and hardware on an embedded computer system that is specifically tailored to this CPS integrates all the necessary interfaces and control algorithms for interacting with the physical system in real-time.
The newly designed NMI was completely built and tested as a working prototype.
The prototype was then used to carry out real-time testing experiments on an ablebodied subject for classifying three movement tasks (level-ground walking, stair ascent, and standing) in real-time. The system performance was evaluated to demonstrate the feasibility of a self-contained and high performance real-time NMI for artificial legs.
Videos of our experiments on the human subject can be found at http://www.youtube.com/watch?v=KNhihjXProU.
This paper is organized as follows. Next section presents the overall system design. Section 3.3 describes the detailed implementation of the UIR algorithm. The experimental results are demonstrated in Section 3.4. We conclude our paper in Section 3.5.

68
The architecture of designed CPS is shown in Figure 3.1. The embedded NMI samples input signals from two physical systems--a human neuromuscular system and a mechanical prosthetic leg. The sampled signals are then processed to decipher user"s intent to control the prosthesis. The NMI consists of two modules: a data collection module built on an MCU with multiple on-chip ADCs for sensing and buffering input signals, and an FPGA device as the computing engine for fast data decoding and pattern recognition. A serial peripheral interface (SPI) is located between the two devices for transferring digitized input data from the MCU to the FPGA device.   continuously identify the user"s intended movements.

Architecture of the UIR Strategy
The architecture of the UIR strategy based on neuromuscular-mechanical information fusion and phase-dependent pattern recognition (PR) is shown in Figure   3.2. It is a self-contained architecture that integrates the functions of training and phasedependent pattern recognition in one embedded system. For every analysis window, features of EMG signals and mechanical signals are extracted from each input channel.
A feature vector is formed and normalized by fusing the features from all the input channels. The feature vector is then fed to the classifier for pattern recognition. The phase-dependent classifier consists of a gait phase detector and multiple classifiers.
Each classifier is associated with a specific gait phase. During the process of pattern recognition, the gait phase for current analysis window is first determined by the phase detector, and then the corresponding classifier is adopted to do the classification. In this study, four gait phases are defined: initial double limb stance (phase 1), single limb stance (phase 2), terminal double limb stance (phase 3), and swing (phase 4) [9]. The real-time gait phase detection is based on the measurements of the vertical ground reaction force (GRF) sampled from the 6-DOF load cell.
In the real-time embedded system design, to ensure a smooth control of artificial legs, precise timing control is necessary. Figure 3.3 shows the timing diagram of the control algorithm during the real-time UIR process. In the designed system, the MCU and the FPGA device collaborates to produce a decision at every window increment.
While the MCU is sampling data for window 1  i , the user intent recognition for window i , including the tasks of SPI data transfer, feature extraction, gait phase 72 detection, feature vector formation and normalization, and pattern recognition must be done within the window increment. In other words, the execution time of the UIR algorithm determines the minimum window increment. Larger window increments will introduce longer delay to the NMI decision, which may not be safe to control the prosthesis in real-time. Therefore fast processing speed is very critical to the embedded system design.   2) Pattern Recognition: In this study, linear discriminant analysis (LDA) is adopted for user intent classification because of its computational efficiency for realtime prosthesis control and the comparable accuracy to more complex classifiers [7].

Parallel Implementations on FPGA
Four gait phases are defined for recognizing user"s locomotion mode, giving rise to four LDA-based classifiers. Each classifier is trained for a specific phase. The details of the LDA algorithm can be found in Appendix 3A.
Most of the computations involved in the training algorithm are matrix operations.
Because a large amount of data need to be processed in the training procedure, the dimensions of the matrices can be very large. Only using on-chip memory is not enough to handle all the computations. External memory with large capacity is required to store the processing data. In our implementation, several external memory buffers are defined to store either large matrices during the training computations or data that might be reused in the online PR phase or the re-training phase. A process is designed to perform a simple task with a small block of data, such as a matrix row/column, and store partial results in the external memory. In this way, the operations of subsequent matrix rows/columns can be efficiently pipelined.
During online pattern recognition, based on the gait phase of current analysis window, the parameters of the corresponding classifier are loaded from memory. The observed feature vector derived from each analysis window is provided to the classifier for intent recognition.

Prototyping & Experimental Results
This study was conducted with Institutional Review Board (IRB) approval at our university and informed consent of subjects. To evaluate the performance of the designed NMI, two experiments with different purposes were conducted. First, to evaluate the classification accuracy and the computation speed of the FPGA-based PR algorithm, the performance of the FPGA implementation was compared with our previous software implementation by processing the same dataset offline. Secondly, to evaluate the performance of the entire CPS, a real-time test was carried out on a male able-bodied subject for identifying three movement tasks (level-ground walking, stair ascent, and standing).

Performance of FPGA vs. CPU
In order to verify the correctness of the FPGA-based PR algorithm and compare the performance of the FPGA design with our previous Matlab implementation, we processed the same dataset on both platforms. The testing dataset was previously collected from a male patient with transfemoral amputation (TF). Seven EMG channels recording signals from the gluteal and thigh muscles and six channels of mechanical forces/moments measured by a 6-DOF load cell were collected in this 77 dataset for identifying three locomotion modes including level-ground walking, stairs ascent, and stairs descent. The dataset was segmented by overlapped analysis windows.
The window length and the window increment were set to 160 data points and 20 data points, respectively. The dataset contained 936 analysis windows totally, where 596 of them were used as the training data and the rest 340 windows were testing data. The Matlab implementation was based on a PC with Intel Core i3 3.   Table 3.1 we can see that the FPGA implementation of the testing algorithm shows better performance than the training algorithm. This is because the testing algorithm only used fast on-chip memory while the computation complexity of the training algorithm required the FPGA to interact with the external memory. In our experiments it was observed that loading training data from external memory to the FPGA took more than half of the total execution time of the training algorithm. The summary of FPGA resource utilization is listed in Table 3.2. The designed NMI prototype was tested on one male able-bodied subject (   conducted. To evaluate the system performance of real-time intent recognition, we adopted the evaluation criteria as described in our previous study [13]. The testing data were separated into static states and transitional periods. The static state was defined as the state of the subject continuously walking on the same type of terrain (level ground and stair) or performing the same task (standing). A transitional period was the period when subjects switched locomotion modes. The purpose of the UIR system is to predict mode transitions before a critical gait event for safe and smooth switch of prosthesis control mode. In this study the critical timing was defined for each type of transition.

System Performance in Real-Time
For the transitions from standing to locomotion modes (level-ground walking and stair ascent), the critical timing was defined at the beginning of the swing phase (i.e. toeoff). For the transitions from locomotion modes to standing, the critical timing was the 81 beginning of the double stance phase (i.e. heel contact). The real time performance of our embedded system was evaluated by the following parameters. The overall classification accuracy in the static states across 10 testing trials for classifying level-ground walking, stair ascent and standing was 99.31%. For all the 10 trials, no missed mode transitions were observed within the defined transition period. the transitions from stair ascent to standing (SA→ST). This is because the subject could not perform foot-over-foot alternating stair climbing with a passive knee joint. In our experiments, the subject climbed stairs by lifting the sound leg on one step and then pulled up the prosthetic leg on the same step, which produced the same pattern as the mode transition from stair ascent to standing. Therefore the transition SA→ST was only able to be recognized after the subject was standing still. This problem will be eliminated by replacing the passive device with a powered knee in the near future.

Classification Accuracy in the Static
Wearing the powered knee, the prosthesis user is able to climb stairs foot-over-foot, which provides a very different pattern from the transition SA→ST. For the other three types of transitions (ST→W, W→ST, and ST→SA), the user intent for mode transitions can be accurately predicted 104-549 ms before the critical timing for switching the control of prosthesis. Figure 3.6 shows the real-time system performance

Conclusions
This paper presented the design and implementation of the first complete cyber physical system of neural machine interface for artificial legs. The new CPS implemented both training and testing modules on one single chip, and integrated all the necessary interfaces and control algorithms for identifying the user's intended locomotion modes in real-time. The designed NMI incorporated an MCU for sensing and buffering input EMG signals and mechanical signals, and an FPGA device as the computing engine for fast decoding and pattern recognition. A special parallel processing algorithm for UIR was designed and implemented that realized the neuromuscular-mechanical fusion based PR algorithm coupled with the real-time controlling algorithm on the FPGA. The FPGA implementation of the PR algorithm achieved a speedup of 7X over the Matlab implementation for the training phase, and a speedup of more than 30X for the testing phase with no sacrifice of computation accuracy. The designed NMI prototype was tested on an able-bodied subject for accurately classifying multiple movement tasks (level-ground walking, stair ascent, and standing) in real-time. The results demonstrated the feasibility of a self-contained and high performance real-time NMI for artificial legs. Our future work includes real-time testing of the designed NMI system on amputee subjects, using the NMI system to control powered prosthetic legs, studying management of power consumption, and increasing the system reliability.

Appendix 3A -Pattern Recognition Using Linear Discriminant Analysis
The principle of the LDA-based PR strategy is to find a linear combination of features which separates multiple locomotion classes ]) ,   Our previously developed locomotion-mode-recognition (LMR) system has provided a great promise to intuitive control of powered artificial legs. However, the lack of fast, practical training methods is a barrier for clinical use of our LMR system for prosthetic legs. This paper aims to design a new, automatic, and user-driven training method for practical use of LMR system. In this method, a wearable terrain detection interface based on a portable laser distance sensor and an inertial measurement unit (IMU) is applied to detect the terrain change in front of the prosthesis user. The mechanical measurement from the prosthetic pylon is used to detect gait phase. These two streams of information are used to automatically identify the transitions among various locomotion modes, switch the prosthesis control mode, and label the training data with movement class and gait phase in real-time. No external device is required in this training system. In addition, the prosthesis user without assistance from any other experts can do the whole training procedure. The pilot experimental results on an able-bodied subject have demonstrated that our developed new method is accurate and user-friendly, and can significantly simplify the LMR training system and training procedure without sacrificing the system performance. The novel design paves the way for clinical use of our designed LMR system for powered lower limb prosthesis control.

Introduction
Myoelectric (EMG) pattern recognition (PR) has been widely used for identifying human movement intent to control prostheses [1][2][3][4]. The PR strategy usually consists of two phases: a training phase for constructing the parameters of a classifier and a 90 testing phase for identifying the user intent using the trained classifier. Our previous study has developed a locomotion-mode-recognition (LMR) system for artificial legs based on a phase-dependent PR strategy and neuromuscular-mechanical information fusion [3,5].
The LMR system has been tested in real-time on both able-bodied subjects and lower limb amputees. The results have shown high accuracies (>98%) in identifying three tested locomotion modes (level-ground walking, stair ascent, and stair descent) and tasks such as sitting and standing [5][6].
One of the challenges for applying the designed LMR system to clinical practice is the lack of practical system training methods. A few PR training methods have been developed for control of upper limb prosthesis, such as screen-guided training (SGT) [7][8], where users perform muscle contractions by following a sequence of visual/audible cues, and prosthesis-guided training (PGT) [9], where the prosthesis itself provides the cues by performing a sequence of preprogrammed motions. However, neither SGT nor PGT can be directly adopted in the training of LMR system for lower limb prostheses because the computer and prosthesis must coordinate with the walking environment to cue the user to perform locomotion mode transitions during training data collection. Currently the training procedure for the LMR system is time consuming and manually conducted by experts. During the training procedure, experts cue the user's actions according to the user"s movement status and walking terrain in front of the user, switch the prosthesis control mode before the user steps on another type of terrain, and label the collected training data with movement class manually using an external computer. Such a manual approach significantly challenges the clinical value of LMR because usually the experts are not available at home.
To address this challenge, this paper aims to design an automatic and userdriven training method for the LMR system. The basic idea is replacing the expert with a smart system to collect and automatically label the training data for PR training.
Our design significantly simplifies the training procedure, and can be applied anytime and anywhere, which paves the way for clinical use of the LMR system for powered lower limb prosthesis control.

Automatic Training Method
The LMR system for artificial legs is based on phase-dependent pattern classification [4][5], which consists of a gait phase detector and multiple sub-classifiers corresponding to each phase. In this study, four gait phases are defined: initial double limb stance (phase 1), single limb stance (phase 2), terminal double limb stance (phase 3), and swing (phase 4) [5]. In the LMR system training, the EMG signals and mechanical forces/moments are the inputs of the LMR system [4] and segmented by overlapped analysis windows. For every analysis window, features of EMG signals and mechanical measurements are extracted from each input channel and concatenated into one feature vector. The feature vector must be labeled with the correct movement class and gait phase to train individual sub-classifiers.
The previous approach labels the training data with locomotion mode (class) and gait phase by an experimenter manually. In this design, we replace the experimenter by a smart system that (1) automatically identifies the transition between locomotion modes based on terrain detection sensors and algorithms, (2) switches the control mode of powered artificial legs, (3) labels the analysis windows with the locomotion mode (class index) and gait phase, and (4) trains individual sub-classifiers. Our designed system consists of three parts: a gait phase detector for identifying the gait phase of the current analysis window, a terrain detection interface for detecting the terrain in front of the user, and a labeling algorithm to label the mode and gait phase of current data.
To label every analysis window with locomotion mode (class index), it is important to define the timing of mode transition. The purpose of the LMR system is to predict mode transitions before a critical gait event for safe and smooth switch of prosthesis control mode. Our previous study has defined this critical timing for each type of mode transition [4,6]. In order to allow the LMR system to predict mode transitions before the critical timings, the transition between locomotion modes is defined to be the beginning of the single stance phase (phase 2) immediately prior to the critical timing during the transition [4].

1) Gait Phase Detection:
The real-time gait phase detection is implemented by monitoring the vertical ground reaction force (GRF) measured from the 6 DOF load cell. The detailed algorithm can be found in [5].
2) Terrain Detection Interface: Because the sequence of the user"s locomotion mode in the training trials is predefined, the goal of the terrain detection interface is not to predict the unknown terrain in front of the subject, but to detect the upcoming terrain change in an appropriate range of distances to help identify the transition between the locomotion modes. prosthesis user's waist as suggested in [10], because this sensor configuration has been demonstrated to provide stable signals with very small noises and good performance for recognizing terrain types. Before the training procedure starts, a calibration session is conducted first to measure a few parameters for later use in the training process.
During calibration, the user walks at a comfortable speed on the level-ground for about 30 seconds. The average vertical distance from the laser sensor to the level ground ( H ) and the average step length ( L S ) of the user are measured.
Three types of terrains have been investigated in this study, including terrains that are above current negotiated terrain (upper terrain), terrains with the same height as the current terrain (level terrain), and terrains that are below the current terrain (lower terrain). The terrain types can be discriminated by a simple decision tree as shown in     3) Labeling Algorithm: The gait phase of every analysis window is directly labeled with the output of the gait phase detector. On the promise of detecting all the terrain alterations in the required ranges by the terrain detection interface, the transition point between locomotion modes is identified as the first analysis window in phase 2 immediately after the expected terrain change is detected. The transition point from standing to locomotion modes can be automatically identified without the information of terrain type, which is the first analysis window immediately after toeoff (the beginning of phase 2). For two consecutive movement modes, the analysis windows before the transition point are labeled with the former movement class, and the windows after the transition point are labeled with the latter movement mode.

Participant and Measurements
This study was conducted with Institutional Review Board (IRB) approval at the University of Rhode Island and informed consent of subjects. One male able-bodied 96 subject was recruited. A plastic adaptor was made so that the subject could wear a prosthetic leg on the right side.
Seven surface EMG signals were collected from the thigh muscles on the subject's right leg including adductor magnus (AM), biceps femoris long head (BFL), biceps femoris short head (BFS), rectus femoris (RF), sartorius (SAR), semitendinosus (SEM), and vastus lateralis (VL). The EMG signals were filtered between 20 Hz and 450 Hz with a pass-band gain of 1000. Mechanical ground reaction forces and moments were measured by a 6 degree-of-freedom (DOF) load cell mounted on the prosthetic pylon. A portable optical laser distance sensor and an inertial measurement unit (IMU) were placed on the right waist of the subject. The laser distance sensor could measure a distance ranging from 300 mm to 10000 mm with the resolution of 3 mm. The EMG signals and the mechanical measurements were sampled at 1000 Hz.
The signals from the laser sensor and the IMU were sampled at 100 Hz. The input data were synchronized and segmented into a series of 160 ms analysis windows with a 20 ms window increment. For each analysis window, four time-domain (TD) features (mean absolute value, number of zero crossings, waveform length, and number of slope sign changes) were extracted from each EMG channel [11]. For mechanical signals, the maximum, minimum, and mean values calculated from each individual DOF were the features. Linear discriminant analysis (LDA) [12] was used as the classification method for pattern recognition. The system were implemented in Matlab on a PC with 1.6GHz Xeon CPU and 2GB RAM.

Experimental Protocol
In this study, four movement tasks (level-ground walking (W), stair ascent (SA), stair descent (SD), and standing (ST)), and five mode transitions (ST→W, W→SA, SA→W, W→SD, and SD→W) were investigated. An obstacle course was built in the laboratory, consisting of a level-ground walk way, 5-step stairs with the height of 160 mm for each step, a small flat platform, and an obstacle block (300 mm high and 250 mm wide). and walking and turning on the platform. These movements were labeled as "not included" (NI) mode.
After training, ten real-time testing trials were conducted to evaluate the performance of the LMR system. Each trial lasted about one minute. All the investigated movement tasks and mode transitions were evaluated in the testing session.

Results & Discussions
In the training trial, the subject took about 225 seconds to complete all the movement tasks. After the subject finished all the tasks, only 0.11 second was further spent to train the classifiers. All the terrain alterations were accurately identified and all analysis windows were correctly labeled. were accurately identified at the beginning of phase 2 during the transition cycle. All movement tasks were labeled with the correct class modes.
The overall classification accuracy across 10 real-time testing trials was 97.64%.
For all the 10 trials, no missed mode transitions were observed. The user intent for mode transitions was accurately predicted 103-653 ms before the critical timing for switching the control of prosthesis. The results indicate that the LMR system using automatic training strategy provides a comparable performance with the system using previous training method.

Conclusions
In this paper, an automatic, user-driven training strategy has been designed and implemented for classifying locomotion modes for control of powered artificial legs.
The smart system can automatically identify the locomotion mode transitions based on a terrain detection interface, switch the prosthesis control mode, label the training data with correct mode (i.e. class index) and gait phase in real-time, and train the pattern classifiers in LMR quickly. The preliminary experimental results on an able-bodied subject show that all the analysis windows in the training trial were correctly labeled  The user only needs to perform all the movement tasks, and the training will be immediately done.
Experimenter driven: The user needs to follow the guidance from the experimenter.
The user needs to pause and wait when the experimenter is processing the data.
The way to switch the prosthesis control mode Automatic switch; Driven by user's motion Manual switch; Controlled by experimenter Abstract EMG pattern classification has been widely studied for decoding user intent for intuitive prosthesis control. However, EMG signals can be easily contaminated by noise and disturbances, which may degrade the classification performance. This study aims to design a real-time self-recovery EMG pattern classification interface to provide reliable user intent recognition for multifunctional prosthetic arm control. A novel self-recovery module consisting of multiple sensor fault detectors and a fast LDA classifier retraining strategy has been developed to immediately recover the classification performance from signal disturbances. The self-recovery EMG pattern recognition (PR) system has been implemented on an embedded system as a working prototype. Experimental evaluation has been performed on an able-bodied subject in real-time to classify three arm movements while signal disturbances were manually introduced. The results of this study may propel the clinical use of EMG PR for multifunctional prosthetic arm control.

Introduction
Electromyographic signal (EMG) pattern recognition (PR) is a widely used method for classifying user intent for neural control of artificial limbs [1][2][3]. However, unreliability of surface EMG recordings over time is a challenge for applying the EMG pattern recognition controlled prostheses for clinical practice. Motion artifacts, environmental noises, sensor location shifts, user fatigue, and other conditions may all cause changes in the EMG characteristics and thus lead to inaccurate identification of user intent and threaten the prosthesis control reliability and user safety [4][5].
Several strategies have been developed to address this challenge in order to make artificial limb control based on EMG PR clinically viable. Sensinger et al. [5] employed adaptive pattern classifier to cope with variations in EMG signals for reliable EMG PR. Tkach et al. [6] investigated different EMG features and suggested several time-domain features that were resilience to EMG signal change caused by muscle fatigue and exerted force levels. Hargrove et al. [7] suggested a new EMG PR training procedure in order to accommodate EMG electrode shift during prosthesis use.
Our research group developed a unique, reliable EMG pattern recognition interface, consisting of sensor fault detectors and a self-recovery mechanism. The sensor fault detectors monitor the recordings from individual EMG electrodes; the self-recovery mechanism will remove the faulty EMG signals from the PR algorithm to recover the classification accuracy [8][9][10]. It was observed that the EMG classification performance was not significantly affected by the removal of one or two EMG signals from redundant EMG recordings [2,8]. Our new EMG-PR interface could salvage system performance by up to 20% increased classification accuracy when one or more EMG signals were disturbed [8].
Despite the promise of our design concept showed in our previous study, the algorithm development and validation were tested offline. In order to implement this concept in real-time, especially in a wearable embedded system, several challenges still exist. First, the recovery strategy involves retraining of the pattern classifier.
Currently this procedure involves reorganization of training feature matrix, computation of parameters in the pattern classifiers, and reorganization of testing feature vectors. Whether or not the embedded system can handle this procedure quickly for each decision-making is unknown. Secondly, since more components are included in the EMG PR algorithm, communication among components and precise timing control is crucial. Finally, a compact integration of all the components in an embedded computer is required. The system needs to provide necessary interfaces for data collection, adequate computing power for real-time decision making, efficient memory management, and low power consumption. All these challenges have never been explored. This paper presents the first real-time self-recovery EMG pattern recognition interface for artificial arms. A novel self-recovery scheme with a fast and efficient retraining algorithm based on linear discriminant analysis (LDA) has been developed.
The self-recovery EMG pattern recognition system was implemented on an embedded computer system as a working prototype. The prototype was preliminarily evaluated on an able-bodied subject in real-time in classifying three arm movements while motion artifacts were manually introduced by randomly tapping the EMG electrodes.
The results of this study may propel the clinical use of EMG PR for multifunctional prosthetic arm control.

System Structure
The overall structure of the self-recovery EMG pattern recognition interface is shown in Figure 5.1. The system seamlessly integrates EMG pattern recognition with the self-recovery module. Multiple channels of EMG signals segmented by overlapped sliding analysis windows are the system inputs. In each window, four time-domain (TD) features (mean absolute value, number of zero crossings, waveform length, and 107 number of slope sign changes [11]) of the EMG signals are extracted from each input channel and fed to the self-recovery module. The sensor fault detectors closely monitor the key features of each EMG signal to detect disturbances. Based on the detection results, the EMG features extracted from "normal" channels are concatenated into a feature vector as the input for pattern classification. If no disturbance is detected, the feature vector is directly sent to the classifier generated from the original training data. If one or more signals are determined as "abnormal", the fast LDA retraining process is triggered and the reduced feature vector is fed to the new classifier for pattern recognition.

Fast LDA-based Retraining Algorithm
Previously the lack of a fast and efficient retraining algorithm was the most critical challenge to the design of a real-time self-recovery EMG PR interface. If the In the previous retraining strategy [8], after the initial training process is done, the original EMG feature matrix is stored in the memory for later use in the retraining process. During the retraining procedure, for each class, a new EMG feature matrix has shown that the calculation of  ' is the most computational intensive task in the retraining procedure, which accounts for more than 90% of the total processing time.
This is because for each class, a large amount of analysis windows are collected as the training data. The number of columns in ' g F may vary from several hundreds to a few thousands, which leads to intensive numerical operations in calculating  '.
Fortunately, after closely analyzing the details of the LDA training algorithm, we have found that the calculation of  ' and ' g  can be avoided in a smart way. The trick is, instead of the large feature matrix g F , only g  and  are stored in the memory after the initial training process is finished.  ' and ' g  can be easily retrieved from  and g  . Figure 5.2 shows an example of the retrieving process if a single EMG channel is detected to be "abnormal". Assume there are totally 6 EMG channels. Each element in the mean vector is calculated by averaging one specific feature row in g F .
Therefore ' g  can be obtained by taking off the four elements that are associated with 110 the disturbed EMG channel from g  .  ' is constructed by removing the corresponding rows and columns associated with the disturbed channel from  and then merging the remaining four small matrices ( 1 B , 2 B , 3 B , and 4 B in Figure 5.2).
If multiple EMG signals are disturbed,  ' and ' g  can be obtained by doing the retrieving process repeatedly. Compared with the previous retraining algorithm which requires intensive numerical operations and a large memory space, the new strategy dramatically accelerates the retraining speed and is much more memory efficient.

Sensor Fault Detection
To detect individual EMG sensor abnormalities, various signal processing methods have been applied to sensor fault detection [8][9][10]. A detector based on Bayesian decision rule [8] has been proposed for accurately detecting three types of simulated distortions including EMG signal drift and saturation, additional noise in the signal, and variation of EMG magnitude. An abnormality detector using Cumulative Sum (CUSUM) algorithm [9] has been developed to closely monitor the changes of EMG features for detecting sudden changes or gradual changes in EMG signals.
In this study, the CUSUM detector is adopted in our implementation because of its computational efficiency for real-time processing, its high accuracy, and low false alarm rate in detecting motion artifacts [9][10]. Two EMG features including mean absolute value and number of zero crossings are monitored to recognize abnormal changes. Detailed algorithms of the CUSUM detector can be found in [9].

Real-Time Embedded System Implementation
A preliminary prototype of the self-recovery EMG pattern recognition system was implemented on Gumstix Overo Air, an ARM Cortex-A8 OMAP3503 based computer-on-module (COM), and RoboVero, an expansion board with an ARM Cortex-M3 microcontroller and eight 12-bit analog-to-digital converters (Fig. 3). The Overo COM communicates with the RoboVero expansion board via two 70-pin connectors as shown in Figure 5.3. The system implementation consists of two parts: the microcontroller on the RoboVero expansion board for data sampling and dispatching, and the Cortex-A8 processor on the Overo COM for EMG pattern recognition.

Experimental Protocol
This study was conducted with Institutional Review Board (IRB) approval at the The training session was conducted first to collect the training data and build the original classifier. The subject was instructed to perform one movement for about 4 seconds in one trial. For each movement task, three separate trials were collected.
After the training process was done, the parameters of the generated classifier, as well as the mean vector for each class and the common covariance matrix were saved in the memory for later use in the testing session.
In the real-time testing session, for each movement task, the subject performed the movement for about 4 seconds in four separate trials. Totally 12 testing trials were conducted. In every trial, motion artifacts were manually introduced by randomly tapping the EMG electrodes with roughly equal strength. In the preliminary experiment, we only tapped one electrode at a time. To better evaluate the performance of our self-recovery module, two types of classification decisions with and without the self-recovery module were compared in every analysis window.
In addition, an offline evaluation was conducted to compare the performance between our fast LDA retraining algorithm and the previous retraining strategy [8,10] by processing the same dataset collected in the real-time testing session. retraining algorithm was two orders of magnitude (118 times) faster than the previous retraining strategy and meanwhile only consumed less than 1% of the memory usage of the old strategy. Furthermore, our fast retraining algorithm only took less than 1 ms to generate the new classifier. This result makes it possible for the system to extract EMG features, detect signal disturbances, retrain the classifier, perform pattern recognition, and produce a decision seamlessly in a sequence within the duration of

System Performance in Real-Time
In the 12 real-time testing trials, totally 48 motion artifacts were introduced, among which 43 were recognized by the CUSUM detector and 20 caused miss classifications if our self-recovery was not used. All the disturbances that led to classification errors were successfully detected. The undetected disturbances were those with either small amplitude or short duration, which did not affect the classification performance. Without the self-recovery module, there were 277 miss classifications observed among 5993 decisions. All these errors were caused by motion artifacts. Our self-recovery module eliminated 259 of them, resulting in a 93.5% recovery rate. The results of the experiment have shown the promise of a robust, reliable, and efficient real-time EMG pattern recognition interface for artificial arms.

Conclusion
This paper presented a real-time self-recovery EMG pattern recognition interface for artificial arms. The system seamlessly integrated EMG pattern recognition with a self-recovery module that could detect signal disturbances, retrain the classifier, and perform reliable pattern classification in real-time. A novel fast and efficient LDAbased retraining algorithm was developed and demonstrated the ability to immediately recover the classification performance from motion artifacts. The self-recovery EMG