EXPLORATORY ANALYSIS OF THE ACCURACY OF BIOSENSOR IN WEARABLE DEVICE

The new wave of wireless technologies, fitness trackers, and body sensors have had a great impact on personal biometric tracking and monitoring. These technologies make a great contribution to personal health care, and can even be used in clinical settings. Among all of these devices, smartwatches are one of the most popular, and are becoming increasingly common among the general public. Commercially available smartwatches incorporate sophisticated algorithms and multi-sensor technologies, which are capable of providing users with real-time biometrics. Some of these sensors include a photoplethysmography (PPG) sensor that detects the wearer’s heart rate, Galvanic skin response sensors which can provide skin surface information, and an accelerometer which can be used to provide activity and movement information. When considering clinical applications, researchers find the smartwatch’s PPG sensor to be of most interest, as heart rate is one of the most important vitals that are monitored for clinical purposes. Heart rate can be used to detect and prevent serious diseases, such as cardiovascular diseases and seizures. However, the accuracy of PPG sensors still needs thorough investigation. Although the ability of wearable PPG sensors to reliably measure heart rate in regular movement (i.e. walking or jogging) has been demonstrated in previous research, there doesn’t exist premier research that focuses on the accuracy of a PPG sensor in daily activities, such as brushing one’s teeth, cooking, or vacuuming. These activities are of interest because they involve short periods of high frequency vibrations or intense wrist movements, which could affect the smartwatch’s heart rate calculation. To validate the relative accuracy of a smartwatch’s PPG sensor in these activities, a Microsoft Band (MB) and a Huawei Android smartwatch (HW) were used to conduct a series of experiments from which the heart rate signals were gathered and evaluated. Six participants were recruited to collect data from these two smartwatches, which involved completing a set of three daily activities under a specific protocol. The participants completed these sets of activities twice, giving us enough data to compare the collected heart rate between the two watches. Each activity was further divided into different stages, including the Rest Stage, Dominant Hand Active Stage (D-Active Stage), and Non Dominant Hand Active Stage (N-Active Stage). The heart rate differences between each watch during the same activity and the same stage of all activities were evaluated. We also investigated how relative heart rate accuracy was affected by skin tone, and if we could tell which hand the watch was being worn, being the user’s dominant or non dominant hand. During the experiment, each subject wore a MB and a HW on the wrist of their dominant hand. Care was taken to follow proper wear guidelines as suggested for each device in order to collect the most reliable data possible. Each participant did a series of timed activities including cutting vegetables, electric tooth brushing, and walking along a given route. The participant was asked to follow timed instructions from the experiment instructor. The heart rate measurements of the two devices were stored in separate CSV files in their Bluetooth-connected smartphones to be processed for further analysis. After a close examination of the experiment’s results, the vegetable cutting activity showed the largest heart rate differences among two devices, and the Dominant Hand Active Stage of cutting vegetables had the largest heart rate difference. Among all three test cases, electric tooth brushing shows the smallest heart rate difference in both the rest and active stages, which indicated that the influence of high frequency vibration is smaller than the magnitude of movement. Statistical results show that the user’s relative heart rate accuracy will be affected by daily activities even when a smartwatch is being worn on their non dominant hand. However, the influence is much smaller than if the watch is worn on the wrist of the user’s dominant hand. Furthermore, the skin tone of the participant also shows some effect on the relative accuracy of optical heart rate sensor as well. Based on the findings of these experiments, we discovered that a further exploration of the heart rate anomaly detection algorithm is required. This algorithm was used to identify the anomaly in the smartwatch’s heart rate measurement while the user was completing an activity. The heart rate from the MB was compared with a pulse oximeter in order to tune the parameters of the anomaly algorithm. Data received from a separate test stage showed that the anomaly detection algorithm with tuned parameters can detect most of the heart rate anomalies identified by an examination of the heart rate signals.

Android smartwatch (HW) were used to conduct a series of experiments from which the heart rate signals were gathered and evaluated. Six participants were recruited to collect data from these two smartwatches, which involved completing a set of three daily activities under a specific protocol. The participants completed these sets of activities twice, giving us enough data to compare the collected heart rate between the two watches. Each activity was further divided into different stages, including the Rest Stage, Dominant Hand Active Stage (D-Active Stage), and Non Dominant Hand Active Stage (N-Active Stage). The heart rate differences between each watch during the same activity and the same stage of all activities were evaluated. We also investigated how relative heart rate accuracy was affected by skin tone, and if we could tell which hand the watch was being worn, being the user's dominant or non dominant hand.
During the experiment, each subject wore a MB and a HW on the wrist of their dominant hand. Care was taken to follow proper wear guidelines as suggested for each device in order to collect the most reliable data possible. Each participant did a series of timed activities including cutting vegetables, electric tooth brushing, and walking along a given route. The participant was asked to follow timed instructions from the experiment instructor. The heart rate measurements of the two devices were stored in separate CSV files in their Bluetooth-connected smartphones to be processed for further analysis. After a close examination of the experiment's results, the vegetable cutting activity showed the largest heart rate differences among two devices, and the Dominant Hand Active Stage of cutting vegetables had the largest heart rate difference. Among all three test cases, electric tooth brushing shows the smallest heart rate difference in both the rest and active stages, which indicated that the influence of high frequency vibration is smaller than the magnitude of movement. Statistical results show that the user's relative heart rate accuracy will be affected by daily activities even when a smartwatch is being worn on their non dominant hand. However, the influence is much smaller than if the watch is worn on the wrist of the user's dominant hand. Furthermore, the skin tone of the participant also shows some effect on the relative accuracy of optical heart rate sensor as well.
Based on the findings of these experiments, we discovered that a further exploration of the heart rate anomaly detection algorithm is required. This algorithm was used to identify the anomaly in the smartwatch's heart rate measurement while the user was completing an activity. The heart rate from the MB was compared with a pulse oximeter in order to tune the parameters of the anomaly algorithm. Data received from a separate test stage showed that the anomaly detection algorithm with tuned parameters can detect most of the heart rate anomalies identified by an examination of the heart rate signals.
ix  Table 4. Results of heart rate data of subject 3 for same stage of different activities. 22

LIST OF TABLES
x LIST OF FIGURES  show good validity, with the criterion device and wearable device having the potential to overcome the limitations of the traditional chest strap. (Fukushima, H., 2012) provided a heart rate estimation by using a wrist-type photoplethysmography (PPG) sensor while their subject was running. An algorithm that estimated heart rate from the PPG sensor was proposed in the study. The algorithm utilized the built in accelerometer to gain knowledge of the subject's body motion and arm position to improve the heart rate accuracy. Two components were used in their method. One of which was rejecting artifacts with the power spectrum's difference between PPG and acceleration obtained by frequency analysis. The other was the reliability of heart rate estimation defined by the acceleration. Results had shown that the heart rate from a PPG sensor had a higher degree of usability compared to existing methods using ECG.
Similar for all three studies, these studies failed to provide analysis of the facts that influence the accuracy of heart rate. Also, the studies only involved exercise movement. The analysis of daily activities was never involved in any of these works.
In addition to previous studies, (Kroll R.R., Boyd JG, 2016 and Kroll, R.R.; McKenzie, E.D., 2017) and (Pelizzo,G., Guddo,A. Aurora P, 2018)  proposed a new framework for anomaly detection in medical wireless sensor networks which is based on the Mahalanobis distance for spatial analysis, and a kernel density estimator for the identification of abnormal temporal patterns.
One problem with this technique is its high dependency on the predefined threshold of MD. An appropriate threshold is quite difficult to figure out, and a single threshold may also not be suitable for outlier detection in multidimensional data.
According to a statement in the work, the proposed framework can update the statistical parameters and obtain more a precise evaluation of the normal state of the patient. According to the experiment, the proposed approach can achieve good detection accuracy with a low false alarm rate (lower than 5.5%) on both real systems Wu, Y. S. Syu, 2012) introduced an anomaly detection algorithm which took the advantage of the regularity of ECG to detect ECG anomaly. The proposed method could explore the intrinsic signal structure and represent the ECG segments on a low dimensional space. The normal ECG segments will constitute a manifold, and the anomaly could be detected automatically. However, this method is focused on the regularity of ECG signals rather than the heart rate measured in a given period.
Furthermore, (M. Haescher, D. J. C. Matthies, 2015) conducted a study using a smartwatch as a wearable device to detect anomaly activities of three different scenarios. These scenarios are the detection of sleep apnea, the detection of epileptic seizures, and the detection of accidents such as falling or car crashes. This study presents how to use a smartwatch as a base device to detect abnormal activities rather than to identify abnormal measurement of heart rate date from smartwatches.

ANALYSIS OF LITERATURE
An important observation to note is how the heart rate accuracy is evaluated throughout the various publications. In previous research, the accuracy of an optical heart rate sensor was validated either during designed exercises or by analyzing a hospital patient in stable and calm conditions. There is no publication research that examined how a wearable device's optical sensor performs in an individual's daily life, and how the accuracy of heart rate calculations can be affected by daily activities. Table 1 shows a summary of current research and their lack of data on the heart rate accuracy of wearable devices in daily activities.
To further investigate how wearable devices can be used in health monitoring or early discovery of certain diseases, the performance of wearable devices in daily activities needs to be further evaluated.
For further exploration, we also reviewed various anomaly detection algorithms used in the biosensors of wearable devices. In this thesis, we use CUSUM to explore the possibility of detecting the heart rate abnormalities measured in daily activities.

Experiment procedure and participants
This prospective experiment recruited six healthy adults between the ages of 22 and 50. Each subject was educated with the procedures first, and then underwent the same procedure through direct verbal communication. Participants gave verbal informed consent to three different activities which are cutting vegetables, tooth brushing with an electric toothbrush, and walking. These activities were chosen to represent the most common everyday activities, each with a differing amounts of movement. Cutting vegetables is a normal daily activity, but the movement is quite intense compare to tooth brushing and walking. For tooth brushing, we had the subjects use an electric toothbrush because of its high frequency vibration. The vibration involved can represent a set of daily activities that use electric appliances, such as vacuuming or shaving with an electric razor. Walking is the most common and moderate daily activity, and it can represent almost every activity that no intense movement involved. An instructor timed each activity and gave corresponding instructions to the participant throughout the whole experiment. Biosensor data was streamed to a phone application and stored in separated files automatically once the experiment started. 13 Each participant wore both smartwatches on the wrist of their dominant hand to get the same level of intensity of movement for each activity. Care was taken to follow the proper user guidelines as suggested by the manufacturer for each device. This is necessary to help make sure that the smartwatch was tight enough to steadily hold the optical heart rate sensor onto the subject's wrist. For the vegetable cutting and tooth brushing activities, data was collected twice from both the dominant hand and non dominant hand. These activities started with a one minute rest period, which is followed by a one minute active period with the given activity performed with the subject's dominant hand. This is followed by a half minute rest period followed by a

Pre-process and metrics analysis
Time synced sensor data from each device was concurrently and continuously acquired by two separate smartphones and their corresponding applications. The reporting rate for each smartwatch differed from one another. For the Microsoft band, the heart rate reporting rate is one data point per second. The 3-axis accelerometer data was captured eight times per second, and the Galvanic skin response data was captured five times per second. For Huawei smartwatch, the heart rate was was only reported when the value changed. This data was stored in the smartwatch's corresponding smartphone, as spreadsheets and further process was required to align the data and resample them to same reporting rate.

15
To identify the heart rate difference between the active and rest stages, we divided the data of each test case into three segments, which are rest segment, To evaluate the heart rate difference between two devices, four levels of metric analysis were implemented to give quantity analysis of various metrics.
1. The Student's T test compares the two averages and tells whether they are different from each other. The T test also indicates how significant the differences are.
The larger the t score, the more of a difference there is between groups. Therefore, a large t-score indicates that the groups are more different from each other, while a small t-score indicates that the groups are similar. We explored the t-scores of the heart rate measurements from the Microsoft Band and the Huawei Smartwatch to examine the similarity of relative accuracy between the two optical sensors.
2. Root Mean Square Error (RMSE) is the standard deviation of the residuals, which are measures of how far from the regression line data points are.
RMSE is a measure of how spread out these residuals are. In other words, it tells you how concentrated the data is around the line of best fit.
3. The mean absolute differential between the heart rate of both the MB and the HW represented the average difference score regardless of direction of the difference.
4. The Bland-Altman method was used to further assess the agreement between the two devices for heart rate measurements and whether the difference varied in a systematic or ambiguous way over the rage of measurements. The Bland-Altman calculates the mean difference between two methods of measurement (the 'bias'), and 95% limits of agreement as the mean difference. It is expected that the 95% limits include 95% of differences between the two measurement methods (Bland & Altman, 1986).
All four levels of analysis were implemented on the heart rate data from the three different activity stages. They were also used on the task-specific HR data and accelerometer data, which can be used to indicate the intensity of movement.

Results and discussion
After collecting the heart rate data from both smartwatches, data analysis was required to truly understand the differences between the two. First, an examination of the experiment results for each subject was conducted. This examination included the heart rate waveforms for all three test cases and the absolute difference of heart rate for each test case for both devices. The student's T-Test, Mean Absolute Difference (MAD), and Root Mean Square Root (RMSE) that were mentioned in the previous chapter were applied to the heart rate measurement of each test case.
Among all six subjects, we picked the results of subject one as an example to reveal the heart rate differences of the two devices in each of the three different test cases. In Figure 2, it's obvious that cutting vegetables had the most significant heart rate difference during the action periods, which was 33bps at time of 90 second. respectively. This matches our expectation that the electric tooth brushing study is the one with smallest movement among these three activities.
From Figure 4, the heart rate measurement of HW is significantly higher than the measurement of MB. The mean bias is 2.9 ± 15 bpm over the heart rate measurement of MB.

Heart rate difference of different stages in same activity
When examining time synced Microsoft Band and Huawei Android watch heart rate data in different action stages during same activities, the heart rate difference of the Action stages are larger than those of the Rest stages. Take the heart rate data of subject four as an example. For the cutting vegetables activity, the Mean absolute difference of the Active stage of the subject's dominant hand is 4.94 times higher (23.03bps VS 4.66bps) than the Rest stage, and the RMSE is 2.32 times higher (28.36bps VS 12.21bps) than Rest stage with the T-Score of D-Active stage being 2.5 times higher (9.74 vs 3.89) than Rest stage. The non dominant hand Active stage also shows heart rate differences when compared to the Rest stage, as the Mean absolute difference is 1.31 times higher. However, the RMSE and T-Score of the N-Active stage is smaller than those of the Rest stage. Meanwhile, the electric tooth brushing test shows similar heart rate differences for both its Active stages and its Rest stage.
The Mean absolute difference is only 1.29 times higher, and RMSE is the same, with the T-Score being 2.2 times higher. The Rest and Active stages' heart rate differences of the walking test is higher than the other two tests, with Mean absolute difference being 4.69 times higher, and the RMSE being 3.15 times higher than the Rest stage.
As for the Bland-Altman plot, the Rest stage and D-Active stage of the cutting vegetables activity is 0.91± 5bps, and 20 ± 35 bps while that of N-Active stage is 2.2 ± 6 bps. The Bland-Altman mean bias values shows that the heart rate measurement of the Rest and N-Active stages are higher than the D-Active stage. Overall, from all three test cases, we've found that heart rate differences between each device in the Active stage is larger than the Rest stage, which meets our expectation.  Figure 5. MAD, RMSE and T-Score of heart rate difference in different stages of each test case. Figure 6. Results for Rest stage, D-Active stage and N-Active stage heart rate differences of two devices. Correlation between two stages of each test case and case average of Bland-Altman Plots indicating mean bias scores and 95% limits of agreement.

Heart rate difference of same stage in different activities
When examining time synced Microsoft Band and Huawei Android smartwatch heart rate data of the same stages in different activities, we calculated three metric measurements, as mentioned in the previous chapter, and explored the results for each subject. In this chapter, we'll use the data from subject four as an example. The heart rate difference of all three Rest stages in the three test cases are almost all the same.
The heart rate difference is less than 1bps for both MAD and RMSE. However, the T-Scores showed that even though the MAD and RMSE are almost the same, the actual signals are different from each other, as the cutting test case has a T-Score as high as 30.98, and the T-Score of the walking test case has a T-Score as low as 1.43. As for the Bland-Altman plot, the mean bias is -3.8 ± 7 bps, which indicates that the heart rate difference of the Rest stage in all three test cases has a very high agreement.
However, when comparing the Active stages, the heart rate differences show some interest findings. From Figure 7 we can infer that the vegetable cutting test has the largest heart rate differences among all three tests. The Mean absolute difference of the tests are 58.82bps, which is 8.5 times higher than electric toothbrush test, and 4.26 times higher than walking test. This indicates that intensity and magnitude are the main factors of optical sensor accuracy, rather than vibration frequency. Also, the T-Score of the cutting vegetables case is much higher than the other two cases. For the Bland-Altman plot, the agreement of heart rate differences in different active stages are very limited. The heart rate differences in the Rest stages can be regarded as the basic device difference between Microsoft Band and Huawei Android watch. In that case, for the cutting vegetables test case, the heart rate differences of the active stage 24 is 8 times higher than the rest stage, while the electrical tooth brushing test case is almost same as the heart rate difference of the rest and active stages.  . Results for heart rate differences of two devices in Rest stage between case 1 and case 2. Correlation between two test cases of Rest stage and Bland-Altman Plots indicating mean bias scores and 95% limits of agreement. Figure 9. Results for heart rate differences of two devices in D-Active stage between case 1 and case 2. Correlation between two test cases of D-Active stage and Bland-Altman Plots indicating mean bias scores and 95% limits of agreement.

Dominate hand activity VS Non-dominate hand active
When considering the relative heart rate accuracy of these two devices, the differences between dominant hand activities and non dominant hand activities is another factor that draws our interest. As one can imagine, it is normal that some people like to wear their watch on the wrist of their dominant hand, while others prefer to wear their watch on the wrist of their non dominant hand, but will this have any effect on the relative accuracy of optical heart rate sensors? From Figure 10 stage. It is obvious that drastic movement will affect optical sensor accuracy, but high frequency vibration with small magnitude has much less of an influence on the accuracy of an optical sensor, which matches our previous findings. However, wearing a smartwatch on one's non dominant wrist can reduce the influence of daily activities on its optical sensor's accuracy. The heart rate difference of N-Active stage is still two times higher than the heart rate difference of Rest stage.
By comparing Figure 11 and Figure 12, it's obvious that heart rate difference between the D-Active stage and the N-Active stage in the tooth brushing test case has 27 better agreement than the pair in the vegetable cutting test case, with the mean bias being -8 ± 14 bps in the tooth brushing test case versus -20 ± 23 bps in the cutting vegetables test case. Figure 10. Mean absolute difference and RMSE for Dominate hand active stage and Non-dominate hand active stage Figure 11. Results for heart rate differences of two devices in D-Active and N-Active stage of case 1. Correlation between two stages and Bland-Altman Plots indicating mean bias scores and 95% limits of agreement. Figure 12. Results for heart rate differences of two devices in D-Active and N-Active stage of case 2. Correlation between two stages and Bland-Altman Plots indicating mean bias scores and 95% limits of agreement.

Relative heart rate accuracy in different skin tone
One more observation of the time synced heart rate data of the Microsoft Band and the Huawei Android smartwatch is that the relative accuracy of their heart rate is related to the skin tone of the participants as well. We divided six participants into three groups based on their skin tone, in which the first group (Group Blue) contains subjects one and two, both of them being Indian. The second group (Group Orange) has subjects three through five, all of which are Chinese. The sixth subject is Caucasian, and in a separate group. The data of third group was not included in this comparison since only one participant is in the group. From Figure 13, we found that for both the vegetable cutting test and the electric tooth brushing test, Group Blue has larger heart rate difference in both the Rest and Active stages. For the vegetable cutting test, the heart rate difference of Group Blue is 1.6 times larger than Group 29 Orange in the Rest stage, and 1.2 times larger than Group Orange in the Active stage.
In the electric tooth brushing test, the heart rate difference of Group Blue is 1.6 times and 1.4 times larger than Group Orange on the Rest stage and the Active stage respectively. One explanation of this skin tone observation is that the optical sensors use the amount of refracted LED lights to determine the blood volume in a vessel. It's likely that the amount of an LED light absorbed by darker skin will be larger than that absorbed by lighter skin. Thus, even with same amount of blood volume, the amount of LED light refracted by darker skin will be much less than lighter skin. Because of this, the heart rate measurement of darker skin may be not as accurate as lighter skin.
This chapter only provides some initial findings based on the experiments and participants we have. For more accurate conclusions, more participants should be recruited, and quantity analysis of skin tone should also been conducted as well. influences on different devices. Therefore, it is confirmed that there always exists some sort of disturbance in heart rate data read from any wearable device. In order to use heart rate data either in personal health monitoring or for clinical usage, it's extremely important to detect an anomaly of heart rate data, and mark that data as unreliable before applying it to any application or diagnosis program.
For a further exploration of the heart rate accuracy of wearable devices, we conducted more experiments on more daily activities and collected heart rate data from the Microsoft Band and a pulse oximeter device. The heart rate read from the pulse oximeter is proved to be very accurate when no activity is involved during the measure. We use this heart rate data as a criterion, and the heart rate data read from the Microsoft Band as test data to detect heart rate anomaly. A CUSUM algorithm was used in this exploration to detect heart rate anomaly. The parameters of CUSUM algorithm were tuned based on the heart rate pairs from both the Microsoft Band and the pulse oximeter, and more tests on heart rate of daily activities were performed to get the accuracy of the CUSUM algorithm.

Experiment design and procedure
For this experiment, two health participants were recruited, and each participant was informed of the procedures first. They then underwent the same procedures through direct verbal communication. Participants gave verbal informed consent to five different activities, which were hand tooth brushing, electric tooth brushing, chopping, vacuuming, and washing dishes. Just like the previous experiment, these activities were picked to represent common everyday activities with different activity characteristics. An instructor timed each activity, and gave corresponding instructions to the participant throughout the whole experiment. Biosensor data from the Microsoft Band was streamed to a phone application, and stored in separated files automatically once the experiment started.
Each participant wore the Microsoft Band on the wrist of their dominant hand, and a pulse oximeter was clipped to one of participant's toes. The pulse oximeter was kept stable, ensuring that there the pulse oximeter was not moved during all activities in order to get the most accurate heart rate readings as possible. Each activity was performed the same way for both the subject's dominant hand and their non dominant hand, just like the previous experiment. Starting with their dominant hand, the subject began the experiment by resting for 30 seconds. The resting period was followed by a one minute period of performing the specified activity. This cycle was repeated two more times, and ended with a final resting period, where they again rested for 30 seconds. The same procedure was performed by the subjects non dominant hand, totaling 9 minutes for the entire procedure for both hands. The whole procedure is figured in Figure 14 and the data from all of the sensors was streamed to separate 32 folders in a smartphone which the Microsoft Band was connected to via a reliable Bluetooth communication. Figure 14. Procedure for anomaly detection experiment Although we have a reliable method to record and process biosensor data from the Microsoft Band, there was no reliable method to stream the heart rate data from our pulse oximeter to a file. Because of this, we used a camera to record the display on the pulse oximeter throughout the entire experiment and saved the video file. We then read the heart rate reading from the video at an approximate reporting rate of one data point per second. The readings from the oximeter video were then time synced with the heart rate data from the Microsoft Band and resampled to match matrix sizes.
33 Figure 15. Heart of MB and Pulse Oximeter of hand tooth brushing test case Figure 15 depicts the heart rate readings from both the MB and pulse oximeter from the hand tooth brushing test case in this experiment. It's clear from comparing the heart rate readings from the pulse oximeter with the smartwatch that the heart rate from Microsoft Band has anomaly readings at around time 250 seconds and 300 seconds, during which the heart rate reading of MB at time 250 seconds has a sudden increase of about 15 bps, and a sudden increase of 30 bps around time 300 seconds. This is just an example of the heart rate where the anomaly is obvious, and can be easily recognized by a consumer. However, there are many cases where the anomaly is not as obvious in this example, and we need a dedicated algorithm to detect and mark the unreliable heart rate data.

CUSUM anomaly detection
To detect anomalies in a wearable device's heart rate data, we used a recursive Cumulative Sum (CUSUM) algorithm as our first preferred method. The CUSUM detector has two advantages when compared to other change detectors. The first advantage is that CUSUM is not sensitive to the probabilistic distribution of the underlying signal, which is suitable to be applied on heart rate data. The second advantage is that it is proven to be optimal in terms of detecting changes faster than other methods. To have a better understanding of the CUSUM algorithm, we must first introduce the basics of the CUSUM algorithm. The CUSUM involves the calculation of a cumulative sum samples from a process n x and determine whether the values of n x has changed. To simplify the algorithm, we assume the distributions of n x before and after change follow Gaussian distribution and the mean values of these two distributions are 0 u and 1 u respectively. Let j x denote the th j sample of the data sequence. The basic CUSUM decision function is: Where 1 j G  is the decision function at the sample   which there are several parameters that the user has to correctly set in order to to get optimal CUSUM performance: 1. The detection threshold h.
2. The change magnitude d.

36
The detection threshold h: The classical way to set this parameter is to use the average run length function, which is the expected number of samples before an action is taken, and more specifically, the mean time between false alarms. The average run length is zero and the mean detection delay. These two specific values of the average run length function depend on the detection threshold h, and can thus be used to set the performance of the CUSUM algorithm to a desired value for a particular application.
The change magnitude d: The user must have a prior knowledge about the signal to correctly set this parameter. Indeed, an efficient setting for the change magnitude is the a priori most likely change magnitude that should appear in the signal. In case several magnitudes of jump are possible, the best choice is the minimum one. In any case, the resulting change detection algorithm is only optimal to sequentially detect the chosen change magnitude.

CUSUM parameter tuning
As we initially explored CUSUM anomaly detection, we first tuned the CUSUM parameters based on the comparison of heart rate data from the Microsoft Band and the pulse oximeter. Figure 17 shows a comparison of normal heart rate readings and heart rate readings with anomalies in the hand tooth brushing test case. It's obvious that a normal heart rate ranges from 60bps to 80 bps for hand tooth brushing, while the anomaly heart rate has a sudden increase from 705bps to 90bps around time 250 seconds, and another sudden increase from 88bps to 125bps at time 300 seconds. With the comparison of the MB heart rate and the pulse oximeter, the sudden increase at 37 250 seconds and 300 seconds are two anomalies that should be marked as unreliable heart rate readings. Figure 17. MB and Pulse Oximeter heart rate comparison of hand tooth brushing test case of two participants with one has normal heart rate reading the other has anomaly Figure 18. MB and Pulse Oximeter heart rate comparison of water flosser test case of two participants with one has normal heart rate reading and the other has anomaly From Figure 18, the comparison of the two heart rate readings from the water flosser test show that the normal heart rate ranges between 60bps to 85bps, while the anomaly shows a sudden increase from 70bps to 120bps at time 250 seconds. The goal of the CUSUM algorithm is to detect these occasions of anomaly, and mark corresponding heart rate reading as unrealizable readings.
To tune CUSUM parameters based on our experiment results, we first consider the change magnitude d. We start with setting the slice window size to 5, which means we consider the heart rate readings in any 5 seconds period. To mark an anomaly in the heart rate readings over various activities, the first step is the get the range of normal heart rate changes. By collecting and comparing all training data sets, it's obvious that for all the normal heart readings, the change range is within 15 bps in any 5 second period, while for a heart rate with an anomaly, the change range is over 20bps, and sometimes even as high as 50bps in 5 seconds periods. Therefore, the change magnitude is set to 25. The detection threshold should be the mean time between false alarms of the average run length and the mean detection delay. As we set the slice window to 5, it's reasonable to set the threshold as twice of the slice window, which is 10 in our training data. To evaluate the result of the parameters we choose, Figure 19 demonstrated the CUSUM results with the detect threshold as 10, and change magnitude of 25 based on the heart rate data reading from the biosensors.
Initial test results of the training set show that our CUSUM algorithm with tuned parameters can detect heart rate anomaly without false alarms. Figure 19, CUSUM anomaly detection result for heart rate shown in Figure17, 18. The results show the tuned CUSUM parameters can detect anomaly without false alarms.

CUSUM anomaly detection performance measurement
To test the performance of our CUSUM algorithm with tuned parameters, we applied the algorithm to all the data collected from our experiments, and all the results are pictured in Figure 20. There are two participants and five daily activities (Chopping (CH), Electric Tooth Brushing (ET), Manual Tooth Brushing (MT), Vacuuming (VA) and Washing Dishes (WD)) tested in this experiment. To examine the performance of the CUSUM algorithm, the data from the same activities was concatenated as one data stream, and the accelerometer data was also included to show the relationship between movement and heart rate anomaly.
For the chopping activity, there's an anomaly detected at time 840 seconds, where the heart rate has a sudden increase from 70 bps to 90 bps. At the moment of the anomaly, the participant was stopping the current activity and setting themselves to their resting position. There's no other heart rate anomaly found from the heart rate signal. We saw that the CUSUM detected the only heart rate anomaly without any false alarm in the chopping test case.
For the electric tooth brushing activity, there are three anomalies detected by the CUSUM algorithm. However, it seems that the first detection at time 260 seconds should be a false alarm since the heart rate only had an increase of 13bps. The other two detections successfully detected the anomaly. Furthermore, the third anomaly detected occurred at the time when the participant was is in resting, which is very abnormal.
For the manual tooth brushing activity, the only anomaly detected was at time 840 seconds. However, there seems to be another anomaly at time 280 seconds which was 41 missed by the anomaly detection algorithm. The anomaly was detected during the time when the participant transitioned from an active state to a resting state, which matched the anomaly detection in the chopping activity.
For the vacuuming activity, the anomaly detected at time 300 seconds should be a false alarm, where the heart rate only increased by around 12bps while two anomalies at time 680 seconds and 800 seconds were missed by the algorithm. The heart rate increases at these two moments were much larger than the ones detect by the CUSUM algorithm. Also, both of anomalies happened at the time when the participant was changing from an active state to a resetting state.
For the washing dishes activity, the anomaly detected at time 860 seconds is a valid heart rate anomaly, which also occurred at the moment when the participant was in transit from active to resting. However, another obviously anomaly at time 700 seconds was ignored, which has a heart rate at around 40 bps, and is clearly an anomaly.
Overall, the performance of CUSUM with tuned parameters is not as good as expected. Over the course of the experiment, five anomalies were successfully detected, with two false alarms and three anomalies missed by the algorithm. It's clear that a dedicate algorithm should be designed to tune the CUSUM parameters, and more training data is needed for the algorithm. The main objective of this study was to analyze the accuracy of PPG heart rate sensors from commercially available smartwatches, like the Microsoft Band and the Huawei Android smartwatch. PPG technology is relatively new, and has been applied on wearable devices to obtain consumer level heart rate monitoring. The inherent variability in accuracy may likely exist among various devices. Previous research shows that conditions of low physical exertion elicited the least variability in error values among various trackers. In this study, we focused on the performance of PPG sensors in daily activities, and discussed various factors that have influence on the accuracy of PPG heart rate sensor. From previous studies, it is obvious that movement plays a very important role in the accuracy of a wearable device's optical sensor, and in this thesis, the focus was on the influence of both magnitude and frequency of the movement. Our three test cases mimicked everyone's daily activities, and represented the three different types of movement which were included in the first experiment.
Among them the vegetable cutting test has the largest magnitude and moderate frequency. The electric tooth brushing test has the highest movement frequency, but smallest magnitude, while the walking test represented the movements that have large magnitude but small frequency.
When considering different factors that have effects on the accuracy of a wearable device's optical sensor, we evaluated the influence of movement magnitude, movement frequency, and user preference of wearing the device on their dominant hand or non dominant hand, as well as skin tone of the participants. From the result of our experiment, the Microsoft Band and the Huawei Android smartwatch have very large heart rate differences across all three of the test cases we conducted, with the largest differences being in the vegetable cutting test. The movement in this activity had the largest magnitude and relatively high frequency. The maximum heart rate difference is about 40bps, with both the Mean absolute value and RMSE being 9bps and 13.69 respectively. Meanwhile, the electric tooth brushing test case has the smallest heart rate difference, with the Mean absolute difference and RMSE being 3.86bps and 5.47bps, indicating that frequency has smaller a influence on heart rate accuracy than magnitude. We found that among all these factors, movement plays the most important role, and has the largest impact on the accuracy of optical heart rate data. Within movements, both the magnitude and frequency of the movements affect heart rate accuracy, with magnitude having a significant impact and frequency having barely any impact. We found that when the device was worn on the wrist of the user's dominant hand, the data showed larger heart rate differences, which matched our expectations. However, when the device was worn on the wrist of the user's non dominant hand, the readings were also affected by the movement on their dominant hand, although the influence is not as high. Furthermore, we found that skin tone may also have some impacts on the accuracy of optical heart rate sensors. Experiment results show that the heart rate differences of the participant group with a darker skin tone are larger than that of the participant group with a lighter skin tone in both the vegetable cutting test and the electric tooth brushing test.
Based on the findings from the first experiment, we further explored how to use the CUSUM algorithm to detect heart rate anomaly. More experiments with a variety of daily activities were included in this to further explore this algorithm, and we first tuned CUSUM parameters with training test data. We then examined the performance of CUSUM on more experimental data. The CUSUM algorithm showed enough accuracy to detect basic heart rate anomaly. However, as part of our future work, more sophisticated algorithms need to be developed to have better detection performance.

LIMITATIONS AND FUTURE WORK
Though our study was successful as a primitive experiment on the optical sensor in wearable devices, and we were able to conclude multiple factors that have impacts on the accuracy of optical heart rate sensors, there are some obvious limitations of this study, and more improvements would need to be included in future work. The first limitation is the sample size, considering the fact that at the time of this study, there was only a small number of participants available. This definitely impairs the statistical significance of our research. With more participants, longer experiment times, and refined experiment protocols, it will be possible to have a valid method to record and calculate skin tone of participants, and to have a more convincing conclusion on the effect of skin tone on optical sensor variability. Also, more participants will provide the potential to group subjects with of different ages, races, health conditions etc., so that we could discover more factors under the hood, and all the conclusions of our study will be more convictive.
In addition to the sample size, more devices and more tests cases can be added to this research to have a more sophisticated and thorough understanding of the differences of each device, and how these devices perform in a much larger range of daily activities. More potential factors that may have influence on optical sensor accuracy should be experimented and evaluated, such as the surface condition of the skin and how the tightness of smartwatch affects its heart rate accuracy.
Another possible improvement is to add a criterion device to provide golden heart rate data. Not only would we know the accuracy of each wearable device, but we'd also have the ability to identify the time frame and movement in which a heart rate anomaly occurs. In this situation, more mathematical algorithms could be used to detect the heart rate anomaly, and with the latest machine learning algorithms, it's even possible to classify each anomaly into different categories.