The Diagnosis of Lyme Disease Using an Expert System

Diagnosing Lyme disease has been problematic since its first recognition in 1975. An assortment of problems, including clinical symptoms that mimic several other diseases, and lack of an accurate laboratory test, have hindered diagnosis. Overdiagnosis and misdiagnosis may result. This thesis seeks to improve the accuracy of diagnosing Lyme disease by creating an expert system. The type of expert system developed in this thesis will be a probabilistic Bayesian belief network. The network consists of nodes which represent diagnostic variables and links between nodes which represent the probabilistic influence one node has on another. Much is known about Lyme disease, its transmission, and the diagnostic symptoms that are associated with the disease. This information about variables is incorporated into the network through a literature search. Initial estimates of these variables were determined to initialize the system with a priori values. To test the system, data were collected on a number of patients who presented symptoms consistent with Lyme disease. The system's classification will be compared to the patients classification based on serological results and methods for improving the system' s accuracy are discussed.

. Table 2. Table 3. Table 4.           Diagnosing Lyme disease has been problematic since the disease's identification in

LIST OF TABLES
1975. An assortment of problems, including clinical symptoms that mimic a number of other diseases and lack of an accurate laboratory test, have hindered diagnosis. Despite these problems, much is known about the spatial distribution of the disease, the mechanisms of acquiring the disease, and the progression of the disease.
Lyme disease is a dangerous disease that continues to challenge scientists and clinicians. It is a complex and enigmatic spirochetal illness involving the skin, nervous system, heart, and joints. It is the most common tick-borne disease reported by the Center for Disease Control (CDC) (Magnarelli, 1988). Incidence in the state of Rhode Island has been among the highest in the country: CDC reported incidences of 0.000272 in 1992 and 1993 (MMWR). Effective treatment of Lyme disease requires early diagnosis. If the disease is allowed to progress, dangerous side effects and symptoms may develop, such as meningitis, encephalitis, Bell's palsy, radiculor pain, nodal block, and arthritis. With early diagnosis, oral antibiotics such as amoxicillin or doxycyline are usually curative (Barbour, 1993).
Unfortunately, there are difficulties with the diagnosis of Lyme disease. Like syphilis, Lyme disease has been called a "great imitator" and is often difficult to diagnose clinically, particularly when a skin lesion, erythema chronicum migrans (ECM), is absent. This may occur in 10-40% of patients (Dennis, 1991). The presence of ECM definitely implies the patient has Lyme disease, however, many physicians may not accurately diagnose ECM. Blaauw ( 1987) tested the diagnostic ability of general practitioners and dermatologists using photographs of ECM and found their recognition lacking.
Laboratory tests are not yet standardized and scientists have highlighted the poor agreement of test results using serological methods. On a national level, the percentage of false negative test results are estimated to be 4% to 21 % and the percentage of false 1 positive rates are estimated to be 2% to 7% (Kahneman, 1982). False negative tests may lead to serious problems if the disease is allowed to progress untreated; false positive tests may lead to complications by overtreating with antibiotics.
The rate of false positives and the inability of most serological tests to distinguish between active and inactive disease has led to the overdiagnosis of Lyme in some areas .
Lyme disease is increasingly reported from the southeastern and south-central regions of the United States, even though the causative agent, Borrelia burgdorferi, has not been found in most states in these regions (Dennis, 1991). Fibromyalgia and Chronic Fatigue Syndrome (CFS) may be incorrectly diagnosed as advanced Lyme disease, due to the similarity of their clinical symptoms. The problems in distinguishing these diseases are compounded by the fact that some patients develop Fibromyalgia or CFS in association with or soon after Lyme disease. For example, Steere ( 1993) found that 45% of the patients diagnosed with a disease other than Lyme also previously tested positive by serological assay for Borrelia burgdorferi, a fact which may reflect the positive association of these diseases.
Diagnosing Lyme disease can be difficult even for the physician who is wellinformed about the clinical manifestations of the disease. The diagnosis problem involves an assessment of a number of clinical symptoms, possible risk factors and the timing of the symptoms. Cognitive psychologists have shown that the ability of the human brain to make these types of assessments of uncertainty in complicated situations is generally poor (Kahneman, 1982). This thesis seeks to improve diagnosis of Lyme disease by developing a computerized expert system.

Bayesian Belie(Expert System:
An expert system is a computer program which can make reasonable judgments in a complex area . The intent is that the program has the capability to perform a task ordinarily performed by an expert. Thus even physicians unfamiliar with 2 Lyme disease could diagnose with the accuracy of an expert with this tool. Not only could such an expert system improve diagnostic accuracy, but it could also save time and money for health care providers. Examples of wasted costs are laboratory tests, physicians' services, hospital care, and unnecessary medications . Thus, an expert system that diagnoses Lyme disease with a high certainty would represent a significant advancement.
In diseases such as Lyme disease a probabilistic expert system is more appropriate than a deterministic system. Because many of the clinical symptoms of Lyme disease are also symptoms of other diseases, the presence of these symptoms increases the probability of having Lyme disease, without implying that the patient definitely has Lyme disease. In a deterministic system, a diagnostic variable directly implies that a particular state of the disease variable has occurred.
The paradigm used to build the expert system in this thesis is a Bayesian Belief network. A Bayesian belief network is a tree-like structure containing a number of diagnostic variables, referred to as nodes. Links between nodes represent the probabilistic influence of one node on another. The tree-like structure is modeled so that nodes are connected by the links in a parent/child arrangement according to causality . Each link carries probabilities which measure the influence that each parent has on its connected child. One objective of this thesis was to obtain estimates of these links by searching the existing literature, and by obtaining opinions of expert Lyme diagnosticians. Once these estimates have been obtained, information on a given patient's symptoms can be entered in the system, and the system will calculate the probability that the patient has Lyme disease.
The Bayesian belief network has three engaging features: it easily incorporates information from a variety of sources, it may be updated with new data and it handles missing data very well.

J.2 Other classification systems:
Three other probabilistic classification methods were considered for use in this project: discriminant analysis, a log-linear model, and a classification and regression tree.
These methods were ultimately decided against, but are briefly described here for completeness. Discriminant analysis is a classification method using continuous, normally distributed explanatory variables. Since most explanatory variables in this study are categorical, a discriminant analysis was not chosen. Log linear models could be used for classification problems with both categorical and continuous explanatory variables. A classification and regression tree splits patients into disease categories (e.g. early, late, or no Lyme disease) based on the values of the diagnostic variables. However, estimation in these models is difficult or inefficient with missing data, and interaction terms don't fully represent a cause and effect relationship between explanatory variables. In addition, the ability to combine information from a variety of sources, and to learn from new data are not possible with these classical statistical methods.
This thesis develops an expert system that will aid physicians in the diagnosis of Lyme disease. Chapter Two describes in detail how the Bayesian belief network works.
Chapter Three discusses how the initial estimates were obtained through a literature search and with the aid of expert's opinions. Chapter Four describes the use of this system on a set of patients and compares a patient's diagnosis as determined by the expert system to the diagnosis determined by serology. Finally, Chapter Five discusses the results and directions for further research.
Determining the probability that a patient has Lyme disease given various clinical variables, is the major goal of this thesis. A Bayesian belief network that represents the causal relationship between clinical variables and Lyme disease facilitates calculation of this probability. The initial step in creating a Bayesian belief network involves construction of a graph of nodes which graphically represents the causal relationships among all variables. Probability calculations become difficult in complex networks, therefore, a different representation of this information, a tree of cliques, is used instead.
In this chapter, the four steps to create and use a Bayesian belief network are described in detail: (1) creating the graph of nodes, (2) probability propagation in a graph of nodes, (3) creating the permanent tree of cliques, and (4) probability propagation in the cliques.

Creating the Graph o(Nodes:
A Bayesian belief network consists of a number of diagnostic variables represented as nodes. Causal relationships between variables are represented by links; one node points to another if it directly affects the probability of that node occurring. This establishes parent/child relationships where a parental node points to its child node. Figure (1 ) depicts the cause and effect relationships for the variables used in this study. For example, the arrow pointing from Age to Oligoarticular Arthritis, indicates that Oligoarticular Arthritis could be caused by a person's age. The link from Disease to Serum Test indicates Disease causes a positive or negative serum test, that is, Disease influences the probability of the serum test. It should be noted that this causation is not deterministic but probabilistic. In these examples, the nodes Age and Disease are the parents of the child nodes Oligoarticular Arthritis and Serum Test, respectively. A node may have more than one parent; in Figure 1 Oligoarticular Arthritis has two parents, Disease and Age. This means that Oligoarticular Arthritis could be caused by Age or Disease or both.
Each link in the graph of nodes has an associated probability matrix, P, that specifies the amount of influence that the parent node has on the child. The matrix has dimensions (N x M) where N is the number of states in the parent node, and M is the number of states in the child node, and has elements Pij = P(child = Cj I parent= Si) where Cj, Si are the possible child states and parent states, respectively, i = 1, 2, ... ,N, andj = 1, 2, ... ,M. For example, in Figure 1, P9 links the parent node Disease to the child node headache. If the states of Lyme disease are defined as Lo= no Lyme disease; L 1 = early Lyme disease; L2 = late Lyme disease; and L3 = Fibromyalgia or Chronic Fatigue Syndrome, and the child states are defined as H 1 =has a headache and H 2 =doesn't have a headache, then the probability matrix for this example is: The first column of the matrix contains the probabilities of having a headache given each state of the parent node (Lyme disease) and the second column of the matrix would have the probability of not having a headache given each state of the parent node. 6 Each node also has an associated belief vector whose elements are the marginal probabilities of each state of that node. For example, the belief vector for the Lyme disease node is (4 x 1) and has elements equal to the incidences of each state of Lyme disease. We will define the belief vector of the Lyme disease node as follows:

Probability propagation in a Graph of nodes:
Calculating the probability of Lyme disease given a set of diagnostic variables is accomplished by sending lambda and pi messages. Suppose that E represents the set of observed diagnostic variables; E+ A are the variables observed in the tree above node A and E-A are the variables observed at or below node A. Lambda messages send information about E-A up to node A; pi messages send information about E+ A down to node A.
Each node A contains a lambda value defined as: If A=ai is observed, then this lambda value is a vector of zeroes with a one in position j where the vector has dimension (K x 1) where K is the number of states in node A.
Otherwise this value must be computed recursively via lambda messages. For example, if a patient had a headache, yes in the node Headache, would be instantiated. The lambda value for this node is A(headache) = (1 , 0). Likewise, instantiating for not having a headache results in a the lambda value of (0, 1). All lambda values initially start at (1 , 1) for each node, which indicates an absence of any diagnostic evidence. After a node has been instantiated the lambda values change. Note that Li A (Ai) is not necessarily one.
Information is passed upward in the tree through lambda messages in the following way. The lambda message from child node B to parent node A is defined as: The lambda message is a vector with dimension (K x 1) where K is the number of states in parental node A. A lambda message from B to A thus sends information to A from the variables below B. Lambda messages can be calculated as: the first equality is a consequence of the law of total probability and the conditional dependence implied by the graph structure.
As an example, suppose a patient reported experiencing a headache. Then the lambda message from the node headache to the Lyme Disease node would be computed as follows: The lambda message is the probability matrix for the link times the lambda value of the child. After all lambda messages are received, the parent's lambda value is updated by multiplying the lambda messages of all its children. This is done because all of node A's children are independent of each other given A. If s(A) is the set of A's children, then where the product indicates elementwise multiplication.
Pi messages are sent down the tree, relaying direct information from the parents to the children. Each node has a pi value defined as: When A has no parents, A is a root node, and its pi value is equal to P(A I 0 ) = P(A) .
The pi message from node A to Bis defined as: where B'(A) is the current belief in node A and the division is elementwise. The pi message is a vector with dimension (K x 1) where K is the number of states in parental node A. The belief vector is divided by the lambda message from node B to A so that, the 9 pi message from node A to B sends information only about E+ A· After the pi message is computed it is normalized so the elements in the message sum to one. Information from a parent is incorporated into the child node by updating the child's pi value: This equation again is a consequence of the law of total probability.
For example, suppose the Lyme disease node is instantiated for Lo, and Lyme disease is the parent of headache. The pi value of Lyme disease node then is equal to (1,0,0,0), by definition. The pi message that is sent from Lyme disease to headache is, After normalization the pi message is ( 1,0,0,0 ). Finally, the pi value for the headache node is computed by multiplying the transpose of the link matrix by the pi message that was just sent: 0 0 0 Thus, the pi value for the child is the probability of the node given E+ A or the instantiated variable.
After the lambda and pi messages are sent, each node's belief is updated using Bayes theorem: where B' ( L) is the new belief of the Lyme disease node due to the new diagnostic evidence, a is a normalizing constant, and the multiplication is elementwise.
The network is first initialized to compute a priori probabilities of all nodes (i.e. the probabilities based on the instantiation of no nodes). The network is initialized through probability propagation. All lambda messages and lambda values are initially set to one. A pi message from the top of the graph is sent down the graph to all of its children whereupon a propagational flow will begin. Each child then sends pi messages to each of its children, and this process is repeated until the leaves of the tree are reached. At this point, each leaf of the tree sends lambda messages up to its parents, which then send lambda messages to their parents. This process is repeated until the root node is reached.
After initialization, the system is ready to calculate the probability of Lyme disease given a set of diagnostic variables. First each diagnostic variable is instantiated; this instantiation will change a number of nodes pi and lambda values. Then this information is propagated throughout the network via lambda and pi messages. Each instantiated node sends a lambda message to its parent and a pi message to its child. If the parent or child node doesn't exist then no message is sent. Once every node has sent both its pi and lambda messages, the propagation is complete.
This method of probability propagation is adequate if no node has more than one parent, and there is no more than one path connecting each pair of nodes. However, this may not be true in more complex networks. If a node has more than one parent, a modification of the probability link matrix may be made that conditions on all parental nodes. With this modification, a single pi message is sent from all parents to the child.
However, if there is more than one path between a pair of nodes, a lambda message from A to B and a pi message from B to A could be sent down both paths, causing errors in the probability propagation method described earlier. A different representation of the graph is necessary to correct these problems.
The Bayesian belief network used here will propagate probabilities in a tree of cliques. Probability propagation could have been done in either a graph of nodes or a tree of cliques because there is not more than one path between any two nodes. The package Dxpress computed the probabilities in a tree of cliques. The next section describes how cliques are determined and how probabilities are propagated in a tree of cliques.

Creating the permanent tree of cliques:
Once the tree of nodes has been constructed we re-represent this graph as a tree of cliques. As discussed earlier, the tree of cliques allows for easier and more accurate calculations in more complex graphs. A clique is a subset of nodes in the network that is complete and maximal. A set is complete if every pair of distinct nodes in the set are connected with a link. A set is maximal if it is not a subset of any other complete set . This prevents counting a clique as two cliques when, in fact, it should be counted as one. The tree of cliques is constructed in three steps: (1) the graph is triangulated, (2) the cliques are determined, (3) and the tree of cliques is determined. The tree of cliques will contain all the information stored in the original graph.
In order to determine the different cliques of the graph, the system must be triangulated. The system is triangulated if every simple cycle of length strictly greater than three possesses a chord. A simple cycle is a path from one node, back to itself, where no other node on this path can be repeated. A simple cycle possesses a chord if there is a link between two nonconsecutive nodes of the cycle (Neopolitian, 1990). Triangulation of the system is accomplished in three steps: moralizing the graph, numbering the nodes, and determining the fill-in's. First, the graph is moralized by marrying or joining the parents of a node with a link. Second, the nodes are numbered using an algorithm called Maximum Cardinality Search (MCS) .
Maximum Cardinality Search runs by assigning the number one to an arbitrary node.
Successive nodes are selected as the node adjacent to the largest number of previously numbered nodes, breaking ties arbitrarily. The final step before determining the cliques is to determine the fill-in's or links that must be added to triangulate the graph. Fill-in's occur when there is a path between two nodes (v,w) containing only v, wand vertices ordered after both v and w. If this occurs, then v and w must be connected resulting in what is called a fill-in . After completion of these steps, the tree of variables is triangulated (see Figure 2).
Next the cliques are determined. Using the triangulated tree of nodes, the complete and maximal sets of nodes are determined. For example, referring to Figure 1, the nodes Disease, Age, and ECM are a clique because each node is linked to every other node in the set and no additional nodes may be added and still retain the complete property. The total number of cliques for this network is eighteen.
The cliques contain all information of the original graph. All possible combinations of states for all the nodes in a clique, or the configurations, are stored in the clique. In Figures 1 and 3, clique 3 consists of nodes C and E where node C has four possible states (cl, c2, c3, c4) and node E has two possible states (el, e2), so the configurations are: (cl, el), (cl, e2), (c2, el), (c2, e2), (c3, el), (c3, e2), (c4, el), (c4, e2). Information about the causal relationships among the nodes are contained in the sets Ri and Si defined for each clique i. Si is defined as the intersection of clique i with the union of all previously numbered cliques; Ri is equal to Clqi-Si,or all other nodes in that clique. The set Ri for clique 3 contains the node (E). The set Si for clique 3 contains the node (C). Psi values and posterior probabilities are also contained in each clique and represent the information previously stored in the link probability matrices and the belief vectors in the tree of nodes, respectively. These values are explained in detail below. Figure 4 has a partial table of the configurations and the previously mentioned sets that must be stored at each clique.
Cliques one and two have a large number of configurations, therefore, they are only partially listed but cliques three and four are fully listed.
The tree of cliques is determined in the following way. Remembering that a clique is defined as a set of nodes, we number the cliques by the highest labeled node in the clique, breaking ties arbitrarily. By using the MCS algorithm, the nodes possess the running intersection property  defined as: each clique, except the first, when intersected with the union of all the previous cliques, results in a set of nodes that is contained entirely in a previous clique. The placement of cliques into a tree ( Figure 3) can now be determined because there is a development of a cause and effect (parent/child) relationship due to the running intersection property. After the cliques are placed into a tree, the original graph can be disregarded. The tree becomes a permanent part of the expert system and is only changed when the causal network changes

Probability Propagation in a tree of cliques:
The Bayesian belief network affects diagnoses by calculating the probability of Lyme disease given observation of certain diagnostic variables. As discussed earlier, the graph of nodes propagated probabilities via lambda and pi messages. In a tree of cliques probability propagation is similar. Information that was stored in the link probability matrices is stored instead in a vector of psi values. Belief vectors are defined for configurations rather than for the states of the nodes. Probability propagation once again takes place by sending messages up through the tree of cliques and back down.
The psi value for Clqi is a N x 1 vector defined as, where Ri, Si are the sets defined earlier and N is the number of configurations in clique i.
The psi values for Clqi is calculated as follows : where P(v I c(v)) is the conditional probability of a node (v) given its parent (c(v)), and f(v) is the set of all nodes assigned to clique i. If no node (v) is assigned to Clqi the product equals 1.
Probability propagation in the tree of cliques is done in two steps. First, lambda messages from the leaves (nodes with no children), or the bottom of the tree, propagate information up the tree. Which leaf starts the propagation and the order in which the 15 leaves send their messages are arbitrary. Nodes that have received lambda messages send lambda messages to their parents. The lambda messages propagate upward until the root is reached. Second, after all the lambda messages reach the root of the tree, pi messages are propagated down the tree. Once all the leaves have received pi messages, the propagation is complete.
Instantiation of a variable in a tree of cliques is similar to instantiation in a graph of nodes. Every clique containing a variable that has been instantiated for, multiplies its psi values by a vector of zeroes with ones in the positions corresponding to configurations that have occurred.
Now a lambda message can be sent from a leaf clique to a parent clique and the propagation may begin. The lambda message from Clqi to its parent clique, Aclqi(Si), is defined similarly to before: where the summation is over all the nodes in Ri or the nodes that are not contained in the parent clique. The lambda message contains information solely about the nodes in both parent and child cliques or nodes in Si, thus the summation over the set Ri· If Ri is equal to the empty set, there is no summation. The lambda message is a K x 1 vector where K is the product of the number of states of each node in Si. If Si is equal to the empty set, no message is sent. After computing the lambda message and before leaving the clique, the conditional probabilities are divided by this lambda message, where the division is elementwise. This normalizes the conditional probabilities so that the marginal probabilities can be calculated. The lambda message that is received by the parent is multiplied elementwise by the parent' s current psi values.
The second phase of probability propagation begins at the root of the tree. Once all lambda messages have been sent, the unconditional probabilities, P(Clqi) can be determined. To determine these probabilities we use pi-messages to propagate the information down the tree. A pi-message from Clqj to Clqi is computed as: where where the multiplication is elementwise.
Finally, the probabilities of each variable in each node can be determined using the law of total probability, This equation sums over all the variables except v in clique i. For example, in Figure 3 clique 3 consists of the nodes C and E. The probability of the first state of node C is the summation over all the states of node E with c 1 fixed, or P( c1 ) = L P( c1 , ei) = P( c1 , e1 ) + P( c1 , e2 ).
The probabilities of other states are found in a similar manner.
In this chapter, we derive estimates of the link probability matrices for the Bayesian belief network through a meta analysis of research papers and collected data.
The literature search was conducted using the keywords Lyme disease, ticks, Ixodes, Fibromyalgia, and CFS. Journal articles were obtained from the University of Rhode

Island and the Brown University libraries. The Centers for Disease Control and
Prevention's Lyme disease data base was also analyzed with the help of Dr. Kathy Orloski.
Additionally, a survey was administered to physicians to obtain information on symptoms related to patients with diseases other than Lyme disease. This chapter explains how the estimates were obtained, and describes the assumptions made.

Section 3.1 Bayesian Updating:
The elements of the link probability matrices are the conditional probabilities, P( ci I Pk) where i = 1, 2, ... ,n and k = 1, 2, ... , m, n being the number of states in the child and m being the number of states in the parent. Thus it was necessary to estimate the proportion of people reporting each of the symptoms in the network for each disease category. We searched for articles with data on the numbers of patients with various symptoms. Proportions estimated from these data serve as the initial estimates of link probability matrices. These results are incorporated into the system.
The information from the literature search, the CDC database, and the physician surveys are integrated using a Bayesian approach. Bayesians view probabilities subjectively. If there is no information about a particular link element except that its value 20 must be between zero and one, then this element would be viewed as a random variable with a uniform distribution on the interval ( 0, 1 ). This is the prior distribution. New information about the link would cause the belief in the link element to change and thus the posterior distribution describing the random variable would be updated to reflect the new belief. Updating distributions with the new information is accomplished using Bayes theorem: posterior -prior * likelihood where the likelihood is the probability of the data given that the link is known.
We assume that the conditional probabilities in the links have a dirichlet distribution. For nodes with two states, the dirichlet distribution reduces to a beta distribution. The dirichlet distribution is desirable for two reasons. First, the dirichlet and beta distributions model random variables with possible values on the interval (0, 1), therefore, they are appropriate for modeling probabilities. Second, both distributions are conjugate priors, so that they lead to a posterior distribution that is also a beta or dirichlet distribution, respectively Additionally, updating is easy for both beta and dirichlet distributions.
Initially, each row of the link matrix, (Pi 1 , Pi 2 , .. . , Pik ) is modeled with a joint distribution that is uninformative.  recommends modeling the joint distribution as: where k is the number of child states. Then the expected value of the link element Pii ( the conditional probability that the child state is ci given that its parent state is si) is, where Si is the parent of ci, and j = 1, 2, ... , k. Thus the probabilities of the child states (given the parent Si) are equally likely, reflecting the fact that nothing is known about the link.
The dirichlet distribution updating method is proved using Bayes' theorem. Bayes' theorem says, gives a quantity proportional to the posterior d i stribution, 22 h "* (P· )al+xl-l(P· )a2+x2-I ( p. )ak+xk- 1 11 12 . . . 1k ' where h" is a constant. The resulting posterior distribution is dirichlet with parameters (a 1 +xi. a2+x2, ... , an+Xn ). Thus the dirichlet is a conjugate prior and updating is accomplished in the very simple manner of adding cases to the prior parameter values.
This updating procedure will be illustrated using an example. Referring to table 2, link 2 is the probability of recalling a tickbite given you have Lyme disease. Suppose we are updating for the parental state of early Lyme disease and the child states are R 1 =recall tickbite and R 2 =don't recall tickbite. Initially the link starts as a vague distribution: No other journal articles were found where patients recalled being bitten by a tick. The initial estimate can now be calculated. 23 After all the information has been incorporated into the distribution the initial It is assumed that the probability of being male given late Lyme disease is equal to the probability of being male given early Lyme disease, in order to do the calculations for late 24 Lyme. The probability of being male or female given no Lyme disease is calculated by noting, Li P(male I Li)P(Li) = P(male) = 112 where al quantities are known except P(malel Lo).

Section 3.2 Assumptions made:
Most journal articles used the patient inclusion criteria of present or previous ECM, a l ate sign or symptom of the disease and/or high titer levels from blood tests. The expert system yields a probability of Lyme disease given clinical symptoms of the disease. Even with strong evidence this probability is often small due to the small incidence rates. For example, the posterior probability of having early Lyme disease given a positive serum test is 0.0003 approximately three-fold higher than the incidence of 0.000127.  conducted a cost-benefit analysis of treating Lyme disease and determined that treatment is only warranted if the probability of Lyme disease is at least 0.01. Thus, we used a cut off level of 0.01 to classify a patient as having Lyme disease. If more than one state of Lyme disease or CFS was greater than 0.01 the patient was classified as the state with the largest likelihood. (Imugen pers. Comm.). 28 Imugen categorized patients into five states based on the serological results: negative, positive early, positve late, remote, and XR (cross reactivity). A classification of cross reactivity occurs when the antigen levels are present, but are not significantly high .
This could indicate a very early response to Lyme disease or cross reactivity. Remote indicates a past infection rather than a current infection.
The study protocol was for each patient to have two serum tests. The first serum test would be 0 adminstered immediately after the office visit and the second test would be given during a follow-up visit approximately two weeks later. Unfortunately, not all patients returned for the follow-up visit. All patients had at least one blood test.
Two different analyses of the expert system were done: first, the expert system' s diagnosis based upon clinical symptoms was compared to Imugen' s classification.
Second, the expert system's diagnosis based upon clinical symptoms and the results of the first serum test were compared to Imugen' s classification. The second comparison was conducted on only the patients who had two serum tests. An overall misclassification rate for the expert system's diagnosis was calculated along with specificity and sensitivity rates. This study classified patients using the results from both serum tests: at least one positive test resulted in a classification of Lyme disease.

Results:
Of 124  Patients diagnosed by the expert system with CFS or Fibromyalgia were assumed to not have Lyme disease when calculating the misclassification rate.
The diagnostic accuracy of this first version of the expert system is a bit disappointing. In order to improve the system, various assumptions of the model were tested on the data. The next section describes in detail this analysis of the system.

Analysis:
Misclassified patients were separated into two groups in order to detect possible trends. Table 24 lists both patients that tested positive for Lyme but were classified by the expert system as not having Lyme and patients that tested negative for Lyme but were classified by the expert system as having Lyme. An examination of the first half of the The highlighted symptoms have significant differences between the initial estimate and the proportions from the data. Biased estimates may lead to misdiagnoses.
One assumption of the Bayesian belief network is that symptoms are conditionally independent given the disease state. To test whether this is a valid assumption, Fisher's exact test for independence was run for each pair of clinical symptoms, conditioning on the disease state. The symptoms Bell's palsy, radiculoneuropathy, meningitis, encephalitis, 33 and nodal block weren't analyzed because of the small amount of patients having these symptoms.
Results from these tests indicated some dependencies among the clinical symptoms. Table 23 lists the significant pairs of symptoms for Lyme positive patients at a 0.05 significance level. There were 4 dependent pairs of symptoms for Lyme disease patients. These pairs of symptoms were myalgia and fever, arthralgia and fever, fatigue and fever, and headache and stiff neck. It is interesting to note that three of the pairs of

Assessment of Physicians Diagnostic Ability:
The case management forms queried each physician to estimate the likelihood that the given patient had Lyme disease, on a scale from 0-10. This assessment was based upon clinical symptoms alone. In this section we analyze the diagnostic abilities of our participating physicians by comparing the likelihoods given to patients classified as having  In fact, physicians at Wood River Health Services and Dr. England's office had exactly the same misclassification rate as the expert system; only physicians at South County did slightly better. The low sensitivity and specificity values highlight the difficulties in clinical diagnosis of Lyme disease.
A second comparsion of the physicians' and expert system's clinical diagnoses was done with a regression analysis using a dependent variable of the physician's likelihood, an independent variable of the expert system's probability and no intercept, the R-squared value was 0.1497. This low correlation indicates that the expert system' s diagnosis gives 35 little information about the likelihood. Figure 7 shows the data and the fitted regression line. 36 Chapter 5: Discussion and further research.
The goal of this thesis is to develop an expert system to aid physicians in the diagnosis of Lyme disease. Toward that end, a literature search was done to calculate initial estimates for the expert system, and the system was tested on a sample of 124 patients. Misclassification, specificity, and sensitivity rates were calculated and proved to be surprisingly poor. An analysis of the system was conducted to find ways to improve its diagnostic ability. We believe the system's poor performance was due to a number of factors. First, and foremost, clinical diagnosis of Lyme disease is very difficult, as evidenced by physicians' misclassification rates. Second, the system's diagnosis was compared to the serological diagnosis. Imugen's serum test is not a gold standard and the system's performance may have been better if patients were correctly classified. Third, a few of the initial estimates appeared to be incorrect. Finally, some clinical symptoms appeared to be conditionally dependent violating the assumptions of the Bayesian belief network. We believe the system's accuracy can be substantially improved with a few simple solutions. The biased initial estimates should be reestimated. Some initial estimates for early Lyme patients appeared to be over estimated; whereas some values for late Lyme were underestimated. The reason for these biased estimates probably lies in the definitions of early versus late Lyme. Our definition did not always correspond to the definition in the articles. Some articles did not separate the stages; in these cases, we attempted to 37 divide the patients. To obtain better estimates, some of these articles should be thrown out of the study.
The Fisher's exact test used to test for independence revealed dependency between some symptoms. Four pairs of symptoms were significantly dependent, at the 0.05 significance level. Three of the pairs, myalgia and fever, arthralgia and fever, and fatigue and fever, interestingly contain the symptom fever. The fourth pair contains headache and stiff neck. Combining myalgia, arthralgia, fatigue, and fever into one node and headache and stiff neck in another node will alleviate the problems of conditional dependence.
The system could be further improved with the addition of new variables.
Inclusion of exposure variables such as have you been bitten by a tick, was the tick infected, where do you reside, do you have pets, etc. should more accurately assess a patient's probability of having Lyme disease. The inclusion of a node that records the results of both the initial serum test and the follow-up serum test would be beneficial, since often early Lyme patients have not yet seroconverted at the time the first blood sample is taken. A node with the number of unmeasurable symptoms a patient experiences could help distinguish between Lyme disease and CFS. Finally, the system's estimates can be refined by learning from the patient data.
Diagnosis of Lyme disease has proven to be extremely challenging. The first version of our expert system performed slightly worse than physicians familiar with Lyme disease.
We believe implementing the aforementioned changes will make the system surpass the physicians in diagnostic ability.                 I m I • I n I n I Y I n I n I n I n I n I n I n I n I n I n I n I y I n I neg.neg I I a I n I S I ro I ro I ro ,--r11::::r=rl_r ___ l_ n _l __ n __ I _ n --I n I n I n I n I n I n I n I   .L n n n n n n n n n n n n n n n n n n n n n n n n n n n n n ± ± n ± n n n n n n n n n n n n n ± n n ± n ..L n n n """i.         ,,,.c,...lfled ,,.,,.,,,., ,,.,,.,,,. lhM