Attack-Resistant Digital Reputation and Privacy Assessment in Social Media

Web 2.0 has been growing rapidly in the past decade, and leading to surging popularity of online social media. There are over 2.1 billion people that are using social media, which is 28% of the global population. Social media has become one of the most complex computing and communication systems in the planet. Social media attracts large amount of people to create, share and exchange information, interests, ideas, pictures, videos, and etc. in the virtual communities. In social media, people can interact with acquaintances and strangers, and thus privacy and security should be considered seriously. From the privacy perspective, one of the severe type of privacy breach is related to online social networks, such as Facebook, Linkedin, Google+, and Twitter. Online social network users are often not aware of the size and the nature of the audience viewing their profiles, and therefore they may reveal more information than what is appropriate to be viewed publicly. Due to the lack of privacy awareness, online social network users can suffer a number of privacy related threats. In this dissertation, a quantitative online social network privacy risk analysis framework – TAPE is proposed. Inspired by the reliability analysis of a wireless sensor network, the binary decision diagram tool is employed to calculate online social network privacy level. The privacy awareness and privacy trust metrics are proposed to evaluate online social network users’ intention of privacy protection. To our best knowledge, TAPE framework is the first work that take both privacy awareness and privacy trust into consideration. Based on the TAPE framework, we also propose an unfriending strategy in terms of privacy protection, which outperforms other existing unfriending strategies. The detail of this framework is introduced in Chapter 2. From the security perspective, online product/service review system is one of the most vulnerable systems in social media. Since there are enormous profits of online markets and the customers’ purchasing decision is relying on the product/service review, it is highly possible that firms and retailers at the online marketplace may create fake reviews to mislead customers. In this dissertation, a novel angle of fake review detection is introduced, which is called Equal Rating Opportunity (ERO) principle. Based on ERO principle, ERO analysis is proposed. ERO analysis can be implemented with limited cost. It is a new direction of fake review detection. Based on real data testing, ERO analysis is able detect new perspectives of fake review, which cannot be detected by other approaches, while giving a relatively low false alarm rate. The ERO principle and ERO analysis is presented in Chapter 3.

From the privacy perspective, one of the severe types of privacy breach is related to online social networks (OSNs). Facebook, Linkedin, Google+, and Twitter are some of the most visited OSNs. According to the report by Kemp, Facebook has 1.4 billion users and 4.5 billion daily likes, Twitter has 284 million users and 500 million daily tweets, and Google+ has 363 million users and 5 billion daily clicks of +1 button 1 [1]. OSN users are often not aware of the size and the nature of the audience viewing their profiles, and therefore they may reveal more information than what is appropriate to be viewed publicly. For example, 72% of teens have a social networking profile and nearly half (47%) of them have a public profile accessible by anyone [2]. 15% of Americans have never checked their social networking privacy and security account settings [3]. As a result of lack of privacy awareness, OSNs may often generate a number of privacy related threats for the users. It is reported that the top two social media sites that stalkers use are Facebook (16%) and Twitter (3%) [4]. In this dissertation, a quantitative privacy risk analysis framework is proposed. This framework can be used to educate OSN users and raise their privacy protection awareness. It is also able to provide recommendations to improve OSN users' privacy level thus reducing privacy risks.
From the security perspective, online product/service review system is one of the most vulnerable systems in social media. First, online product/service has a huge market and it is still growing. More than 80% of the Internet users have used the Internet to make a purchase, and more than 50% of the online shoppers shopped online more than once [5]. E-commerce sales in the US are predicted to grow from $263 billion in 2013 to $414 billion in 2018 [6]. Second, online product/service review plays a critical role in online market. There are 74% of consumers that relying on social media to guide their purchases [7]. A one-star rating increase can bring a 5-9% increase of revenue to online sellers [8]. Since there are huge profits of online markets and the purchasing decisions dramatically depend on the product/service reviews, it is highly possible that firms and retailers at the online marketplace may create fake reviews to mislead online customers. In this dissertation, a new angle of fake review detection is introduced. The fake review detection approach based on the new angle can capture dishonest signals that the existing approaches would miss. The success of ERO analysis can improve the trustworthiness of product reviews, can help customers make better decisions, can suppress the fake review market, and therefore can help us build a healthy competitive online market.

Online Social Network Privacy 1.Privacy Threats
As the OSNs emerge, people are facing new critical privacy issues. People participate OSN activities by sharing personal data, such as photos, videos, travel plans, comments, etc. The casual posting of personal information on an OSN often creates a permanent record of the user's personal data, and it creates the possibility for the information to be propagated through social network connections. Some information is expected to be accessible only to certain people, usually intimates or acquaintances. However, ONS is a scale-free network, and digital data is easy to copy and store. Therefore, there are unexpected viewers that may obtain our personal data, and sometimes even private information. This can lead to personal information abuse and affect our daily life.
Real life stories of sensitive information leakage via OSN happen frequently.
For example, the UK Ministry of Defence staff have leaked confidential information onto social network sites 6 times in 18 months [9]. The Israeli military cancelled a planned raid on a Palestinian village after one of its soldiers posted details of the operation on Facebook [10]. More employers begin to collect potential employees' information using social network. According to a survey released on the EU Data Protection Day [11], information leakage has put people's careers on risk.

Privacy Protection
ONS privacy issue has been attracting public attention. Hasib has categorized the OSN privacy threats into several types [12], such as digital dossier and search engine indexing that makes sensitive information searchable, image retrieval and interpretation, and profile association. Krishnamurthy et al. studied the problem of personally identity information leakage and how it can be misused by thirdparties [13].Livingstone [14] discussed the risks when young people make friends and share personal information to express themselves online.
Solutions are proposed from different perspectives to protect privacy. Online social network service providers, including Facebook and Google+, let users manage who can access a certain type of information, and Linkedin hides sensi-tive information from users that are not connected to the user. Researchers also studied privacy protection from several perspectives. The first type is proposing new structure of online social network. Felt et al. [15] studied and discussed the privacy concerns of social network APIs (application programming interface) for third party. Baden et al. [16] proposed a new type of online social network using encryption to hide user data and allowing user to define privacy policies. Guha et al. [17] proposed an approach to hide user data by mapping real data to fake data.
Another type is developing tools or methods to examine and improve current online social network privacy. Fang et al. [18] developed privacy wizards to give user recommendation for privacy setting. Gundecha et al. [19] proposed an approach to identify a user's vulnerable friends. In this dissertation, I propose to assist users to improve privacy from another perspective, that is, measuring privacy risk by considering both whether related users are aware of privacy protection and whether they are trusted to protect others' personal information.

Quantitative Privacy Risk Analysis
According to [20], general security threats can be divided into avoidable category and unavoidable category. OSN privacy issue, in the current design of OSN, is unavoidable, due to the publicity of OSN [21]. When risk is unavoidable, we assume an existing risk and attempt to reduce the likelihood of harmful events.
Under the assumption of unavoidable risk, risk analysis becomes extremely important. With an effective risk analysis method, we are able to 1) design secure privacy management, 2) monitor critical data and protect them effectively, 3) make effective privacy policies, and 4) provide valuable analysis data for future estimation [22].
Risk analysis can be performed either quantitatively or qualitatively. According to the National Institute of Standards and Technology (NIST), risk analysis is defined as "the process of identifying risk, assessing risk, and taking steps to reduce risk to an acceptable level" [23]. From this guideline, quantitative risk analysis plays a critical role as the reduction of risk need to be measured. The advantages of quantitative risk analysis include [24]: • Many metrics can be used to represent and evaluate the risk parameters.
This allows a more detailed analysis of the risky events.
• The risk parameters can be expressed in a numerical way, by which people can understand them and make comparison (e.g. human defined units for data importance, threat impact and reputation loss).
• Sophisticated decision-making techniques can be used as the quantitative assessment provides a credible set of parameters.
• The results of the risk analysis process can be expressed in management's language. This makes it more efficient to help an entity to achieve risk reducing objectives.
In summary, quantitative risk analysis can assist and strengthen privacy protection from both technical perspective and educational perspective.
However, in the OSN privacy research literatures, quantitatively analyzing privacy risk is still not mature. First, privacy risk analysis method should consider the key factors of social network, some which are ignored in the current literature.
Second, the data availability, OSN structure and privacy policy may change. To our best knowledge, the current methods are not sufficiently flexible with the change of OSN environment. Finally, the measurement of privacy level is not clear. Some methods use profile visibility as privacy level [19], and some others use number of friends [25]. Therefore, in manuscript 1, I introduce a framework, which is called TAPE (Trust-Aware Privacy Evaluation), to quantitatively evaluate the privacy risk in OSNs. Particularly, the reliability analysis tool is utilized. The factors that impact information propagation are divided into two categories and evaluated by two metrics, including privacy awareness and privacy trust. The proposed TAPE framework is also able to make recommendations to individual OSN user to fulfill their privacy improving needs. the contributions of this work include: 1. the TAPE framework, which considers privacy leakage through nodes and links separately and utilizes reliability analysis tools, as well as the definition of privacy risk, in a quantitative way; 2. the privacy awareness and privacy trust metrics; 3. a privacy awareness algorithm, which shows a clear advantage over the known algorithm called IRT [26] in the current literature; 4. the sensitivity analysis metric, from which we propose an efficient unfriending strategy.
Our work can help people understand their privacy situations, raise their privacy awareness and thus reduce privacy risks 1.2 Security of Online Review System 1.

Online Review System
Online product/service review is an option for customers who have experiences of using a product or service. A piece of review usually contains a rating value and some text describing the aspects of the product. Examples of online reviews include Amazon product review, Yelp restaurant review, and TripAdviser hotel review. Online review system, also referred to as online reputation system, allows customers to post reviews for products/services, and integrates these reviews to an overall reputation score. This type of score reveals the average shopping experience, such as product quality and customer service. Online review system helps people make purchase decisions, and hence greatly affect the profit of online retailers. It is reported that 74% of consumers rely on product/service reviews to guide their purchases [7]. A one-star rating increase can bring a 5-9% increase of revenue to online sellers [8]. People are becoming more and more relying on online reviews when evaluating the quality of products, hotels, restaurants, and vacation packages before placing the order. "If you do build a great experience, customers tell each other about that. Word of mouth is very powerful", said by Jeff Bezos, CEO of Amazon [27].

Fake Review
Online product reviews are becoming more and more important. However, with the increasing popularity of online review systems, such as Amazon, TripAdvisor and Yelp, and the profit growing of online markets [28], malicious users start to abuse the convenience of publishing online reviews and intentionally post low quality, untrustworthy, or even fraudulent reviews. It is reported that sellers at the online marketplace boost their reputation by trading with collaborators [29], and firms post biased reviews to praise their own products or bad-mouth the competitors' products [30]. According to Yelp official blog [31,32], Yelp has been using a review filter to hide certain suspicious reviews since 2005, which means Yelp was aware of review fraudulence at the very early stage. Recently, both Yelp and Amazon announced that they have sued several companies in order to block alleged fake reviews on their website [33,34].
Fake reviews, also referred to as review manipulation and review spam, can inflate or deflate products' reputation scores, crash users' confidence in online reputation systems, and eventually undermine reputation-centric online businesses leading to economic loss. Furthermore, there are some situations, in which the review manipulation is even more damaging. For example, Black Friday and Cyber Monday shoppers heavily relies on online reviews, because they have to make rush decisions for the products they are not familiar with in order to take advantage of the 'unusual' discount that quickly expires. Another example is online reputation of hotels and restaurants. The consumers, who are misled by manipulated hotel ratings, cannot be easily refunded after they purchase these services.

Online Review System Protection
In the literature, researchers propose methods to protect reputation systems from several angles, such as 1) increasing the cost of acquiring multiple user IDs [35], 2) endogenous discounting of dishonest reviews by analyzing the statistic features of the reviews [36], 3) exogenous discounting of dishonest ratings by introducing reputation evaluation of users [36][37][38], and 4) studying correlation between users and reviews to detect dishonest reviews [39,40].
There is a great demand to detect fake reviews thoroughly on the reputation system. There are three directions of fake review detection. In the first direction, fake reviews are detected primarily based on review features, such as the standard word and part-of-speech n-gram features [41], and duplicated or near-duplicated review text [42,43]. In the second direction, dishonest reviewers are detected based on reviewer features, such as through reviewer graph [44], frequency patterns [45], and user correlations [39]. In the third direction, victim products are detected based on the unusual changes of review statistics, such as the jump/drop of average rating [39] and the arrival rate changes [46].
We argue that victim product detection is critically important and under investigation. First, compared to the huge amount of online reviews, the portion of fake reviews is relatively small [47]. Fake review detections can be more efficient if we can focus on only victim products. Second, some reviews may have limited impact while some other may have major impact on the products' reputation. Victim product detection can help us focus on reviews that have major impact. Third, online customers often care about the average rating more than specific reviews.
According to the statistics made by Social Barrel, on average one consumer consults 11 online reviews before making a purchasing decision [48]. Knowing whether the product is a victim can help customers make better decisions.
In manuscript 2, the Equal Rating Opportunity (ERO) principle is introduced, by assuming that the distributions of certain review features should not be related to the rating value. The ERO analysis is also proposed to detect victim products. The contributions of this work include: 1. The ERO principle is introduced, which can reveal the fraudulence signal of fake reviews. Importantly, ERO analysis provides a new angle how we consider product reviews, and it can capture fraudulence signals missed by current work 2. The criterion of ERO feature selection is introduced.
3. The ERO analysis is implemented, and the performance is evaluated by comparing with two other common methods and conducting expert reviews.

Summary
In summary, comprehensively investigating the multiple privacy and security aspects of social media is of great importance. To address the challenges, this dissertation focuses on two critical aspects, online social network privacy and online review system security. In this dissertation, I provide reasonable answers to the following questions.
• Q: How can we evaluate the behaviors of OSN users of spreading information, based on the current data availability?
A: The information propagation is mainly through social connections and individual user. The traditional sociology researches can address the evaluation of the impact of social connections, based on the strength of social ties for example. In this dissertation, information propagation through individual user is evaluated by two metrics. One is privacy awareness, which indicates a user's privacy protection awareness, describing whether a user is paying attention to her/his own privacy. The other is privacy trust, which indicates how much a user's friends trust her/him in terms of not gossiping their information to others.
• Q: How can we detect the fraudulence signal of fake reviews?
A: Fake review is elusive, and the fake review signal is multidimensional.
Fake review detection is nontrivial. The existing approaches can work well for some dimensions. We found a new angle that is not getting sufficient attention, and we introduce the Equal Rating Opportunity principle and ERO analysis to detect fake reviews from this new angle.

Introduction
With the emergence of Online Social Networks (OSN), people are facing critical privacy risks. In OSN, personal information can be abused, which will put users into risks. Researchers identified OSN privacy issues as two categories, inadvertent disclosure of personal information, and stalking or backtracking [1,2].
Krishnamurthy et al. studied the problem of personal identity information leakage and how it can be misused by third parties [3]. This kind of information is able to distinguish an individual's identity either alone or when being combined with other information that is linked to a specific individual, and its leakage will lead to identity theft. Livingstone [4] demonstrated the risks when young people make friends and share personal information to express themselves online. Real life stories of sensitive information leakage in OSNs happen frequently. For example, most employers began to collect potential employees' information using social networks.
According to a survey released on the EU Data Protection Day [5], privacy leakage had put people's careers on risk.
In the current commercial OSN design, privacy risk is unavoidable, due to the publicity of OSN [6,7]. In order to benefit from the convenience of OSNs, people share personal information with friends, which makes privacy leakage possible.
When privacy risk is unavoidable, we assume a risk and attempt to reduce the likelihood of harmful events. Under the assumption of unavoidable risk, risk analysis becomes extremely important. According to the National Institute of Standards and Technology (NIST), risk analysis is defined as "the process of identifying risk, assessing risk, and taking steps to reduce risk to an acceptable level" [8]. In [ In this manuscript, we address the first challenge by proposing quantitative definition of privacy risk, based on privacy hazard and its probabilities. This quantitative measurement will lead to the privacy level calculation tools, which were originally proposed in the reliability analysis field. To address the second and third challenge, we have to consider the availability of social data. Since nobody can monitor the users' all communication behaviors (online and offline), researchers have to work on limited data, which can be obtained with reasonable costs. In this work, Facebook privacy setting is used as the primary data source. We also focus on the 'word-of-mouth', which is the primary drive of OSN information diffusion [16].
Although other privacy leakage scenarios, which we discuss in Section 2.7.7, are not considered in this work, the proposed concepts, including privacy awareness and privacy trust, can be extended to those scenarios. For the fourth challenge, due to the lack of ground truth of users' privacy level, we compare the proposed scheme with some existing approaches, such as privacy concern model in [17] and vulnerability analysis in [11]. In addition, Monte Carlo simulation is employed to verify the results of privacy risk evaluation.
In this manuscript, we propose a TAPE (Trust-Aware Privacy Evaluation) framework for quantitatively evaluating users' privacy level in OSNs. The TAPE framework contains several novel aspects.
• It finds the similarity between the reliability analysis in wireless sensor networks (WSN) and the privacy risk estimation in OSNs. It sets up the stage for utilizing reliability analysis tools for privacy analysis.
• It considers the privacy leakage through nodes (i.e. users) and through links (i.e. friend connections) separately. Here, the privacy leakage through nodes mainly depends on the users' behavior, and we define two metrics in TAPE to estimate it. The first one reflects whether one respects others' privacy, and it is named as Privacy Awareness. The other one reflects how much one's friends trust her/him in terms of not gossiping their information to others, and it is named as Privacy Trust. The privacy leakage through a link mainly depends on the relationship between the two users, in terms of whether one paying attention to the other' personal information.
• It proposes the desirable properties of privacy awareness and privacy trust metrics, as well as specific ways to calculate them under the guidance of trust management theory. It is the first time that the privacy trust concept has been used in evaluating privacy level in OSN.
Besides privacy risk estimation, the TAPE framework has the ability to conduct sensitivity analysis for friend links, which is similar to the concept of vulnerability in [11]. Through the sensitivity analysis, an OSN user can understand how much his/her privacy level is affected by a particular friend connection. The sensitivity analysis yields a practical way to improve OSN users' privacy level.
As a summary, the contributions of this work include: 1. the TAPE framework, which considers privacy leakage through nodes and links separately and utilizes traditional reliability analysis tools, as well as the definition of privacy risk, in a quantitative way; 2. the privacy awareness and privacy trust metrics; 3. a privacy awareness algorithm, which shows a clear advantage over the know algorithm called IRT [17] in the current literature; 4. the sensitivity analysis metric, from which we propose an efficient unfriending strategy.
This manuscript is organized as follows. Related work is discussed in Section 2.3. TAPE framework is described in Section 2.4, followed by discussion of information spreading probability algorithms and the proposed algorithms in Section 2.5. Privacy assessment and sensitivity analysis metric are presented in Section 2.6. Experiment results and conclusion are presented in Section 2.7 and Section 2.8 respectively.

Related Work
Privacy in OSN have attracted many attentions. OSN service providers allow users to manage who can access which information (e.g. in Facebook and Google+), and to hide sensitive information to non-connected users (e.g. in Linkedin). Researchers studied privacy protection from two directions. Along the first direction, fundamental changes to the current design of OSN were suggested to enhance users' privacy. Felt et al. [14] studied and discussed the privacy concerns of social network APIs for third parties. Guha et al. [18] proposed an approach to hide user data by mapping real data to fake data. Within the first direction, "Privacy by Design" (PbD) is an important approach. In [19], Wolf  to use homomorphic encryption and multi-party computation techniques to hide privacy-sensitive data from the service provider in a recommender system, without losing the significant usability of data. The second direction is developing privacy protection tools based on existing OSNs. For example, Fang et al. [12] developed privacy wizards to give user recommendation for privacy setting, and Gundecha et al. [11] proposed an approach to identify a user's vulnerable friends. In this manuscript, we propose to assist users' privacy protection by providing quantitative evaluation of privacy risk and conducting sensitivity analysis for friend links.
Our work belongs to the second direction.
There have been several quantification models for privacy evaluation in OSN.
Alim et al. [21] proposed a vulnerability quantification model which consists of three components: individual, relative and absolute vulnerabilities. They examined the visibility of OSN users' profiles and computed the clustering coefficient to compose individual vulnerability. Based on individual vulnerability, relative vulnerability and absolute vulnerability were calculated. Besides privacy risk evaluation, friend vulnerability analysis, also referred to as sensitivity analysis in this manuscript, is considered to be a good way to improve personal privacy. Abdulrah-man et al. proposed a node vulnerability metric [22] and a multi-agent vulnerability analysis [23] based on the friendship graph of MySpace. Vulnerability index (V-Index) was proposed to measure how vulnerable an OSN user is based on her/his friends' privacy setting [11]. Privacy setting and its implications were considered as a primary factor in the existing models. In this manuscript, we consider privacy setting as one of the primary factors. The implications of privacy setting are represented as two metrics -privacy awareness and privacy trust. Besides privacy setting, the TAPE framework is able to adopt social tie analysis approaches when implementing the module of link information spreading probability. The network topology and information diffusion patterns are also considered.
The proposed work is also related to information diffusion in OSN. Gruhl et al. [24] studied the dynamics of information spreading in weblogs. Adar et al. [25] demonstrated a technique for inferring information propagation through a blog network by applying epidemic models of information spreading. Cha et al. [16] studied social cascades over Flickr social network. Researchers also attempt to build mathematical model to solve problems of information diffusion in online social network, such as [26][27][28]. In addition, there are literatures discussing the relationship between tie strength and information propagation [29], which is related to the information spreading probability that is discussed in this manuscript.
Different from the previous information diffusion work, the proposed TAPE framework considers information diffusion in the context of privacy protection, which requires different set of features and considerations.

Trust-aware Privacy Evaluation Framework
In this section, the TAPE framework is discussed in details. We first define privacy risk from the perspective of information diffusion. The binary decision diagram (BDD) which was commonly used for system reliability analysis is employed to calculate privacy risks. The concepts of node information spreading probability and link information spreading probability are proposed.
Proportion of users whose privacy setting for I j is looser than user u rank − u,j Proportion of users whose privacy setting for I j is tighter than user u A, B Friend link between A and B P T u User u's PT Positive recommendations for user u Figure 1: Online social network of Example 1.

Online Social Network Privacy
Some OSNs (e.g. Facebook and Linkedin) encourage people to use real names and upload personal information onto a page known as 'Profile'. Such personal information is often accessible by friends directly, and can even flow to thousands of other people through retweet (e.g. on Twitter), sharing (e.g. on Facebook) and online communication (e.g. chatting). The privacy concern in OSNs is well known, but how can we define the privacy risk in a quantitative way?
Before discussing quantification of privacy risk, we first look at two examples. Generally, in some scenarios, we want some personal information to be known only by friends, and in some other scenarios we don't want certain personal information to be viewed by specific people [2]. In Example 1, the personal information concerned by Alice is her comment on Cris, and in Example 2, the personal information is her photo. It is clear that an user has different types of personal information, and that the privacy concerns depend on the particular type of personal information. We introduce the notation I u j to denote user u's type j personal information. Without loss of generality, we present the framework in the context of protecting Alice's privacy, i.e. u="Alice". Alice is also referred to as the personal information owner (PIO). In the rest of the manuscript, for simplicity, we use It is noted that privacy concerns are related to the "undesirable viewers".
We define the concept of Undesirable Group (UG) of I j , denoted by UG j , as follows. If Alice does not want her information I j to be seen by user u i , then u i is put into UG j , where u i is also called Undesirable Destination (UD) of I j .
In Example 1, Alice's UG is {Cris}. In Example 2, Alice's UG contains all users except her friends.
In other words, if I j flows to any UD, Alice considers her privacy of I j being violated and privacy leakage happens. In the rest of the manuscript, for simplicity, we use UG to represent UG Alice j . Information leaking to different persons has different potential risks to the PIO. Without defining UD, this difference cannot be captured. The privacy definition based on UD is a more generalized definition. In this definition, users are classified into 4 types: 1. personal information owner (e.g. Alice), 2. users who are allowed to access personal information according to the privacy setting (e.g. Alice's friends), 3. users to whom the exposure of personal information causes damage (i.e. the undesirable group) 4. users not belonging to the above three types.
In the existing work, people usually assume that there are no type 4 users, such as in [11]. The definition of privacy leakage in this manuscript becomes the traditional definition as long as the UD is defined as the complement set of the type 1 and type 2 users. Our definition can also handle the cases that Alice only concerns that the privacy leaks to a specific set of users, as seen in Example 1. In other words, our definition can capture the fact that privacy leaking to different persons has different damage to the PIO. Such difference is usually not captured by the privacy setting alone.

Privacy Risk and Related Concepts
With the proposed TAPE framework, we aim to answer two questions: 1) Can we measure the probability of personal information leakage as a measurement of privacy risk in OSN? 2) How is the personal information leakage related to privacy risk? In this subsection, we first introduce the key concepts of the TAPE framework.
In [15], privacy is considered as keeping a piece of information in its intended scope. In TAPE, the leakage of personal information I j occurs when any users in the undesirable group UG j view I j . Here the undesirable group is the same as the unintended scope in [15]. We assume that I j can only be obtained through online information diffusion, which only occurs through friend connections. This assumption is a result of the limitation of data, as discussed in Section 3.2. In the future, if more data are available, such as cell phone contact data, this assumption can be revised. Due to this assumption, the UG in Example 2 can be simplified as {all of Alice's 2-hop neighbors}. We define privacy leakage probability of I j , denoted by L j , as the probability that at least one UD views I j through information diffusion in the OSN.
In statistics, the notion of risk is often modeled as the expected value of an undesired outcome [30]. That is Risk = (probability of the accident occurring) ×(expected loss in case of the accident) In the context of OSN, we argue that privacy risk of information I j , denoted by V j can be computed as where L j is privacy leakage probability as defined in Equation 1 and Z j describes the expected loss/damage in case of privacy leakage. In this manuscript, we also use another term "privacy level" to describe an individual's privacy, and obviously, the lower privacy risk is, the higher the privacy level is. In  such as the privacy leakage problem study in [3]. In this manuscript, we simply assume that Z j can be provided by PIO. In the rest of this manuscript, when we compare privacy risks, Z j is considered as constant 1. Based on this assumption, the privacy leakage probability L j is equivalent to privacy risk V j . The core task in TAPE is to estimate the privacy leakage probability L j .

Toward Privacy Leakage Probability Estimation
In We argue that the privacy leakage probability estimation problem can be decomposed into two tasks. ISP of a node is determined by complicated factors, ranging from knowledge to personality, which is extremely difficult to quantify or even understand.

Task 2:
The second task is to compute the probability of privacy leakage (i.e. L j ), given the network topology, the information spreading probabilities of links and nodes, the PIO (i.e. Alice), and the UG.
In the rest of this section, we first discuss the solution to the second task (Section 2.4.6), and then present the solution to the first task (Section 2.5). Fig 2 shows the core structure of the TAPE framework.

Privacy Analysis and Reliability Analysis
When investigating information diffusion in OSN, we found reliability graph, which has been used as one of the reliability analysis tools (e.g. WSN reliability),  : Similarity between WSN and OSN. In 3a, Sensor A detects fire, and the detection will be sent to the server through other sensor nodes. In 3b, Alice (PIO) feels her photos are improper to be viewed by Eve (UD).
can be adapted to solve the problem.
In a reliability analysis problem, the system is represented by a reliability graph, whose links and nodes are assigned failure probabilities. The system has a source node and a sink node that is usually a station. If there is no path from the source to the sink can be established, the system fails. For example, in a WSN, the nodes are sensors, and the links are the communication channels. A sensor's failure probability depends on its battery, environment temperature, work In the TAPE framework, we have defined the information spreading probability for nodes and links in the previous section. This concept is kind of "opposite" to the failure probability. For example, if node A fails to forward data to its neighboring nodes with probability x, node A's failure probability is x in the context of WSN reliability analysis, whereas this node's information spreading probability is 1 − x in the context of privacy analysis. The goal of WSN is to transmit data successfully, whereas the goal of privacy protection is to prevent personal information from propagation. Therefore, in the TAPE framework, we can also define failure probability of nodes/links as 1−ISP . We propose to use the binary decision diagram (BDD) method, which is commonly used in reliability analysis [31][32][33], to solve Task 2 described in Section 2.4.5. Table 1 shows the important concepts in TAPE, as well as the concepts mapping.
A BDD is a directed acyclic graph created based on Shannon's decomposition.
It is an efficient tool to manipulate boolean expressions. For example, in Fig 4, Alice is PIO and Bob is UD. All nodes and links are assigned ISPs. In order to calculate the information leakage probability L j , we first use a boolean expression to represent L j .
Then, a BDD graph is constructed based on the reliability expression. The BDD graph is a binary tree ( Fig 5), each sub-tree is considered to be a subexpression. The left sub-tree of a BDD node represents the expression when the node successfully spreads information. The right sub-tree represents the expression when the node fails to spread information. When traversing from the root to a leaf node, if the leaf node is a left child, then it gives a information leakage case.
Based on the BDD diagram, we can evaluate L j using a recursive method. The details of the BDD approach can be found in [31].
When BDD method is utilized in the OSN privacy problem, one of the most challenging issue is the computational cost. It is noticed that the size of the BDD graph increase exponentially as the size of the network. The size of OSNs are too large to make an efficient BDD calculation. In this work, BDD is employed to compute the probability of information diffusion after modification. Due to the large size of the social network and the high computation cost of BDD, we adopt a reduced BDD algorithm. In particular, we set the maximum traversing depth as k times the number of hops between PIO and UD. For example, when k = 2 and the UD is 3 hops away from the PIO, the branches longer than 6 (3 × 2) are discarded from the BDD graph. In Section 2.7, we set k = 2.

Summary
By studying the similarity between the reliability analysis in WSN and the privacy risk estimation, we modify the BDD method to evaluate information leakage probability. The concept of node ISP and link ISP are developed. The core structure of the TAPE is shown in Fig 2. As a summary, TAPE is presented as a framework to solve task 2 described in Section 2.4.5. In Section 2.5, we discuss details of ISP calculation. Particularly, the metrics of privacy awareness and privacy trust are proposed for node ISP calculation.

Information Spreading Probability Algorithms
While most social network information diffusion models consider the impact of nodes and links together [25], we argue that information propagation through nodes and through links should be considered separately. This is why we define information spreading probability of node (NISP), also referred to as node ISP, and information spreading probability of link (LISP), also referred to as link ISP, which can better describe the information diffusion process. NISP is the probability that a node will spread others' information, and LISP is the probability that a link will be in the path of information diffusion. NISP and LISP imitate the nature human communication process in the real world (i.e. offline social network).
• NISP describes the probability of speaking, i.e. talking about others.
• LISP describes the probability of listening, i.e. hearing what is said.
In this section, we focus on the algorithm of NISP, followed by a brief introduction of the LISP algorithms proposed in literatures.

Node Information Spreading Probability (NISP)
Evaluating NISP of a person is very challenging, because it is related to one's knowledge and personality. In the offline social network, we probably can estimate the NISP of a person based on experiences if we know this person well. Obviously, such estimation can be biased and limited, and most importantly cannot be applied in OSNs due to data limitation. Instead of resolving a challenging problem in social science, we propose to examine NISP based on the quantitative data available in OSNs.
In particular, we propose two metrics that should be used to estimate NISP -privacy awareness and privacy trust.

Privacy Awareness
The first metric is privacy awareness (PA), which depends on a user's privacy setting. We argue that privacy setting reflects a user's privacy protection awareness, describing whether a user is paying attention to his/her own privacy.
There are many different ways to compute a user's PA. In TAPE, PA evaluation is a module. The input is a set of the user's privacy setting, which is represented  loss of generality, in the rest of this manuscript, we use S and s j to represent Alice's privacy setting set and privacy setting respectively.
Let PA u represent PA of user u and G PA represent the adopted PA calculation, The TAPE framework can accommodate many PA algorithms. However, what are the design criteria for PA algorithms? Based on the possible distributions of privacy setting and Alice's possible adoptions, we identified seven special cases and the desirable PA values in these special cases in Table 2, which serves as a guidance for the PA algorithm design. To better understand Table 2, we define rank + u,j as the proportion of users whose privacy setting for I j is looser than u, and rank − u,j as the proportion of users whose privacy setting for I j is tighter than u. As long as we know the statistics of users' privacy setting for I j and the adoption of u, we can compute rank + u,j and rank − u,j .
Example 3. Table 3 shows the statistics of birthday (I 1 ) privacy setting as an example. Alice allows only her friends to see her birthday.
We assume people can apply privacy setting for one type of information in Table 2.
However, it is easy to extend it to multiple types of information.   should get a larger PA but relatively smaller than that in case 5.
We have to point out that Table 2 may not include all possible cases. For example, if Alice adopts a tight setting for birthday and a loose setting for phone number, and Bob adopts a loose setting for birthday and a tight setting for phone number, it is difficult to compare Alice's PA with Bob's. In such case, we need more data to make the PA evaluation more accurate. At current stage, we argue that those desirable properties in Table 2 provide a satisfying guidance for the PA algorithm design.

PA Algorithm
In our proposed PA algorithm, individual information privacy awareness (IPA) is calculated first. IPA is the PA value calculated from privacy setting of one type of information. Let IPA u,j denote the IPA of u for I j .
where rank + u,j and rank − u,j are defined in Section 2.5.1. It is easy to verify that Equation 6 satisfies the desirable properties in Table 2. As an example, IP A Alice,1 = 0.725 in Example 3. Obviously, 0 ≤ IPA u,i ≤ 1. In fact, people can develop more sophisticated calculation to replace Equation 6, according on the implementation environment and data availability. After calculating the IPAs for all types of information, the PA of u is calculated by In the literature, there are some approaches proposed to evaluate similar metrics.
For example, in [17], the Item Response Theory (IRT) was used for modeling "privacy concern". In Section 2.7.1, we compare the proposed PA algorithm with IRT privacy concern model.

Privacy Trust
We propose another metric to evaluate how much a person should be trusted in terms of protecting privacy. Because this metric reflects how much one's friends trust her/him in terms of not gossiping their information to others, it is named as privacy trust (PT). In fact, this type of trust is very difficult to evaluate based on direct evidence. First, direct evidence is rarely available, because we cannot wait someone to commit bad behaviors (e.g. gossip others) before estimating PT.
Second, the clues that people use to determine whether a person is trustworthy in offline social networks are usually not available in OSNs. Alternatively, indirect evidence is used to predict OSN users' PT. Such indirect evidence can be established based on recommendations [34]. For example, Fig 6a shows  In TAPE, we propose to evaluate an individual's PT based on implicit rec-  Alice from her friend Bob is established when Bob allows her to access his personal information. Moreover, if Bob has a high PA value, it implicitly tells us that Alice may be trusted not to propagate others' personal information (Fig 6b) PT calculation in TAPE is a module whose inputs are PAs of the user's friends and trust evaluations that how much the user is trusted by friends. Let P T u represent PT of u and G P T represent PT calculation, then where T f riends,u indicates how much u is trusted by friends. Similar to PA calculation, we argue that the PT calculation should follow 3 rules.

PT Algorithm
In the literature, there are many trust models. We adopt a trust model using Beta function to address concatenation propagation and multi-path propagation of trust [34].
In the context of PT, the recommendation accuracy (arrow from A to B in Fig 6a) is replaced by PA of node B, and the trust value (arrow from B to C in Fig 6a) is the implicit trust of B towards C, represented by T B,C in TAPE. For simplicity, in the manuscript, we set T B,C as a constant value, by assuming that when two users are connected in OSN, they have certain chance to see each other's personal information, but it does not necessarily mean they have already read or will read that information. In the future, when more OSN data is available, such as the nuanced privacy setting, T B,C can be calculated more accurately.
We use R + u to denote the set of positive recommendations for u, i.e. u's friends whose PA values are higher than a threshold (ǫ + ). The PT calculation we adopt is described as follows.
First, we estimate PT through one recommendation path where f i ∈ R + u is ith high PA friend of u. Then the variance of the estimation of P T u,f i is calculated The PT is calculated using Beta trust model where where w is a weight factor between 0 and 1. Here weighted average is one of the simplest ways to combine PA and PT. In fact, People can develop more complicated calculation depending on the implementation environment and data availability.
In the experiments in Section 2.7, we choose w = 0.5. Fig 7 shows the diagram of the NISP calculation. In the future, real human users must be involved (e.g. questionnaire) to understand the relationship among NISP, PA and PT.

Link Information Spreading Probability (LISP)
As discussed earlier, LISP of the link between Alice and Bob depends on whether Bob heard what Alice said. Furthermore, it depends on whether Alice has a strong tie with Bob and whether the information is interesting enough to catch Bob's attention. In the current literature, many works have investigated social ties [29,35]. Note that the TAPE framework can accommodate any algorithms for LISP calculation, as long as the outcome of LISP calculation is a value between 0 and 1 indicating the probability of information spreading. In this manuscript, we do not propose a specific algorithm for calculating LISP. In the experiments, we adopt a constant value for LISP and focus on demonstrating the impacts of PA and PT.

Privacy Assessment and Privacy Improvement through TAPE
By evaluating NISP and LISP, and utilizing the reliability analysis method, TAPE has the ability to assess one's OSN privacy level. More importantly, based on the privacy assessment process, TAPE is able to tell people the strategies of improving privacy level.

Privacy Assessment
As discussed in Section 2.4, by utilizing the BDD method and adopting proper NISP and LISP algorithms, TAPE is able to evaluate privacy leakage probability from the PIO to the UD. In real life, people usually want to avoid certain personal information being viewed by multiple people, which is the reason why we define undesirable group. Without further modification, TAPE can solve multiple UD case. Given an undesirable group UG = {UD 1 , UD 2 , . . . , UD K }, K > 1, the information leakage probability to UG is where L i is the privacy leakage probability to UD i . Here, privacy leakage happens if any one UD gets the information.

Privacy Improvement Strategies
The goal of privacy protection in TAPE is to reduce privacy risk. From a user's perspective, the most practical strategy is to block a friend to access certain personal information, also referred to as unfriending. In TAPE, we develop a method that can identify the friend link which contributes to the privacy leakage the most. We adopt Birnbaum's measure (BM) [36] to find such a friend link.
Originally, Birnbaum's measure was used to examine the sensitivity or importance of a component in reliability graph. In TAPE, we use it to evaluate the sensitivity of a friend link. Birnbaum's measure evaluates the partial derivative of the leakage probability with respect to LISP of link c.
BM (c) = ∂L ∂ISP (c) (16) For the single UD case, the detailed calculation of BM, which uses the BDD graph, can be found in [36].
We derive Birnbaum's measure for multiple UD case. By rewriting (15), we For the right-hand side, For the left-hand side: Plugin together, the Birnbaum's measure of c for UG is where α i = 1−L U G 1−L UD i is the contribution weight of the ith UD, and BM i (c) is the Birnbaum's measure for c when computing the privacy leakage probability to UD i . Finally, the unfriending strategy proposed in TAPE is to find a friend link, which is TAPE suggests to block c * to improve privacy level.

Case Study
We compare two PA algorithms. The proposed PA algorithm, referred to as Rank PA, is described in Section 2.5.1. The comparison algorithm is described in [17], referred to as IRT. Briefly speaking, this scheme calculates a metric called "privacy concern" based on privacy settings, by utilizing Item Response Theory.
The goal is to estimate OSN users privacy concerns toward information sharing.
Since there is no ground truth on what should be the most "correct" value of PA, in order to demonstrate their major features, we compare these two schemes in special situations. Assume Alice has 3 types of information I 1 ,I 2 and I 3 , and the related privacy settings are s 1 ,s 2 and s 3 . The privacy setting is binary, either open (represented by 0) or hidden (represented by 1). The column index in Table 4 is the possible privacy setting. We randomly generate privacy setting data as follows. For each special situation, we first specify the proportion of each privacy configuration (e.g. '000', '001', etc.), and then generate the privacy configuration realities according to the distribution. 10,000 privacy configuration realities are generated for each special situation. We conduct the case studies, and the special situations we investigate are follows. The PA calculation are shown in Table 4, in which the proposed PA algorithm is referred to as "Rand PA" and the comparison scheme is referred to as "IRT z".
We first investigate the range of each scheme. IRT z has narrow ranges for the studied situations, although the theoretical range of IRT z can be (−∞, ∞). In order to adopt IRT z in TAPE as a PA algorithm, non-trivial normalization is needed. On the other hand, the proposed Rank PA has a range from 0 to 1 as expected. In addition, the neutral value for Rank PA is 0.5, and neutral value for IRT z is 0. Then, we investigate both schemes according to desirable properties of PA in Table 2 and get follow observations.
1. The majority always get PA values close to neutral for Rank PA, i.e. "000" of special situation 1, "011" of special situation 2, "110" of special situation 3 and "111" of special situation 4. However, when IRT z is used, such majority behavior cannot be captured. Table 2. It is seen that both Rank PA and IRT z satisfy the desirable properties of case 4 and case 5.

Special situation 1 corresponds to case 4 and case 5 in
3. Special situation 4 corresponds to PA case 6 and PA case 7 in Table 2. Rank PA satisfies the desirable properties in PA case 6 and PA case 7. When look at "000" in special situation 4, IRT z gives a higher value than the same privacy setting in special situation 1, which violates the desirable properties.
4. When investigating column "000" in Table 4, it is expected that the PA values from top to bottom should change from neutral to small, because the more people adopting tight privacy setting may indicate that the information is more sensitive and opening it can yield lower PA values. Rank PA has such trend, while IRT z does not.
In addition, IRT z is designed for binary privacy setting. However, real privacy setting usually has more than two options, such as in Facebook. Additionally, compare to the proposed Rank PA, IRT z also suffers from higher computational cost.

Datasets
We use two datasets to conduct experiments.  Information of the two datasets is listed in Table 5.

Privacy Risk
It is well known that the reliability of data transmission can drop significantly as the distance (i.e. the number of hops) increases. In the context of privacy protection, does the privacy risk heavily depend on this distance? We study the relationship between the privacy risk and the distance from the PIO to UD.
We first randomly pick 100 nodes from dataset I and put them in the PIO set.
In each round of simulation, we pick one node (without replacement) from the PIO set as the PIO, and pick another node from the network as the corresponding UD, which is no more than 6-hop away from the PIO. If the picked PIO is an isolated node (i.e. degree is 0), we skip it. For each pair of PIO and UD, we measure the distance, compute the privacy risk using TAPE, and plot the privacy risk in Fig 8. Each point represents one pair of PIO and UD. The x-axis indicates the distance between PIO and UD, and y-axis is the privacy risk. In this experiment, LISP is chosen from 0.5, 0.8, 0.9, and 0.95. We have the following observations • As expected, when the distance increases, privacy risk has a decreasing trend.
• The privacy risk to 1-hop UDs (i.e. friends) can be greater than the LISP.  • When the distance is small, the privacy risk varies in a large range. The distance is not a dominating factor. The PA, PT and network topology jointly determine users' privacy risk. A user who is 3 hops away may be more likely to obtain Alice's person information than a user who is 2 hops away.
• As the LISP decreases, the privacy risk decreases. In the future work, incorporating the estimation of LISP will yield even a larger variation in the privacy risk values.

The impact of PA and PT
Since the lack of "ground truth" about the real privacy risk of users, it is hard to compare TAPE with other privacy evaluation methods that consider different features of the users. Instead of comparing TAPE with a specific method, we argue that a prevalent type of privacy study in OSN only focuses on network topology.
We construct a comparison method, referred to as topology-based method, which uses the BDD to compute the privacy risk with fixed LISP and NISP. By comparing TAPE with the topology based method, we will see whether considering PA and PT metrics reveals more information that is not captured by considering the topology alone. In the experiment, we set the LISP to be 0.5, and set the NISP of the topology based method to be the average of the NISP values when considering PA and PT.
The experiment setup is similar to that in Section 2.7.3. We construct PIO sets for both dataset I and dataset II, and each set has 100 nodes. In each round of simulation, one node is picked up (without replacement) from the PIO sets as the PIO, and another node that is 3 hops away from the PIO is picked as UD.
We calculate the privacy risk using TAPE and using the topology based method.
We define proportional difference as D = V T opology −V T AP E V T opology , where V T opology is the privacy risk calculated based on topology, and V T AP E is the privacy risk calculated by TAPE. The histograms of D for both datasets are shown in Fig 9. It is seen that the proportional difference range is from -25% to 5%. Hence PA and PT do provide additional and useful information beyond the topology. In addition, it is seen that dataset II shows more concentrated distribution around 0, and dataset I has a wider range. It is known that, dataset II has 4 types of personal information and each type has 2 privacy setting options, while dataset I has 16 types of personal information and each type has 5 privacy setting options. We In the topology-based method, we set the LISP to be 0.5 and the NISP to be the average NISP in TAPE. When choosing the NISP value, we argue that the NISP setting favors the topology based method. Particularly, when PA and PT are not available, it is very difficult to choose a proper NISP value for the topology based method. By choosing the average NISP value from TAPE, we believe that it will provide a reasonable NISP estimation for the topology based method. In the rest of this section, we conduct experiment to study how much the LISP value can impact the results when comparing TAPE and the topology-based method.
The experiment setup is the same as the one using dataset I earlier in this section,  Table 6. The proportional difference does not change when we select different LISP. However, we have to point out that smaller LISP values will give smaller privacy risk estimations, and we already observed it in Fig 8.

Verification of TAPE Calculation
In the previous experiments, the privacy risks are calculated from LISP and NISP using BDD as described in Section 2.4.6. In order to verify this calculation, Monte-Carlo simulations are used and the results are compared with the outputs of TAPE.
The simulation is conducted as follows. At the initial stage, a node is selected UD that obtains a token duplicate, this simulation is marked as 'information leakage observed'. By repeating the simulation N times, we will get N 1 'information leakage observed' simulations, and the simulated privacy risk is N 1 N . In the experiment, we randomly select 1,000 PIOs from dataset II, and those whose degrees are less than 2 are skipped.

Sensitivity Analysis and Unfriending Strategy
Unfriending is suggested in [22,38]. We propose an unfriending strategy based on Birnbaum's Measure, referred to as TAPE unfriending, which evaluates the partial derivative of the leakage probability with respect to the LISP of a given friend connection. In this section, we conduct experiments to compare TAPE unfriending with 3 unfriending approaches.
1. TAPE: In this approach, the friend link that has the largest Birnbaum's measure is blocked and the privacy improvement is calculated. be the critical points in information diffusion. Therefore, we examine the privacy improvement by blocking the friend with largest degree.
3. V-Index: Vulnerability index was proposed by Gundecha et. al. [11], which is based on privacy setting of friends. We use this approach for unfriending, by blocking the friend with the largest V-Index.

Random:
We also calculate the privacy improvement by randomly removing a friend link. This approach helps us to understand the average case when no friend sensitivity indicators are available.
The experiment setup is the same as that in Section 2.7.5. For each PIO-UD pair, we use above approaches to remove one friend link and calculate the privacy risk reductions. The experiment results are shown in Fig 11, in which the x-axis is the index of PIO-UD pair and y-axis is the privacy risk reduction. The statistic summary is shown in Table 7. We can see that TAPE gives the best performance.
It is important to point out that the privacy risk reductions are calculated using the TAPE framework. It is not surprising that the Birnbaum's measure, which is based on TAPE, performs the best. On the other hand, we show that the other unfriending strategies, which consider less information, are not as promising as the proposed TAPE framework.

Discussion
We define a probability based definition for privacy risk (level). Starting from the quantitative definition, the reliability evaluation method is utilized to calculate one's privacy risk in terms of information diffusion. In TAPE, PA and PT are used to capture OSN users' privacy protection behaviors. TAPE can also compute the sensitivity of one's friend links, which can assist the user to adopt unfriending strategies. TAPE can be a starting point of enhance OSN users' privacy level.
Since it highly depends on data sufficiency, the OSN service providers who control the most user data could be the best candidates to implement TAPE, and their users can really benefit from it. In addition, it is expected that social links (LISP) can also impact the calculation of privacy level. Real applications should adopt an LISP algorithm while being implemented.

Privacy Leakage beyond one OSN
In TAPE, we assume that information can only be obtained through information diffusion within OSN. In practice, information diffusion is a much more complex process. There are several scenarios of information diffusion in social networks.
1. Cross-OSN diffusion: People can be active in multiple OSN platforms.
For example, Alice is a friend of Bob on Twitter. She sees news about Bob on Twitter, and then she posts some words about the news on Facebook.

2.
Offline diffusion: This is the traditional way we spread information through face-to-face conversation, phone calls etc.
3. Online-Offline diffusion: Information is propagated through both online and offline channels. This is the most common way we spread information in the information era.
Whereas scenario 2 is well studied in social science, scenarios 1 and 3 are challenging. In all the three scenarios, the concepts of privacy awareness and privacy trust are still valid. They have a great potential to be adopted in these scenarios and contribute to a broader study on personal privacy leakage in a hybrid online-offline world.

Conclusion and Future Work
In this manuscript, we present a TAPE framework for the quantitative evaluation of users' privacy risk in OSNs. Mathematical tools (e.g. statistics, modeling techniques) are used to process online social network data, and signal processing tools are utilized in this work. The concepts of privacy awareness and privacy trust are introduced. Simulations are performed to illustrate the computation of privacy leakage probability, as well as to demonstrate that TAPE can capture useful information which was not captured previously. Several unfriending strategies are compared with the Birnbaum's method of TAPE, and TAPE gives the best performance. More importantly, TAPE sets up the stage for utilizing reliability analysis, which is a well-developed field, to solve privacy risk analysis problems.
Besides BDD, other tools such as sensitivity analysis can surely benefit privacy research.
Future work includes developing better PA and PT algorithms, implementation of TAPE as a Facebook application and performing real user testing.

Abstract
Online market has been growing for many years. There have been huge profits in online markets. Online product/service review plays an important role when customers are shopping on the Internet. The purchasing decisions dramatically depend on the product/service reviews. Sellers can benefit from creating fake reviews to boost their products' reputation or bad-mouth their competitors' products. Fake reviews can seriously affect both buyers and sellers. In this manuscript, we introduce a novel angle to detect fake reviews, which is called Equal Rating Opportunity (ERO) principle. Based on ERO principle, we propose a fake review detection method -ERO analysis, which is able to detect fraudulence signals based missed by existing approaches. Experiments based on Amazon product reviews are conducted, in which the applicability of review features for ERO analysis is studied, two common fake review detection methods are compared with the proposed ERO analysis, and expert reviews are employed to evaluate the performance of ERO analysis. The ERO analysis provides a new angle of fake review detection, which is important when the fake reviews are diverse.

Introduction
Online product/service reviews are created by customers who have experiences of using the product/service, such as Amazon product review, Yelp restaurant review and TripAdvisor travel review. A piece of review often consists of a rating value that represents the user's overall satisfaction, and a text that describes the experience of purchasing and using the product/service. Online review system, also referred to as online reputation system, allows users to post reviews for products/services. The reviews of a single product/service are often combined to an overall reputation score, such as the average number of stars in Amazon. Product/service reviews have shown huge impacts to online sales. It is reported that a one-star rating increase can bring a 5-9% increase of revenue to online sellers [1].
On the one hand, the average rating can affect products' display order on the site.
Products with higher average ratings are often put in front of the list and thus can get more visibilities. On the other hand, customers are relying on the average rating when they are screening products. It is reported 74% of consumers rely on product reviews to guide their purchases [2].
However, online reviews may be manipulated. It is reported that sellers at the online marketplace boost their reputation by trading with collaborators [4], and firms post fake reviews to praise their own products or bad-mouth their competitors' [5]. Review manipulation can inflate or deflate products' reputation scores, crash users' confidence in online reputation systems, and eventually undermine reputation-centric online businesses leading to economic loss. Furthermore, there are some situations, in which the review manipulation is even more damaging.
For example, Black Friday shoppers heavily relies on online reviews, because they have to make rush decisions for the products they are not familiar with in order to take advantage of these quickly expiring 'unusual' discount. Another example is online reputation of hotels and restaurants. The consumers, who are misled by manipulated hotel ratings, cannot be easily refunded after they purchase these services.
In the literature, researchers propose methods to protect reputation systems from several angles, such as 1) increasing the cost of registering multiple user accounts [6], 2) endogenous discounting of fake reviews by analyzing the statistic features of reviews [7], 3) exogenous discounting of dishonest ratings by introducing metrics of users' reputation [7][8][9], and 4) studying correlation between users and reviews to detect fake reviews [10,11]. In this manuscript, we propose a new angle to detect products with fake reviews, which is called Equal Rating Opportunity (ERO) analysis. This method roughly belongs to category 2 and 4.
Fake review detection, also referred to spam review detection in some literature, is an urgent yet under investigation task. Unlike other types of spam, such as spam email, fake review is much harder to detect. The main reason is that fake reviewers can easily pretend to be honest. It is hard for a human user to recognize them. Some work started the research of fake review detection many years ago.
For example, the opinion spam detection was proposed by Jindal and Liu in [12], in which the authors built a classifier to detect review duplications. There is also an example in the commercial field. Yelp has been using a review filter to hide certain suspicious reviews since 2005 [13,14]. We argue that fake review detection has following challenges, which are the major obstacles of this work.
1. Fake reviewers' behaviors may be hard to capture, and the fake reviews' structures are diverse and usually unknown. For example, in order to successfully mislead review readers, fake reviewers can make their writing styles and review habits look very similar to honest reviewers. Without knowing the structures of fake reviews, it is difficult to distinguish fake ones from genuine reviews.
2. Fake reviewers can learn from the detection strategy to avoid detection.
Therefore, fake review detection approaches have to be robust. For example, some existing work evaluate and assign trust scores to reviewers. Fake reviewers can post genuine reviews to boost their trust score before they creating fake reviews.
3. There is no ground truth whether a review is faked or not. By reading the review text alone, we usually do not have enough clues to tell dishonest ones from honest reviews.
These are the major challenges that we think make simple behavioral heuristics insufficient. To detect such elusive fake reviews, we need to consider more clues.
In this work, we introduce the ERO principle. The ERO principle tells us that the sentiment of the review, such as the rating value, should not depend on certain review features. If dependency or correlation is observed, it is highly suspicious that the product contains fake reviews. Based on the ERO principle, ERO analysis is proposed. ERO analysis has two major advantages.
• It does not require the cooperation of online reputation system owners. In particular, many existing algorithms need to use a large amount of data, which makes them impractical unless the reputation system owners (e.g. Amazon and Yelp) implement these algorithms. Our approach, however, can be implemented by a third party, who only needs to crawl a very small amount of data to train the detection parameters and perform the ERO analysis.
Therefore, the proposed method is a low-cost solution, yields independent opinions, and leads to practical implementations.
• ERO analysis is a new direction of fake review detection. It is compatible with most of existing algorithms, which exam ratings and reviewers from more traditional angles. ERO has a potential to find the fraudulence signals, which were previously missed by existing approaches.
As a summary, the contributions of this work include: 1. The ERO principle is introduced. It is a new perspective how we can investigate product/service reviews;

A new categorization of review features is introduced, and the criterion of
ERO feature selection is studied; 3. The ERO analysis is proposed and real data evaluation is performed. Particularly, expert reviews are employed to evaluate the detection accuracy.
This manuscript is organized as follows. Related work is discussed in Section 3.3. The ERO principle and ERO analysis are presented in Section 3.4, followed by the discussion of ERO feature selection in Section 3.5. Experiments and results are shown in Section 3.6.

Related Work
In order to protect online reputation systems, researchers propose many protection schemes, which can be roughly put into 4 categories. The first category is increasing the cost of getting multiple user accounts by binding user IDs with IP addresses [6]. The second category is endogenous discounting of fake reviews [7].
Dishonest ratings are directly differentiated from normal ratings based on the statistic features of the rating values. In a Beta-function based approach [15], a user is determined as a malicious user if the estimated reputation of the product rated by him/her lies outside q and 1 − q quantile of his/her underlying rating distribution. An entropy based approach is proposed in [16]. The third category is exogenous discounting of dishonest ratings. Users are assigned trust scores based on their review history, and the quality of their reviews are discounted according to their trust scores. In [9], a user's trust is obtained by cumulating his/her neighbors' beliefs through belief theory. The fourth category is studying correlation between users and reviews to detect dishonest ratings [10,11]. The proposed scheme has both category 2 and category 4 features, and the detection algorithm is from a new angle.
Fake reviews can undoubtedly reduce the quality of reviews. They may even mislead users to make wrong purchase decisions. Therefore, there is a great demand to detect fake reviews thoroughly on the reputation system. There are three ways that can achieve the detection of fake reviews. The first way is directly detecting fake reviews, primarily based on the features of review. For example, the standard word and part-of-speech n-gram features are used in [17] to identify fake reviews. Another major feature of fake review detection is review text. Some approaches rely on the identification of duplicated or near-duplicated text occurring in multiple reviews [18,19] The second way is dishonest reviewer detection. For example, a graph-based method is used to find fake store reviewers [20], frequency pattern mining is employed to find groups of reviewers who frequently write reviews together [21], and user correlations are analyzed to identify fake reviewer groups [10]. The third way is victim product detection. A product is considered to be a victim if positive fake reviews or negative fake reviews are detected. This type of detection often assumes norms of review statistics. For example, in [10], the CUSUM approach is used, and the average rating is assumed not to jump/drop dramatically in a short period of time. In [22], the changes of mean value and changes of arrival rate are considered to be signals of fake reviews. The proposed ERO analysis belongs to victim product detection.
Many research results did not turn into practical systems. This is probably because of the potential liability concerns of major e-commerce companies, as well as the gap between research and practical constraints. Without the support from the e-commerce companies (i.e., reputation system owners), the algorithms can only rely on a limited amount of data. This is one of the major hurdles. Currently, there are only a few existing online systems providing review analysis services.
For example, there is a website called "ReviewPro" [23], whose major business is to provide professional suggestions to hotel owners. By analyzing the customers' reviews on a hotel, ReviewPro can provide analytical reports with "strategies" to climb TripAdvisor rankings and earn 5-star reviews. Another practical system is "TrustYou" [24], which provides review analysis services on hotels. For hotel owners, it provides service to market the reputation and increase businesses. For individual users, it provides service to analyze the hotel's quality, by summarizing online reviews and generating a trust score for the hotel. What we propose in this work is fundamentally different from these existing services. First, our work focus on detecting review manipulation, instead of finding patterns for reputation promotion purpose. Second, our work can provide on-demand real time service, whereas ReviewPro and TrustYou can only offer analysis of a pre-determined list of hotels. That is, our algorithm is so effective that it can detects fraudulence signals based on the small amount of data crawled in real time.

ERO Principle and ERO Analysis
A product is considered to be a victim if it has fake reviews, either positive, e.g. fake 5 star reviews, or negative, e.g. fake 1 star reviews. Victim product detection aims to determine if a product contains fake reviews. In this section, ERO principle and ERO analysis are introduced. To help the readers understand the underlying thoughts of ERO principle, we first give a brief overview of a consistency analysis method, which is introduced in [25].

Consistency Analysis
Consistency analysis is a victim product detection technology. The unusual jump/drop of review statistics is considered inconsistent. Consistency detectors are based on the fact that in order to perform an effective review manipulation, the fake reviews must cause large enough change in the review statistics, such as the average rating. In the literature, there are several approaches to detect the inconsistency of reviews [22,25]. In this subsection, we give a brief overview of the one proposed in [25] called CUSUM.
First of all, the notations used in this manuscript are defined as follows.
• p i is the product with product id i.
• r i,n is the nth review of p i , where the reviews are sorted by posting time from old to new. n = 1, 2, . . . , N .
i,n , f Let µ i be the true rating of p i and ν be trigger threshold, ifx i > µ i + ν or is the average rating. The detection functions are defined as follows.
where g + i,n means the positive changes, g − i,n means the negative changes, and g + i,0 = 0, g − i,0 = 0 for initialization. Rating inconsistency is observed when g + n or g − n exceeds the thresholdh.
In order to measure the degree of inconsistency, another metric, Percentage of Change Interval (PCI), is defined as where N D is the number of g + i,n or g − i,n points exceedsh. Obviously, PCI value depends on the selection of threshold.
It is pointed that a uniform threshold for all products is not applicable, and heterogeneous thresholds should be used [25]. Briefly speaking, the threshold depends on P CI(h 0 ), whereh 0 is a predefined minimum threshold. Smaller P CI(h 0 ) gives smaller threshold, and vice versa. In this manuscript, we use the PCI value P CI(h) as the evaluation of rating inconsistency. P CI(h) = 0 means the ratings are consistent, P CI(h) = 1 means strong inconsistency is observed. More detailed discussion on PCI can be found in [25]

ERO Principle
The consistency detection, which is based on the statistics of reviews, can only be used to find products that are suspected to be under review manipulation, but is lack of the capability to accurately detect such manipulation. This is because the normal rating can change without any manipulation. For example, when a restaurant changes the chief, a seller changes his/her attitude toward consumer complaints, and the manufacturer fixes a defect of the product, the ratings for the restaurant/seller/product could change. The rating is also related to price.
Consumers tend to be more tolerant if they purchase deeply discounted products.
If the price changes dramatically, the ratings may change. Therefore, after the consistency analysis gives us a set of suspicious products, we must apply a more informative analysis to confirm the review manipulation.
We are inspired by the Equal Employment Opportunity Policy, adopted by many employers. One example of such policy statement is as follows.
"All employment decisions at the company are based on business needs, job requirements and individual qualifications, without regard to race, color, religion or belief, national, · · · ." We introduce the Equal Rating Opportunity (ERO) Principle, as follow.
ERO Principle. "The normal ratings should be primarily based on the quality of the product or service, without regard to whether the review is posted on weekdays or weekend, posted during daytime or night time, · · · ." This principle is based on the idea that the fake reviewers may maintain a lot of user accounts and review templates, and if they perform review manipulation on a product, it can change the distribution of some features. It is also possible that some fake reviewers are not working full time, they may focus on writing fake reviews during their spare time, such as evenings and weekends. This type of unusual correlation is called fraudulence signal. For example, if the fake positive reviews are posted one specific days, such weekends, it will increase the correlation between the 'day of week' feature and rating value. If the fake reviewers randomly select user accounts from a large account pool and post fake reviews on a random day, the fraudulence signal can also be detected as long as an appropriate feature is used, such as whether the reviewer purchased the product and reviewer city.

ERO Feature
Review features applied to ERO principle are called ERO feature. A review often has multiple features, such as rating value, review text, review date, reviewer's name and reviewer's reviewing history. Obviously, some review features may not be applicable for ERO principle. For example, people who give 5 star reviews in Amazon tend to write short text, and those who give 1 star and 2 star reviews are more likely to write long text to express their complaint. Therefore, the text length being correlated to the rating value cannot be considered as a violation of ERO principle. This example is also observed in Section 3.5.
If we can find a review feature that statistically cannot be correlated to rating value, then the feature is called ERO feature. ERO principle holds upon ERO features. One example is the 'day of week' feature. The percentage of negative review and the percentage of positive review cannot significantly vary when we look at on which day of a week the reviews were posted. If the percentage of positive reviews in weekends are found much higher than on weekdays, it is highly suspicious that the product suffers fake reviews. Some other ERO features examples are 'time in a day' and 'verified'. The detailed discussion of ERO feature and ERO feature selection is presented in Section 3.5.

ERO Analysis
Based on ERO principle, we propose ERO analysis for fake review detection. i . We employ the Pearson correlation coefficient as the ERO calculation.
i . 0 ERO value means no correlation. 1 or -1 ERO value means absolutely correlated. Generally, the larger the absolute ERO value is, the stronger the correlation is.
A nonzero ERO value is not sufficient to make decision, because the ERO value can be noisy. First, ERO value is an estimation of the Pearson correlation coefficient. The estimation error depends on the number of samples. Second, honest reviewers can have bias on their reviews, based on their preferences for example. Some reviewers may be hypercritical and give low rating just because of tiny problems; Some other reviewers may tolerate tiny problems and still speak highly of the product. In this work, the boxplot [26] where w is the sensitivity factor. Smaller w makes the detector easier to alarm and thus can achieve higher detection rate as well as higher false alarm rate.

Case Study
In this section, we use 4 real products to demonstrate the proposed ERO principle and ERO analysis. The review data are collected from Amazon (see Section 3.6.2 for more details).
We use two ERO features -'verified' and 'day of week'. Briefly, a review is said 'verified' if the reviewer purchased the product. The 'day of week' feature means on which day of a week the review was posted. We select four typical products.
According to ERO analysis, two of them are considered normal and the other two are determined suspicious. The review details are shown in Fig 12. In Fig 12a, we observed that the percentage of non-verified reviews is not  Therefore, those two products are determined to be normal by ERO analysis.
Conversely, in Fig 12b, the 'verified' feature shows certain correlation with rating value, for which the ERO value is 0.42. In Fig 12d, the percentage of negative reviews on Sundays is much different from on other days, for which the ERO value is 0.33. Since relatively strong correlations are observed, those two products are determined to be suspicious.

ERO Feature Selection
As we discussed earlier in Section 3.4.3, a reviews often has multiple features.
Some of them are applicable for ERO principle, and some other are not. In this section, we discuss the strategies how ERO features are selected.

Review Feature Categorization
In some literatures, review features are categorized into three types -product specific feature, review specific feature and reviewer specific feature, based on the feature oringins [17,27]. Product specific feature, also referred to as product metainformation, describes the product's properties, such as price, discount rate and category. Review specific feature describes the review's characteristics, such as review text, number of votes and review date. Reviewer specific feature describes the reviewer's profile and history, such as number of reviews posted by the reviewer, reviewer ranking and number of votes.
In the context of ERO analysis, since the analysis is conducted for specific product, the product specific feature is not used. ERO features are selected from the review specific feature and the reviewer specific feature categories.
When considering the features' applicability of ERO analysis, we categorize review features into three groups rather than the ones mentioned earlier.
• Persistent Feature is defined to be the features being determined when the review was posted and will never change. For example, the review date was determined when a review was posted, and it cannot change again. Another example is 'verified' feature. Whether a review is verified has to be determined at the time when the review was posted based on the reviewer's purchase history.
• Acquired Feature is defined to be the features acquired after the review was posted. For example, when people think a review is useful, they will vote it as helpful and thereby increase the number of helpful votes and total votes. The number of votes may keep changing as long as it is accessible.
• Accumulative Feature is defined to be the features that had already existed before the review was posted and will keep changing. For example, the number of lifetime reviews of a reviewer will keep increasing as long as the reviewer keeps posting reviews.
We argue that the ERO features should be selected from the Persistent category. The Acquired and Accumulative categories are usually not suitable for ERO analysis. We summarize two criterions that can guide ERO feature selection.
The first criterion is causality. Recall that the ERO principle tells us that the rating value should depend on the product quality and purchase experience but other factors, i.e. ERO features. In other words, it is impossible to predict the rating value based on uncorrelated features. For example, when considering whether the user purchased the product, reviews can be divided into two groups -verified and non-verified. The 'verified' information cannot help us predict the rating value. In contrast, we observed that negative reviews tend to have longer review text (see Section 3.5.2). Therefore, given the text length of a review, we are able to predict the rating value, and the prediction should be more accurate than guessing. Theoretically, causation implies correlation, but correlation does not necessarily imply causation. Therefore, an ERO feature should not have causality with rating value directly or indirectly.
The second criterion is stability. If a review feature's observations are changed after reviews were posted, then the feature is said unstable. An unstable feature is not suitable for ERO analysis, because the observations depend on some other factors. For example, there are two similar reviews -R 1 and R 2 , and they are In summary, a review feature is not applicable for ERO analysis, if 1) it is not Persistent, 2) causuality observed, or 3) it is unstable.

Feature Comparison
In this subsection, we use real data to demonstrate the feature selection process. The data were collected from Amazon (refer to Section 3.6.2). There are total 416 products randomly selected from the dataset. The review features investigated in this section are listed in Table 9. We calculate the ERO values of all

Total Votes
'Total votes' feature means the number of votes the review got. From Fig 13, we observe that it has negative correlation with the rating value. In other words, negative reviews (1 star and 2 star) potentially get more votes than positive ones.
When looking at the reviews, we find that negative reviews often have longer and more informative texts while positive reviews often have shorter texts. The possible reason is that when people are reading reviews, they tend to give votes to the reviews with longer texts.

Day of Week
From Fig 13, we can see the irrelevance between 'day of week' feature and rating value. Statistically, the numbers of reviews on a day of week may not be identical, but the ratio between negative and positive review numbers can not significantly vary. Therefore, small ERO values are observed.

Content Length
The 'content length', also called text length, has negative correlation with the rating value. It is likely that customers with negative experience have more to complain, and therefore their review texts are longer. Note that this observation also confirms the one we get for the 'total votes' feature.

Reviewer's Number of Lifetime Reviews
When investigating the number of lifetime reviews that the reviewer has posted, we find that reviewers with more reviews tend to post positive reviews.
In Fig 13, a positive correlation is observed. It is likely that experienced reviewers are more tolerant to negative exiperience.

Reviewer Ranking
The reviewer ranking has negative correlation with the rating value, which means higher ranked reviewers tend to give positive reviews. It also partially confirms the observation of 'reviewer's number of lifetime reviews', since in Amazon system, the reviewer ranking has certain relationship to the number of lifetime reviews.

Verified
A review is verified if the reviewer purchased the product before he/she posting the review. In Fig 13, there is no strong correlation observed for 'verified' feature.

Summary
The average correlations are calculated and shown in  The ERO features are selected by comparing the average ERO value with empirical thresholds, which is 0.01 and -0.01 in this work.

Experiment and Results
The proposed ERO analysis is implemented and experiments are conducted in this section. We first build a web crawling system and crawl review data from Amazon. ERO analysis is applied and suspicious products are picked up. Then, expert reviews are employed to evaluate the detection performance. The ERO analysis is implemented in Matlab, and the crawling system is build in Python.

Performance Evaluation
Unlike fraudulence detection in a traditional system, such as email spam, it is often hard to get ground truth for fake reviews in a product/service review system.
Lack of ground truth is one of challenges when study online review fraudulence. To our best knowledge, there is no research work that can perfectly address this issue.
However, researchers have made several attempts to evaluate the performance of a fake review detection.
Previous research employed different approaches to obtain ground truth data.
There are mainly three types of approaches. First, some early work manually inspect reviews and extract simple features such as duplicated and near-duplicated # of reviews Verified review or not # of verified reviews reviews or unexpected rating patterns [19]. This type of approach is limited since it largely depends on heuristics and the assumption that fake review structures are known. Second, a few researchers create ground truth data by hiring people to write fake reviews [25,28]. They then develop detectors that compare the features of genuine and fake reviews. Although these classifiers have performed well with those artificial data, it is questionable whether datasets generated by hired people can be representative of actual fake reviews in practice. The third type is employing expert reviews. For example, Mukherjee et al. generated ground truth by hiring experts to manually detect fake reviews given some intuitive features and hints [27].
In this work, we adopt the third approach, in which we employ expert reviewers to evaluate the detection accuracy.
In addition, we argue that the detection accuracy should be presented as False Alarm Rate (FAR). Although the other metric -Detection Rate (DR) might be more intuitive, it is hard to estimate due to the lack of ground truth. Fake reviews are complex and multidimensional. It is extremely difficult for a single method to achieve high accuracy. Multiple detection methods have to be used together to capture different aspects of fake reviews.

Dataset
We develop web crawlers to collect real data from Amazon. The crawlers are implemented in Python [29]. The dataset we use in the experiments are collected 1. The number of reviews is greater than 50.
2. The average rating is between 2.5 and 4.8.
3. Product category is either 'Toys and Games' or 'Electronics'. 4. We only collect the first 40 pages of each category.
For each product, there are 3 types of information crawled.
1. Product meta-information. The product ASIN (Amazon Standard Identification Number), average rating and category are obtained.
2. Review information, such as rating value, date, text, number of votes, etc.
are crawled from Amazon.
3. Reviewer information, such as customer ID, reviewer ranking, number of reviews, etc. are crawled from Amazon. Table 10 shows the detail information we obtained. There are 916 products collected. A summary of the dataset is shown in Table 11.

Victim Product Detection Comparison
The 'verified' feature and 'day of week' feature are used in the experiment. In this section, we compare the ERO analysis results with two common approaches.
One is the consistency detection that is described in Section 3.4.1. The other is duplication detection described in [19]. All the 916 products are analyzed by the three detection methods. For consistency detection, the trigger threshold we used is ν = 0.6, and the minimum thresholdh 0 = 3. More details of consistency detection can be found in Sec- The ERO analysis detects 16 suspicious products, including 7 products detected by 'day of week' feature and 9 products detected by 'verified' feature. The consistency method detects 28 products, while the duplication method detects 123 products. We have following observations.
• There are small overlaps among the three methods. One product is detected by both ERO analysis and duplication method, and two products are detected by both consistency and duplication methods. Fake reviews have very diverse structures. Single method is not able to detect all those structures.
ERO analysis is important if the false alarm rate is relatively low, as it provides a new angle that missed by existing methods, such as duplication method and consistency method.
• There are 123 products detected by duplication method. We look at the products detected by duplication method and find that some duplicated texts are just "5 star", "excellent", "I like it" and some common phrases. Without  further information, it is hard to tell whether they are fake. It is highly possible that the duplication method has a high false alarm rate. There are 16 products detected by the ERO analysis. The detection rate may be low.
However, considering the diverse of fake review structures and the specific angle captured by ERO analysis, the low detection rate is reasonable as long as the false alarm rate is relatively high.
It is noticed that the thresholds of those detection methods impact the detection rate and overlaps. In this experiment, the consistency method uses the thresholds proposed in [25]. In practice, detection thresholds should be well tuned when multiple methods are used to capture the different aspects of fake reviews.

Expert Reviews
In the experiment of previous subsection, there are 16 products detected by ERO analysis. As we discussed in Section 3.6.1, no ground truth is available for performance evaluation. Therefore, in order to evaluate the results of ERO analysis, we conduct expert reviews in this section.
The detection results are presented as one or two sentences and provided to the expert reviewers as hints. The expert reviewers answer two questions -"do you think the product contains fake reviews?" and "how confident do you feel when you answer the first question?". There are 16 expert reviewers participating in the evaluation. When determine the results of expert review, we use majority votes.
Particularly, if 8 or more people vote the product as non-victim, then the detection is considered as false alarm.
The results are shown in Table 13. According to the expert reviews, 6.3% detected products are voted as false alarm. Particularly, 11.1% products detected by 'verified' feature are voted as false alarm, and no products detected by 'day of week' feature are voted as false alarm. It is noticed that, 'day of week' feature is more convincing than 'verified' feature. We also investigate the confidence change.
The products are first provided to expert reviewers without any hints. After they making their decisions, the hints are provided to them and they make their decision again. For each decision, they choose one confidence level from the 5 levels. We calculate the confidence change after they getting the hints, and the results are shown in Table 13. There is an average 10.5% confidence increase, which means people can feel more confident if they can consult the ERO analysis.

Discussion
When a smart attacker learns ERO principle and ERO analysis, the attacker may be able to perform certain actions to avoid the detection of ERO analysis. For example, there are some fake reviewers working on specific day, and this pattern can be detected by ERO analysis with 'day of week' feature. However, a smart attacker can change the attack strategy to avoid the detection of ERO analysis, such as posting fake reviews randomly in a week. In this case, the ERO analysis is not attack resistance.
To our best knowledge, almost all the existing approaches cannot guarantee attack resistance. However, the proposed ERO analysis can increase the cost of attack and thus reduce the impact of fake reviews. For example, if the reviewers' IP addresses are available for ERO analysis, then we are able to estimate the reviewers' geographic location information based on their IP addresses. The ERO analysis is able to detect correlations between rating value and reviewers' city or state. Although it is still not attack resistance, as attackers can technically change their IP addresses, the attack cost must be increased.
In practice, a more effective way to improve attack resistance is using multiple detection methods to capture the diverse structures of fake reviews.

Conclusion
The ERO principle, which is a new angle to detect fake reviews, has been introduced in this manuscript. The fraudulence signal detected by ERO analysis is very different from the traditional approaches. Furthermore, it needs very limited data to set up detection thresholds, and only requires the reviews for a particular product in order to determine whether this product is under review manipulation.
Therefore, it can be performed in real-time. Experiments based on real data are conducted. Several review features are examined for the applicability of ERO analysis, and two of them are found applicable. The detection results are compared with two common existing methods. The proposed ERO analysis provides a new angle of fake review detection. We also conduct expert reviews to evaluate the ERO analysis results, in which the ERO analysis gives a low false alarm rate. In the future, we look forward to selecting more ERO features and implementing a real application and involve more users for testing.