INTRODUCING SERENDIPITY IN RECOMMENDER SYSTEMS THROUGH COLLABORATIVE METHODS

Widely used recommendation systems are mainly accuracy-oriented since they are based on item-based ratings and useror item-based similarity measures. Such accuracy-based engines do not consider factors such as proliferation of varied user interests and the desire for changes. This results in a muted user experience that is generated from a constrained and narrow feature set. Recommender systems should therefore consider other important metrics outside of accuracy such as coverage, novelty, serendipity, unexpectedness and usefulness. The main focus of this thesis is to both incorporate serendipity into a recommendation engine and improve its quality using the widely used collaborative filtering method. Serendipity is defined as finding something good or useful while not specifically searching for it. The design of recommendation engines that considers serendipity is a relatively new and an open research problem. This is largely due to a certain degree of ambiguity in balancing the level of unexpectedness and usefulness of items. In this thesis, a new hybrid algorithm that combines a standard user-based collaborative filtering method, and item attributes has been proposed to improve the quality of serendipity over those that use item ratings alone. The algorithm was implemented using Python in conjunction with the scientific computing package NumPy. Furthermore, the code has been validated using a well-accepted and widely used open source software namely, Apache Mahout, that provides support for recommender system application development. The new method has been tested on the 100K MovieLens dataset from the GroupLens Research Center that consists of 100,000 preferences for 1,682 movies rated by 943 customers. The new algorithm is shown to be capable of identifying a significant fraction of movies that are less serendipitous but which might not have been identified otherwise, thereby improving the quality of predictions.

Recommender systems are those that adopt these information filtering techniques to provide customized information for the targeted audience. The development and deployment of recommender systems have gained significant attention in recent years. Recommender systems are popular web-search mechanisms, which are used to address information overload and provide personalized results to the users.
The aim of a recommender system is to automatically find the most useful product (for example, movies, books, etc.,) for a user that best suits his/her needs and taste.
Such recommendations are made possible by profiling and analyzing the relationships between users and products. Some of the most popular recommender systems include content-based methods (ex. Music Genome Project), collaborative filters (Google, Amazon, Yahoo!), social network analysis (Facebook, LinkedIn, Twitter, Zynga), and combination of the above (hybrid recommenders). Collaborative Filtering is the most commonly used method in recommender engines and is based on user-to-user similarity [1]. This method maps the user to a set of users with similar tastes, and items are recommended based on how like-minded users rated those items. Content based filtering system recommends items that are similar to those that a user liked in the past. It has its roots from information retrieval and information filtering techniques and employs many of the same principles. The preferences of the users are collected both explicitly and implicitly in these systems.
Explicit ratings are obtained when a user rates an item in a scale of 1-10 or by giving 1-5 stars or through questionnaires. Implicit ratings are obtained from the buying-patterns or click-stream behavior (Read, Click) of the users. Both contentbased and collaborative filters suffer from cold-start issues. A cold-start problem is one for which the ratings for a particular item is not known (for example, a new item) and hence, recommendations are impossible or hard to predict [2]. A knowledge-based system is a case-based recommender system that uses knowledge about users and products to pursue a knowledge based approach for giving rec- There are significant disadvantages for such accuracy-based recommender engines [3]. Accuracy-based engines do not consider factors such as proliferation of varied user interests and desire for changes. This results in a muted user experience that is generated from a constrained and narrow feature set. There is no room for user's personal growth and experience. Thus, recommender systems should also consider other important metrics outside of accuracy such as coverage, novelty, serendipity, unexpectedness and usefulness. Briefly, serendipity is defined as the accident of finding something good or useful while not specifically searching for it. Serendipity is thus closely related to unexpectedness and involves a positive emotional response of the user about a previously unknown item. It measures how surprising the unexpected recommendations are [4]. In other words, serendipity is concerned with the novelty of recommendations and in how far recommendations may positively surprise users [5]. Adamopoulos [6] has proposed a method to improve user satisfaction by generating unexpected recommendations based on the utility theory of economics. A discovery-oriented collaborative filtering algorithm for deriving novel recommendations has been proposed in Hijikata et al. [7]. Andre et al. [8] examine the potential for serendipity in Web search and suggest that information about personal interests and behavior may be used to support serendipity.
Their algorithms, in addition to building a User preference, build another profile of Users Known and Unknown Items. Novel recommendations are then given based on them. Chhavi Rana [9] has proposed a methodology based on temporal parameters to include novelty and serendipity in recommender systems. Ziegler et al. [10] assume that diversifying recommendation lists improves user satisfaction.
They proposed topic diversification, which diversifies recommendation lists, based on an intra-list similarity metric. In [11], it has been proposed to recommend items whose description is semantically far from users profiles. Kawamae [12] suggests a recommendation algorithm based on the assumption that users follow earlier adopters who have demonstrated similar preferences.
Serendipity is becoming a popular topic of research in the current recommender systems to enhance user experiences. This thesis will specifically deal with developing a recommender system that incorporates serendipity as a factor for enriching the predictions. Furthermore, the quality of the predicted serendipitous items will be improved using the contents or attributes of the items.

CHAPTER 2 COLLABORATIVE FILTERING METHODS
Most recommender systems take either of two basic approaches: collaborative filtering or content-based filtering. Collaborative filtering (CF) is one of the most successful approaches to building recommender systems [1]. In order to establish recommendations, CF systems need to relate two fundamentally different entities: items and users.There are two primary approaches to facilitate such a comparison, which constitute the two main techniques of CF: the neighborhood approach and the model-based approach.

NEIGHBORHOOD APPROACH
Neighborhood methods focus on relationships between users called user-user CF or between items called item-item CF. Neighborhood-based methods are also commonly referred to as memory-based approaches.

USER-USER CF METHOD
A user-user neighborhood approach models the preference of a user to an item based on ratings of similar users for the same item. User-user CF is a straightforward algorithmic interpretation of the core premise of CF: find other users whose past rating behavior is similar to that of the current user and use their ratings on other items to predict what the current user will like [2]. The fundamental ingredients for CF are: (i) rating matrix R that specifies the item, user, rating/preference tuple, (ii) a similarity function sim(u, v) between user u and v, and (iii) a method for using similarities and ratings to generate predictions. The ratings matrix R is a user input and an example is shown in  Table 1. Ratings matrix that consists of N users × N items entries of ratings. The blank entries denote unrated items. r u,i is the rating of item i by user u and r v,i is the rating of the same item i by another user v.r u ,r v are the average ratings of users u and v.
correlation [3] based measures. Other similarity measures used in the literature also include, Spearman rank correlation, Kendalls τ correlation, mean squared differences, entropy, and adjusted cosine similarity [4]. The Pearson-correlation based similarity function is given by: The cosine-based similarity function is given by: In Eqns.1 and 2, r u is the vector of items rated by user u that are also rated by user v, and r v is similarly defined for user v. Scalarsr u andr v denote the average ratings of users u and v.
Pearson-correlation and cosine-based similarity functions are both invariant to scaling. This implies that multiplying the ratings of users by a constant does not change the similarities between users. Pearson-correlation unlike cosine-based similarity is also invariant to adding a constant to the users' ratings. For example, if r mod u = a r u + b, where a and b are constants, then the Pearson-correlation function value remains invariant. This is given by the following equation.
This is an important property because it implies that the Pearson-correlation based similarities between users do not depend on the absolute values of their ratings but only on the way they vary. This is one of the main reasons for the wide popularity of Pearson-correlation coefficient.
Finally, to generate predictions or recommendations for a user u the user-user CF first uses sim(u, v) to compute a neighborhood N ⊆ N users of neighbors of u. N is usually taken as the top-N users based on similarity scores (based on sim(u, v) sorted from highest to lowest scores). Alternately, N can also be based on a prescribed threshold for the similarity score sim (u, v). For example, if the threshold is specified as 0.8, N consists of only those users for whom sim(u, v) ≥ 0.8. Once N has been computed, the ratings of users are combined in N to generate predictions p for user u preference for an item i. This is typically done by computing the weighted average of the neighboring users ratings of i using similarity as the weights:

ITEM-ITEM CF METHOD
Although user-user CF filtering techniques are popular, they suffer from scalability issues associated with the frequent computation of similarity between users.
When a user changes the rating of items frequently, the rating vector of such a user changes which modifies the similarity with others. Hence, the user neighborhood N for a given user cannot be pre-computed but has to be evaluated whenever rec-ommendations are needed. This can be a big computational bottleneck for large datasets. This effect is amplified when there are more users than items that are typical of many e-commerce websites.
To eliminate this scalability issue, an item-item based CF technique was proposed by Linden et al. [5] (see, [6], [7]). These algorithms analyze the similarity between items instead of predicting the ratings based on the similarity between users. If two items tend to have the preferences from the same users, then they are similar and users are expected to have similar preferences for similar items.
In systems with a high user to item ratio, frequent changing of ratings of an item by a user is unlikely to change the similarities between items since each item has far more ratings from many users that do not change. Hence, change of ratings by a very small set of users will only slightly change alter the similarity between items [2] and the users will still get good recommendations. The item-item CF method is similar to user-user CF except that the item similarity is deduced from user preference patterns rather than extracted from item data. The Pearson correlation similarity and the adjusted cosine similarity are examples of the common similarity metrics that are used to predict the similarity between items in such systems.

MODEL-BASED APPROACH
Model-based methods, fit a parametric model to the training data that can later be used to predict unseen ratings and issue recommendations. Latent factor and matrix factorization models have emerged as a state of the art methodology in this class of techniques. In its basic form, matrix factorization characterizes both items and users by vectors of factors inferred from item rating patterns. High correspondence between item and user factors leads to a recommendation. These methods have become popular in recent years by combining good scalability with predictive accuracy. In addition, they offer much flexibility for modeling various real-life situations [8]. Other methods include cluster-based CF [9], Bayesian classifiers [10], and regression-based methods [11]. The slope-one method [12] fits a linear model to the rating matrix, achieving fast computation and reasonable accuracy. [5] G. Linden, B. Smith, and J. York, "Amazon.com recommendations: Itemto-item collaborative filtering," Internet Computing, IEEE, vol. 7, no. 1, pp. 76-80, 2003.

List of References
[6] G. Karypis, "Evaluation of item-based top-n recommendation algorithms," in Proceedings of the tenth international conference on Information and knowledge management. [11] S. Vucetic and Z. Obradovic, "Collaborative filtering using a regression-based approach," Knowledge and Information Systems, vol. 7, no. 1, pp. 1-22, 2005.

INTRODUCING SERENDIPITY TO RECOMMENDER SYSTEMS
Serendipity is defined as the accident of finding something good or useful while not specifically searching for it. In other words, serendipity is concerned with the novelty of recommendations and in how far recommendations may positively surprise users [1]. In recommender systems, it is defined as a measure that indicates how the recommender system can find unexpected and useful items for users. In this chapter, we propose and implement a new algorithm for generating a list of serendipity items using collaborative filtering techniques. The novelty of the current work is the proposed improvement of the quality of the serendipity list by incorporating the items' contents. It is hypothesized that the degree of ambiguity in defining serendipity is reduced by incorporating the items' contents together with the ratings.
The algorithm and its implementation are explained in detail in the following sections.

SERENDIPITY RECOMMENDER SYSTEM
The proposed serendipity algorithm is given in Algorithm 3.1.
Algorithm 1 Proposed algorithm for generating serendipity items 1: Build a recommendation list of items using a primitive method 2: Build a set of recommendation lists using collaborative filtering techniques 3: Predict a list of "unexpected" items using the results of first two steps 4: Generate a list of serendipity items from the unexpected items 5: Improve the quality of the serendipity list by incorporating items' contents

PRIMITIVE RECOMMENDER METHOD
The first step in the generation of serendipity items is predicting a list of items using a primitive recommendation method which is based on the highest average ratings and highest number of users (popularity). Consequently, the level of expectedness for the user is generally high for all the predicted items using this method. Let the predicted list be denoted by P P M . The PPM model used in this study will be based on the top-N items with the highest average rating and the top-N items with the largest number of ratings. These two top-N rating will be combined as a union to produce a list of top-K items (where K is a user-specified number) of the P P M recommendation list [2].

EXPECTED LIST OF RECOMMENDATIONS
In this study, an expected set of recommendation items is generated using collaborative filtering methods, particularly using the user-user method as described in Section.2.1.1. The basic algorithm for user-based collaborative method is given below: Algorithm 2 User-based recommendation method 1: procedure UserBasedRecommender(User u) 2: for each item i not rated by user u do 3: for each user v that has rated i do

5:
Compute weighted moving average using v's rating of i and sim Return recommended list of items RS and their scores for u 9: end procedure In Algorithm 2, "Similarity" refers to a user-based similarity measure. Specifically, the measures employed in the present study include distance-based, cosinebased, and Pearson-correlation based metrics.

UNEXPECTED LIST OF RECOMMENDATIONS
One of the key ingredients of serendipitous items is a high degree of unexpectedness. As pointed earlier, the P P M list consists of items with a high degree of expectedness. The RS list, on the other hand, contains items with a varied spectrum of expectedness. The unexpected list of items is generated using the method specified in [3,1,2].
U N EXP is the list of items that are in RS list but not in P P M .

PREDICTED SERENDIPITY LIST
Given the U N EXP list of items, a predicted list of serendipity items SEREN DIP P is generated by filtering the items in terms of an "usefulness" metric. All items whose ratings are greater than or equal to a chosen value are considered "useful" items to be recommended as serendipitous. The items are sorted from highest to lowest ratings in SEREN DIP P .

IMPROVED SERENDIPITY LIST
The quality of the list is further improved by rating the items in SEREN DIP P based on the items' contents or attributes. For example, in the MovieLens database, an important content is the movie genre. The items in SEREN DIP P are once again rated based on their genre. Let this list be denoted by SEREN DIP t P , where t represents the attribute of interest.
Central Hypothesis: Those movies that are in the predicted serendipity list and which are ranked lower in the genre-based ratings are considered to better satisfy serendipity. For example, let us assume that an item in the predicted serendipity list is rated 4.5 (on a 5 scale) using a user-based recommender system without taking into account the genre. If the same item is rated lower but above the usefulness threshold using a genre based recommender, it is considered more serendipitous because it is both useful and more unexpected.
The genre-based recommendation list is generated using a modified procedure as outlined in [4]. The key steps in the genre-based evaluation are the following: Algorithm 3 Algorithm for evaluating recommendation based on genre 1: Compute average rating of each genre for a given user (Eqn.6) 2: Compute average rating of each item based on average rating for each genre obtained from Step 1 (Eqn.7) 3: Compute genre-based recommendation based on user-similarity and ratings computed from Step 2. (Eqn.8) Following [4], let the attribute vector of a given item item be denoted by Here,r gnr u is the average rating of a genre gnr for a given user u. If no movies belong to gnr, such a genre is not taken into account. Also, p u,item is the genrebased predicted rating of user u for item item, r k,item is the genre-based rating of user k ( = u) for item item, N u,item is the number of non-zero attributes for item i, and M u,gnr is the number of valid items for user u that belongs to attribute gnr.

PYTHON AND NUMPY 3.2.1 REASONS FOR CHOOSING PYTHON
Python is an interpreted, high-level language that has easy-to-read syntax.
The native ability to interact with data structures and objects with a wide range of built-in functionality makes it easier to write scientific programs. Moreover, it abstracts most of the memory management layers from the end users. This facilitates researchers and scientists to spend more time exploring various ideas than how to code them. The fast development time of Python scripts makes it much easier to test new ideas with prototypes.
An example provided in [5] for a "Hello World" program illustrates the difference in readability between Python and C++.
/* A C++ program to print "Hello World" */ #include <iostream.h> void main() { cout << "Hello World" << endl; } In Python, the above code reads as print "Hello World" Because of the above reasons, Python was chosen as the language to implement the serendipity recommender system for the present work. Also, Python is free and open source.
NumPy is a Python extension module that provides efficient operation on arrays of homogeneous data. It allows python to serve as a high-level language for manipulating numerical data.

EXAMPLE SCRIPT TO ILLUSTRATE READABILITY
An example for computing a sample-based Pearson correlation coefficient is given in the following code snippet implemented in the present study. Lines beginning with # are comments in a Python script. Here, "np" denotes the NumPy utility. Recall, sample-based Pearson similarity between two users is given by Eqn.1 which is repeated here for convenience.
where, r u is the vector of items rated by user u that are also rated by user v, and r v is similarly defined for user v. Scalarsr u andr v denote the average ratings of users u and v (see,

RESULTS AND DISCUSSION
The serendipity recommender algorithm developed in the previous chapter is tested on the widely used MovieLens dataset from the GroupLens Research Center [1]. Representative results are presented and discussed in this chapter.

VALIDATION
The in-house code is first validated against Apache Mahout [2]. We selected Mahout because it is a workbench platform that provides many of the desired characteristics required for a recommender engine development. Mahout is also a production-level, open-source software and consists of a wide range of commonly used collaborative filtering algorithms that are easy to use for validation purposes.
Some of the recent studies have used Mahout as their preferred platform include [3,4,5,6,7] which indicates its popularity.

DESCRIPTION OF TESTCASES
As a first test case, we consider the recommendation of top 20 movies for a random set of users using a user-based recommendation method and Pearson cor-relation coefficient. The recommended set of movies (IDs and scores) are then compared with the in-house developed code. The user IDs chosen are (5,121,269,842).
While a random number generator can be used to pick the set of users, we have not done so for the present study. These users were, however, selected without any bias. The following methods and conditions have been used to validate the code.
1. A user-based recommendation is used because of its popularity.
2. Similarity metric based on Pearson correlation is used.

3.
A similarity threshold value of 0.8 is used to select a small set of neighbors for a given user. Such a threshold is used reduce the computation time and is a necessary input in Mahout.
In Fig.2, the recommended item IDs for the chosen set of users are plotted.
Excellent agreement is seen between the present and Mahout's results. There is however a small discrepancy in one of the items recommended for user 842 as can be seen from Table.4.2.1. It is observed that one of the movies is different between the two implementations. However, the movie ratings are identical for both, and hence different movies with same ratings are considered acceptable.
Comparison plots of the ratings of the recommended movies for different users are shown in Fig.3. Very good agreement is once again seen between the present results and Apache Mahout. The difference pointed out in Table.4.2.1 is apparent here as well. As can be seen, the ratings for these different movies are identical.

During the validation phase, two important observations were made in Apache
Mahout that was not apparent from the Java documentation of their methods.
First, the Pearson correlation metric used is applied to a population and not to a sample. Mathematically, the sample based Pearson correlation for two vectors   r i and r j is expressed using Eqn.9. The population based Pearson correlation for the same vectors is expressed as: Here, Cov( r i , r j ) is the covariance and σ is the standard deviation. It should be noted that the standard deviation is the square root of the variance. The python function to evaluate Eqn.10 is provided below.  The second important observation is that, in computing the ratings using a similarity weighted moving average, the total sum of the similarities must exceed a threshold that was found to be 1.0 by trial and error (this was not documented in Apache Mahout). This condition may be to avoid rating a movie that is rated by only very few users.

RESULTS FOR SERENDIPITY
Based on the results and observation from the previous section, the serendipitous recommendation engine developed for the present study uses the following methods.

Pearson correlation based on population.
3. A similarity-based threshold of 0.8 is used to identify nearest neighbors. 4. The PPM list is generated for top-100 movies.

The recommendation list RS is 100 movies.
Although many of the above assumptions can all be relaxed (such as using knearest neighbor selection, choosing larger values for RS and P P M ), the above set has been chosen for demonstration purposes. It is straightforward to implement an item-based similarity metric as well but this has not been attempted in the present study. Incorporating various other methods for building recommendation engines will form a part of the future work.
A recommender system built using a standard collaborative filtering method predicts preferences based only on the ratings of the items. While this is useful and has proven to be largely successful, it may not consistently give high quality recommendations to the customers. This is because such recommendation systems do not consider other attributes of the items being recommended [8]. Another interpretation of this phenomenon is that such recommendation systems built on ratings alone have a higher probability of predicting items that are unexpected.
Thus, there is an implicit unexpectedness built in recommendation systems that do not consider item attributes. Hence, in the present study, a list of unexpected items are first generated using the standard collaborative filtering method as has been done in the works of [9,10]. Then a serendipity list is generated based on a "usefulness" metric as described in detail under Section 3.1.4. Let us denote this ratings of the predicted serendipity list by SRDP . In order to improve the level of serendipity, the predicted serendipity list is further rated using a genre-based collaborative technique as explained in Section 3.1.5. Let us denote the newly obtained ratings using the genres as SRDP genre . The difference in these ratings are then normalized and then converted to percentage as given by Eqn. 11 and denoted by ∆ n . For the movies, the normalizing factor is taken as 5.0 because of the 1-5 rating scale.
If ∆ n is negative (SRDP genre > SRDP ), it implies that the movie is more expected than unexpected because SRDP genre is generally considered more accurate than SRDP [8]. Movies whose ∆ n are below a certain threshold can be removed from the serendipity list. For example, with ∆ n = −5%, a plot of the top 100 serendipity movies are shown for random users against the ∆ n values in Figs. 6 and 7. We see that a significant number of movies are less serendipitous based on the above threshold.
User ID Number of less serendipitous movies 5  3  15  16  47  0  121  11  269  23  300  0  912 2 Table 4. The number of movies that are considered less serendipitous for different users in the predicted list of serendipity movies using a standard collaborative filtering method. for different users using Eqn.11 and a threshold of −5%. We see that for some users, the numbers are 0. For such users, the standard method of predicting the serendipity list is sufficient. However, for other users such as 15, 121, and 269, a significant fraction is less serendipitous. Filtering out this fraction of movies will lead to better list of serendipitous movie prediction.
Thus, the newly suggested metric is shown to very useful in increasing the quality of serendipity by filtering out those movies that are rated higher by the genre based recommender compared to the standard ones. Therefore, our methodology More serendipitous Figure 6. Plot of ∆ n from Eqn.11 for the serendipity list of movies generated by a standard collaborative filtering method for user 15. ∆ n below −5% are considered as less serendipitous. A significant fraction of movies has been identified as less serendipitous items using the present algorithm. More serendipitous Figure 7. Plot of ∆ n from Eqn.11 for the serendipity list of movies generated by a standard collaborative filtering method for user 121. ∆ n below −5% are considered as less serendipitous. A significant fraction of movies has been identified as less serendipitous items using the present algorithm.
can aid in much better prediction of serendipitous movies and thereby improve user experience. A main advantage of our proposed algorithm is that for those items where attributes or contents cannot be explicitly defined, the predicted serendipity list (see, Section 3.1.4) can be used as the final list.

LIMITATIONS
Since our recommender engine is primarily based on collaborative filtering techniques, it suffers from the following problems: 1. New User Problem: When a new user is introduced, recommendations cannot be produced for them since the person may not have rated any movies.
This limitation could however be addressed using content-based filtering using parameters such as age, occupation, gender etc., 2. New Item Problem: If an new item has not been rated, it will not participate in the recommendation list generation.

Very large datasets:
The ratings matrix is a dense N user × N items matrix. This does not scale for massively large (big) datasets. This can be overcome by using sparse matrix representations of the ratings. The SciPy module for Python supports a variety of sparse data representations. Such representations will be incorporated in future work.

Homogeneity of contents:
The in-house code is based on a single-content analysis which for the MovieLens dataset is its genre. Because of the homogeneous nature of the contents, the present code is not amenable to using multiple contents.

Lack of experimental validation:
A rigorous validation of the proposed algorithm is possible only through real-user experiments and the quality of the ratings obtained from these experiments. Such a validation will further improve the quality of the serendipitous items by identifying the bounds on the difference in the ratings to be used in our computational algorithm (see, Eqn. 11). [3] R. M. Esteves and C. Rong, "Using mahout for clustering wikipedia's latest articles: a comparison between k-means and fuzzy c-means in the cloud," The future work on this project will address the following issues. The newly developed code can easily handle small data sets such the 100K MovieLens Dataset.

List of References
With the advent of Big Data analytics, the current code is inefficient in handling very large data both due to the restrictions of the language and the type of data structures employed. Both these restrictions have to be addressed. In terms of the algorithms used, the present work has been built on collaborative filtering methods because of the popularity they have enjoyed thus far. Incorporating new and scalable methods such as clustering-based techniques for handling Big Data need to be addressed as well.