Recommender API -- Developer's Companion

(The updated version of this document can be found at http://drupal.org/project/recommender)

Basic idea

Suppose we have three mice, Alex, Becky and Carol. They all love cheese, but different kinds. Alex likes Spanish cheese and Italian cheese. Becky also likes Spanish and Italian cheese, but she likes Swiss too. Carol, on the other hand, hates Spanish and Italian cheese, but loves Cheddar cheese. Now, suppose we also learn Alex hates French cheese, can we guess what Becky and Carol would say about French cheese?

First, based on the fact that both Alex and Becky like Spanish and Italian cheese, we can infer that Alex and Becky have similar tastes. For the same reason, we can infer that Alex and Carol have quite opposite tastes. Then, since we know Alex hates French cheese, we can reasonably guess that Becky hates French cheese too because of the similar tastes, whereas Carol probably likes French cheese due to their opposite tastes.

That is the basic idea of recommender systems. It can be used in many applications too. For an online store, we can think of the customers as the mice in the mouse-cheese analogy, and the products as cheese, then we can generate products recommendations for customers. For a taxonomy system, we can think of the nodes to be tagged as the mice, and the tags as the cheese, then we can generate similarity scores for the nodes from the nodes-tags relationship, and we can also recommend terms for other nodes.

The Recommender API module provides a set of APIs to calculate similarities among the mice, and then predict how each mouse would evaluate each cheese based on the evaluation from other mice, and finally generate a list of recommended cheese to each mouse. To use the APIs, you just need to have the mouse-cheese relationship stored in a table, and then select the right algorithm to do the calculation. Different algorithms will be explained below.

Algorithms

Classical

This is the classical family of collaborative filtering algorithms based on correlation coefficients. The most famous examples include the User-User algorithm, and the Item-Item algorithm. If you are not sure which algorithm to use, just use this one because it performs well in most cases, and is widely used in many applications.

More readings:

Slope-One

This algorithm has much better performance. In some cases, it generates better results too (see the reading below). But it's not widely studied in the academia or widely appied to many real world practices. Another drawback is that it cannot compute similarities among the mice, but can only predict mouse-cheese scores. 

More readings:

Co-ocurrences

This is a very simple and high performance algorithm. It only calculates similarities among the mice by how many cheese they share. For example, if mouse A and mouse B like 4 types of cheese in common, then the similarity score between A and B would be 4. This is the algorithm used in the "Similar By Terms" module. However, this algorithm has one major drawback -- suppose a mouse simply loves all cheese, then that mouse would have the highest similarity score with all other mice, which is obviously not correct.

PageRank

To be developed. More readings: http://en.wikipedia.org/wiki/PageRank

SVD

To be developed. More readings: http://en.wikipedia.org/wiki/Singular_value_decomposition

PCA

To be developed. More readings: http://en.wikipedia.org/wiki/Principal_components_analysis

Influence Limiter

To be developed. This algorithm is supposed to prevent manipulation of the recommender system. More readings:

Comparison of algorithms

Similarity Prediction Incremental update In-memory calculation Missing data auto append 0-1 Weight field Performance Accuracy
Classical X X Partial X X poor high
Slope-one X medium medium
Coocurrence X X X high poor
PageRank (TBD)
SVD (TBD)
PCA (TBD)
Inflence Limiter (TBD)

How to use the APIs

The APIs usually require these parameters:

To use the APIs, you need to have the mouse-cheese table that stores records like "Mouse A rated Cheese X as 5", or "Mouse B dislikes Cheese Y". That table might be part of an existing table. In that case, you might want to create a view from the existing table, or create a table and insert mouse-cheese records into that table. Then pass the view name or table name as $table_name to the APIs.

Also, please be noted that the calculation might take a long time to finish due to the complexity nature of the task. The calling function should provide a user-friendly interface such as a progress bar. Also, you might want to consider providing a drush or drupal.sh interface to do the calculation offline.

In the recommender.module file, functions like recommender_similarity_*() are to calculate similarity scores for the mice. Functions like recommender_prediction_*() are to calculate prediction scores for the mice-cheese pairs. Functions starts with the underscore are private functions, not public APIs, and are subject to change. Other functions are helper functions.

Please refer to the comments in recommender.module for more details. You can also look at the code of the User-to-user Recommendation module as an example of how to use the APIs.

For support or other questions, please submit your request to the issue queue. Thanks.