Matrix factorization (recommender systems)

Matrix factorization is a class of collaborative filtering algorithms used in recommender systems.

[1] This family of methods became widely known during the Netflix prize challenge due to its effectiveness as reported by Simon Funk in his 2006 blog post,[2] where he shared his findings with the research community.

The prediction results can be improved by assigning different regularization weights to the latent factors based on items' popularity and users' activeness.

[3] The idea behind matrix factorization is to represent users and items in a lower dimensional latent space.

Since the initial work by Funk in 2006 a multitude of matrix factorization approaches have been proposed for recommender systems.

The original algorithm proposed by Simon Funk in his blog post[2] factorized the user-item rating matrix as the product of two lower dimensional matrices, the first one has a row for each user, while the second has a column for each item.

The row or column associated to a specific user or item is referred to as latent factors.

[4] Note that, in Funk MF no singular value decomposition is applied, it is a SVD-like machine learning model.

Specifically, the predicted rating user u will give to item i is computed as: It is possible to tune the expressive power of the model by changing the number of latent factors.

Increasing the number of latent factors will improve personalization, therefore recommendation quality, until the number of factors becomes too high, at which point the model starts to overfit and the recommendation quality will decrease.

A common strategy to avoid overfitting is to add regularization terms to the objective function.

All things considered, Funk MF minimizes the following objective function: Where

[8] While Funk MF is able to provide very good recommendation quality, its ability to use only explicit numerical ratings as user-items interactions constitutes a limitation.

Modern day recommender systems should exploit all available interactions both explicit (e.g. numerical ratings) and implicit (e.g. likes, purchases, skipped, bookmarked).

[9][10] Compared to Funk MF, SVD++ takes also into account user and item bias.

Even though the system might have gathered some interactions for that new user, its latent factors are not available and therefore no recommendations can be computed.

This is an example of a cold-start problem, that is the recommender cannot deal efficiently with new users or items and specific strategies should be put in place to handle this disadvantage.

[12] A possible way to address this cold start problem is to modify SVD++ in order for it to become a model-based algorithm, therefore allowing to easily manage new items and new users.

If the system is able to gather some interactions for the new user it is possible to estimate its latent factors.

Note that this does not entirely solve the cold-start problem, since the recommender still requires some reliable interactions for new users, but at least there is no need to recompute the whole model every time.

[6] It clusters users and items based on dependency information and similarities in characteristics.

In recent years many other matrix factorization models have been developed to exploit the ever increasing amount and variety of available interaction data and use cases.

Hybrid matrix factorization algorithms are capable of merging explicit and implicit interactions[15] or both content and collaborative data[16][17][18] In recent years a number of neural and deep-learning techniques have been proposed, some of which generalize traditional Matrix factorization algorithms via a non-linear neural architecture.

[19] While deep learning has been applied to many different scenarios: context-aware, sequence-aware, social tagging etc.

its real effectiveness when used in a simple Collaborative filtering scenario has been put into question.

Systematic analysis of publications applying deep learning or neural methods to the top-k recommendation problem, published in top conferences (SIGIR, KDD, WWW, RecSys, IJCAI), has shown that on average less than 40% of articles are reproducible, with as little as 14% in some conferences.

Overall the studies identify 26 articles, only 12 of them could be reproduced and 11 of them could be outperformed by much older and simpler properly tuned baselines.

The articles also highlights a number of potential problems in today's research scholarship and call for improved scientific practices in that area.