What is Collaborative Filtering

‍Introduction

Imagine you’re at a bookstore to find your next read, and your friend recommends you an action novel promising that it will be just as fun as the typical romance novels that you read. Although it’s a completely different genre, your friend knows you enjoy dramatic, emotional stories and promises that this action novel will be enjoyable. Surprisingly, you end up loving it and look for more action novels with similar emotional depth. Amazing, right?

These real-life instances of product discovery and recommendations are what collaborative filtering aims to mimic, and it happens on all of your favorite platforms: YouTube, Amazon, Netflix, Spotify, Linkedin, etc.

Overview

Recommendation systems provide personalized recommendations by learning about users’ interests through traces of interactions. Based on past behaviors, these systems aim to predict future purchases.

It’s like the typical recommendations that pop up on your YouTube and Amazon homepages- machine learning algorithms work hard to recommend you items based on your past activity, but collaborative filtering takes it to the next level by using data from other users, not just you. It’s like…a group of similar people collaborating to help each other find products they might like.

You can read a general overview of product recommendation engines here to familiarize yourself with recommender systems before diving into collaborative filtering.

Collaborative filtering considers user-item interactions to make predictions. We don’t examine characteristic information such as categories, keywords, user profiles, and preferences. Rather, the focus is on ratings, purchases, saves, likes, and dislikes.

This is the basics of how collaborative filtering works:

Organize a large customer base to extract data from.

Find a smaller set of users from the initial group based on similar preferences and activity.

Determine a new set of products to recommend based on combining what the group likes.

To make the recommendations accurate, collaborative filtering takes on an exhaustive set of data about both the users and products:

Implicit feedback: search history, order history, clicks. views– any interaction that demonstrates some interest towards a specific product is recorded.

Explicit feedback: Direct feedback like ratings, dislikes, likes, favorites, and reviews are also integral to predicting whether or not the product will actually be desirable to the set of customers.

User-based vs. Item-based Approaches

There are two approaches to collaborative filtering: user-based and item-based.

User-based filtering identifies similar users based on past implicit and explicit feedback gathered. The goal is to make recommendations based on what other users in the group have enjoyed, with hopes that the recommended user will also like the recommendations. For example, there could be 2 users, A and B that both rated a movie 4.5 stars. Even better, they have watched many of the same movies in the past, all of which they have given similar ratings for. Based on this similarity, movies that user A enjoyed could be recommended to user B if they haven't watched it yet.

Item-based filtering identifies similarities between some set of items to predict whether or not the user would like it. For example, there are two items, A and B. How many users purchased A, and how many purchased B as well? If these consumer behaviors are similar enough, then A would be recommended to users who only purchased B, and B would be recommended to users who only purchased A.

The Math Involved

Collaborative filtering leverages user-item matrices and similarity functions.

This means that familiarity with linear algebra concepts and matrices are vital for visualizing the past interactions between users and items in order to predict new interactions.

This is what a user-item matrix generally looks like:

That chart looks a bit daunting, so here’s another one:

Picture Source

We call this a user-item matrix, but it’s basically just a table that contains user ratings for each item.

Each row represents a user, and each column represents a specific set of items. Each entry displays the interactions made, which in the example above is the rating.

Although these tables can be made simply by using Google Sheets or Excel, familiarity with linear algebra is important because we turn these entries into vectors.

For example, the bottom-leftmost user’s ratings can be turned into a vector, (6, -1, -1, -1, 8, 10). These mathematical approaches allow us to manipulate and analyze user-item interaction data to make accurate predictions.

This is a very simple introduction to the math involved in collaborative filtering, but there are many methods involving linear algebra such as matrix factorization, eigenvalues, eigenvectors, and other matrix operations.

I gathered a list of resources that dive deeper into the math:

Introduction to Recommender Systems

Maths Behind Collaborative Recommendation System

Recommender Systems with Python (Focuses on the user-item matrix)

Memory Based Collaborative Filtering– User Based

Limitations to Collaborative Filtering

Collaborative filtering is an excellent way to incorporate personalization, pleasant exposure to products that would otherwise go undiscovered, and encourage engagement between customers and businesses. However, there are some limitations to this technique.

Cold Start Problem

As one might expect, collaborative filtering is difficult to use on new users and products. As a technique that relies on past interactions, new users and products don’t have enough (or none) interaction history. How could a recommendation system provide new suggestions if there’s no past purchases, browsing history, or ratings to learn from?

Data Sparsity Problem

This example of a user-item matrix is ideal, but not always the case.

Sometimes, the matrix can look something like this:

Yes, it’s still a user-item matrix with data, but it’s just too little information to make accurate conclusions about what users might like or dislike. Although all the users have rated the items, they didn’t rate the same items.

Shilling Attack Problem

Have you ever made multiple accounts, whether it was because you forgot about your past accounts or just to save membership costs? A shilling attack happens when multiple accounts are created maliciously, usually to promote or demote certain content to cut competitors. This manipulates existing users into consuming specific content that wouldn’t have been recommended without the shilling attack. But, fear not. Your accidental second account most likely did not make a big difference to the recommendation system.

Conclusion

Collaborative filtering is undeniably a great way for users to find new products that they may never have come across if it weren’t for the recommendations that popped up on their homepage.

With the right data, it is a reliable way to provide serendipitous recommendations and enhance the customer experience. However, data is hard to recognize– some could be malicious (like the shilling attack problem), and there are factors to consider such as changing user preferences and sparse data sets that past interactions cannot capture.

As you now understand, collaborative filtering is just one type of recommendation system– there are numerous techniques that your business can utilize, such as content-based filtering and hybrid filtering.