14_recommender

Recommender System

Table of Content

1. Recommender Systems 1.1. Collaborative Filtering 1.1.1. User-based Collaborative Filtering 1.1.2. Item-based Collaborative Filtering 1.1.3. Considerations on Memory-based Filtering 1.2. Matrix Factorization 1.2.1. Implicit Ratings 1.2.2. Alternating Least Squares (ALS) 1.2.3. Predicting with ALS 1.3. Deep Learning Extension 1.4. Challenges of Collaborative Filtering 1.5. Content-based Filtering 1.6. Deep Learning Recommender Systems

1. Recommender Systems1.1. Collaborative Filtering• The goal of collaborative filtering is to use the user-item matrix to get an idea of how a user would respond to an unseen item.1.1.1. User-based Collaborative Filtering• In user-based collaborative filtering, the goal is to find similar users and recommend to a a user a particular item that is used by other similar users. • Note: If we're using binary indicators in the user-item matrix, we can use Jaccard/Cosine/Hamming metrics to measure similarity.– – Jaccard Similarity →

\frac{no. matching 1 s}{no. matching 1 s + no. u_{i} = 1 & u_{j} = 0 + no. u_{i} = 0 & u_{j} = 1}

* * Note: We don't count the number of matching

0

s here.– Cosine Similarity →

\frac{\sum_{n}^{} u_{i} u_{j}}{\sqrt{\sum_{n}^{} u_{i}^{2}} . \sqrt{\sum_{n}^{} u_{j}^{2}}}

– – Hamming Distance → sum of the differences →

\sum_{i = 1}^{n} 1 (x_{i} \neq y_{i})

– • How do we predict for a particular user (given the user similarities?) →

r e s \hat{p o} n s e_{u_{p}} = \frac{\sum_{u}^{} s i m (u_{p}, u_{i}) . r e s p o n s e_{u_{i}}}{\sum_{u}^{} s i m (u_{p}, u_{i})}

– • Note: We can use the KNN to find

k

number of neighbor users for prediction.• Note: We can use MSE to evaluate our predictions.• Note: Generally, we'd like to recommend the item that generates the highest predicted value.• Note: If the user-item is not binary, we can use the following similarity measures:– Euclidean– Manhattan →

\sum_{i = 1}^{n} | x_{i} - y_{i} |

– Pearson Correlation →

\frac{\sum_{n}^{} (u_{1} - {\bar{u}}_{1}) (u_{2} - {\bar{u}}_{2})}{\sqrt{\sum_{n}^{} (u_{1} - {\bar{u}}_{1})^{2}} \sqrt{\sum_{n}^{} (u_{2} - {\bar{u}}_{2})^{2}}}

– Cosine• Steps of user-based collaborative filtering:– Find KNN of

u_{i}

to form prediction for.– Calculate the similarity score between the neighbor users– Using the KNN, predict the response for

u_{i}

. 1.1.2. Item-based Collaborative Filtering• Here, we use the similarity between the items to make recommendations. • Here, we calculate the item-item similarity.• To predict user response, we use a similar formula as above →

r e s \hat{p o} n s e_{p_{l}} = \frac{\sum_{p}^{} s i m (p_{l}, p_{i}) . r e s p o n s e_{p_{i}}}{\sum_{p}^{} s i m (p_{l}, p_{i})}

• Steps of item-based collaborative filtering:– Calculate item-item similarity matrix– Predict response for

u_{i}

– Recommend the item with the highest prediction 1.1.3. Considerations on Memory-based Filtering• Time complexity of user-based →

O (u . p) \to O (u + p)

– Greater diversity

↑

– More Expensive

↓

(because of KNN calculations)• Time complexity of item-based →

O (u, p^{2}) \to O (u . p)

– Less re-calculations

↑

(items change less than users)– Lack of diversity

↓

• The user-item matrices are sparse, so the → denotes the effective time complexity.• Both user-based and item-based are collectively part of memory-based filtering.• Time Decay: We can apply time decay to the predictions as follows →

r e s \hat{p o} n s e_{u_{p}} = \frac{\sum_{u}^{} s i m (u_{p}, u_{i}) . d_{t} . r e s p o n s e_{u_{i}}}{\sum_{u}^{} s i m (u_{p}, u_{i})}

where

d_{t} = {0.5}^{\frac{t}{half-life}}

• – Time decay means that as time goes on, less influence will be given to a particular rating.• Inverse User Frequency: Another thing that we can do is to apply more weights to less frequent items →

r e s \hat{p o} n s e_{p_{l}} = \frac{\sum_{p}^{} s i m (p_{l}, p_{i}) . f_{i} . r e s p o n s e_{p_{i}}}{\sum_{p}^{} s i m (p_{l}, p_{i})}

where

f_{i} = \log (\frac{N}{n + 1})

– where *

N

→ total number of users and*

n

→ no. of users who interacted with item

i

.–

f_{i}

is called Inverse User Frequency (IUF). 1.2. Matrix Factorization• One problem withe memory-based approaches is that the user-item matrix (or item-item similarity) matrix is extremely large.• One approach we can use is Matrix Factorization → it takes the same user-item matrix and factorizes it into some terms as follows

U . P = {\tilde{u}}^{T} \tilde{p} + b_{p} + b_{u}

• Here, both

{\tilde{u}}^{T}

and

\tilde{p}

need to be learned.• Also, note that we have two bias terms, one for the users,

b_{p}

, and one for the items,

b_{u}

.• The loss function for matrix factorization is,

L = \frac{1}{N} \sum_{N}^{} (U . P - {\tilde{u}}^{T} \tilde{p} - b_{p} - b_{u})^{2} + 𝜆 (\sum_{u}^{} | | u_{i} | |^{2} + \sum_{p}^{} | | p_{i} | |^{2} + \sum_{u}^{} | | b_{u, i} | |^{2} + \sum_{p}^{} | | b_{p, i} | |^{2})

• The second term is the regularization term. 1.2.1. Implicit Ratings• If we have some binary user item matrix, we can transform that into a non-binary matrix of implicit ratings. – Example: Let's say the implicit rating (

r_{i m p}

) of an item is like this,*

1 \to v i e w

2 \to l i k e

3 \to c o m m e n t

– We can multiply

(1 + 𝛼 r_{i m p})

to all the binary values where

𝛼

is an adjusting parameter which can be tuned through cross validation.– With the implicit ratings, the loss function looks like this,

L = \frac{1}{N} \sum_{N}^{} (U . P . C_{p u} - {\tilde{u}}^{T} \tilde{p} - b_{p} - b_{u})^{2} + r e g u l a r i z a t i o n

1.2.2. Alternating Least Squares (ALS)• Since we need to optimize with respect to both

{\tilde{u}}^{T}

and

\tilde{p}

, we can't use ordinary least squares (like in linear regression).• We have to use something called Alternating Least Squares (found in libraries such as Spark ML).• ALS keeps constant one of

{\tilde{u}}^{T}

\tilde{p}

and performs OLS on the other one. Then, it would fix the other one, and runs an OLS on the one that kept fixed in the previous round.• ALS performs well in practice and we can also tune the dimensions of

U

and

P

. 1.2.3. Predicting with ALS• Here's the initial equation →

U . P = {\tilde{u}}^{T} \tilde{p} + b_{p} + b_{u}

• By estimating this equation with ALS, we can get prediction for a particular user and an item as follows →

p r e d (u_{i}, p_{j}) =

{\tilde{u}}_{r o w_{i}}^{T} {\tilde{p}}_{c o l u m n_{j}} + b_{p_{j}} + b_{u_{i}}

• The problem with the above approach is that, we'd have to retrain for every new customer that came in.• Note: To evaluate the performance of the predictions, we can use MSE on the validation set.• How can we avoid having to retrain these models for every new customer? 1.3. Deep Learning Extension• To avoid the prediction limitation of ALS, we can use the deep learning extension of matrix factorization.– In this case, the row and column of the user-item matrix are the inputs to the NN.– These inputs would have their own separate fully connected layers which would act as an embedding layer.– Then, the results of these embedding layers are combined into a single fully connected layer to finally be sent through a linear activation function.– The NN output would be the prediction for a particular user and item.• This typically requires less retraining. 1.4. Challenges of Collaborative Filtering• Cold-start Problem – We can't generate predictions for brand new users with no information (or items with no purchase history).– There are some ways to mitigate this such as:* Recommending new users popular items* Presenting new items to random subgroups• Echo Chamber– Let's say we recommend an item to a user so that increments some of their values in the user-item matrix.– Now that value has been incremented, it has more weights for other users and it just spins into this positive feedback loop of recommending potentially the same or very similar items over and over.– This doesn't make for a very good customer experience because these users won't see the variety that they need to see (they just see the same thing).– One way to avoid that is to recommend some random new items into the mix. • Shilling Attacks– This happens in systems where everyone can provide a rating.– For example, someone can provide a bunch of fake accounts to promote its own products in a platform (similarly when disliking a product).– We can limit that by allowing one user per phone number for example. 1.5. Content-based Filtering• Content-based filtering represents users with respect to items that they've interacted with.• Now, if we get the dot product of user vector with respect to every other items, and the item with the highest dot product will be recommended.• Content-based filtering doesn't require any user data to make a prediction about a particular user.– So, if you have tons of users, you can use content-based filtering to avoid running KNN.• The downside of content-based filtering is that it requires context about the items, i.e. products are need to evaluated in terms of some of their characteristics. 1.6. Deep Learning Recommender Systems• We can use deep learning to combine collaborative and content-based filtering.• It's going to be a similar NN as described in section ?, but with addition of input (and embedding layers) of user and item features. Back to Top