Recommender System
Table of Content
1. Recommender Systems 1.1. Collaborative Filtering 1.1.1. User-based Collaborative Filtering 1.1.2. Item-based Collaborative Filtering 1.1.3. Considerations on Memory-based Filtering 1.2. Matrix Factorization 1.2.1. Implicit Ratings 1.2.2. Alternating Least Squares (ALS) 1.2.3. Predicting with ALS 1.3. Deep Learning Extension 1.4. Challenges of Collaborative Filtering 1.5. Content-based Filtering 1.6. Deep Learning Recommender Systems
1. Recommender Systems1.1. Collaborative FilteringThe goal of collaborative filtering is to use the user-item matrix to get an idea of how a user would respond to an unseen item.1.1.1. User-based Collaborative FilteringIn user-based collaborative filtering, the goal is to find similar users and recommend to a a user a particular item that is used by other similar users. Note: If we're using binary indicators in the user-item matrix, we can use Jaccard/Cosine/Hamming metrics to measure similarity. Jaccard Similarityno. matching1sno. matching1s+no.ui=1&uj=0+no.ui=0&uj=1* * Note: We don't count the number of matching 0s here.Cosine Similaritynuiujnui2.nuj2 Hamming Distance → sum of the differences → i=1n1(xiyi) How do we predict for a particular user (given the user similarities?)respo^nseup=usim(up,ui).responseuiusim(up,ui) Note: We can use the KNN to find k number of neighbor users for prediction.Note: We can use MSE to evaluate our predictions.Note: Generally, we'd like to recommend the item that generates the highest predicted value.Note: If the user-item is not binary, we can use the following similarity measures:EuclideanManhattan i=1n|xi-yi|Pearson Correlationn(u1-u¯1)(u2-u¯2)n(u1-u¯1)2n(u2-u¯2)2CosineSteps of user-based collaborative filtering:Find KNN of ui to form prediction for.Calculate the similarity score between the neighbor usersUsing the KNN, predict the response for ui. 1.1.2. Item-based Collaborative FilteringHere, we use the similarity between the items to make recommendations. Here, we calculate the item-item similarity.To predict user response, we use a similar formula as above → respo^nsepl=psim(pl,pi).responsepipsim(pl,pi) Steps of item-based collaborative filtering:Calculate item-item similarity matrixPredict response for uiRecommend the item with the highest prediction 1.1.3. Considerations on Memory-based FilteringTime complexity of user-basedO(u.p)O(u+p)Greater diversity More Expensive (because of KNN calculations)Time complexity of item-basedO(u,p2)O(u.p)Less re-calculations (items change less than users)Lack of diversity The user-item matrices are sparse, so the → denotes the effective time complexity.Both user-based and item-based are collectively part of memory-based filtering.Time Decay: We can apply time decay to the predictions as follows → respo^nseup=usim(up,ui).dt.responseuiusim(up,ui) where dt=0.5thalf-life Time decay means that as time goes on, less influence will be given to a particular rating.Inverse User Frequency: Another thing that we can do is to apply more weights to less frequent items → respo^nsepl=psim(pl,pi).fi.responsepipsim(pl,pi) where fi=log(Nn+1) where * N → total number of users and* n → no. of users who interacted with item i.fi is called Inverse User Frequency (IUF). 1.2. Matrix FactorizationOne problem withe memory-based approaches is that the user-item matrix (or item-item similarity) matrix is extremely large.One approach we can use is Matrix Factorization → it takes the same user-item matrix and factorizes it into some terms as followsU.P=u~Tp~+bp+buHere, both u~T and p~ need to be learned.Also, note that we have two bias terms, one for the users, bp, and one for the items, bu.The loss function for matrix factorization is,L=1NN(U.P-u~Tp~-bp-bu)2+𝜆(u||ui||2+p||pi||2+u||bu,i||2+p||bp,i||2)The second term is the regularization term. 1.2.1. Implicit RatingsIf we have some binary user item matrix, we can transform that into a non-binary matrix of implicit ratings. Example: Let's say the implicit rating (rimp) of an item is like this,* 1view* 2like* 3commentWe can multiply (1+𝛼rimp) to all the binary values where 𝛼 is an adjusting parameter which can be tuned through cross validation.With the implicit ratings, the loss function looks like this, L=1NN(U.P.Cpu-u~Tp~-bp-bu)2+regularization1.2.2. Alternating Least Squares (ALS)Since we need to optimize with respect to both u~T and p~ , we can't use ordinary least squares (like in linear regression).We have to use something called Alternating Least Squares (found in libraries such as Spark ML).ALS keeps constant one of u~T or p~ and performs OLS on the other one. Then, it would fix the other one, and runs an OLS on the one that kept fixed in the previous round.ALS performs well in practice and we can also tune the dimensions of U and P. 1.2.3. Predicting with ALSHere's the initial equation → U.P=u~Tp~+bp+buBy estimating this equation with ALS, we can get prediction for a particular user and an item as follows → pred(ui,pj)=u~rowiTp~columnj+bpj+bui The problem with the above approach is that, we'd have to retrain for every new customer that came in.Note: To evaluate the performance of the predictions, we can use MSE on the validation set.How can we avoid having to retrain these models for every new customer? 1.3. Deep Learning ExtensionTo avoid the prediction limitation of ALS, we can use the deep learning extension of matrix factorization.In this case, the row and column of the user-item matrix are the inputs to the NN.These inputs would have their own separate fully connected layers which would act as an embedding layer.Then, the results of these embedding layers are combined into a single fully connected layer to finally be sent through a linear activation function.The NN output would be the prediction for a particular user and item.This typically requires less retraining. 1.4. Challenges of Collaborative FilteringCold-start Problem We can't generate predictions for brand new users with no information (or items with no purchase history).There are some ways to mitigate this such as:* Recommending new users popular items* Presenting new items to random subgroupsEcho ChamberLet's say we recommend an item to a user so that increments some of their values in the user-item matrix.Now that value has been incremented, it has more weights for other users and it just spins into this positive feedback loop of recommending potentially the same or very similar items over and over.This doesn't make for a very good customer experience because these users won't see the variety that they need to see (they just see the same thing).One way to avoid that is to recommend some random new items into the mix. Shilling AttacksThis happens in systems where everyone can provide a rating.For example, someone can provide a bunch of fake accounts to promote its own products in a platform (similarly when disliking a product).We can limit that by allowing one user per phone number for example. 1.5. Content-based FilteringContent-based filtering represents users with respect to items that they've interacted with.Now, if we get the dot product of user vector with respect to every other items, and the item with the highest dot product will be recommended.Content-based filtering doesn't require any user data to make a prediction about a particular user.So, if you have tons of users, you can use content-based filtering to avoid running KNN.The downside of content-based filtering is that it requires context about the items, i.e. products are need to evaluated in terms of some of their characteristics. 1.6. Deep Learning Recommender SystemsWe can use deep learning to combine collaborative and content-based filtering.It's going to be a similar NN as described in section ?, but with addition of input (and embedding layers) of user and item features. Back to Top