Recommender systems across all domains (Media, Ecommerce, Online Travel) suggesting all kinds of content (Videos, Text articles, SKU’s of retail products, etc) face a standard set of issues:
For a new user – the recommender system struggles to suggest the right content to him because it has no idea of what interests him and hence, until it gathers more information on the user through his behaviour, etc, the system doesn’t have a strong idea of what might interest/engage him. Same issue for new content – the recommender system has no information on user affinities for this item and hence no idea of how to suggest this content to users.
- The information that is already known about the user can be used to infer what would interest him like the location he’s from or the referring source for his arrival into the website (Search engine, advertisements, etc)
- Content based recommendations work when there isn’t enough information about the user based on the items he’s interacting with such as ‘Many users who viewed this also viewed’ and ‘Similar content to viewed item’ to provide meaningful suggestions
- As the behavioural data on the user is less, smaller signals of user intent such as the dwell time on certain pages, product views can be used to provide suggestions
A recommender system holds every users relationship/interactions with every content in memory to infer his context. This means that for m users and n pieces of content, a m*n matrix would be needed which becomes massive. This slows down computations based on this matrix
- This matrix is bound to have a lot of zeroes considering that users wouldnt have interacted with most of the content on most websites. Matrix factorization can be used here to reduce the dimensionality.
- The row/column containing the users in the matrix can be substituted with the segments of users instead. This brings down the number drastically.
Serendipity ensures that users have the chance to explore and find new content that delights/interests them. Many algorithms work on rigidly providing the content that adheres to their computations but does not surprise the user with unexpected interesting content.
- Websites should have a section that just provides content that could be considered non-mainstream that are likely to interest the user
- For any algorithm, an item should be thrown into the mix that is agnostic of the algorithm and would appeal to everyone
- For any algorithm, items should be thrown into the mix that are related to the original items but one step away in some regard – like the parent of the item, or from a different brand, or from the next price bracket, etc
Handling long tail
For any website, there will always be content that isn’t mainstream and appeals only to a certain set of users. These content tend to be passion/hobby centric. Discovery of these items becomes difficult as most algorithms tend to push more popular/better performing content.
- The user behaviours on the website should be analysed – if it turns of that the user is generally interacting with more long tail content, it means that the user is more experimentative and thus more likely to try more of these content.
- The category preferences of the user can be used to push more long tail content in the same category