Recommender systems
2024-12-18
Just a couple of seconds compounded to become many hours. I was engaged, though it was far from intentional. Yes, I was actively destroying my brain. Every few seconds my mind would be cast in a different direction. I couldn't help it. It just felt so natural.
Oh the algorithm
Having watched The Social Dilemma on Netflix about two years ago in high school, I became aware of just how powerful data is. There's a reason why social media applications account for almost all of the typical teenager's screen time. Embedded within these apps are algorithms that somehow seem to know more about us than we do. In the documentary they explained that by tracking every minute detail – from which person you follow to how long you hover over a single comment – leads to a more improved prediction. Be it a prediction of what song you might like or what next reel you should watch. Being sucked into an endless wormhole of short-form content – what we now know as doomscrolling – seemed a scary concept to me. Indeed some research has shown that the underlying reinforcement learning based approaches attempt to curate content in a way that makes the human more predictable, so that engagement can be maximized for as long as possible.
No, I refuse. I didn’t like the idea of this blacbox algorithm taking over my brain one bit. So I took every preventive measure I could: I turned my phone into grayscale, set website blockers on my laptop, and installed Chrome extensions to turn YouTube shorts and recommendations off.
In hindsight that might seem excessive, but it’d somewhat worked for me. Without Snapchat or Instagram on my phone I didn't feel any real need to look at it, my mind now didn't have a default to go to everytime I was mildly bored. But complete abstinence wasn't a solution for very long. You can only stay away from platforms where everyone you know is 'interacting' with each other for so long. Plus there's some truly valuable content sprinkled here and there. And so short-form content engagement has continued and will for a while.
Tryna build
And so I set out to make a new algorithm. But before that, an MVP. I had only one goal in mind: give the user some sort of control. So I interviewed every consumer of short form content within a the 1-floor radius of my college dorm building to analyze their behavior better. There were two common trends. Firstly, no matter what app the user used they always only had one option: to scroll down. Secondly, after a timed 1 minute scrolling session no one seemed to remember the category of the first few videos they’d watched, or what they’d gotten out of it. Now I’d identified two features I needed in my MVP.
To test I scraped 24,000+ TikToks with relevant post and description metadata from different creators. I then used ViViT – a video vision transformer to generate video embeddings of the short form content I had just scraped so I would have an algorithmic way of selecting what videos to show to the user. Optimizing the speed of this embedding generation is tough (24k videos take time!), so I manipulated the scraped videos by splitting them into an arbitrary number of frames and averaged the embedding generated for each frame, giving more weight to the later portions of the video since their embeddings capture contextual information coming from the previous parts. I worked with Pinecone, a vector database, to store these embeddings and used some of the built-in vector and similarity search functions, which formed the core of my basic algorithm.
At this point my algorithm was essentially a super mini-version of TikTok. You scroll down and you see similar videos. But that wasn’t the only purpose. With all the embeddings already in a vector database, I implemented a version of k-means clustering on the data to segment all the videos into a desired number of categories, and developed an automated labeling system using OpenAI’s CLIP to give a standardized category label to each video based on the stored metadata from earlier. I thought of this like a group of trees – videos of similar topics followed one another and were interconnected, and some had branches shared with other topics that had a labelled root node. This group of trees was the user’s entire scrolling journey, and they could choose how they wanted to navigate it. It made perfect sense to name this application Grove.
So what’s the point? Instead of just scrolling down mindlessly, a Grove user can now see what specific video category they are viewing and can choose to scroll either left or right to a different category if they desire. After ensuring that the label generation was decently accurate and that the algorithm was somewhat functional in appropriately giving interesting recommendations if the user chooses to scroll down, I put together a web app with the two main features. The user logs into Grove and after a quick on-boarding process that involves selecting a couple of categories, is taken to a TikTok-like scrolling page. Here, the category of the current content is clearly displayed, and the user can scroll down to see similar videos (go deeper into the tree), or go left or right to a different category (visit another branch/subtree).
Looking back on the testing I did after developing my MVP, I realized that users agreed that it was an interesting concept. Still there are many improvements to be made in the video categorization and scrolling tree creation processes, and in moving away from a static feed consisting of the same 24,000 TikTok videos to something more scalable and dynamic.
I’m still continuing to add features and improve the application experience overall. For example, now in the settings page you can see what your feed is made up of. I've been seeing a lot of Indian beta content on mine recently.
Looking back
Editing this post on the 31st of July 2025. Lots that can be improved on and in reality I just needed to ship faster. Incoming.