YouTube-8M Kaggle Challenge: Learnings and Takeaways
By: Chuck Cho, Senior AI Research Scientist at Axon
Axon's AI team participated in a Kaggle challenge for a video classification problem. Kaggle is an online competition arena for machine learning (ML) researchers and practitioners alike, competing for prizes, kudos, learning and education for oneself, or maybe one additional line in their resume. We engaged in the competition named “The 2nd YouTube-8M Video Understanding Challenge,” and achieved top 4% (17th place among 394 teams). The goal of the challenge was to develop an ML algorithm that accurately predicts (possibly multiple) labels associated with each unseen test video. As the name suggests, the dataset consists of millions of videos from YouTube covering a diverse set of concepts and topics (about 4,000 classes, see Figure 1).
Video classification opens a pathway to video understanding in general. Once we have an efficient and scalable solution for this, there are plenty of exciting applications -- video summarization (condensing a long, and mostly boring, video into a concise video that contains only meaningful highlights), smart playback capability (automatically changing video playback speed depending on interestingness), video retrieval (“find the red Chevy van”), video matching (given a query video, find similar videos), and automatic report generation (describing the content of a video in a report format), and so forth.
Our team joined forces in thoughtful project management and planning, cooperation and coordination of action items among teammates, best practices around machine learning techniques, and tenacity to win a competition. The Axon team was made up of researchers and engineers from Seattle and Ho Chi Minh City who volunteered to take on the challenge as a side project – mostly out of their own spare time. We are very proud that we earned top <1% performance and are invited to present our approach at the European Conference on Computer Vision (ECCV), a prestigious ML conference.
As a team, we have learned a few things: Clear strategy, teamwork, a methodical design of experiments, and a focused and positive mindset all helped us succeed. Learning these skills and exhibiting these attitudes contributed to the win. We kicked off this competition by identifying and itemizing the most important and time-sensitive tasks, and prioritizing and splitting work among team mates. Through this method, every one of us had a clear ownership of a little component.
Given a quite tight deadline (we only had 2 months and each of us spent up to a few hours a week), a systematic and efficient way to design and run experiments was crucial. For example, from the get-go we sub-sampled training data (used only 1/2) and validation data (used only 1/10) to expedite training and validation processes, yet the impact of each incremental experiment was still representative of the case with the entire data.
Lastly, we unwaveringly focused on winning and kept working as planned even if we fell out of top spots in the leaderboard from time to time (we once ranked 40th).
This was a fun and rewarding side project and we’re happy that the Axon AI team ended up very high in the leaderboard.
Figure 1: A tag-cloud representation of the top 200 entities. Font size is proportional to the number of videos labeled with the entity. Credit: Google AI, https://arxiv.org/abs/1609.08675