Latest Posts

  • Kaggle Journey to Competitions Master

    I started entering Kaggle competitions near the start of 2021. I had previously been working on a few machine learning side projects, but since starting to work full-time as an ML engineer I found that I didn’t really have the time or energy to devote to working on a full machine learning project lifecycle, in addition to doing the same at my job.
  • Jigsaw Rate Severity of Toxic Comments 14th Place Solution

    The goal of this competition was to build a system which can predict how toxic online comments are. What separated this from other similar sentiment analysis tasks was that the comments were divided randomly into pairs and then the comments in each pair were ranked by annotators.
  • Predicting the CEFR Level of English Texts

    This time I wanted to see if I could predict the CEFR level of a given text. This kind of system is a useful tool for teachers or self-studying students as it helps them find reading material of an appropriate difficulty level.
  • Token Classification With Subword Tokenizers for Bulgarian

    The amount of tools for Bulgarian language learners seems pretty limited, so I thought I’d try building my own. I wanted to know what the individual words in the sentences I Google-translating were doing, so I decided to train a part-of-speech (POS) tagger. While I was at it I also trained a model for named-entity recognition (NER).
  • Classifying Heavy Metal Subgenres with Mel-spectrograms

    Training a neural network to classify music genres is not a new idea, but I thought it would be interesting to see if one could be trained to classify subgenres on a more precise level.
  • Using Spotipy to Collect Track Data

    For a recent project on classifying music genres, I needed to collect a large dataset of labelled tracks. The Spotify API is ideal for this because, a long with a variety of tabular track data, you can download 30-second track samples from the majority of tracks. An easy way to use the Spotify API in Python is through Spotipy.
  • Generating Questions Using Transformers

    The original goal of this project was to create a system to allow independent learners to test themselves on a set of questions about any text that they choose to read. This means that a learner would be able to pick texts that are about topics they find interesting, which will motivate them to study more. In order to achieve this, I decided to train a neural network to generate questions.