This page contains details about projects that I have worked on my own time and out of my own interest.

If you would like to know about any project in particular, I would love to engage in a conversation.


Playground Projects

All projects in this section are pieces I have worked on prior to getting my first break as a data professional.

TED Talk Content Analysis

A project analysing how the content of TED talks have evolved over the years in terms of the presentation styles of speakers as well as the topics discussed.

The Project Overview: The project involves fresh collection of data from the official TED website with web scraping and quantitiative text analysis of the transcripts. The project is completely documented and can provide a blueprint for a side-project on NLP. It was submitted as the Final Project for MY459 (Quantitative Text Analysis) for the MSc Applied Social Data Science 2020-21 degree at LSE.

The Links to the Project :

Keywords : Web scraping, NLP, Python


Analyzing Kaggle DS and ML Survey 2020

An analysis of the survey focused on Indian respondents who were under the age of 21 years

The Project Overview : In this project, I present an attempt to explore the Kaggle survey responses of young data science aspirants from India and to understand their current state in data science by dissecting my finds across multiple themes.

The Links to the Project :

Keywords : Data analysis, Data storytelling, Tabular data, Python


Making every bit count - Where to invest to combat air pollution in India?

Identifying the cities that require immediate attention to their increasing air pollution

The Project Overview : This project involves a study on India’s growing unrest with respect to air quality issues. It takes into account quantitative data depicting AQI levels in differnet cities across the country and makes a suggestion as to which cities most require external investment to improve the quality of air and thus, reduce the detrimental effects on the population.

The Links to the Project :

Keywords : Data Analysis, Data storytelling, Time series data, Python, Data for Good


Understanding the Extent of Police Abuse in the United States of America

Reviewing both sides of the coin with the help of publicly available data

The Project Overview : Policing is no-doubt a difficult job. But, its also something that provides too much power into the hands of a few individuals. In such a social setup, it becomes difficult to hold people with authority accountable for their actions. The issue of racial bias in the policing system in US has been discussed widely and largely across nations. However, most views are single-dimensional. In this project, I make an attempt to provide different perspectives of the problem.

The Links to the Project :

Keywords : Data analysis, Data storytelling, Data for Good, Python, Tabular data


Amplifying the Impact of the CTDS Shows on YouTube

Dissecting the “success” of the podcast’s YouTube episodes and towards formulating a strategy to reach a wider audience

The Project Overview : This project is my official submission to the CTDS.show contest held on Kaggle. The task was to unearth insights and tell an interesting/actionable story with the dataset of the Chai Time Data Science podcasts. I won the contest and more details on my experience on this contest has been described in this blog post.

The Links to the Project : The following links provide further details on my work in this project.

Keywords : Data analysis, Data storytelling, Text data, Python, Tabular data, Youtube analytics


CouReco - Simple Course Recommendation System for Online Courses

The Project Overview : CouReco is a minimalistic system built on the idea to help learners navigate through the courses on Coursera, aided by a data-driven strategy. Currently, CouReco only performs the task of identifying the most similar and most dissimilar courses to a selected course that is chosen by the learner from a pool of courses relevant to the skills the learner is interested in.

The Links to the Project : The project’s detailed code and related documentation can be found here

Keywords : Web scraping, Data analysis, Recommender systems, Product building, Python, Streamlit


Leveraging Data for Social Good — A Practical Example

The Project Overview : Studied the needs of a rural, backward and socially-isolated village in Northern India in the context of their developmental goals and social aspirations. Established that “Water stress” arising from poor water management practices was a significant hindrance to the village’s development. A human-centered methodology was followed to collect both qualitative as well as quantitative data and the analysis was performed using both computational techniques as well as qualitative methods. The findings were informed to the stakeholders as well as relevant policy-makers.

The Links to the Project : I have already covered this project in detail via an article in Medium and the code is available on my Github account. So, I shall leave those links out here in case you wish to know more.

Keywords : Data for Good, Data analysis, Data collection, Python, Ethnographic study


Criminal Activity Hotspot Analysis

The Project Overview : The project finds relevance in analysing the prevelance of areas of high crime rates(called hotspots) in cities. Through our work, we have gone a little deeper than conventional studies of criminal activity hotspots and have established a relation that If crime can vary across months, across days in the week and even across hours of the very same day; it is definitely a fact that the rate of crime will not remain constant across districts within a city at any point of time. Some may face greater vulnerabilities to criminal activity than others. Based on this idea, we put together a predictive modelling study that helped create a model that would be able to predict the relative susceptibilities of districts in a city based on past crime records in these regions at given time points(a time point is made up of hour, day and month).

Our final work also took shape as a conference paper at CSITSS 2019 held at RVCE, Bengaluru, Karnataka, India. This paper won the “Best Paper” award at the same conference.

The Links to the Project : The project’s detailed code and related documentation can be found here

Keywords : Supervised machine learning, Time series data, Python


Analyzing Chocolate Bar Ratings

The Project Overview : Exploratory Data Analysis(EDA) is one of the most crucial steps in a Data Science project. EDA in basic terms is a way of “Understanding the data with the help of visualizations and descriptive statistics”. A lot of courses I signed up for in the past emphasised on the need to use EDA to understand datasets and use this information to either model a ML/DL problem or tell a convincing story about the dataset to a stakeholder. Therefore, through this project I made an attempt at improving my EDA skills using the Chocolate Bar Ratings Dataset.

The Links to the Project : The project’s detailed code and related documentation can be found here

Keywords : Data analysis, Data storytelling, Tabular data, Python