This page contains my daily bit of data science learning for #66DaysofData from January 5th, 2021.

Day 1 - 5th January, 2021

  1. Read all articles in the #DataDecember Initiative
  2. Completed Module 1 of BCG Open-Access Data Science & Advanced Analytics Virtual Experience Program
    • Designed a data science problem based on a given hypothesis

Day 2 - 6th January, 2021

  1. Binged through Ken Jee’s Sports Analytics videos and content
  2. Scraped a website on bribes paid in India using the rvest package in R (Site: http://ipaidabribe.com) and collected 1000 instances of such bribes in the last year
    • 1000 is too low. However, I don’t have time currently with my assignments to run the system for longer to collect more data
    • Potential project : Analyze bribes in India
      • Which department takes most bribe?
      • Which state collects most?
      • What is the sentiment of people writing the posts?
      • Bonus: Is the initiative by this website being used correctly?

Day 3 - 7th January, 2021

  1. Worked with the rtweet package to collect tweets from twitter

Day 4 - 8th January, 2021

  1. Had a perfunctory read over the winning solutions to the Hateful Memes Challenge hosted on DrivenData
  2. Listed out potential projects that I can work on during the next few weeks as a part of #66DaysofData
    • Why?
      • It would be useless to do #66DaysofData if I don’t put it to practical use
      • Will help me improve my portfolio
    • What types of projects?
      • The plan is to work on projects that involve at least one of the following concepts
        • Intensive data cleaning
        • Storytelling with data
        • Predictive analytics - Tabular data
        • Natural Language Processing
        • Computer Vision
        • SQL
        • End-to-end deployment or Building an ML system

Day 5 - 9th January, 2021

  1. Brushed up SQL Basics
    • SQL Cheatsheet
    • SQL Order of Operations
    • What project can I do to ensure I understand SQL better?
      • A web app that uses SQL? (But, I don’t like web development much)
      • A Python - SQL application? (I do not know what this means, should read more)
    • What hands on experience do I have already?
      • MY470 lectures at my MSc

Day 6 - 10th January, 2021

  1. Followed through on my SQL journey from yesterday with some hands on usage of the RSQLite package
    • But, the question now is how can I use this new R + SQL skill to make something useful?
  2. Also, watched The most powerful idea in data science - YouTube by Cassie Kozyrkov

Day 7 - 11th January, 2021

  1. Performed and compared sentiment analysis outputs with the tidytext package in R and the sentimentr package
  2. Put some thought into using R + SQL for a project - Fixed on creating a dummy ETL system


Day 8 - 12th January, 2021

  1. Spent time going over my dissertation proposal and the data I will need to collect to make it happen

    • What data do I need?
      • Qualitative?
      • Quantitative?
    • How will I collect the data?
      • Use secondary resources?
      • Collect it first hand?
        • With human interaction?
        • Without human interaction?
    • How will I store the data?
      • Local system?
      • Cloud?
      • Documentation?
    • What ethical considerations must I be aware of?

Day 9 - 13th January, 2021

  1. Read the article - https://hackernoon.com/going-from-not-being-able-to-code-to-deep-learning-hero-2ou34fh by Radek Osmulski
  2. Went in-depth into R Markdown to make a decent final report for my winter assignment - R Markdown Cookbook (bookdown.org)

Day 10 - 14th January, 2021

  1. Initiated work on https://www.widsconference.org/datathon.html
  2. Also, I have secretly fallen in love with #R’s pipe operator ;)

Day 11 - 15th January, 2021

  1. Continued work on the WiDS 2021 Datathon
  2. Worked on the draft of an article summarizing my favourite tips on working on a data analytics project

Day 12 - 16th January, 2021

  1. Published an article on @TDataScience summarizing my favorite tips to create a good Data Analytics project - https://towardsdatascience.com/how-to-make-a-data-analytics-project-that-people-want-to-read-47caea306570
  2. Went through a Tableau crash course tutorial - https://www.youtube.com/watch?v=TPMlZxRRaBQ

Day 13 - 17th January, 2021

  1. Read through Jeremey Howard’s and team’s Drivetrain approach - Designing great data products – O’Reilly
    • Define your objective
    • Understand the inputs that you can control - Levers
    • Identify the data you need to collect
    • Modelling

Day 14 - 18th January, 2021

  1. Made my first WiDS 2021 challenge kernel public - https://kaggle.com/thedatabeast/wids-2021-tutorial…
    • Resources used
      • https://machinelearningmastery.com/what-is-data-preparation-in-machine-learning/
      • https://www.kaggle.com/parulpandey/a-guide-to-handling-missing-values-in-python
      • https://machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/
  2. While doing so, brushed up on my rusty skills in data preparation for modelling tasks
  3. Tried the lazypredict package to compare models, but hit memory issues

Day 15 - 19th January, 2021

Spent 15 mins to chalk out a rough plan for my first project under the #66DaysofData initiative. The idea is to build a simple image classifier and deploy it as a web application.

Hope to start work on it soon and complete it by mid-February.


Day 16 - 20th January, 2021

  1. Dabbled with the modin package - Works faster than pandas, but works exactly like pandas
  2. Read the first 2 chapters of Introduction to Statistical Learning

Day 17 - 21st January, 2021

Took a break from any kind of data science learning today.

Instead, I went ahead and dedicated the day for my grad school assignment. Will continue to work on this till I submit it after which I shall return to #66DaysofData


Day 18 - 25th January, 2021

Resumed things after my little break.

Kicked things of with working through the @fastdotai book’s second chapter. Fired up a kernel with under 10 lines of code - https://www.kaggle.com/thedatabeast/fast-ai-resnet18-2-epochs-99-9-accuracy

Not much, but surely a way to start my fast.ai journey!


Day 19 - 26th January

  1. Heard the Decision Skills Q&A by Cassie Kozyrkov and Jenny Brown
    • Making s decision takes effort
      • Decision making requires people to be motivated
      • Requires collaboration
      • Important to delegate decision making appropriately
      • Structuring, coordinating and assigning is important to make good decisions
    • It is time society begins to treat “decision-making” as a skill just like engineering or singing
    • Software and decision-making
      • Software can aid in decision making
      • It helps transcend the limitations of the human brain
      • But, software cannot replace the human in the decision-making process
      • A “one-size fits all” approach will not work in decision making software
    • More data is not always better. It has to be more “good” data
    • Decision making in its purest sense is about processing information
    • The most important question data scientists need to be asking stakeholders
      • “What would it take to change your mind?”
      • Forces the stakeholders to think what exactly they want
  2. Dived into the creation of pipelines in scikit-learn today - 6.1. Pipelines and composite estimators — scikit-learn 0.24.1 documentation (scikit-learn.org)

Day 20 - 27th January