This page contains my daily bit of data science learning for #66DaysofData from January 5th, 2021.

Day 1 - 5th January, 2021

  1. Read all articles in the #DataDecember Initiative
  2. Completed Module 1 of BCG Open-Access Data Science & Advanced Analytics Virtual Experience Program
    • Designed a data science problem based on a given hypothesis

Day 2 - 6th January, 2021

  1. Binged through Ken Jee’s Sports Analytics videos and content
  2. Scraped a website on bribes paid in India using the rvest package in R (Site: http://ipaidabribe.com) and collected 1000 instances of such bribes in the last year
    • 1000 is too low. However, I don’t have time currently with my assignments to run the system for longer to collect more data
    • Potential project : Analyze bribes in India
      • Which department takes most bribe?
      • Which state collects most?
      • What is the sentiment of people writing the posts?
      • Bonus: Is the initiative by this website being used correctly?

Day 3 - 7th January, 2021

  1. Worked with the rtweet package to collect tweets from twitter

Day 4 - 8th January, 2021

  1. Had a perfunctory read over the winning solutions to the Hateful Memes Challenge hosted on DrivenData
  2. Listed out potential projects that I can work on during the next few weeks as a part of #66DaysofData
    • Why?
      • It would be useless to do #66DaysofData if I don’t put it to practical use
      • Will help me improve my portfolio
    • What types of projects?
      • The plan is to work on projects that involve at least one of the following concepts
        • Intensive data cleaning
        • Storytelling with data
        • Predictive analytics - Tabular data
        • Natural Language Processing
        • Computer Vision
        • SQL
        • End-to-end deployment or Building an ML system

Day 5 - 9th January, 2021

  1. Brushed up SQL Basics
    • SQL Cheatsheet
    • SQL Order of Operations
    • What project can I do to ensure I understand SQL better?
      • A web app that uses SQL? (But, I don’t like web development much)
      • A Python - SQL application? (I do not know what this means, should read more)
    • What hands on experience do I have already?
      • MY470 lectures at my MSc

Day 6 - 10th January, 2021

  1. Followed through on my SQL journey from yesterday with some hands on usage of the RSQLite package
    • But, the question now is how can I use this new R + SQL skill to make something useful?
  2. Also, watched The most powerful idea in data science - YouTube by Cassie Kozyrkov

Day 7 - 11th January, 2021

  1. Performed and compared sentiment analysis outputs with the tidytext package in R and the sentimentr package
  2. Put some thought into using R + SQL for a project - Fixed on creating a dummy ETL system


Day 8 - 12th January, 2021

  1. Spent time going over my dissertation proposal and the data I will need to collect to make it happen

    • What data do I need?
      • Qualitative?
      • Quantitative?
    • How will I collect the data?
      • Use secondary resources?
      • Collect it first hand?
        • With human interaction?
        • Without human interaction?
    • How will I store the data?
      • Local system?
      • Cloud?
      • Documentation?
    • What ethical considerations must I be aware of?

Day 9 - 13th January, 2021

  1. Read the article - https://hackernoon.com/going-from-not-being-able-to-code-to-deep-learning-hero-2ou34fh by Radek Osmulski
  2. Went in-depth into R Markdown to make a decent final report for my winter assignment - R Markdown Cookbook (bookdown.org)

Day 10 - 14th January, 2021

  1. Initiated work on https://www.widsconference.org/datathon.html
  2. Also, I have secretly fallen in love with #R’s pipe operator ;)

Day 11 - 15th January, 2021

  1. Continued work on the WiDS 2021 Datathon
  2. Worked on the draft of an article summarizing my favourite tips on working on a data analytics project

Day 12 - 16th January, 2021

  1. Published an article on @TDataScience summarizing my favorite tips to create a good Data Analytics project - https://towardsdatascience.com/how-to-make-a-data-analytics-project-that-people-want-to-read-47caea306570
  2. Went through a Tableau crash course tutorial - https://www.youtube.com/watch?v=TPMlZxRRaBQ

Day 13 - 17th January, 2021

  1. Read through Jeremey Howard’s and team’s Drivetrain approach - Designing great data products – O’Reilly
    • Define your objective
    • Understand the inputs that you can control - Levers
    • Identify the data you need to collect
    • Modelling

Day 14 - 18th January, 2021

  1. Made my first WiDS 2021 challenge kernel public - https://kaggle.com/thedatabeast/wids-2021-tutorial…
    • Resources used
      • https://machinelearningmastery.com/what-is-data-preparation-in-machine-learning/
      • https://www.kaggle.com/parulpandey/a-guide-to-handling-missing-values-in-python
      • https://machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/
  2. While doing so, brushed up on my rusty skills in data preparation for modelling tasks
  3. Tried the lazypredict package to compare models, but hit memory issues

Day 15 - 19th January, 2021

Spent 15 mins to chalk out a rough plan for my first project under the #66DaysofData initiative. The idea is to build a simple image classifier and deploy it as a web application.

Hope to start work on it soon and complete it by mid-February.


Day 16 - 20th January, 2021

  1. Dabbled with the modin package - Works faster than pandas, but works exactly like pandas
  2. Read the first 2 chapters of Introduction to Statistical Learning

Day 17 - 21st January, 2021

Took a break from any kind of data science learning today.

Instead, I went ahead and dedicated the day for my grad school assignment. Will continue to work on this till I submit it after which I shall return to #66DaysofData


Day 18 - 25th January, 2021

Resumed things after my little break.

Kicked things of with working through the @fastdotai book’s second chapter. Fired up a kernel with under 10 lines of code - https://www.kaggle.com/thedatabeast/fast-ai-resnet18-2-epochs-99-9-accuracy

Not much, but surely a way to start my fast.ai journey!


Day 19 - 26th January

  1. Heard the Decision Skills Q&A by Cassie Kozyrkov and Jenny Brown
    • Making a decision takes effort
      • Decision making requires people to be motivated
      • Requires collaboration
      • Important to delegate decision making appropriately
      • Structuring, coordinating and assigning is important to make good decisions
    • It is time society begins to treat “decision-making” as a skill just like engineering or singing
    • Software and decision-making
      • Software can aid in decision making
      • It helps transcend the limitations of the human brain
      • But, software cannot replace the human in the decision-making process
      • A “one-size fits all” approach will not work in decision making software
    • More data is not always better. It has to be more “good” data
    • Decision making in its purest sense is about processing information
    • The most important question data scientists need to be asking stakeholders
      • “What would it take to change your mind?”
      • Forces the stakeholders to think what exactly they want
  2. Dived into the creation of pipelines in scikit-learn today - 6.1. Pipelines and composite estimators — scikit-learn 0.24.1 documentation (scikit-learn.org)

Break

I have been forced to take a break from 66 Days of Data due to health issues. I will be resuming the campaign soon after revamping my plan for the same.


Day 20 - 17th March, 2021

  1. Completed SQL Basics from Alex Freberg’s Youtube Channel
  2. Obtained the 10 days of statistics badge from Hackerrank

Day 21 - 18th March, 2021

  1. Worked on SQL problems on Hackerrank
    1. Obtained 2 stars on Hackerrank for SQL

Day 22 - 19th March, 2021

  1. Continued working through SQL challenges on Hackerrank
  2. Watched some SQL Intermediate videos by Alex Freberg - Highly recommend this to anyone who wants bite-sized lectures to get started with SQL

Day 23 - 20th March, 2021

  1. Continued work on SQL
    1. Focused on joins and unions. Found out that the venn diagram representation of SQL joins makes it so much more easier to visualize the output of a type of join

Day 24 - 21st March, 2021

  1. Learnt Google Data Studio
    1. Obtained a certificate from Google Analytics Academy for Introduction to Data Studio
  2. Worked on a data engineering mini-project based on the course by Karolina Sowsinka
    1. Link to the repository: ry05/spotify_data_pipeline: A basic data pipeline to learn some data engineering basics (github.com)
    2. Link to Ms Sowsinka’s Data Engineering Playlist

Day 25 - 22nd March, 2021 to Day 33 - 30th March, 2021

  1. Dedicated the last full week to the study of An Introduction to Statistical Learning - First Edition
    1. Completed all chapters except chapter 7
    2. Made notes that are available at ry05/ISL_Notes: Draft notes taken while reading An Introduction to Statistical Learning (github.com)