This post is a part of a series “Watching My Blood Sugar Levels”. I intend to use short posts in this series to walk the readers through the initiative of using data to manage my Type I Diabetes condition. The content in this series is not expert opinion, it’s more of a documentation of my personal journey with this condition in the way how I understand it - Data.

The last 3 posts in this series were

  1. Watching My Blood Sugar Levels - The Beginning (ry05.github.io)
  2. Watching My Blood Sugar Levels - Problem Framing (ry05.github.io)
  3. Watching My Blood Sugar Levels - Eliciting Requirements (ry05.github.io)

With the previous post, I have fixed on the MVP that I want to be building to help track my blood sugar level better. As a recap, here are the 5 main features I expect my initial design to have

  1. Visualize blood sugar levels
  2. Compute metrics that would help quantify how well (or badly) I am managing my condition
  3. Identify relationships between food I eat, insulin taken and blood sugar levels
  4. Filter readings and data based on a given time period
  5. Visualize my eating patterns (for example, how much do I eat? what do I eat?)

Now, the next step is to create a set of rules to determine how I will collect data relevant to the task at hand.

What Data To Collect?

Collecting data is well, boring. But is it a task that you can avoid? Maybe yes, but only if there already is a well-documented, understandable, relevant dataset that could help in your task. For example, if you were working on a school project where your idea was to build a movie recommender system, you would most likely not try to collect data from scratch. Instead, you would use the several movie recommender datasets on the internet like the IMDB one or the MovieLens dataset.

However, using a secondary data source is not a luxury that I can afford in this project. I need to collect data from scratch and the data has to be about my blood sugar and what I eat. While this is the logical way to think, I want to extend this to highlight an important point - The most important component of data collection is not the data, but rather the questions you expect your data to answer.

This is why it’s so important to define questions(or requirements in this case). Based on the MVP requirements of my data product, it’s clear that the data I intend to collect must contain a few specific variables:

  • Timestamp : When have I taken a recording? (datetime)
  • Sugar levels : Fasting, Pre-meal, Post-meal (integer)
  • Type of meal : Breakfast, Lunch, Evening snack or Dinner (categorical)
  • Contents of meals : What am I eating? (textual)
  • Insulin units : Number of fast-acting insulin units taken (integer)

A type I diabetic needs to inject insulin before a meal in order to keep the blood sugar level in control. This is because in type I, the pancreas loses the ability to generate insulin and this leads to accumulated glucose in the blood stream which can be dangerous if not controlled.

Also, the units of insulin will have to vary based on what you are eating. Higher level of carbohydrates, then the insulin amount also has to increase. Since I am a newly diagnosed type I diabetic, I have been put on fixed insulin doses of 6-8-6 (breakfast-lunch-dinner). Though I must admit, I tweak the units in order to get better data for my analysis!

Meal Contents - A Potential Source of Data Inconsistency

Data inconsistency is when data points quantifying the same information are collected in various ways. For example, think of the situation when a feature containing money has values “$4” and “Rs.500”. Another example can be taken from my animal shelter analytics work, where poor spellings can be a source of frustration when trying to extract value from the data.

In the data I am looking to collect, the meal contents are textual. The question is “how to enter this data”? If it’s merely entered in whichever way I please at a given time, it would mean less load when collecting data and great pain while analyzing it. I wouldn’t recommend this for myself!

Hence, I fixed a constraint or rule on how I will enter data into the meal contents column. The template of each entry is as follows

[name of item] (weight of item gm) + [name of item] (weight of item gm) + …

This simple rule helps ensure I can write code to easily analyze this column and also, the ease of understanding what I ate by looking onto the raw excel sheet data is not compromised.

The data will be stored in MS Excel. As of now, the data collection will require manual entry.

Manual entry is not a great idea, but if exercised with great caution it does not create much fuss!

In the next post, I will try my hand at building a simple design document to aid my product development process.