A lookback at what I was studying as a beginner data scientist

If you search online, you’ll find 1000s of listicles like “Your Data Science Roadmap in 2021”, but “data science” in its current sense is a very vast field, and no one ever knows everything. And you’re NOT even expected to know everything. There is no 1 size fits all kind of a checklist or a roadmap that everyone needs to follow to become a “data scientist” (the term itself is so vague that I’m pretty sure it’ll disappear in a few years time and evolve into specialized roles like ML Engineer, DL Engineer, MLOps, etc). In this post, I just want to talk about the things I tried in my initial couple of years as a data scientist, what worked for me & what didn’t.

Since my data science first job was at a startup where I worked with a fairly small team, I had to figure out my learnig path on my own. Mostly, I was trying to learn more about the tools that I was using at my workplace. A major focus of my projects was to deliver insights to our customers, both internal and external. This meant building lots of dashboards and interactive web apps. Now I know “GUI is for wimps”, but if, like me, you come from a non CS background, tools like Tableau make it very easy for you to productionize your analysis and distribute it easily with your team without much coding efforts.

Data Visualization with Tableau - The course was good, but more importantly, they also give you 2 Tableau Desktop licenses for a period of 6 months once you enroll in the course!

I did realize very soon that proprietary softwares like Tableau have many limitations but most importantly, they are also very costly. Although it’s pretty quick to build the MVP of a dashboard on Tableau, mainitaining it over a long time frame is very costly and inefficient. I had to quickly switch to either Python or R. Unfortunately, my beginning with Python was not that great.

Introduction to Python for Data Science - It was an okayish course but for some reason, I found learning R much easier than learning Python as a beginner. And so, I decided to build web apps in R using the high level Shiny framework.
JHU’s Data Science specialization - Although I did not complete the entire 10 course specialization, I really liked the instruction style of professors Brian Caffo, Jeff Leak and Roger Peng. I also ended up reading a couple of their e-books like Executive Data Sceince and Developing Data Products in R. In hindsight, I’ve found that R for Data Science by Hadley Wickham is the best book for a beginner if he / she is interested in R, but ofcourse, I discovered this book much later!

For machine learning, Prof Andrew Ng’s Machine Learning and Deep Learning courses were very popular back then and I did start working through them as well, but I also wanted to learn about deploying ML models on the cloud, especially GCP, I found the followging course to be fairly practical and useful:

ML with Tensorflow on GCP - I started this course back then but it took me over a year to finish the 5 course specialization.

I also wanted to learn more about databases since I spent a significant writing ETL queries. I did the following couple of courses and they were okay but I wouln’t recommend them to anyone now.

I also tried some very forgettable courses on LinkedIn Learning (Lynda, back then) and Udemy. I never really liked these platforms because they usually just had a playlist of videos, 2-4 hours duration with little or no exercises. It was also around this time that I started using Kaggle properly, moved from the default Novice tier to the Contributor tier by getting involved in Discussions, publishing Notebooks and entering some competitions.

So as you can see, it was mostly very introductory level stuff. Some courses were good, some not so much. I wouln’t recommend the above set of courses to anyone. Everyone is working on different applications, different technology stacks and over very different time frames. So you dont need to have a FOMO about THAT book / course everyone keeps talking about! Just try to learn about what you actually NEED in your (current / future) day-to-day job, the small things which can give you some immediate short term value. Because these small things always add up:

Play iterated games. All the returns in life, whether in wealth, relationships, or knowledge, come from compound interest.
— Naval (@naval) May 31, 2018

2022 2
2021 7
2020 9

2022

AI, consciousness & Vedanta

2 minute read

Book recommendations from my Covid reading list

2 minute read

5 books to offer a different perspctive on life

Pritesh Shrivastava

A lookback at what I was studying as a beginner data scientist

2022

AI, consciousness & Vedanta

Book recommendations from my Covid reading list

2021

Is Data Structures & Algorithms important for a data scientist

Structural and Generative Recursion

You are never going to be an expert in data science

Tips and Tricks that helped me find a new job

A lookback at what I was studying as a beginner data scientist

How matrices transform space

How much should one know in SQL for data science

2020

Matrix algebra operations using recursion

Analyzing gender bias in movie dialogues

Analyzing my Goodreads data

Laziness vs The Trap of Productivity & Ambition

Tree vs Tail Recursion & Memoization

Fun with Haskell

How I ended up with a data science job

Favourite learning resources for ML

My journey with SICP so far