And why you don’t need another ‘How to become a data scientist in 2021’ listicle
A lookback at what I was studying as a beginner data scientist
If you search online, you’ll find 1000s of listicles like “Your Data Science Roadmap in 2021”, but “data science” in its current sense is a very vast field, and no one ever knows everything. And you’re NOT even expected to know everything. There is no 1 size fits all kind of a checklist or a roadmap that everyone needs to follow to become a “data scientist” (the term itself is so vague that I’m pretty sure it’ll disappear in a few years time and evolve into specialized roles like ML Engineer, DL Engineer, MLOps, etc). In this post, I just want to talk about the things I tried in my initial couple of years as a data scientist, what worked for me & what didn’t.
Since my data science first job was at a startup where I worked with a fairly small team, I had to figure out my learnig path on my own. Mostly, I was trying to learn more about the tools that I was using at my workplace. A major focus of my projects was to deliver insights to our customers, both internal and external. This meant building lots of dashboards and interactive web apps. Now I know “GUI is for wimps”, but if, like me, you come from a non CS background, tools like Tableau make it very easy for you to productionize your analysis and distribute it easily with your team without much coding efforts.
- Data Visualization with Tableau - The course was good, but more importantly, they also give you 2 Tableau Desktop licenses for a period of 6 months once you enroll in the course!
I did realize very soon that proprietary softwares like Tableau have many limitations but most importantly, they are also very costly. Although it’s pretty quick to build the MVP of a dashboard on Tableau, mainitaining it over a long time frame is very costly and inefficient. I had to quickly switch to either Python or R. Unfortunately, my beginning with Python was not that great.
Introduction to Python for Data Science - It was an okayish course but for some reason, I found learning R much easier than learning Python as a beginner. And so, I decided to build web apps in R using the high level Shiny framework.
JHU’s Data Science specialization - Although I did not complete the entire 10 course specialization, I really liked the instruction style of professors Brian Caffo, Jeff Leak and Roger Peng. I also ended up reading a couple of their e-books like Executive Data Sceince and Developing Data Products in R. In hindsight, I’ve found that R for Data Science by Hadley Wickham is the best book for a beginner if he / she is interested in R, but ofcourse, I discovered this book much later!
For machine learning, Prof Andrew Ng’s Machine Learning and Deep Learning courses were very popular back then and I did start working through them as well, but I also wanted to learn about deploying ML models on the cloud, especially GCP, I found the followging course to be fairly practical and useful:
- ML with Tensorflow on GCP - I started this course back then but it took me over a year to finish the 5 course specialization.
I also wanted to learn more about databases since I spent a significant writing ETL queries. I did the following couple of courses and they were okay but I wouln’t recommend them to anyone now.
I also tried some very forgettable courses on LinkedIn Learning (Lynda, back then) and Udemy. I never really liked these platforms because they usually just had a playlist of videos, 2-4 hours duration with little or no exercises. It was also around this time that I started using Kaggle properly, moved from the default Novice tier to the Contributor tier by getting involved in Discussions, publishing Notebooks and entering some competitions.
So as you can see, it was mostly very introductory level stuff. Some courses were good, some not so much. I wouln’t recommend the above set of courses to anyone. Everyone is working on different applications, different technology stacks and over very different time frames. So you dont need to have a FOMO about THAT book / course everyone keeps talking about! Just try to learn about what you actually NEED in your (current / future) day-to-day job, the small things which can give you some immediate short term value. Because these small things always add up:
Play iterated games. All the returns in life, whether in wealth, relationships, or knowledge, come from compound interest.— Naval (@naval) May 31, 2018
An intuitive way to look at matrix vector multiplication, with applications in image processing
Most tech firm interviews include SQL problems for DS roles
Implementing basic matrix algebra operations in Scheme using a Jupyter notebook
Building a gender classifier model based on the dialogues of characters in Hollywood movies
Simple EDA of my reading activity using tidyverse on R Markdown
My experience using productivity tools for personal projects
Comparing Tree Recursion & Tail Recursion in Scheme & Python
My notes halfway through the book Learn You A Haskell
My topsy turvy ride to data science
Books, MOOCs and other resources that I would highly recommend
The magic of SICP