Mar 21, 2024 | By
INTRODUCTION
Welcome to the Data Science Roadmap for Beginners 2024, a comprehensive guide designed to navigate you through the dynamic field of data science. In today's data-driven world, the demand for skilled data scientists continues to surge across industries, making it an opportune time to embark on your journey in this exciting field. Whether you're a recent graduate, a career changer, or someone simply intrigued by the power of data, this roadmap for data science is tailored to equip you with the fundamental knowledge and skills needed to thrive in the realm of data science.
As we delve into the intricacies of data science, we will explore key concepts such as data analysis, machine learning, statistics, programming languages, and data visualization. Through a structured and progressive approach, you will gradually build a strong foundation while gaining hands-on experience through practical exercises and projects. Additionally, we will highlight emerging trends, tools, and techniques shaping the landscape of data science in 2024, ensuring that you stay abreast of the latest advancements in the field.
Whether your aspirations lie in unlocking insights from big data, developing predictive models, or leveraging data to drive business decisions, this data science roadmap will serve as your compass, guiding you towards a rewarding and fulfilling career in data science. So, buckle up and let's embark on this exhilarating journey together towards becoming a proficient data scientist in 2024 and beyond.
p will serve as your compass, guiding you towards a rewarding and fulfilling career in data science. So, buckle up and let's embark on this exhilarating journey together towards becoming a proficient data scientist in 2024 and beyond.
A Data Science Roadmap is a structured guide designed to outline the learning path and essential steps for individuals seeking to enter or advance in the field of data science. It typically includes a curated list of topics, skills, and resources necessary to develop proficiency in areas such as programming languages, statistics, machine learning, and data visualization. A well-structured roadmap provides beginners with a clear direction, helping them navigate through the complexities of data science systematically. It serves as a blueprint for setting goals, acquiring knowledge, and gaining practical experience essential for success in the rapidly evolving field of data science.
Key data science skills encompass a range of technical, analytical, and soft skills essential for success in the field. Some of the primary skills include:
Programming Languages: Proficiency in languages like Python, R, or SQL for data manipulation, analysis, and modeling.
Statistics and Mathematics: Understanding of statistical concepts and mathematical foundations for data analysis and modeling.
Machine Learning: Knowledge of machine learning algorithms, techniques, and libraries for predictive modeling, classification, clustering, and regression.
Data Visualization: Ability to effectively communicate insights through visual representations using tools like Matplotlib, Seaborn, or Tableau.
Data Wrangling: Skills in data cleaning, preprocessing, and transformation to ensure data quality and usability.
Domain Knowledge: Familiarity with the specific industry or domain to contextualize data analysis and derive meaningful insights.
Critical Thinking and Problem-Solving: Capacity to approach complex problems analytically, identify patterns, and develop creative solutions.
Communication Skills: Ability to articulate findings, explain technical concepts to non-technical stakeholders, and collaborate effectively within interdisciplinary teams.
Big Data Technologies: Understanding of big data frameworks like Hadoop, Spark, or NoSQL databases for handling large-scale datasets.
Experimental Design and A/B Testing: Knowledge of experimental methodologies and hypothesis testing for validating insights and making data-driven decisions.
Developing proficiency in these key skills equips data scientists with the tools and capabilities necessary to tackle real-world challenges and drive value from data. This roadmap for data science covers both the underlying statistical concepts as well as practical programming and tools. It balances theory with extensive projects for comprehensive applied data science skill development.
Unfortunately, a lot of systematic scams are happening in ed tech, especially in the data field where aspirants are provided with false promises like a 100% job guarantee or trapped into “Masterclasses” which are nothing but sales pitches to upsell their low-grade courses at exorbitant prices. You need to do complete research about the market and mentors before starting your journey. Providing you the links to a few posts that we have made in this regard which will support your research.
Even though these posts are NOT sufficient, do your additional research.
1. Topics
Variables, Numbers, Strings
Lists, Dictionaries, Sets, Tuples
If condition, for loop
Functions, Lambda Functions
Modules (pip install)
Read, Write files
Exception handling
Classes, Objects
2. Learning Resources
Track A (Free)
Free Python Tutorials on YouTube (first 16 videos)
https://bit.ly/3X6CCC7
Codebasics python HINDI tutorials
https://bit.ly/3vmXrgw
Track B (Affordable Fees)
LinkedIn - Core Skill
Create a professional-looking LinkedIn profile.
Have a clear profile picture and banner image.
Add tags such as: Open to work etc.
Use this LinkedIn Checklist to create a profile: Click here.
Motivation
Physics to Data Scientist Transition -> https://bit.ly/47cA8GU
Assignment
(Use the assignment tracker: Click here)
Track A: Finish all these exercises: https://bit.ly/3k1mof5
Track B: Finish exercises and quizzes for relevant topics
Create a professional-looking LinkedIn profile.
Tech Skills
Numpy
numpy YouTube playlist: https://bit.ly/3GTppa8
Pandas, Matplotlib, Seaborn
Go through chapter 3 in this course (entire chapter is free):
https://codebasics.io/courses/math-and-statistics-for-data-science
Core/Soft Skills
Linkedin
Start following prominent data science influencers.
Daliana Liu: https://www.linkedin.com/in/dalianaliu/
Nitin Aggarwal: https://www.linkedin.com/in/ntnaggarwal/
Steve Nouri: https://www.linkedin.com/in/stevenouri/
Dhaval Patel: https://www.linkedin.com/in/dhavalsays/
Increase engagement
Start commenting meaningfully on data science and career related posts.
Helps network with others working in the industry to build Connections.
Learning and brainstorming opportunities.
Remember online presence is a new form of resume
Business Fundamentals - Soft Skill
Learn business concepts from ThinkSchool and other YT Case Studies
Example: How Amul beat competition: https://youtu.be/nnwqtZiYMxQ
Discord
Start asking questions and get help from the community. This post
shows how to ask questions the right way: https://bit.ly/3I70EbI
Join codebasics discord server: https://discord.gg/r42Kbuk
Assignment
Write meaningful comments on at least 10 data science related LinkedInposts
Note down your key learnings from 3 case studies on ThinkSchool and share them with your friend.
Math and Statistics for Data Science
Topics to Learn
Basics: Descriptive vs inferential statistics, continuous vs discrete data,
nominal vs ordinal data
Basic plots: Histograms, pie charts, bar charts, scatter plot etc.
Measures of central tendency: mean, median, mode
Measures of dispersion: variance, standard deviation
Probability basics
Distributions: Normal distribution
Correlation and covariance
Central limit theorem
Hypothesis testing: p value, confidence interval, type 1 vs type 2 error,Z test, t test, ANOVA
Learning Resources
Track A (Free)
Learn the above topics from this excellent Khan academy course on statistics and probability.
Course link: https://www.khanacademy.org/math/statisticsprobability
While doing khan academy course, when you have doubts, use starquest YouTube channel: https://www.youtube.com/@statquest
Use this free YouTube playlist: https://bit.ly/3QrSXis
Track B (Affordable Fees)
Khan academy course doesn’t have python coding and it is generic. To learn using Python and specifics of applying statistics to data science check this course: https://codebasics.io/courses/math-statistics-for-dataprofessionals
Motivation
Petroleum engineer to data scientist: https://bit.ly/3REsqiL
Assignment
Finish all exercises in this playlist: https://bit.ly/3QrSXis
Finish all exercises in the Khan academy course.
https://www.kaggle.com/code?searchQuery=exploratory+data+analysis
Use the above link to search for exploratory data analysis notebooks.
Practice EDA using at least 3 datasets.
e.g. https://www.kaggle.com/datasets/rishabhkarn/ipl-auction2023/data
Assignment
Perform EDA (Exploratory data analysis on at least 2 additional datasets on Kaggle)
Basics of relational databases.
Basic Queries: SELECT, WHERE LIKE, DISTINCT, BETWEEN, GROUP BY, ORDER BY
Advanced Queries: CTE, Subqueries, Window Functions
Joins: Left, Right, Inner, Full
No need to learn database creation, indexes, triggers etc. as those things are rarely used by data scientists.
Learning Resources
Track A
Track B
SQL course for data professionals:
https://codebasics.io/courses/sql-beginner-to-advanced-for-data-professionals
Core/Soft Skills
Presentation skills
Death by PowerPoint: https://youtu.be/Iwpi1Lm6dFo
Assignment
Participate in SQL resume project challenge on https://codebasics.io/
Link: https://codebasics.io/challenge/codebasics-resume-project-challenge/7
These challenges help you improve technical skills, soft skills and business understanding.
Make a LinkedIn post with a submission of your resume project challenge
Sample post: https://bit.ly/48Bg5mB
Codebasics is promoting winning entries to employers. This way you can get interview calls. We do this in two ways:
We have a database of employers hiring for data analyst positions. We send the first 10 or 20 profiles based on their Performance.
LinkedIn post by Dhaval (who has more than 100k followers and some of them are HR managers, data analytics senior managers): https://bit.ly/3jnni5c
Handling NA values, outlier treatment, data normalization
One hot encoding, label encoding
Feature engineering
Train test split
Cross validation
Machine Learning: Model Building
Types of ML: Supervised, Unsupervised
Supervised: Regression vs Classification
Linear models
Linear regression, logistic regression
Gradient descent
Nonlinear models (tree-based models)
Decision tree
Random forest
XGBoost
Model evaluation
Regression: Mean Squared Error, Mean Absolute Error, MAPE
Classification: Accuracy, Precision-Recall, F1 Score, ROC Curve, Confusion matrix Hyperparameter tuning: GridSearchCV, RandomSearchCV Unsupervised: K means, Hierarchical clustering, Dimensionality reduction (PCA)
Learning Resources
YouTube playlist (more than 2 million views): https://bit.ly/3io5qq
First 21 videos - Feature engineering playlist: https://bit.ly/3IFa3Yf
Core/Soft Skills
Project Management
Kanban: https://youtu.be/jf0tlbt9lx0
Tools: JIRA, Notion
Motivation
How Kaggle helped this person become ML engineer: https://bit.ly/3RFVruy
Assignment
Complete all exercises in ML playlist: https://bit.ly/3io5qqX
Work on 2 Kaggle ML notebooks
Write 2 LinkedIn posts on whatever you have learnt in ML
Discord: Help people with at least 10 answers
Week 16, 17, 18: Machine Learning Projects with Deployment
You need to finish two end to end ML projects. One on Regression, the other on Classification
Regression Project: Bangalore property price prediction
YouTube playlist link: https://bit.ly/3ivycWr
Project covers following
Data cleaning
Feature engineering
Model building and hyper parameter tuning
Write flask server as a web backend
Building website for price prediction
Deployment to AWS
Classification Project: Sports celebrity image classification
YouTube playlist link: https://bit.ly/3ioaMSU
Project covers following
Data collection and data cleaning
Feature engineering and model training
Flask server as a web backend
Building website and deployment
ATS Resume Preparation
Resumes are dying but not dead yet. Focus more on online presence.
Here is the resume tips video along with some templates you can use for your data analyst resume: https://www.youtube.com/watch?v=buQSI8NLOMw
Use this checklist to ensure you have the right ATS Resume: Check here.
Portfolio Building Resources:
You need a portfolio website in 2024. You can build your portfolio by using these free resources.
Upload your projects with code on github and using github.io create a portfolio website
Sample portfolio website: http://rajag0pal.github.io/
Helpful to add multiple links in one page.
Assignment
In above two projects make following changes
Use FastAPI instead of flask. FastAPI tutorial: https://youtu.be/Wr1JjhTt1Xg
Regression project: Instead of property prediction, take any other project of your interest from Kaggle for regression
Classification project: Instead of sports celebrity classification, take any other project of your interest from Kaggle for classification and build an end to end solution along with deployment to AWS or AzureAdd a link of your projects in your resume and LinkedIn.
(Tag Codebasics, Dhaval Patel and Hemanand Vadivel with the hashtag #dsroadmap24 so we can engage to increase your visibility)
Week 19, 20, 21: Deep Learning
Topics
What is a neural network? Forward propagation, back propagation
Building multilayer perceptron
Special neural network architectures
1.Convolutional neural network (CNN)
2.Sequence models: RNN, LSTM
Learning Resources
Deep Learning playlist (tensorflow): https://bit.ly/3vOZ3zV
Deep learning playlist (pytorch): https://bit.ly/3TzDbWp
Assignment
Instead of potato plant images use tomato plant images or some other image classification dataset.
Deploy to Azure instead of GCP.
Create a presentation as if you are presenting to stakeholders and upload video presentations on LinkedIn.
Week 22, 23, 24: NLP or Computer Vision
Many data scientists choose a specialized track which is either NLP or Computer vision. You don’t need to learn both.
Natural Language Processing (NLP)
Topics
Regex
Text presentation: Count vectorizer, TF-IDF, BOW, Word2Vec, Embeddings
Text classification: Naïve Bayes
Fundamentals of Spacy & NLTP library
One end to end project
Learning Resources
NLP YouTube playlist: https://bit.ly/3XnjfEZ
Computer Vision (CV)
Topics
Basic image processing techniques: Filtering, Edge Detection, Image Scaling, Rotation
Library to use: OpenCV
Convolutional Neural Networks (CNN) – Already covered in deep learning.
Data preprocessing, augmentation – Already covered in deep learning.
Assignment
NLP Track: Complete exercises in this playlist: https://bit.ly/3XnjfEZ
Week 25 onwards
More projects
Online brand building through LinkedIn, Kaggle, Discord, Open Source contribution
Job application and Success
Tips of effective learning
Spend less time in consuming information, more time in
Digesting
Implementing
Sharing
Group learning
Use partner-and-group-finder channel on codebasics discord server for group study and hold each other accountable for the progress of your study plan. Here is the discord server link: https://discord.gg/r42Kbuk
Conclusion
In summary, this comprehensive 6-month data science roadmap for beginners has a clear path to develop essential data science skills in 2024. By following the layered progression from programming fundamentals to statistical concepts to machine learning models and projects, learners can gain theoretical and practical competencies.
The Roadmap for data science balances rigorous material with motivating stories of career transitions. Alongside technical expertise, it emphasizes critical soft skills like online personal branding, communication abilities, and business acumen. With diligent application and smart time management, the roadmap can equip aspiring data scientists with versatility to thrive in an ever-evolving landscape.
Frequently Asked Questions
What is the best route to become a data scientist?
Gain a quantitative degree, learn programming and data skills, build projects to showcase abilities, get experience through internships and entry-level roles, consider a master's degree, develop domain expertise and communication skills, stay up-to-date on the latest data science tools and techniques.
Can I learn data analysis in 6 months?
Learning foundational data analysis skills is achievable in 6 months through data science roadmap and becoming job-ready. Focus on the basics like Excel, SQL, Python, data visualization, and statistical concepts. Apply skills to projects and keep sharpening through practice. Be patient with the learning process.
Do I need to learn cloud tech (Amazon sagemaker, Azure etc.)?
Big cloud service providers such as AWS, Azure, Google Cloud have their own ML offering such as Amazon Sagemaker in case of AWS. As a fresher it is ok if you are not familiar with these cloud platforms but once you have some experience it is good to have experience and know-how of at least one cloud ML platform.
Do I need to learn Gen AI?
Gen AI is a fancy topic and the majority of the junior data science positions do not demand gen ai skills. In case you have additional time and If you want to learn a famous framework for building Gen AI apps called langchain then here is the playlist: https://bit.ly/3RYpxuw
How about BI tool (Power BI or Tableau)
BI tools nowadays are mainly used by BI developers, data analysts etc. Hence It is ok if you don’t learn them as a data scientist. Majority of the time whenever data scientists have a need of BI dashboards they will take help of BIor data analyst teams. In small organizations however, sometimes data scientist work on building BI dashboards but in general you should not worry about learning BI tool for a data scientist career