Data Science for Beginners | How to Learn Data Science in 2021

Data Science explained from a B.S. Data Science graduate

Data Science Graduate
Data Science Graduate

--

My custom image

Hey everyone, with the ever-increasing ubiquity and use of technology and data, companies and organizations are creating new and innovative ways to perform data analysis to create value for people. Some use cases are for personalized ads, entertainment recommendations, autonomous vehicles, and IoT infrastructure.

So in this video, I’ll show you how to learn fundamental data science skills using my experience as a data science graduate as a guide.

Specifically, I’ll go over some key points and resources to learn data science.

Video Version

My YouTube video

Here is the video version if you prefer to watch a video. Enjoy!

Key Points

So here are 4 key points to keep in mind while learning data science.

The first key point is to learn programming and statistics to start working on personal projects. In particular, I would recommend getting familiar with programming in Python or R. From my experience, I would start learning Python since it’s one of the easiest programming languages to learn, and it has great data science packages to work with tabular data and perform machine learning. I’ll go into more depth on programming and stats later in this video.

Once you have a solid understanding of programming and stats, you should work on personal projects to apply your knowledge to a working project that you can add to your portfolio and demonstrate your expertise to employers.

As you’re learning programming and stats and working on your project, it’s important to set a consistent set of goals and have a consistent work schedule to have accountability on yourself and to track your progress over time to motivate you to continue learning and working on your project.

Once you feel like you have a good grasp of fundamental concepts and have a working project, it would be a good idea to learn more advanced concepts, such as AI, machine learning, NLP, and computer vision, to increase your expertise and incorporate these concepts to future projects.

Programming

So I’ll go over 9 methods step-by-step on which specific data science skills to learn. The first step towards learning data science is to learn fundamental programming concepts, such as object-oriented programming, data structures, and algorithms.

For object-oriented programming, it’s important to know the concepts of a class, object, properties or instance variables, and methods or functions. In addition, it’s good to know encapsulation, abstraction, polymorphism, and inheritance. Some other important concepts to know include while and for loops and try/catch exceptions.

For data structures, some of the most important ones to know are primitive data structures, such as integer, character, and boolean. There are also non-primitive or user-defined data structures, such as an array, linked list, stack, queue, tree, and graph.

For algorithms, it’s good to know the recursive, divide and conquer, dynamic programming, greedy, brute force, and backtracking algorithms.

As I mentioned earlier, I would suggest learning programming in Python since it’s a great first programming language to learn due to its easy-to-read syntax and great data science packages, such as NumPy and Pandas.

NumPy adds support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool to work with tabular data, such as a spreadsheet or CSV.

I personally started learning data science with Python and these packages, so I can speak from personal experience that these tools are fundamental to learning data science.

In addition to these tools, I would suggest using Jupyter notebooks as your go-to IDE since I have used them to work on every project I have worked on. If you’re interested in learning more about Jupyter notebooks, I made an Intro to Jupyter video, which is linked here.

Statistics

For statistics, it’s good to know fundamental concepts, such as mean/median/mode, standard deviation, confidence interval, combination, permutation, and probability. Essentially, high school stats and math courses are sufficient to get started with learning data science.

Exploratory Data Analysis (EDA)

Once you understand programming and stats, now you can begin doing exploratory data analysis. EDA is an approach to analyzing data sets to summarize their main characteristics, often using statistical graphics and other data visualization methods. The first step is to load the data, which can be done in Python using the Pandas library. For example, if you have a CSV file, you can use the read_csv() method in the Pandas library to load the data as a Python dataframe, which is essentially a spreadsheet.

Once you load the data, you can perform EDA by distinguishing attributes in the data, particularly by performing data cleaning and data imputation. Then, you can run various types of analysis by computing basic stats or creating charts. EDA is a great way to uncover insights fairly quickly before diving deeper into the data analysis.

Data Visualization

As I touched on, part of EDA includes data visualization, which is another important aspect of data science since it’s the graphical representation of information and data. By using visual elements, such as charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data. In other words, data viz helps make your insights easier to understand and digest to your audience so they can more quickly take action based on your insights.

I personally started working with Matplotlib, which is a Python plotting and charting library, since it’s a great and simple way to create charts.

I then learned other charting libraries, such as Seaborn, Folium, and D3.

I haven’t tried Tableau, but I know it’s another great data viz tool.

Machine Learning

Another important aspect of data science to learn and incorporate into your projects is machine learning, which is a branch of AI focused on building applications that learn from data and improve their accuracy over time without being programmed to do so.

I used the scikit-learn package in Python to learn and work with machine learning. It provides simple and efficient tools for predictive data analysis in classification, regression, clustering, dimensionality reduction, model selection, and preprocessing. It’s also built on NumPy, SciPy, and Matplotlib. I’ll go over a project that you can practice machine learning with scikit-learn later in this video.

Web Scraping

One way to get data to use for your project is to perform web scraping, which is data scraping used for extracting data from websites.

I have used the BeautifulSoup library in Python to perform web scraping. Beautiful Soup is a Python library for pulling data out of HTML and XML files.

Databases

To store your data in one location to use later, you could use a database, which is a data structure that stores organized information. Most databases contain multiple tables, which may each include several different data fields. For example, a company database may include tables for products, services, employees, and financial records.

One type of database is a relational database, which is a type of database that stores and provides access to data points that are related to one another. Relational databases are based on the relational model, which is an intuitive, straightforward way of representing data in tables. You can represent these relationships in an Entity Relationship Diagram, or ER Diagram.

One way to work with databases is to use SQL, which stands for Structured Query Language. SQL is a special-purpose programming language designed for managing data in a relational database. I used SQL in some of my projects, and it’s a great way to store data.

Deployment

Another good concept to learn is the deployment of your data science project or ML model. The concept of deployment in data science refers to the application of a model for prediction using new data. Generally, building a model is not the end of the project. Even if the purpose of the model is to increase knowledge of the data, the knowledge gained will need to be organized and presented in a way that the customer can use it.

One example of learning deployment is through internships.

Other Topics

Some other topics to learn include Natural Language Processing, or NLP, and Deep Learning. NLP is a branch of AI that deals with the interaction between computers and humans using the natural language. The ultimate objective of NLP is to read, decipher, understand, and make sense of the human languages in a manner that is valuable. Some examples of NLP include bag-of-words and TF-IDF.

Deep learning is an AI function that imitates the workings of the human brain in processing data and creating patterns for use in decision making. Specifically, it uses networks capable of learning unsupervised from data that is unstructured or unlabeled. One method of deep learning is using a neural network.

These concepts are generally more advanced and overkill for a beginner project, so I would recommend learning these concepts after you have worked on multiple projects.

Working on Projects

Now, I’ll switch gears and talk about how to get started working on a project.

A great website to get started working on projects and get different ideas is Kaggle. It offers a no-setup and customizable Jupyter Notebooks environment as well as access to free GPUs and a huge repository of community published data and code. One of my first projects that I worked on that I found from Kaggle was the Titanic ML from Disaster project. It was the best, first project for you to dive into ML projects and familiarize yourself with how the Kaggle platform works. The project is simple: use machine learning to create a model that predicts which passengers survived the Titanic shipwreck. The description provides an overview of the project and data sets. There is also published code from other data scientists that you can download and run on your own to see their process and get familiar with writing ML code in Python or another language.

So, I would recommend starting with this project or browsing Kaggle to find another project with a topic that you’re interested in.

Once you have completed your project, I would recommend publishing your work and code on GitHub, LinkedIn, and your personal website, if you have one. That way, you can send a link of your project to employers and recruiters so they can take a look at your project and code.

From my experience, personal projects are a great way to demonstrate to recruiters that you have expertise in data science concepts.

One pro tip I have is to include links to your GitHub, LinkedIn, and your personal website directly on your resume. That way, if you send a recruiter a PDF of your resume, they can click on the links and be taken directly to your project.

While it may be initially daunting and overwhelming to learn data science, it’s important to set smart goals and plan out a reasonable work schedule around your other commitments.

For goal setting, one method I found helpful was setting smart goals, which stands for specific, measurable, achievable, relevant, and time-bound. Personally, as I have followed this strategy, I found it easier to articulate my specific goals every week and track my progress over time using a consistent work schedule.

Resources

Now, let’s go over some great resources to help you get started learning data science.

One way to learn data science is through a bachelor degree program in data science. I went over my entire data science degree curriculum in depth in another video, which I linked down below in the description.

One set of colleges to research if you’re planning on going to college is the UC system, which has some of the best universities in the world. As of the recording of this video, I know of four UC campuses that currently offer a bachelor’s degree in data science, and those campuses are UCSD, UC Irvine, UC Berkeley, and UC Davis.

I can only speak for the data science program at UCSD since that’s where I graduated from, but I’m sure any of these campuses have great data science programs.

If you’re looking to learn data science online, I would recommend looking into online courses. There are a ton of platforms out there offering data science courses. Some of them include Udemy and 365 Data Science, which I have seen other data science YouTubers mention in their videos. I personally haven’t looked into any online course, so I can’t say for certain which specific course is the best one. So, definitely do some research, if you’re interested in an online course.

If you’re looking for either online resources or to learn broader data science concepts, I would recommend looking into Codecademy and Treehouse.

Codecademy is a great resource where they have created their own courses on data science. They have many beginner friendly courses on the job essentials of a data scientist, how to analyze data with Python working on an actual project, data visualization in Python, and much more.

Treehouse also creates their own online courses tailored for people of all skill levels and backgrounds to learn programming, design, and more, all on your own time. They have topics ranging from Python and data analysis to machine learning and databases.

I personally used Codecademy when I first started learning about programming, so I would recommend looking into that to get started. As for Treehouse, I haven’t used it, but I have heard anecdotally from friends that it has great courses.

If you’re looking for a more formal education than online courses but don’t want to go to college, I would recommend looking into data science bootcamps, such as Metis. I first heard about Metis from my former manager at one of my internships, so anecdotally it sounds like a great way to learn data science and establish a professional network.

Another great resource to learn data science is YouTube since there are many YouTubers out there that focus on data science and stats, such as StatQuest.

Also, I want to shout out Ken and Tina for being a huge inspiration for starting my own YouTube channel focused on data science. They have a ton of great videos on data science, tech, and career advice, so definitely check them out and subscribe to all of their channels.

So I hoped you found a lot of value from this article.

If you found this article helpful, be sure to:

→ hit the clap button, and

→ follow me on Medium and my YouTube channel.

Also, comment down below what you learned from this video and what other topics you want to see.

And share this article with anyone interested in learning about data science.

With that said, thanks for reading!

--

--

Data Science Graduate
Data Science Graduate

Data science, machine learning, AI, and career advice from a B.S. Data Science Graduate. My Links: https://linktr.ee/datasciencegraduate