How can I become a data scientist from an absolute beginner level to an advanced level?

How can I become a data scientist from an absolute beginner level to an advanced level?

How can I become a data scientist from an absolute beginner level to an advanced level?  Awantik Das
Posted on May 22, 2017, 12:01 p.m.

Learn data science

Here is the journey that works. Being a trainer myself, I know many success stories

Most importantly, believe that you can do it. At least 30% work gets done with this.

Python is definitely best programming language for this because of ease & absorption. Spend 25–30 hours mastering this. Be through with list, dictionary, functional programming, classes, regular expression, iterators & generators.

Start with NumPy, ~5 hours is fine for this. Wide variety of documents are available for this.

Pandas are extensively used in data wrangling & processing. This would need ~10 hours. Do some csv, exel data processing

Data Visualization is another important aspect. Matplotlib is definitely a good one to do. Another ~5 hours will boost your confidence.

scikit-learn - a python machine learning library have many datasets already available. Start using those.

Now, time for machine learning. Remember & tell yourself, that you don’t have to reinvent algorithms here. First, level is to make use of them to solve problems.

First, on very simple datsets try applying linear regression. Now, jump into pipeline, hyperparameter & cross-validation concept of scikit learn. This will give fair idea to solve problems.

Then, go for other algorithms like classification, clustering.

At this stage, you will be very confident to solve any problem & dig deeper into machine learning algorithms.

Happy Learning.


Awantik Das is a Technology Evangelist and is currently working as a Corporate Trainer. He has already trained more than 3000+ Professionals from Fortune 500 companies that include companies like Cognizant, Mindtree, HappiestMinds, CISCO and Others. He is also involved in Talent Acquisition Consulting for leading Companies on niche Technologies. Previously he has worked with Technology Companies like CISCO, Juniper and Rancore (A Reliance Group Company).




Keywords : data-science spark


Recommended Reading


What are Big Data, Hadoop & Spark ? What is the relationship among them ?

Big Data is a problem statement & what it means is size of data under process has grown to 100's of petabytes ( 1 PB = 1000TB ). Yahoomail generates some 40-50 PB of data everyday. Yahoo has to read those 40-50 PB of data & filter out spans. E-commerence w...


How can one explain the concept of Apache Spark in layman's terms?

Data needs computation to get some information out. Size of data can be really huge. Huge data is broken down into chunks & stored across different systems.