Here is the journey that works. Being a trainer myself, I know many success stories
Most importantly, believe that you can do it. At least 30% work gets done with this.
Python is definitely best programming language for this because of ease & absorption. Spend 25–30 hours mastering this. Be through with list, dictionary, functional programming, classes, regular expression, iterators & generators.
Start with NumPy, ~5 hours is fine for this. A wide variety of documents are available for this.
Pandas are extensively used in data wrangling & processing. This would need ~10 hours. Do some csv, excel data processing
Data Visualization is another important aspect. Matplotlib is definitely a good one to do. Another ~5 hours will boost your confidence.
scikit-learn - a python machine learning library have many datasets already available. Start using those.
Now, time for machine learning. Remember & tell yourself, that you don’t have to reinvent algorithms here. The first level is to make use of them to solve problems.
First, on very simple datasets try applying linear regression. Now, jump into pipeline, hyperparameter & cross-validation concept of scikit learn. This will give a fair idea to solve problems.
Then, go for other algorithms like classification, clustering.
At this stage, you will be very confident to solve any problem & dig deeper into machine learning algorithms.
Big Data is a problem statement & what it means is the size of data under process has grown to 100's of petabytes ( 1 PB = 1000TB ). Yahoo mail generates some 40-50 PB of data every day. Yahoo has to read that 40-50 PB of data & filter out spans. E-commerce...
Data needs computation to get some information out. Size of data can be really huge. Huge data is broken down into chunks & stored across different systems.