Data analysis and processing

Python libraries in data analysis

Pandas

PyMongo

NumPy arrays

Array creation

Fancy indexing

Array functions

Loading and saving data

Loading an array

NumPy random numbers

An overview of the Pandas package

Series

The essential basic functionality

Head and tail

Functional statistics

Sorting

Computational tools

Advanced uses of Pandas for data analysis

The Panel data

The matplotlib API primer

Figures and subplots

Scatter plots

Contour plots

Legends and annotations

Additional Python data visualization tools

MayaVi

Time series primer

Resampling time series

Upsampling time series data

Timedeltas

Interacting with data in text format

Writing data to text format

HDF5

Interacting with data in Redis

List

Ordered set

Data munging

Filtering

Reshaping data

Grouping data

An overview of machine learning models

Data representation in scikit-learn

Unsupervised learning – clustering and dimensionality reduction

Introducing predictive modelling

Ensemble of statistical algorithms

Historical data

Business context

Task matrix for predictive modelling

LinkedIn's "People also viewed" feature

How is it done?

How is it done?

How is it done?

How is it done?

How was it done?

Anaconda

Installing a Python package

Installing Python packages with pip

IDEs for Python

Reading the data – variations and examples

Delimiters

Case 1 – reading a dataset using the read_csv method

Use cases of the read_csv method

Reading a .txt dataset with a comma delimiter

Case 2 – reading a dataset using the open method of Python

Changing the delimiter of a dataset

Case 4 – miscellaneous cases

Writing to a CSV or Excel file

Handling missing values

What constitutes missing data?

Treating missing values

Imputation

Visualizing a dataset by basic plotting

Histograms

Subsetting a dataset

Selecting rows

Creating new columns

Various methods for generating random numbers

Generating random numbers following probability distributions

Cumulative density function

Normal distribution

Geometry and mathematics behind the calculation of pi

Grouping the data – aggregation, filtering, and transformation

Filtering

Miscellaneous operations

Method 1 – using the Customer Churn Model

Method 3 – using the shuffle function

Merging/joining datasets

Left Join

An example of the Inner Join

An example of the Right Join

Random sampling and the central limit theorem

Null versus alternate hypothesis

Confidence intervals, significance levels, and p-values

A step-by-step guide to do a hypothesis test

Chi-square tests

Understanding the maths behind linear regression

Fitting a linear regression model and checking its efficacy

Making sense of result parameters

F-statistics

Implementing linear regression with Python

Multiple linear regression

Variance Inflation Factor

Training and testing data split of models

Feature selection with scikit-learn

Handling categorical variables

Handling outliers

Linear regression versus logistic regression

Contingency tables

Odds ratio

Estimation using the Maximum Likelihood Method

Log likelihood function:

Making sense of logistic regression parameters

Likelihood Ratio Test statistic

Implementing logistic regression with Python

Data exploration

Creating dummy variables for categorical variables

Implementing the model

Cross validation

The ROC curve

Introduction to clustering – what, why, and how?

How is clustering used?

Mathematics behind clustering

Euclidean distance

Minkowski distance

Normalizing the distances

Single linkage

Average linkage

Ward's method

K-means clustering

Importing and exploring the dataset

Hierarchical clustering using scikit-learn

Interpreting the cluster

The elbow method

Introducing decision trees

Understanding the mathematics behind decision trees

Entropy

ID3 algorithm to create a decision tree

Reduction in Variance

Handling a continuous numerical variable

Implementing a decision tree with scikit-learn

Cross-validating and pruning the decision tree

Regression tree algorithm

Understanding and implementing random forests

Implementing a random forest using Python

Important parameters for random forests

Best practices for coding

Defining functions for substantial individual tasks

Example 2

Avoid hard-coding of variables as much as possible

Using standard libraries, methods, and formulas

Best practices for algorithms

Best practices for business contexts

Data, information, knowledge, and insight

Information

Data analysis and insight

Transforming data into information

Data preprocessing

Organizing data

Transforming information into knowledge

Data visualization history

Minard's Russian campaign (1812)

Statistical graphics (1850-1915)

How does visualization help decision-making?

Data visualization today

Visualization plots

Bar graphs

Box plots

Scatter plots

KDE plots

Why does visualization require planning?

A sports example

Creating interesting stories with data

Reader-driven narratives

The State of the Union address

A few other example narratives

Perception and presentation methods

Some best practices for visualization

Correlation

Location-specific or geodata

Trends over time

Development tools

Anaconda from Continuum Analytics

Event listeners

Circular layout

Balloon layout

NumPy, SciPy, and MKL functions

NumPy universal functions

An example of interpolation

SciPy

The vectorized numerical derivative

The performance of Python

Slicing

Array indexing

Logical indexing

Stacks

Sets

Dictionaries

Sparse matrices

Dictionaries for memoization

Visualization using matplotlib

Installing word clouds

Web feeds

Plotting the stock price chart

The visualization example in sports

The deterministic model

The stochastic model

What exactly is Monte Carlo simulation?

Monte Carlo simulation in basketball

Implied volatilities

The simulation model

The diffusion-based simulation

Schelling's Segregation Model

K-nearest neighbors

Bayesian linear regression

Classification methods

Linear regression

An example

The NaÃ¯ve Bayes classifier

Installing TextBlob

The NaÃ¯ve Bayes classifier using TextBlob

k-nearest neighbors

Support vector machines

Installing scikit-learn

Directed graphs and multigraphs

Displaying graphs

NetworkX

PageRank

Analysis of social networks

The directed acyclic graph test

A genetic programming example

Computer simulation

SciPy's random functions

Signal processing

Visualization methods using HTML5

D3.js for visualization

We provide classroom-based as well as online training. Since this is a hand-on training so batches generally does not contain more than 4 people.

We will provide web services specific study material as the course progresses. You will have lifetime access to all the code and basic settings needed for these Data Analysis through our github account and the study material that we share with you. You can use that for quick reference.

Feel free to drop a mail to us at support@zekelabs.com and we will get back to you at the earliest for your queries on Data Analysis course

We have tie ups with various companies and placement organizations to whom we connect our learners. Each Data Analysis training ends with career consulting

Minimum 2-3 projects of industry standards on Data Analysis will be provided

Yes, we provide our own course completion certificate to all students. Each Data Analysis training in bangalore ends with training and project completion certificate

You can pay by card (debit/credit), cash, cheque and net-banking. You can pay in two installments

We take immense pride to provide post training career consulting for Data Analysis training

More Courses

Read More

Feedback