Statistics for Data Scientists Training

Statistics for Data Scientists Course:
This statistics course works as a solid foundation for somebody getting started with data science or machine learning. Understanding probability, regression, sampling etc are integral part of this course. Understanding regression, cost function, distance between vectors, hyper-parameter tuning, regularization. Discussion around plotting data, accuracy error calculation.

Statistics for Data Scientists Course Curriculum

1. Exploratory Data Analysis

Elements of Structured Data

Further Reading

Rectangular Data

Data Frames and Indexes

Nonrectangular Data Structures

Further Reading

Estimates of Location

Mean

Median and Robust Estimates

Example: Location Estimates of Population and Murder Rates

Further Reading

Estimates of Variability

Standard Deviation and Related Estimates

Estimates Based on Percentiles

Example: Variability Estimates of State Population

Further Reading

Exploring the Data Distribution

Percentiles and Boxplots

Frequency Table and Histograms

Density Estimates

Further Reading

Exploring Binary and Categorical Data

Mode

Expected Value

Further Reading

Correlation

Scatterplots

Further Reading

Exploring Two or More Variables

Hexagonal Binning and Contours (Plotting Numeric versus Numeric Data)

Two Categorical Variables

Categorical and Numeric Data

Visualizing Multiple Variables

2. Data and Sampling Distributions

Random Sampling and Sample Bias

Bias

Random Selection

Size versus Quality: When Does Size Matter?

Sample Mean versus Population Mean

Further Reading

Selection Bias

Regression to the Mean

Further Reading

Sampling Distribution of a Statistic

Central Limit Theorem

Standard Error

Further Reading

The Bootstrap

Resampling versus Bootstrapping

Further Reading

Confidence Intervals

Further Reading

Normal Distribution

Standard Normal and QQ-Plots

Long-Tailed Distributions

Further Reading

Student’s t-Distribution

Further Reading

Binomial Distribution

Further Reading

Poisson and Related Distributions

Poisson Distributions

Exponential Distribution

Estimating the Failure Rate

Weibull Distribution

Further Reading

Summary

3. Statistical Experiments and Significance Testing

A/B Testing

Why Have a Control Group?

Why Just A/B? Why Not C, D…?

For Further Reading

Hypothesis Tests

The Null Hypothesis

Alternative Hypothesis

One-Way, Two-Way Hypothesis Test

Further Reading

Resampling

Permutation Test

Example: Web Stickiness

Exhaustive and Bootstrap Permutation Test

Permutation Tests: The Bottom Line for Data Science

For Further Reading

Statistical Significance and P-Values

P-Value

Alpha

Type 1 and Type 2 Errors

Data Science and P-Values

Further Reading

t-Tests

Further Reading

Multiple Testing

Further Reading

Degrees of Freedom

Further Reading

ANOVA

F-Statistic

Two-Way ANOVA

Further Reading

Chi-Square Test

Chi-Square Test: A Resampling Approach

Chi-Squared Test: Statistical Theory

Fisher’s Exact Test

Relevance for Data Science

Further Reading

Multi-Arm Bandit Algorithm

Further Reading

Power and Sample Size

Sample Size

Further Reading

Summary

4. Regression and Prediction

Simple Linear Regression

The Regression Equation

Fitted Values and Residuals

Least Squares

Prediction versus Explanation (Profiling)

Further Reading

Multiple Linear Regression

Example: King County Housing Data

Assessing the Model

Cross-Validation

Model Selection and Stepwise Regression

Weighted Regression

Prediction Using Regression

The Dangers of Extrapolation

Confidence and Prediction Intervals

Factor Variables in Regression

Dummy Variables Representation

Factor Variables with Many Levels

Ordered Factor Variables

Interpreting the Regression Equation

Correlated Predictors

Multicollinearity

Confounding Variables

Interactions and Main Effects

Testing the Assumptions: Regression Diagnostics

Outliers

Influential Values

Heteroskedasticity, Non-Normality and Correlated Errors

Partial Residual Plots and Nonlinearity

Polynomial and Spline Regression

Polynomial

Splines

Generalized Additive Models

5. Classification

Naive Bayes

Why Exact Bayesian Classification Is Impractical

The Naive Solution

Numeric Predictor Variables

Further Reading

Discriminant Analysis

Covariance Matrix

Fisher’s Linear Discriminant

A Simple Example

Further Reading

Logistic Regression

Logistic Response Function and Logit

Logistic Regression and the GLM

Generalized Linear Models

Predicted Values from Logistic Regression

Interpreting the Coefficients and Odds Ratios

Linear and Logistic Regression: Similarities and Differences

Assessing the Model

Further Reading

Evaluating Classification Models

Confusion Matrix

The Rare Class Problem

Precision, Recall, and Specificity

ROC Curve

AUC

Lift

Further Reading

Strategies for Imbalanced Data

Undersampling

Oversampling and Up/Down Weighting

Data Generation

Cost-Based Classification

Exploring the Predictions

6. Statistical Machine Learning

K-Nearest Neighbors

A Small Example: Predicting Loan Default

Distance Metrics

One Hot Encoder

Standardization (Normalization, Z-Scores)

Choosing K

KNN as a Feature Engine

Tree Models

A Simple Example

The Recursive Partitioning Algorithm

Measuring Homogeneity or Impurity

Stopping the Tree from Growing

Predicting a Continuous Value

How Trees Are Used

Further Reading

Bagging and the Random Forest

Bagging

Random Forest

Variable Importance

Hyperparameters

Boosting

The Boosting Algorithm

XGBoost

Regularization: Avoiding Overfitting

Hyperparameters and Cross-Validation

7. Unsupervised Learning

Principal Components Analysis

A Simple Example

Computing the Principal Components

Interpreting Principal Components

Further Reading

K-Means Clustering

A Simple Example

K-Means Algorithm

Interpreting the Clusters

Selecting the Number of Clusters

Hierarchical Clustering

A Simple Example

The Dendrogram

The Agglomerative Algorithm

Measures of Dissimilarity

Model-Based Clustering

Multivariate Normal Distribution

Mixtures of Normals

Selecting the Number of Clusters

Further Reading

Scaling and Categorical Variables

Scaling the Variables

Dominant Variables

Categorical Data and Gower’s Distance

Problems with Clustering Mixed Dat

Frequently Asked Questions

What are the modes of training for "Statistics for Data Scientists" course?

This "Statistics for Data Scientists" course is an instructor-led training (ILT). The trainer travels to your office location and delivers the training within your office premises. If you need training space for the training we can provide a fully-equipped lab with all the required facilities. The online instructor-led training is also available if required. Online training is live and the instructor's screen will be visible and voice will be audible. Participants screen will also be visible and participants can ask queries during the live session.

Will I be provided with any study material during the "Statistics for Data Scientists" training?

Participants will be provided "Statistics for Data Scientists"-specific study material. Participants will have lifetime access to all the code and resources needed for this "Statistics for Data Scientists". Our public GitHub repository and the study material will also be shared with the participants.

What is the pedagogy of zekeLabs?

All the courses from zekeLabs are hands-on courses. The code/document used in the class will be provided to the participants. Cloud-lab and Virtual Machines are provided to every participant during the "Statistics for Data Scientists" training.

What is the duration of this course?

The "Statistics for Data Scientists" training varies several factors. Including the prior knowledge of the team on the subject, the objective of the team learning from the program, customization in the course is needed among others. Contact us to know more about "Statistics for Data Scientists" course duration.

What would be the venue for the "Statistics for Data Scientists" training?

The "Statistics for Data Scientists" training is organised at the client's premises. We have delivered and continue to deliver "Statistics for Data Scientists" training in India, USA, Singapore, Hong Kong, and Indonesia. We also have state-of-art training facilities based on client requirement.

Who is the trainer for "Statistics for Data Scientists" training?

Our Subject matter experts (SMEs) have more than ten years of industry experience. This ensures that the learning program is a 360-degree holistic knowledge and learning experience. The course program has been designed in close collaboration with the experts working in esteemed organizations such as Google, Microsoft, Amazon, and similar others.

Can we customize this course based on our requirements?

Yes, absolutely. For every training, we conduct a technical call with our Subject Matter Expert (SME) and the technical lead of the team that undergoes training. The course is tailored based on the current expertise of the participants, objectives of the team undergoing the training program and short term and long term objectives of the organisation.

How can I reach out to you if I have any other queries regarding the "Statistics for Data Scientists" course?

Drop a mail to us at [email protected] or call us at +91 8041690175 and we will get back to you at the earliest for your queries on "Statistics for Data Scientists" course.

Recommended Courses

	Data Science & Machine Learning Foundation
	Apache Kafka
	Deep Learning using Tensorflow
	Deep Learning using Tensorflow
	Deep Learning using Tensorflow

More Courses

	Big Data Processing with PySpark
	React Native
	AWS Solutions Architect Associate Level
	Prometheus
	Oracle SQL
	Terraform
	Apache Cassandra
	AWS Data services
	Power BI
	AWS Solutions Architect Associate Level

First Name*
Last Name*
Mobile*
Email*
Training Required For*

Organisation
Message*
Lead Status
Lead Source

Statistics for Data Scientists Training

Statistics for Data Scientists Course:

Statistics for Data Scientists Course Curriculum

1. Exploratory Data Analysis

2. Data and Sampling Distributions

3. Statistical Experiments and Significance Testing

4. Regression and Prediction

5. Classification

6. Statistical Machine Learning

7. Unsupervised Learning

Frequently Asked Questions

What are the modes of training for "Statistics for Data Scientists" course?

Will I be provided with any study material during the "Statistics for Data Scientists" training?

What is the pedagogy of zekeLabs?

What is the duration of this course?

What would be the venue for the "Statistics for Data Scientists" training?

Who is the trainer for "Statistics for Data Scientists" training?

Can we customize this course based on our requirements?

How can I reach out to you if I have any other queries regarding the "Statistics for Data Scientists" course?

How to install Kubernetes Clusters Using Terraform?

How should I start learning Python?

What is Helm in Kubernetes?

What is future prospects of being a Django developer in India?

Impact of Artificial Intelligence, Big Data and Technology on the Financial Sector: Disruption

Practical use cases of AI in Business

The Vital Role of Big Data to Fight Against Corona virus

How do I check end of file (EOF) in python?

Top 3 Applications of Apache Spark

Using Terraform with Azure

Recommended Courses

Data Science & Machine Learning Foundation

Apache Kafka

Deep Learning using Tensorflow

Happy to hear your feedback