Machine Learning with Spark Training

Machine Learning with Spark Course:
Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI). The course will also draw from numerous case studies and applications, so that you'll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.

Machine Learning with Spark Course Curriculum

1. Getting Up and Running with Spark

Installing and setting up Spark locally

Spark clusters

The Spark programming model

SparkContext and SparkConf

The Spark shell

Resilient Distributed Datasets

Creating RDDs

Spark operations

Caching RDDs

Broadcast variables and accumulators

The first step to a Spark program in Scala

The first step to a Spark program in Java

The first step to a Spark program in Python

Getting Spark running on Amazon EC

Launching an EC Spark cluster

2. Designing a Machine Learning System

Introducing MovieStream

Business use cases for a machine learning system

Personalization

Targeted marketing and customer segmentation

Predictive modeling and analytics

Types of machine learning models

The components of a data-driven machine learning system

Data ingestion and storage

Data cleansing and transformation

Model training and testing loop

Model deployment and integration

Model monitoring and feedback

Batch versus real time

An architecture for a machine learning system

3. Obtaining, Processing, and Preparing Data with Spark

Accessing publicly available datasets

The MovieLens k dataset

Exploring and visualizing your data

Exploring the user dataset

Exploring the movie dataset

Exploring the rating dataset

Processing and transforming your data

Filling in bad or missing data

Extracting useful features from your data

Numerical features Categorical features

Derived features

Transforming timestamps into categorical features

Text features

Simple text feature extraction

Normalizing features

Using MLlib for feature normalization

Using packages for feature extraction

4. Building a Recommendation Engine with Spark

Types of recommendation models

Content-based filtering Collaborative filtering

Matrix factorization

Extracting the right features from your data

Extracting features from the MovieLens k dataset

Training the recommendation model

Training a model on the MovieLens k dataset

Training a model using implicit feedback data

Using the recommendation model User recommendations

Generating movie recommendations from the MovieLens k dataset

Item recommendations

Generating similar movies for the MovieLens k dataset

Evaluating the performance of recommendation models

Mean Squared Error

Mean average precision at K

Using MLlib's built-in evaluation functions

RMSE and MSE MAP Summary

5. Building a Classification Model with Spark

Types of classification models

Linear models

Logistic regression

Linear support vector machines

The naïve Bayes model

Decision trees

Extracting the right features from your data

Extracting features from the Kaggle/StumbleUpon evergreen classification dataset Training classification models

Training a classification model on the Kaggle/StumbleUpon evergreen classification dataset

Using classification models

Generating predictions for the Kaggle/StumbleUpon evergreen classification dataset Evaluating the performance of classification models

Accuracy and prediction error

Precision and recall

ROC curve and AUC

Improving model performance and tuning parameters

Feature standardization

Additional features

Using the correct form of data

Tuning model parameters

Linear models

Decision trees

The naïve Bayes model

Cross-validation

6. Building a Regression Model with Spark

Types of regression models Least squares regression

Decision trees for regression

Extracting the right features from your data

Extracting features from the bike sharing dataset

Creating feature vectors for the linear model

Creating feature vectors for the decision tree

Training and using regression models

Training a regression model on the bike sharing dataset

Evaluating the performance of regression models

Mean Squared Error and Root Mean Squared Error

Mean Absolute Error

Root Mean Squared Log Error

The R-squared coefficient

Computing performance metrics on the bike sharing dataset

Linear model

Decision tree

Improving model performance and tuning parameters

Transforming the target variable

Impact of training on log-transformed targets

Tuning model parameters

Creating training and testing sets to evaluate parameters

The impact of parameter settings for linear models

The impact of parameter settings for the decision tree

7. Building a Clustering Model with Spark

Types of clustering models

K-means clustering

Initialization methods

Variants

Mixture models

Hierarchical clustering

Extracting the right features from your data

Extracting features from the MovieLens dataset

Extracting movie genre labels

Training the recommendation model

Normalization

Training a clustering model

Training a clustering model on the MovieLens dataset

Making predictions using a clustering model

Interpreting cluster predictions on the MovieLens dataset

Interpreting the movie clusters

Evaluating the performance of clustering models

Internal evaluation metrics

External evaluation metrics

Computing performance metrics on the MovieLens dataset

Tuning parameters for clustering models

Selecting K through cross-validation

8. Dimensionality Reduction with Spark

Types of dimensionality reduction

Principal Components Analysis

Singular Value Decomposition

Relationship with matrix factorization

Clustering as dimensionality reduction

Extracting the right features from your data

Extracting features from the LFW dataset

Exploring the face data

Visualizing the face data

Extracting facial images as vectors

Normalization

Training a dimensionality reduction model

Running PCA on the LFW dataset

Visualizing the Eigenfaces

Interpreting the Eigenfaces

Using a dimensionality reduction model

Projecting data using PCA on the LFW dataset

The relationship between PCA and SVD

Evaluating dimensionality reduction models

Evaluating k for SVD on the LFW dataset

9. Advanced Text Processing with Spark

What's so special about text data?

Extracting the right features from your data

Term weighting schemes

Feature hashing

Extracting the TF-IDF features from the Newsgroups dataset

Exploring the Newsgroups data Applying basic tokenization

Improving our tokenization

Removing stop words

Excluding terms based on frequency

A note about stemming

Training a TF-IDF model

Analyzing the TF-IDF weightings

Using a TF-IDF model

Document similarity with the

Newsgroups dataset and TF-IDF features

Training a text classifier on the

Newsgroups dataset using TF-IDF

Evaluating the impact of text processing

Comparing raw features with processed TF-IDF features on the Newsgroups dataset WordVec models

WordVec on the Newsgroups dataset

10. Real-time Machine Learning with Spark Streaming

Online learning

Stream processing

An introduction to Spark Streaming

Input sources

Transformations

Actions

Window operators

Caching and fault tolerance with Spark Streaming

Creating a Spark Streaming application

The producer application

Creating a basic streaming application

Streaming analytics

Stateful streaming

Online learning with Spark Streaming

Streaming regression

A simple streaming regression program

Creating a streaming data producer

Creating a streaming regression model

Streaming K-means

Online model evaluation

Comparing model performance with Spark Streaming

Frequently Asked Questions

What are the modes of training for "Machine Learning with Spark" course?

This "Machine Learning with Spark" course is an instructor-led training (ILT). The trainer travels to your office location and delivers the training within your office premises. If you need training space for the training we can provide a fully-equipped lab with all the required facilities. The online instructor-led training is also available if required. Online training is live and the instructor's screen will be visible and voice will be audible. Participants screen will also be visible and participants can ask queries during the live session.

Will I be provided with any study material during the "Machine Learning with Spark" training?

Participants will be provided "Machine Learning with Spark"-specific study material. Participants will have lifetime access to all the code and resources needed for this "Machine Learning with Spark". Our public GitHub repository and the study material will also be shared with the participants.

What is the pedagogy of zekeLabs?

All the courses from zekeLabs are hands-on courses. The code/document used in the class will be provided to the participants. Cloud-lab and Virtual Machines are provided to every participant during the "Machine Learning with Spark" training.

What is the duration of this course?

The "Machine Learning with Spark" training varies several factors. Including the prior knowledge of the team on the subject, the objective of the team learning from the program, customization in the course is needed among others. Contact us to know more about "Machine Learning with Spark" course duration.

What would be the venue for the "Machine Learning with Spark" training?

The "Machine Learning with Spark" training is organised at the client's premises. We have delivered and continue to deliver "Machine Learning with Spark" training in India, USA, Singapore, Hong Kong, and Indonesia. We also have state-of-art training facilities based on client requirement.

Who is the trainer for "Machine Learning with Spark" training?

Our Subject matter experts (SMEs) have more than ten years of industry experience. This ensures that the learning program is a 360-degree holistic knowledge and learning experience. The course program has been designed in close collaboration with the experts working in esteemed organizations such as Google, Microsoft, Amazon, and similar others.

Can we customize this course based on our requirements?

Yes, absolutely. For every training, we conduct a technical call with our Subject Matter Expert (SME) and the technical lead of the team that undergoes training. The course is tailored based on the current expertise of the participants, objectives of the team undergoing the training program and short term and long term objectives of the organisation.

How can I reach out to you if I have any other queries regarding the "Machine Learning with Spark" course?

Drop a mail to us at [email protected] or call us at +91 8041690175 and we will get back to you at the earliest for your queries on "Machine Learning with Spark" course.

Recommended Courses

	PyTorch
	Power BI
	Power BI
	Data Analytics and Machine Learning using Azure
	Azure Databricks

More Courses

	Angular2 - Mastering Angular2
	Advanced Jenkins
	Prometheus
	Spring Boot
	Certified Ethical Hacking
	Chef
	jQuery
	Kibana
	Elasticsearch
	Deep Learning using Tensorflow

First Name*
Last Name*
Mobile*
Email*
Training Required For*

Organisation
Message*
Lead Status
Lead Source

Machine Learning with Spark Training

Machine Learning with Spark Course:

Machine Learning with Spark Course Curriculum

1. Getting Up and Running with Spark

2. Designing a Machine Learning System

3. Obtaining, Processing, and Preparing Data with Spark

4. Building a Recommendation Engine with Spark

5. Building a Classification Model with Spark

6. Building a Regression Model with Spark

7. Building a Clustering Model with Spark

8. Dimensionality Reduction with Spark

9. Advanced Text Processing with Spark

10. Real-time Machine Learning with Spark Streaming

Frequently Asked Questions

What are the modes of training for "Machine Learning with Spark" course?

Will I be provided with any study material during the "Machine Learning with Spark" training?

What is the pedagogy of zekeLabs?

What is the duration of this course?

What would be the venue for the "Machine Learning with Spark" training?

Who is the trainer for "Machine Learning with Spark" training?

Can we customize this course based on our requirements?

How can I reach out to you if I have any other queries regarding the "Machine Learning with Spark" course?

Getting started with ISTIO

Container is the new process and Kubernetes is the new Unix.

zekeLabs among Top 10 destinations to learn AI & Machine Learning

Deep Dive into Understanding Functions in Python

What Programming Languages are best suited for IoT Development?

When Python has ML libraries, why do you need Apache Spark for analytics?

Top 3 Applications of Apache Spark

Using Terraform with Azure

Know more about Terraform

Practical use cases of Blockchain

Recommended Courses

PyTorch

Power BI

Power BI

Happy to hear your feedback