Installing and setting up Spark locally
The Spark programming model
The Spark shell
Creating RDDs
Caching RDDs
The first step to a Spark program in Scala
The first step to a Spark program in Python
Launching an EC Spark cluster
Introducing MovieStream
Personalization
Predictive modeling and analytics
The components of a data-driven machine learning system
Data cleansing and transformation
Model deployment and integration
Batch versus real time
Accessing publicly available datasets
Exploring and visualizing your data
Exploring the movie dataset
Processing and transforming your data
Extracting useful features from your data
Derived features
Text features
Normalizing features
Using packages for feature extraction
Types of recommendation models
Matrix factorization
Extracting features from the MovieLens k dataset
Training a model on the MovieLens k dataset
Using the recommendation model User recommendations
Item recommendations
Evaluating the performance of recommendation models
Mean average precision at K
RMSE and MSE MAP Summary
Types of classification models
Logistic regression
The naïve Bayes model
Extracting the right features from your data
Training a classification model on the Kaggle/StumbleUpon evergreen classification dataset
Generating predictions for the Kaggle/StumbleUpon evergreen classification dataset Evaluating the performance of classification models
Precision and recall
Improving model performance and tuning parameters
Additional features
Tuning model parameters
Decision trees
Cross-validation
Types of regression models Least squares regression
Extracting the right features from your data
Creating feature vectors for the linear model
Training and using regression models
Evaluating the performance of regression models
Mean Absolute Error
The R-squared coefficient
Linear model
Improving model performance and tuning parameters
Impact of training on log-transformed targets
Creating training and testing sets to evaluate parameters
The impact of parameter settings for the decision tree
Types of clustering models
Initialization methods
Mixture models
Extracting the right features from your data
Extracting movie genre labels
Normalization
Training a clustering model on the MovieLens dataset
Interpreting cluster predictions on the MovieLens dataset
Evaluating the performance of clustering models
External evaluation metrics
Tuning parameters for clustering models
Types of dimensionality reduction
Singular Value Decomposition
Clustering as dimensionality reduction
Extracting features from the LFW dataset
Visualizing the face data
Normalization
Running PCA on the LFW dataset
Interpreting the Eigenfaces
Projecting data using PCA on the LFW dataset
Evaluating dimensionality reduction models
What's so special about text data?
Term weighting schemes
Extracting the TF-IDF features from the Newsgroups dataset
Improving our tokenization
Excluding terms based on frequency
Training a TF-IDF model
Using a TF-IDF model
Newsgroups dataset and TF-IDF features
Newsgroups dataset using TF-IDF
Comparing raw features with processed TF-IDF features on the Newsgroups dataset WordVec models
Online learning
An introduction to Spark Streaming
Transformations
Window operators
Creating a Spark Streaming application
Creating a basic streaming application
Stateful streaming
Streaming regression
Creating a streaming data producer
Streaming K-means
Comparing model performance with Spark Streaming