Data analysis and processing
Python libraries in data analysis
Pandas
PyMongo
NumPy arrays
Array creation
Fancy indexing
Array functions
Loading and saving data
Loading an array
NumPy random numbers
An overview of the Pandas package
Series
The essential basic functionality
Head and tail
Functional statistics
Sorting
Computational tools
Advanced uses of Pandas for data analysis
The Panel data
The matplotlib API primer
Figures and subplots
Scatter plots
Contour plots
Legends and annotations
Additional Python data visualization tools
MayaVi
Time series primer
Resampling time series
Upsampling time series data
Timedeltas
Interacting with data in text format
Writing data to text format
HDF5
Interacting with data in Redis
List
Ordered set
Data munging
Filtering
Reshaping data
Grouping data
An overview of machine learning models
Data representation in scikit-learn
Unsupervised learning – clustering and dimensionality reduction
Introducing predictive modelling
Ensemble of statistical algorithms
Historical data
Business context
Task matrix for predictive modelling
LinkedIn's "People also viewed" feature
How is it done?
How is it done?
How is it done?
How is it done?
How was it done?
Anaconda
Installing a Python package
Installing Python packages with pip
IDEs for Python
Reading the data – variations and examples
Delimiters
Case 1 – reading a dataset using the read_csv method
Use cases of the read_csv method
Reading a .txt dataset with a comma delimiter
Case 2 – reading a dataset using the open method of Python
Changing the delimiter of a dataset
Case 4 – miscellaneous cases
Writing to a CSV or Excel file
Handling missing values
What constitutes missing data?
Treating missing values
Imputation
Visualizing a dataset by basic plotting
Histograms
Subsetting a dataset
Selecting rows
Creating new columns
Various methods for generating random numbers
Generating random numbers following probability distributions
Cumulative density function
Normal distribution
Geometry and mathematics behind the calculation of pi
Grouping the data – aggregation, filtering, and transformation
Filtering
Miscellaneous operations
Method 1 – using the Customer Churn Model
Method 3 – using the shuffle function
Merging/joining datasets
Left Join
An example of the Inner Join
An example of the Right Join
Random sampling and the central limit theorem
Null versus alternate hypothesis
Confidence intervals, significance levels, and p-values
A step-by-step guide to do a hypothesis test
Chi-square tests
Understanding the maths behind linear regression
Fitting a linear regression model and checking its efficacy
Making sense of result parameters
F-statistics
Implementing linear regression with Python
Multiple linear regression
Variance Inflation Factor
Training and testing data split of models
Feature selection with scikit-learn
Handling categorical variables
Handling outliers
Linear regression versus logistic regression
Contingency tables
Odds ratio
Estimation using the Maximum Likelihood Method
Log likelihood function:
Making sense of logistic regression parameters
Likelihood Ratio Test statistic
Implementing logistic regression with Python
Data exploration
Creating dummy variables for categorical variables
Implementing the model
Cross validation
The ROC curve
Introduction to clustering – what, why, and how?
How is clustering used?
Mathematics behind clustering
Euclidean distance
Minkowski distance
Normalizing the distances
Single linkage
Average linkage
Ward's method
K-means clustering
Importing and exploring the dataset
Hierarchical clustering using scikit-learn
Interpreting the cluster
The elbow method
Introducing decision trees
Understanding the mathematics behind decision trees
Entropy
ID3 algorithm to create a decision tree
Reduction in Variance
Handling a continuous numerical variable
Implementing a decision tree with scikit-learn
Cross-validating and pruning the decision tree
Regression tree algorithm
Understanding and implementing random forests
Implementing a random forest using Python
Important parameters for random forests
Best practices for coding
Defining functions for substantial individual tasks
Example 2
Avoid hard-coding of variables as much as possible
Using standard libraries, methods, and formulas
Best practices for algorithms
Best practices for business contexts
Data, information, knowledge, and insight
Information
Data analysis and insight
Transforming data into information
Data preprocessing
Organizing data
Transforming information into knowledge
Data visualization history
Minard's Russian campaign (1812)
Statistical graphics (1850-1915)
How does visualization help decision-making?
Data visualization today
Visualization plots
Bar graphs
Box plots
Scatter plots
KDE plots
Why does visualization require planning?
A sports example
Creating interesting stories with data
Reader-driven narratives
The State of the Union address
A few other example narratives
Perception and presentation methods
Some best practices for visualization
Correlation
Location-specific or geodata
Trends over time
Development tools
Anaconda from Continuum Analytics
Event listeners
Circular layout
Balloon layout
NumPy, SciPy, and MKL functions
NumPy universal functions
An example of interpolation
SciPy
The vectorized numerical derivative
The performance of Python
Slicing
Array indexing
Logical indexing
Stacks
Sets
Dictionaries
Sparse matrices
Dictionaries for memoization
Visualization using matplotlib
Installing word clouds
Web feeds
Plotting the stock price chart
The visualization example in sports
The deterministic model
The stochastic model
What exactly is Monte Carlo simulation?
Monte Carlo simulation in basketball
Implied volatilities
The simulation model
The diffusion-based simulation
Schelling's Segregation Model
K-nearest neighbors
Bayesian linear regression
Classification methods
Linear regression
An example
The Naïve Bayes classifier
Installing TextBlob
The Naïve Bayes classifier using TextBlob
k-nearest neighbors
Support vector machines
Installing scikit-learn
Directed graphs and multigraphs
Displaying graphs
NetworkX
PageRank
Analysis of social networks
The directed acyclic graph test
A genetic programming example
Computer simulation
SciPy's random functions
Signal processing
Visualization methods using HTML5
D3.js for visualization