Data Science Revealed

56,99 €*

Lieferzeit Sofort lieferbar

Format

PDF

Data Science Revealed, Apress
With Feature Engineering, Data Visualization, Pipeline Development, and Hyperparameter Tuning
Von Tshepo Chris Nokeri, im heise Shop in digitaler Fassung erhältlich

Produktinformationen "Data Science Revealed"

Get insight into data science techniques such as data engineering and visualization, statistical modeling, machine learning, and deep learning. This book teaches you how to select variables, optimize hyper parameters, develop pipelines, and train, test, and validate machine and deep learning models. Each chapter includes a set of examples allowing you to understand the concepts, assumptions, and procedures behind each model.

The book covers parametric methods or linear models that combat under- or over-fitting using techniques such as Lasso and Ridge. It includes complex regression analysis with time series smoothing, decomposition, and forecasting. It takes a fresh look at non-parametric models for binary classification (logistic regression analysis) and ensemble methods such as decision trees, support vector machines, and naive Bayes. It covers the most popular non-parametric method for time-event data (the Kaplan-Meier estimator). It also covers ways of solving classification problems using artificial neural networks such as restricted Boltzmann machines, multi-layer perceptrons, and deep belief networks. The book discusses unsupervised learning clustering techniques such as the K-means method, agglomerative and Dbscan approaches, and dimension reduction techniques such as Feature Importance, Principal Component Analysis, and Linear Discriminant Analysis. And it introduces driverless artificial intelligence using H2O.

After reading this book, you will be able to develop, test, validate, and optimize statistical machine learning and deep learning models, and engineer, visualize, and interpret sets of data.

WHAT YOU WILL LEARN

* Design, develop, train, and validate machine learning and deep learning models
* Find optimal hyper parameters for superior model performance

* Improve model performance using techniques such as dimension reduction and regularization

* Extract meaningful insights for decision making using data visualization

WHO THIS BOOK IS FOR

Beginning and intermediate level data scientists and machine learning engineers

TSHEOP CHRIS NOKERI harnesses advanced analytics and artificial intelligence to foster innovation and optimize business performance. He has delivered complex solutions to companies in the mining, petroleum, and manufacturing industries. He completed a bachelor’s degree in information management and graduated with an honors degree in business science at the University of the Witwatersrand on a TATA Prestigious Scholarship and a Wits Postgraduate Merit Award. He also was awarded the Oxford University Press Prize. Section 1: Parametric Methods

Chapter 1: An Introduction to Simple Linear Regression

Chapter goal: Introduces the reader to parametric and understand the underlying assumptions of regression.

Subtopics

• Regression assumptions.

• Detecting missing values.

• Descriptive analysis.

• Understand correlation.

o Plot Pearson correlation matrix.

• Determine covariance.

o Plot covariance matrix.

• Create and reshape arrays.

• Split data into training and test data.

• Normalize data.

• Find best hyper-parameters for a model.

• Build your own model.

• Review model performance.

o Mean Absolute Error.

o Mean Squared Error.

o Root Mean Squared Error.

o R-squared.

o Plotting Actual Values vs. Predicted Values.

• Residual diagnosis.

o Normal Q-Q Plot.

o Cook’s D Influence Plot.

o Plotting predicted values vs. residual values.

o Plotting Fitted Values vs. Residual Values.

o Plotting Leverage Values vs. Residual Values.

o Plotting Fitted Values vs. Studentized Residual Values.

o Plotting Leverage Values vs. Studentized Residual Values.

Chapter 2: Advanced Parametric Methods

Chapter goal: Highlights methods of dealing with the problem of under-fitting and over-fitting.

Subtopics

• Issue of multi-collinearity.

• Explore methods of dealing with the problem under-fitting and over-fitting.

• Understand Ridge, RidgeCV and Lasso regression models.

• Find best hyper-parameters for a model.

• Build regularized models.

• Compare performance of different regression methods.

o Mean Absolute Error.

o Mean Squared Error.

o Root Mean Squared Error.

o R-squared.

o Plotting actual values vs. predicted values.

Chapter 3: Time Series Analysis

Chapter goal: Covers a model for identifying trends and patterns in sequential data and how to forecast a series.

• What is time series analysis?

• Underlying assumptions of time series analysis.

• Different types of time series analysis models.

• The ARIMA model.

• Test of stationary.

o Conduct an ADF Fuller Test.

• Test of white noise.

• Test of correlation.

o Plot Lag Plot.

o Plot Lag vs Autocorrelation Plot.

o Plot ACF.

o Plot PACF.

• Understand trends, seasonality and trends.

o Plot seasonal components.

• Smoothen a time series using Moving Average, Standard Deviation and Exponential techniques.

o Plot smoothened time series.

• Determine rate of return and rolling rate of return.

• Determine parameters of ARIMA model.

• Build ARIMA model.

• Forecast ARIMA.

o Plot forecast.

• Residual diagnosis

Chapter 4: High Quality Time Series

Chapter goal: Explores Prophet for better series forecast.

• Difference between statsmodel and Prophet.

• Understand components in Prophet.

• Data preprocessing.

• Develop a model using Prophet.

• Forecast a series.

o Plot forecasted.

o Plot seasonal components.

• Evaluate model performance using Prophet.

Chapter 4: Logistic Regression

Chapter goal: Introduces reader to logistic regression – a powerful classification model.

Subtopics

• Find missing values

• Understand correlation.

o Plotting Pearson correlation matrix.

• Determine covariance.

o Plotting covariance matrix.

• PCA for dimension reduction.

o Plotting scree plot.

• Normalize data.

• Hyper-parameter tuning.

• Create a pipeline.

• Develop a Logit model.

• Model evaluation.

o Tabulate classification report.o Tabulate confusion matrix.

o Plot ROC Curve

o Find AUC.

o Plot Precision Recall Curve.

o Find APS.

o Plot learning curve.

Chapter 5: Dimension Reduction and Multivariate Analysis using Linear Discriminant

Chapter goal: Discusses the difference between linear discriminant analysis and logistic regression and how linear discriminant analysis can be used for other purposes other than classification.

Subtopics

• Difference between logistic regression and discriminant analysis.

• Purpose of discriminant analysis.

• Model fitting.

• Model evaluation.

o Tabulate classification report.

o Tabulate confusion matrix.

o Plot ROC Curveo Find AUC.

o Plot Precision Recall Curve.

o Find APS.

o Plot learning curve.

Section 2: Ensemble methods

Chapter 6: Finding Hyper Lanes Using Support Vector Machine

Chapter goal: Highlights ways of finding hyper lanes using Linear Support Vector Chain including its pros and cons.

• Understand support vector machine.

• Find hyper lanes using SVM.

• Scenarios in which SVM performs better.

• Disadvantages of SVM. • Model fitting.

• Model evaluation.

o Tabulate classification report.

o Tabulate confusion matrix.

o Plot ROC curve

o Find AUC.o Plot Precision Recall curve.

o Find APS.

o Plot learning curve.

Chapter 7: Classification Using Decision Tree

Chapter goal: Explores how decision trees are formed and visualized them.

Subtopics

• Discussing entropy.

• Information gain

• Structure of decision trees

• Visualizing decision trees

• Modelling fitting

• Model evaluation.

o Tabulating classification report.

o Tabulating confusion matrix.

o Plotting ROC curve

o Finding AUC.

o Plotting Precision Recall curve.

o Finding APS.

o Plotting learning curve.

Chapter 8: Back to the Classic

Chapter goal: Gives an overview of this classical algorithm and explain why it is still relevant up to this date.

Subtopics

• The Naïve Bayes theorem.

• Unpacking Gaussian Naïve Bayes.

• Model fitting.

• Hyper-parameter tuning.

• Create a pipeline.

• Model evaluation.

o Tabulate classification report.

o Tabulate confusion matrix.

o Plot ROC Curve

o Find AUC.

o Plot Precision Recall Curve.

o Find APS.

o Plot learning curve.

Section 3: Non-Parametric Methods

Chapter 9: Finding Similarities and Dissimilarities Using Cluster Analysis

Chapter goal: Explain clustering and explore three main clustering algorithms (K-Means, Agglomerative and DBSCAN).

• An introduction to cluster analysis.

• Types of clustering algorithms.

• Normalize data.

• Dimension reduction using PCA.

o Finding number of components

• Find number of clusters.

o Elbow curve.

• Clustering K-Means.

• Fit K-Means model.

• Plot K-Means clusters.

• Clustering using Agglomerative algorithm.

o Techniques of calculating similarities/dissimilarities

• Fit Agglomerative.

• Plot Agglomerative clusters.

• Clustering using Density-Based Spatial Clustering Algorithm with Noise (DBSCAN)

• Fit DBSCAN.

• Plot DBSCAN clusters.

Chapter 10: Survival AnalysisChapter: Provides an overview of survival analysis (a model used commonly used in medical and insurance industries) by detailing the commonly used estimator – Kaplan Meier Fitter.

Subtopics

• Create a survival table.

• The survival function.

• An introduction to the Kaplan Meier Estimator.

• Finding confidence intervals.

• Tabulating cumulative density estimates.

• Tabulating survival function estimates.

• Plotting survival curve.

• Plotting cumulative density.

• Model evaluation.

Chapter 11: Neural Networks

Chapter goal: Discusses the fundamentals of neural networks and ways of optimizing networks for better accuracy.

Subtopics

• Forward propagation.

• Backward propagation.

• Forward pass.

• Backward pass.

• Cost function.

• Gradient.

• The vanishing gradient problem.

• Other functions.

• Optimizing networks.

• Bernoulli Restricted Boltzmann Machine.

• Multi-Layer Perceptron.

• Regularizing networks.

• Dropping layers.

• Model evaluation.

• Model evaluation.

o Tabulate classification report.

o Tabulate confusion matrix.

o Plot ROC Curve

o Find AUC.

o Plot Precision Recall Curve.

o Find APS.

o Plot training and validation loss across epochs.

o Plot training and validation accuracy across epochs.

Chapter 12: Driverless AI Using H2O

Chapter goal: Covers a new library that helps organizations accelerate their adoption of AI.

• How H2O works.

• Data processing.

• Model training.

• Model evaluation.

• AutoML.

Artikel-Details

Anbieter:: Apress
Autor:: Tshepo Chris Nokeri
Artikelnummer:: 9781484268704
Veröffentlicht:: 06.03.21