內容簡介
通過具體的例子、很少的理論以及兩款成熟的Python框架:Scikit-Learn和TensorFlow,作者Aurélien Géron會幫助你掌握構建智能係統所需要的概念和工具。你將會學習到各種技術,從簡單的綫性迴歸及發展到深度神經網絡。每章的練習有助於你運用所學到的知識,你隻需要有一些編程經驗就行瞭。
探索機器學習,尤其是神經網絡
使用Scikit-Learn全程跟蹤一個機器學習項目的例子
探索各種訓練模型,包括:支持嚮量機、決策樹、隨機森林以及集成方法
使用TensorFlow庫構建和訓練神經網絡
深入神經網絡架構,包括捲積神經網絡、循環神經網絡和深度強化學習
學習可用於訓練和縮放深度神經網絡的技術
運用實際的代碼示例,無需瞭解過多的機器學習理論或算法細節
作者簡介
Aurélien Géron,是一名機器學習顧問。作為一名前Google職員,在2013至2016年間,他領導瞭YouTube視頻分類團隊。在2002至2012年間,他身為法國主要的無綫ISP Wifirst的創始人和CTO,在2001年他還是Polyconseil的創始人和CTO,這傢公司現在管理著電動汽車共享服務Autolib'。
精彩書評
“本書很好地介紹瞭利用神經網絡解決問題的相關理論與實踐。它涵蓋瞭構建高效應用涉及的關鍵點以及理解新技術所需的背景知識。我嚮有興趣學習實用機器學習的讀者推薦這本書。”
—— Pete Warden
TensorFlow移動部門主管
目錄
Preface
Part Ⅰ.The Fundamentals of Machine Learning
1. The Machine Learning Landscape
What Is Machine Learning?
Why Use Machine Learning?
Types of Machine Learning Systems
Supervised/Unsupervised Learning
Batch and Online Learning
Instance-Based Versus Model-Based Learning
Main Challenges of Machine Learning
Insufficient Quantity of Training Data
Nonrepresentative Training Data
Poor-Quality Data
Irrelevant Features
Overfitting the Training Data
Underfitting the Training Data tepping Back
Testing and Validating
Exercises
2. End-to-End Machine Learning Project
Working with Real Data
Look at the Big Picture
Frame the Problem
Select a Performance Measure
Check the Assumptions
Get the Data
Create the Workspace
Download the Data
Take a Quick Look at the Data Structure
Create a Test Set
Discover and Visualize the Data to Gain Insights
Visualizing Geographical Data
Looking for Correlations
Experimenting with Attribute Combinations
Prepare the Data for Machine Learning Algorithms
Data Cleaning
Handling Text and Categorical Attributes
Custom Transformers
Feature Scaling
Transformation Pipelines
Select and Train a Model
Training and Evaluating on the Training Set
Better Evaluation Using Cross-Validation
Fine-Tune Your Model
Grid Search
Randomized Search
Ensemble Methods
Analyze the Best Models and Their Errors
Evaluate Your System on the Test Set
Launch, Monitor, and Maintain Your System
Try It Out!
Exercises
3. Classification
MNIST
Training a Binary Classifier
Performance Measures
Measuring Accuracy Using Cross-Validation
Confusion Matrix
Precision and Recall
Precision/Recall Tradeoff
The ROC Curve
Multiclass Classification
Error Analysis
Multilabel Classification
Multioutput Classification
……
Part Ⅱ.Neural Networks and Deep Learning
A. Exercise Solutions
B. Machine Learning Project Checklist
C. SVM Dual Problem
D. Autodiff
E. Other Popular ANN Architectures
Index
精彩書摘
《Scikit-Learn與TensorFlow機器學習實用指南(影印版)》:
3.It is quite possible to speed up training of a bagging ensemble by distributing it across multiple servers, since each predictor in the ensemble is independent of the others.The same goes for pasting ensembles and Random Forests, for the same reason.However, each predictor in a boosting ensemble is built based on the previous predictor, so training is necessarily sequential, and you will not gain anything by distributing training across multiple servers.Regarding stacking ensembles, all the predictors in a given layer are independent of each other, so they can be trained in parallel on multiple servers.However, the predictors in one layer can only be trained after the predictors in the previous layer have all been trained.
4.With out-of-bag evaluation, each predictor in a bagging ensemble is evaluated using instances that it was not trained on (they were held out).This makes it pos-sible to have a fairly unbiased evaluation of the ensemble without the need for an additional validation set.Thus, you have more instances available for training, and your ensemble can perform slightly better.
5.When you are growing a tree in a Random Forest, only a random subset of the features is considered for splitting at each node.This is true as well for Extra-Trees, but they go one step further: rather than searching for the best possible thresholds, like regular Decision Trees do, they use random thresholds for each feature.This extra randomness acts like a form of regularization: if a Random Forest overfits the training data, Extra-Trees might perform better.Moreover, since Extra-Trees don't search for the best possible thresholds, they are much faster to train than Random Forests.However, they are neither faster nor slower than Random Forests when making predictions.
6.Ifyour AdaBoost ensemble underfits the training data, you can try increasing the number of estimators or reducing the regularization hyperparameters of the base estimator.You may also try slightly increasing the learning rate.
……
Scikit-Learn與TensorFlow機器學習實用指南(影印版) 下載 mobi epub pdf txt 電子書