0

Data Science Programming All-in-One For Dummies

eBook

Erschienen am 04.12.2019, 1. Auflage 2019
29,99 €
(inkl. MwSt.)

Download

E-Book Download
Bibliografische Daten
ISBN/EAN: 9781119626138
Sprache: Englisch
Umfang: 768 S., 21.42 MB
E-Book
Format: PDF
DRM: Adobe DRM

Beschreibung

Your logical, linear guide to the fundamentals of data science programming

Data science is explodingin a good waywith a forecast of 1.7 megabytes of new information created every second for each human being on the planet by 2020 and 11.5 million job openings by 2026. It clearly pays dividends to be in the know. This friendly guide charts a path through the fundamentals of data science and then delves into the actual work: linear regression, logical regression, machine learning, neural networks, recommender engines, and cross-validation of models.

Data Science Programming All-In-One For Dummies is a compilation of the key data science, machine learning, and deep learning programming languages: Python and R. It helps you decide which programming languages are best for specific data science needs. It also gives you the guidelines to build your own projects to solve problems in real time.

Get grounded: the ideal start for new data professionalsWhat lies ahead: learn about specific areas that data is transformingBe meaningful: find out how to tell your data storySee clearly: pick up the art of visualization

Whether youre a beginning student or already mid-career, get your copy now and add even more meaning to your lifeand everyone elses!

Autorenportrait

John Mueller has produced 114 books and more than 600 articles on topics ranging from functional programming techniques to working with Amazon Web Services (AWS).Luca Massaron, a Google Developer Expert (GDE),??interprets big data and transforms it into smart data through simple and effective data mining and machine learning techniques.

Inhalt

Introduction1

About This Book 1

Foolish Assumptions 3

Icons Used in This Book 4

Beyond the Book 4

Where to Go from Here 5

Book 1: Defining Data Science 7

Chapter 1: Considering the History and Uses of Data Science 9

Considering the Elements of Data Science 10

Considering the emergence of data science 10

Outlining the core competencies of a data scientist 11

Linking data science, big data, and AI 12

Understanding the role of programming 12

Defining the Role of Data in the World 13

Enticing people to buy products 13

Keeping people safer 14

Creating new technologies 15

Performing analysis for research 16

Providing art and entertainment 17

Making life more interesting in other ways 18

Creating the Data Science Pipeline 18

Preparing the data 18

Performing exploratory data analysis 18

Learning from data 19

Visualizing 19

Obtaining insights and data products 19

Comparing Different Languages Used for Data Science 20

Obtaining an overview of data science languages 20

Defining the pros and cons of using Python 22

Defining the pros and cons of using R 23

Learning to Perform Data Science Tasks Fast 25

Loading data 26

Training a model 26

Viewing a result 26

Chapter 2: Placing Data Science within the Realm of AI 29

Seeing the Data to Data Science Relationship 30

Considering the data architecture 30

Acquiring data from various sources 31

Performing data analysis 32

Archiving the data 33

Defining the Levels of AI 33

Beginning with AI 34

Advancing to machine learning 39

Getting detailed with deep learning 43

Creating a Pipeline from Data to AI 47

Considering the desired output 47

Defining a data architecture 47

Combining various data sources 47

Checking for errors and fixing them 48

Performing the analysis 48

Validating the result 49

Enhancing application performance 49

Chapter 3: Creating a Data Science Lab of Your Own 51

Considering the Analysis Platform Options 52

Using a desktop system 53

Working with an online IDE 53

Considering the need for a GPU 54

Choosing a Development Language 56

Obtaining and Using Python 58

Working with Python in this book 58

Obtaining and installing Anaconda for Python 59

Defining a Python code repository 64

Working with Python using Google Colaboratory 69

Defining the limits of using Azure Notebooks with Python and R 71

Obtaining and Using R 72

Obtaining and installing Anaconda for R 72

Starting the R environment 73

Defining an R code repository 75

Presenting Frameworks 76

Defining the differences 76

Explaining the popularity of frameworks 77

Choosing a particular library 79

Accessing the Downloadable Code 80

Chapter 4: Considering Additional Packages and Libraries You Might Want 81

Considering the Uses for Third-Party Code 82

Obtaining Useful Python Packages 83

Accessing scientific tools using SciPy 84

Performing fundamental scientific computing using NumPy 85

Performing data analysis using pandas 85

Implementing machine learning using Scikit-learn 86

Going for deep learning with Keras and TensorFlow 86

Plotting the data using matplotlib 87

Creating graphs with NetworkX 88

Parsing HTML documents using Beautiful Soup 88

Locating Useful R Libraries 89

Using your Python code in R with reticulate 89

Conducting advanced training using caret 90

Performing machine learning tasks using mlr 90

Visualizing data using ggplot2 91

Enhancing ggplot2 using esquisse 91

Creating graphs with igraph 91

Parsing HTML documents using rvest 92

Wrangling dates using lubridate 92

Making big data simpler using dplyr and purrr 93

Chapter 5: Leveraging a Deep Learning Framework 95

Understanding Deep Learning Framework Usage 96

Working with Low-End Frameworks 97

Chainer 97

PyTorch 98

MXNet 98

Microsoft Cognitive Toolkit/CNTK 99

Understanding TensorFlow 100

Grasping why TensorFlow is so good 101

Making TensorFlow easier by using TFLearn 102

Using Keras as the best simplifier 102

Getting your copy of TensorFlow and Keras 103

Fixing the C++ build tools error in Windows 106

Accessing your new environment in Notebook 108

Book 2: Interacting with Data Storage 109

Chapter 1: Manipulating Raw Data 111

Defining the Data Sources 112

Obtaining data locally 112

Using online data sources 117

Employing dynamic data sources 121

Considering other kinds of data sources 123

Considering the Data Forms 124

Working with pure text 124

Accessing formatted text 125

Deciphering binary data 126

Understanding the Need for Data Reliability 128

Chapter 2: Using Functional Programming Techniques 131

Defining Functional Programming 132

Differences with other programming paradigms 132

Understanding its goals 133

Understanding Pure and Impure Languages 134

Using the pure approach 134

Using the impure approach 134

Comparing the Functional Paradigm 135

Imperative 135

Procedural 136

Object-oriented 136

Declarative 136

Using Python for Functional Programming Needs 137

Understanding How Functional Data Works 138

Working with immutable data 139

Considering the role of state 139

Eliminating side effects 140

Passing by reference versus by value 140

Working with Lists and Strings 142

Creating lists 144

Evaluating lists 144

Performing common list manipulations 146

Understanding the Dict and Set alternatives 147

Considering the use of strings 148

Employing Pattern Matching 150

Looking for patterns in data 150

Understanding regular expressions 152

Using pattern matching in analysis 155

Working with pattern matching 156

Working with Recursion 159

Performing tasks more than once 159

Understanding recursion 161

Using recursion on lists 162

Considering advanced recursive tasks 163

Passing functions instead of variables 164

Performing Functional Data Manipulation 165

Slicing and dicing 166

Mapping your data 167

Filtering data 168

Organizing data 169

Chapter 3: Working with Scalars, Vectors, and Matrices 171

Considering the Data Forms 172

Defining Data Type through Scalars 173

Creating Organized Data with Vectors 174

Defining a vector 175

Creating vectors of a specific type 175

Performing math on vectors 176

Performing logical and comparison tasks on vectors 176

Multiplying vectors 177

Creating and Using Matrices 178

Creating a matrix 178

Creating matrices of a specific type 179

Using the matrix class 181

Performing matrix multiplication 181

Executing advanced matrix operations 183

Extending Analysis to Tensors 185

Using Vectorization Effectively 186

Selecting and Shaping Data 187

Slicing rows 188

Slicing columns 188

Dicing 189

Concatenating 189

Aggregating 194

Working with Trees 195

Understanding the basics of trees 195

Building a tree 196

Representing Relations in a Graph 198

Going beyond trees 198

Arranging graphs 199

Chapter 4: Accessing Data in Files 201

Understanding Flat File Data Sources 202

Working with Positional Data Files 203

Accessing Data in CSV Files 205

Working with a simple CSV file 205

Making use of header information 208

Moving On to XML Files 209

Working with a simple XML file 209

Parsing XML 211

Using XPath for data extraction 212

Considering Other Flat-File Data Sources 214

Working with Nontext Data 215

Downloading Online Datasets 218

Working with package datasets 218

Using public domain datasets 219

Chapter 5: Working with a Relational DBMS 223

Considering RDBMS Issues 224

Defining the use of tables 225

Understanding keys and indexes 226

Using local versus online databases 227

Working in read-only mode 228

Accessing the RDBMS Data 228

Using the SQL language 229

Relying on scripts 231

Relying on views 231

Relying on functions 232

Creating a Dataset 233

Combining data from multiple tables 233

Ensuring data completeness 234

Slicing and dicing the data as needed 234

Mixing RDBMS Products 234

Chapter 6: Working with a NoSQL DMBS 237

Considering the Ramifications of Hierarchical Data 238

Understanding hierarchical organization 238

Developing strategies for freeform data 239

Performing an analysis 240

Working around dangling data 241

Accessing the Data 243

Creating a picture of the data form 243

Employing the correct transiting strategy 244

Ordering the data 247

Interacting with Data from NoSQL Databases 248

Working with Dictionaries 249

Developing Datasets from Hierarchical Data 250

Processing Hierarchical Data into Other Forms 251

Book 3: Manipulating Data Using Basic Algorithms 253

Chapter 1: Working with Linear Regression 255

Considering the History of Linear Regression 256

Combining Variables 257

Working through simple linear regression 257

Advancing to multiple linear regression 260

Considering which question to ask 262

Reducing independent variable complexity 263

Manipulating Categorical Variables 265

Creating categorical variables 266

Renaming levels 267

Combining levels 268

Using Linear Regression to Guess Numbers 269

Defining the family of linear models 270

Using more variables in a larger dataset 271

Understanding variable transformations 274

Doing variable transformations 275

Creating interactions between variables 277

Understanding limitations and problems 282

Learning One Example at a Time 283

Using Gradient Descent 283

Implementing Stochastic Gradient Descent 283

Considering the effects of regularization 287

Chapter 2: Moving Forward with Logistic Regression 289

Considering the History of Logistic Regression 290

Differentiating between Linear and Logistic Regression 291

Considering the model 291

Defining the logistic function 292

Understanding the problems that logistic regression solves 294

Fitting the curve 295

Considering a pass/fail example 296

Using Logistic Regression to Guess Classes 297

Applying logistic regression 297

Considering when classes are more 298

Defining logistic regression performance 300

Switching to Probabilities 301

Specifying a binary response 301

Transforming numeric estimates into probabilities 302

Working through Multiclass Regression 305

Understanding multiclass regression 305

Developing a multiclass regression implementation 306

Chapter 3: Predicting Outcomes Using Bayes 309

Understanding Bayes Theorem 310

Delving into Bayes history 310

Considering the basic theorem 312

Using Naïve Bayes for Predictions 313

Finding out that Naïve Bayes isnt so naïve 314

Predicting text classifications 315

Getting an overview of Bayesian inference 318

Working with Networked Bayes 324

Considering the network types and uses 324

Understanding Directed Acyclic Graphs (DAGs) 327

Employing networked Bayes in predictions 328

Deciding between automated and guided learning 332

Considering the Use of Bayesian Linear Regression 332

Considering the Use of Bayesian Logistic Regression 333

Chapter 4: Learning with K-Nearest Neighbors 335

Considering the History of K-Nearest Neighbors 336

Learning Lazily with K-Nearest Neighbors 337

Understanding the basis of KNN 337

Predicting after observing neighbors 338

Choosing the k parameter wisely 341

Leveraging the Correct k Parameter 342

Understanding the k parameter 342

Experimenting with a flexible algorithm 343

Implementing KNN Regression 345

Implementing KNN Classification 347

Book 4: Performing Advanced Data Manipulation 351

Chapter 1: Leveraging Ensembles of Learners 353

Leveraging Decision Trees 354

Growing a forest of trees 356

Seeing Random Forests in action 358

Understanding the importance measures 360

Configuring your system for importance measures with Python 361

Seeing importance measures in action 361

Working with Almost Random Guesses 364

Understanding the premise 365

Bagging predictors with AdaBoost 366

Meeting Again with Gradient Descent 369

Understanding the GBM difference 369

Seeing GBM in action 371

Averaging Different Predictors 372

Chapter 2: Building Deep Learning Models 373

Discovering the Incredible Perceptron 374

Understanding perceptron functionality 375

Touching the nonseparability limit 376

Hitting Complexity with Neural Networks 378

Considering the neuron 379

Pushing data with feed-forward 381

Defining hidden layers 383

Executing operations 384

Considering the details of data movement through the neural network 386

Using backpropagation to adjust learning 387

Understanding More about Neural Networks 390

Getting an overview of the neural network process 391

Defining the basic architecture 391

Documenting the essential modules 393

Solving a simple problem 396

Looking Under the Hood of Neural Networks 399

Choosing the right activation function 399

Relying on a smart optimizer 401

Setting a working learning rate 402

Explaining Deep Learning Differences with Other Forms of AI 402

Adding more layers 403

Changing the activations 405

Adding regularization by dropout 406

Using online learning 407

Transferring learning 407

Learning end to end 408

Chapter 3: Recognizing Images with CNNs 409

Beginning with Simple Image Recognition 410

Considering the ramifications of sight 410

Working with a set of images 411

Extracting visual features 417

Recognizing faces using Eigenfaces 419

Classifying images 423

Understanding CNN Image Basics 427

Moving to CNNs with Character Recognition 429

Accessing the dataset 430

Reshaping the dataset 431

Encoding the categories 432

Defining the model 432

Using the model 433

Explaining How Convolutions Work 435

Understanding convolutions 435

Simplifying the use of pooling 439

Describing the LeNet architecture 440

Detecting Edges and Shapes from Images 446

Visualizing convolutions 447

Unveiling successful architectures 449

Discussing transfer learning 450

Chapter 4: Processing Text and Other Sequences 453

Introducing Natural Language Processing 454

Defining the human perspective as it relates to data science 454

Considering the computer perspective as it relates to data science 455

Understanding How Machines Read 456

Creating a corpus 457

Performing feature extraction 457

Understanding the BoW 458

Processing and enhancing text 459

Maintaining order using n-grams 461

Stemming and removing stop words 462

Scraping textual datasets from the web 465

Handling problems with raw text 470

Storing processed text data in sparse matrices 473

Understanding Semantics Using Word Embeddings 478

Using Scoring and Classification 482

Performing classification tasks 482

Analyzing reviews from e-commerce 485

Book 5: Performing Data-Related Tasks 491

Chapter 1: Making Recommendations 493

Realizing the Recommendation Revolution 494

Downloading Rating Data 495

Navigating through anonymous web data 496

Encountering the limits of rating data 499

Leveraging SVD 506

Considering the origins of SVD 506

Understanding the SVD connection 508

Chapter 2: Performing Complex Classifications 509

Using Image Classification Challenges 510

Delving into ImageNet and Coco 511

Learning the magic of data augmentation 513

Distinguishing Traffic Signs 516

Preparing the image data 517

Running a classification task 520

Chapter 3: Identifying Objects 525

Distinguishing Classification Tasks 526

Understanding the problem 526

Performing localization 527

Classifying multiple objects 528

Annotating multiple objects in images 529

Segmenting images 530

Perceiving Objects in Their Surroundings 531

Considering vision needs in self-driving cars 531

Discovering how RetinaNet works 532

Using the Keras-RetinaNet code 534

Overcoming Adversarial Attacks on Deep Learning Applications 538

Tricking pixels 539

Hacking with stickers and other artifacts 541

Chapter 4: Analyzing Music and Video543

Learning to Imitate Art and Life 544

Transferring an artistic style 545

Reducing the problem to statistics 546

Understanding that deep learning doesnt create 548

Mimicking an Artist 548

Defining a new piece based on a single artist 549

Combining styles to create new art 550

Visualizing how neural networks dream 551

Using a network to compose music 551

Other creative avenues 552

Moving toward GANs 553

Finding the key in the competition 554

Considering a growing field 556

Chapter 5: Considering Other Task Types 559

Processing Language in Texts 560

Considering the processing methodologies 560

Defining understanding as tokenization 561

Putting all the documents into a bag 562

Using AI for sentiment analysis 566

Processing Time Series 574

Defining sequences of events 574

Performing a prediction using LSTM 575

Chapter 6: Developing Impressive Charts and Plots 579

Starting a Graph, Chart, or Plot 580

Understanding the differences between graphs, charts, and plots 580

Considering the graph, chart, and plot types 582

Defining the plot 583

Drawing multiple lines 584

Drawing multiple plots 584

Saving your work 586

Setting the Axis, Ticks, and Grids 587

Getting the axis 587

Formatting the ticks 590

Adding grids 590

Defining the Line Appearance 591

Working with line styles 592

Adding markers 593

Using Labels, Annotations, and Legends 594

Adding labels 595

Annotating the chart 596

Creating a legend 598

Creating Scatterplots 599

Depicting groups 599

Showing correlations 600

Plotting Time Series 603

Representing time on axes 604

Plotting trends over time 605

Plotting Geographical Data 608

Getting the toolkit 608

Drawing the map 609

Plotting the data 613

Visualizing Graphs 615

Understanding the adjacency matrix 615

Using NetworkX basics 615

Book 6: Diagnosing and Fixing Errors 619

Chapter 1: Locating Errors in Your Data 621

Considering the Types of Data Errors 622

Obtaining the Required Data 624

Considering the data sources 624

Obtaining reliable data 625

Making human input more reliable 626

Using automated data collection 628

Validating Your Data 629

Figuring out whats in your data 629

Removing duplicates 631

Creating a data map and a data plan 632

Manicuring the Data 634

Dealing with missing data 634

Considering data misalignments 639

Separating out useful data 640

Dealing with Dates in Your Data 640

Formatting date and time values 641

Using the right time transformation 641

Chapter 2: Considering Outrageous Outcomes 643

Deciding What Outrageous Means 644

Considering the Five Mistruths in Data 645

Commission 645

Omission 646

Perspective 646

Bias 647

Frame-of-reference 648

Considering Detection of Outliers 649

Understanding outlier basics 649

Finding more things that can go wrong 651

Understanding anomalies and novel data 651

Examining a Simple Univariate Method 653

Using the pandas package 653

Leveraging the Gaussian distribution 655

Making assumptions and checking out 656

Developing a Multivariate Approach 657

Using principle component analysis 658

Using cluster analysis 659

Automating outliers detection with Isolation Forests 661

Chapter 3: Dealing with Model Overfitting and Underfitting 663

Understanding the Causes 664

Considering the problem 664

Looking at underfitting 665

Looking at overfitting 666

Plotting learning curves for insights 668

Determining the Sources of Overfitting and Underfitting 670

Understanding bias and variance 671

Having insufficient data 671

Being fooled by data leakage 672

Guessing the Right Features 672

Selecting variables like a pro 673

Using nonlinear transformations 676

Regularizing linear models 684

Chapter 4: Obtaining the Correct Output Presentation 689

Considering the Meaning of Correct 690

Determining a Presentation Type 691

Considering the audience 691

Defining a depth of detail 692

Ensuring that the data is consistent with audience needs 693

Understanding timeliness 693

Choosing the Right Graph 694

Telling a story with your graphs 694

Showing parts of a whole with pie charts 694

Creating comparisons with bar charts 695

Showing distributions using histograms 697

Depicting groups using boxplots 699

Defining a data flow using line graphs 700

Seeing data patterns using scatterplots 701

Working with External Data 702

Embedding plots and other images 703

Loading examples from online sites 703

Obtaining online graphics and multimedia 704

Chapter 5: Developing Consistent Strategies 707

Standardizing Data Collection Techniques 707

Using Reliable Sources 709

Verifying Dynamic Data Sources 711

Considering the problem 712

Analyzing streams with the right recipe 714

Looking for New Data Collection Trends 715

Weeding Old Data 716

Considering the Need for Randomness 717

Considering why randomization is needed 718

Understanding how probability works 718

Index 721

Informationen zu E-Books

„E-Book“ steht für digitales Buch. Um diese Art von Büchern lesen zu können wird entweder eine spezielle Software für Computer, Tablets und Smartphones oder ein E-Book Reader benötigt. Da viele verschiedene Formate (Dateien) für E-Books existieren, gilt es dabei, einiges zu beachten.
Von uns werden digitale Bücher in drei Formaten ausgeliefert. Die Formate sind EPUB mit DRM (Digital Rights Management), EPUB ohne DRM und PDF. Bei den Formaten PDF und EPUB ohne DRM müssen Sie lediglich prüfen, ob Ihr E-Book Reader kompatibel ist. Wenn ein Format mit DRM genutzt wird, besteht zusätzlich die Notwendigkeit, dass Sie einen kostenlosen Adobe® Digital Editions Account besitzen. Wenn Sie ein E-Book, das Adobe® Digital Editions benötigt herunterladen, erhalten Sie eine ASCM-Datei, die zu Digital Editions hinzugefügt und mit Ihrem Account verknüpft werden muss. Einige E-Book Reader (zum Beispiel PocketBook Touch) unterstützen auch das direkte Eingeben der Login-Daten des Adobe Accounts – somit können diese ASCM-Dateien direkt auf das betreffende Gerät kopiert werden.
Da E-Books nur für eine begrenzte Zeit – in der Regel 6 Monate – herunterladbar sind, sollten Sie stets eine Sicherheitskopie auf einem Dauerspeicher (Festplatte, USB-Stick oder CD) vorsehen. Auch ist die Menge der Downloads auf maximal 5 begrenzt.