Data science is an interdisciplinary field that combines programming, statistics, and domain expertise to extract insights and knowledge from data. It is a cornerstone of modern industry and research, driving innovation in areas from business intelligence to scientific discovery.

Proficiency in a dedicated ecosystem of programming tools is essential for efficiently storing, cleaning, analyzing, and visualizing data, making these skills highly valuable for researchers, analysts, and developers.

About the book

*Python Data Science Handbook* is a comprehensive desk reference that covers the core Python tools used for data science. It is intended for working scientists, data analysts, and developers who are already familiar with reading and writing Python code and wish to apply it to data-intensive tasks. The book provides clear, practical examples for tackling day-to-day issues in data manipulation, transformation, cleaning, visualization, and model building.

What you will learn

Readers will gain practical knowledge of the essential Python libraries that form the data science stack. This includes learning how to use IPython and Jupyter as interactive computational environments. You will understand how to use NumPy for efficient numerical array operations, pandas for manipulating labeled and columnar data in DataFrames, and Matplotlib for creating a wide range of data visualizations. Finally, the book covers how to implement key machine learning algorithms using Scikit-Learn.

Preface. Part I. Jupyter: Beyond Normal Python

Getting Started in IPython and Jupyter.
Enhanced Interactive Features.
Debugging and Profiling.

Part II. Introduction to NumPy

Understanding Data Types in Python.
The Basics of NumPy Arrays.
Computation on NumPy Arrays: Universal Functions.
Aggregations: min, max, and Everything in Between.
Computation on Arrays: Broadcasting.
Comparisons, Masks, and Boolean Logic.
Fancy Indexing.
Sorting Arrays.
Structured Data: NumPy’s Structured Arrays.

Part III. Data Manipulation with Pandas

Introducing Pandas Objects.
Data Indexing and Selection.
Operating on Data in Pandas.
Handling Missing Data.
Hierarchical Indexing.
Combining Datasets: concat and append.
Combining Datasets: merge and join.
Aggregation and Grouping.
Pivot Tables.

Book details

Title: Python Data Science Handbook
Author(s): Jake VanderPlas
Main category: Data Science
Subcategory: Python
Language: English
License: Creative Commons Attribution-Noncommercial-No Derivatives 4.0 International Public License

More books in: Data Science, Python

Legal notice: This book is shared for educational purposes only. The content is distributed under Creative Commons licenses or with explicit permission from the author. FreeProgrammingBooks may host files that comply with their respective licenses.

About the book

What you will learn

Table of contents

Book details