/python-polars-the-definitive-guide

Scripts and datasets for the O'Reilly book Python Polars: The Definitive Guide

Primary LanguageJupyter NotebookMIT LicenseMIT

Python Polars: The Definitive Guide

License: MIT     Polars Discord server

Welcome to the official repository of the book Python Polars: The Definitive Guide by Jeroen Janssens and Thijs Nieuwdorp. The book is still being written and is scheduled to be published by O'Reilly in November 2024.


Description

Get ready to speed up your data analysis and start working with larger-than-memory datasets. Polars offers a blazingly fast, multi-threaded, elegant API for data loading, manipulation, and processing. Authors Jeroen Janssens and Thijs Nieuwdorp walk you through every aspect of Python Polars as they tackle practical use cases using real-world datasets. You’ll not only learn the syntax, but also understand the underlying concepts. You don’t need to have any experience with Pandas or Spark, but if you do, this book will help you make a smooth transition.

With this definitive guide at your side, you’ll be able to:

  • Process larger-than-memory datasets at record speed
  • Apply the eager, lazy, and streaming APIs of Polars and decide when to use which
  • Transition smoothly from Pandas or Spark to Polars
  • Integrate Polars into your existing codebase
  • Work with Arrow and Parquet to efficiently read and write data
  • Translate complex ETL tasks into efficient and elegant queries

Outline

Note that this outline is subject to change.

Front matter

  • Foreword by Ritchie Vink, creator of Polars
  • Acknowledgements

Part 1: Begin

  1. Introducing Polars
  2. First Steps
  3. Transitioning from Pandas to Polars
  4. Transitioning from Spark to Polars

Part 2: Load

  1. Data Types and Data Structures
  2. Eager and Lazy APIs
  3. Reading and Writing Data

Part 3: Express

  1. Beginning Expressions
  2. Continuing Expressions
  3. Combining Expressions

Part 4: Transform

  1. Selecting and Creating Columns
  2. Filtering and Sorting Rows
  3. Working with Special Data Types
  4. Summarizing and Aggregating
  5. Joining and Concatenating
  6. Reshaping

Part 5: Advance

  1. Creating Visualizations
  2. Extending Polars
  3. SQL with Polars
  4. Debugging and Testing with Polars
  5. Polars Internals
  6. Integrating with Other Tools