/python-api-inspect

Statistics to better understand how python is used and written

Primary LanguagePython

SciPy 2019 Lightning Talk

Motivation

This is a package with a goal to provide statistics to better understand how python is used and written.

A package maintainer might ask:

  • Can certain functions be depreciated?
  • How are my users using my package in tests vs. source vs. notebooks?
  • What should I include in tutorials?
  • Are new features being adopted?

Python Core Maintainers might ask:

  • What are the most and least used stdlib modules?
  • Is the community moving away from one module?
  • Lets educate PEPs with actual statistics!

This work exposes a sqlite queryable web api via datasette.

NOTE: this dataset is currently extremely biased as we are parsing the top 4,000 repositories for few scientific libraries in data/whitelist. This is not a representative sample of the python ecosystem nor the entire scientific python ecosystem. Further work is needed to make this dataset less biased.

Interesting Questions

As with any project that provides large datasets interpretation is even more important than the data itself. Here we provide some guiding questions.

Workflow

This is a package with components that expose a sqlite database via datasette. Originally this package provided csv files with api usage statistics for packages. The problem is that this cannot anticipate all the questions that users may have. Thus we have a sql interface to ask custom questions on the (currently) 6 GB database.

The scripts involved in this work.

  1. Assemble list of important repositories/projects that depend on libraries such as numpy, scipy, requests, tensorflow, etc. This work would not be possible without libraries.io scripts/librariesio.sh
  2. Construct database by inspecting source code and ast of every python file and notebook in repositories. scripts/inspect.sh
  3. Expose sqlite database via datasette scripts/serve.sh

Tests

The tests depend on pytest. The tests are a great demostration of what python-api-inspect can capture.

pytest