/pfilter

Command-line tool that filters out a CSV dataset using lower and upper percentiles

pfilter

EO principles respected here DevOps By Rultor.com We recommend IntelliJ IDEA

pfilter is a command-line tool for filtering your CSV dataset by percentiles.

Motivation. During the work on CaM project, we were required to filter out too small and too big GitHub repositories by number of files. No readily available command-line tool existed that could perform that function, so we created pfilter.

How to use

First, pull it from PyPI like this:

pip install pfilter

Now, execute it with the following flags:

pfilter --csv=foo.csv --c=age --lower=0.05 --upper=0.95 --o=filtered.csv

Where, --csv is a path to your source CSV file, --c is a column to filter by, --lower is a lower percentile (max is 1, so 0.05 is a 5th percentile, or P5 for short), --upper is an upper percentile (max is 1, so 0.95 is a 95th percentile, or P95 for short), and --o is a location for the output, filtered dataset.

How to contribute

Fork repository, make changes, send us a pull request. We will review your changes and apply them to the master branch shortly, provided they don't violate our quality standards. To avoid frustration, before sending us your pull request please run full build:

poetry build

You will need Python 3.11+ installed.