/unilist

📑 Loads any newline-separated file as a generator.

Primary LanguagePythonMIT LicenseMIT

Unilist

📑 Load any newline-separated file as a generator.

Currently supporting files located in local filesystem, HTTP(s) endpoint, S3 URI and plaintext, CSV and JSONL formats. Alternatively, you can setup your own virtual URLs.

Install

Install and update using pip:

pip install unilist

Usage

from unilist import Unilist

lines = list(Unilist('./file.txt')))
print(lines)

csv = list(Unilist('https://example.com/file.csv'))
print(csv)

# requires Unilist.setup({ ... })
# or /usr/local/bin/aws
records = list(Unilist('s3://example/file.jsonl.gz'))
print(records)

S3 setup

boto3

If you don't mind extra dependency (boto3), install with

pip install unilist[boto3]

Example setup

Unilist.setup({
    's3': {
      'aws_access_key': '___your_access_key___',
      'aws_secret_access_key': '___your_secret_key___',
    },
})

awscli

Alternatively, you can provide a path to aws binary.

Unilist.setup({
  's3': {
    'aws_bin': '/usr/local/bin/aws'
  }
})

Integration

pandas

import pandas as pd
df = pd.DataFrame(Unilist('vfs://path/to/file.jsonl'))

Configuration

Unilist.setup({
    's3': {
      'cache_dir': '/tmp',
      'aws_access_key': '___your_access_key___',
      'aws_secret_access_key': '___your_secret_key___',
    },
    'virtual': {
      'vfs': '/custom/root/path',
      'c4': './local/c4',
    },
    'http': {
      'headers': {
        'accept': 'text/plain',
      },
      'encoding': 'utf-8',
    },
    'jsonl': {},
})

Development

Install from source

git clone git@github.com:petlack/unilist.git
pip install -e .
pip install -e .[boto3]

Run tests

pipenv run pytest

Meta

CONTRIBUTING

LICENSE (MIT)