/bed-reader

A library for easy, fast, and efficient reading & writing of PLINK Bed files

Primary LanguageRustOtherNOASSERTION

PyPI version Build Status PyPI

Read and write the PLINK BED format, simply and efficiently.

This is the Python README. For Rust, see README-rust.md.

Highlights

  • Fast multi-threaded Rust engine.
  • Supports all Python indexing methods. Slice data by individuals (samples) and/or SNPs (variants).
  • Used by PySnpTools, FaST-LMM, and PyStatGen.
  • Supports PLINK 1.9.
  • Read data locally or from the cloud, efficiently and directly.

Install

Full version: With all optional dependencies:

pip install bed-reader[samples,sparse]

Minimal version: Depends only on numpy:

pip install bed-reader

Usage

Read genomic data from a .bed file.

>>> import numpy as np
>>> from bed_reader import open_bed, sample_file
>>>
>>> file_name = sample_file("small.bed")
>>> bed = open_bed(file_name)
>>> val = bed.read()
>>> print(val)
[[ 1.  0. nan  0.]
 [ 2.  0. nan  2.]
 [ 0.  1.  2.  0.]]
>>> del bed

Read every second individual and SNPs (variants) from 20 to 30.

>>> file_name2 = sample_file("some_missing.bed")
>>> bed2 = open_bed(file_name2)
>>> val2 = bed2.read(index=np.s_[::2,20:30])
>>> print(val2.shape)
(50, 10)
>>> del bed2

List the first 5 individual (sample) ids, the first 5 SNP (variant) ids, and every unique chromosome. Then, read every genomic value in chromosome 5.

>>> with open_bed(file_name2) as bed3:
...     print(bed3.iid[:5])
...     print(bed3.sid[:5])
...     print(np.unique(bed3.chromosome))
...     val3 = bed3.read(index=np.s_[:,bed3.chromosome=='5'])
...     print(val3.shape)
['iid_0' 'iid_1' 'iid_2' 'iid_3' 'iid_4']
['sid_0' 'sid_1' 'sid_2' 'sid_3' 'sid_4']
['1' '10' '11' '12' '13' '14' '15' '16' '17' '18' '19' '2' '20' '21' '22'
 '3' '4' '5' '6' '7' '8' '9']
(100, 6)

From the cloud: open a file and read data for one SNP (variant) at index position 2.

>>> with open_bed("https://raw.githubusercontent.com/fastlmm/bed-sample-files/main/small.bed") as bed:
...     val = bed.read(index=np.s_[:,2], dtype="float64")
...     print(val)
[[nan]
 [nan]
 [ 2.]]

Project Links