/pyFastq

Simple python 2.7 librarie to parse fastq files and handle illumina 1.8+ fastq sequences

Primary LanguagePython

pyFastq 0.1

Simple python 2.7 libraries to parse fastq files and handle illumina 1.8+ fastq sequences

Creation : 2015/04/02

Last update : 2015/04/02

FastqReader

Parsing function that reads a fastq file and generates an iterator of FastqSeq object. When the file is empty the generator raise a StopIteration exception indicating the number of valid sequence parsed. If a fastq sequence is invalid, this sequence is skipped. Any part of the sequence name following a blank space will be removed

FastqSeq : Simple object representing a Fastq sequence

FastqSeq is a simple python object class generating a object representing a Fastq sequence. The object is initialized with:

  • A name for the sequence without @
  • A DNA sequence as a string
  • A quality score, as an illumina 1.8+ encoded quality string, a numpy.ndarray of integers in Phred +33 or python list of integers in Phred +33
  • Eventually a short text description

The DNA sequence and the quality score must have the same size otherwise an assertion error is raised

After creation the object has the following fields:

  • name = name of the sequence without @.
  • seq = The DNA sequence of the fastq sequence store as a simple string.
  • qual = An numpy integer ndarray representing the Phred Quality of bases (support all np.array methods)
  • descr = A description of the fastq sequence.
  • qualstr = A string of letters corresponding to the sequence quality in illumina 1.8+ Phred 33 encoding
  • fastqstr = The field "descr" will be included in the output fastq sequence name after a space if present

The object support slicing([0:10]), concatenation(seq1+seq2) and the len method

Authors and Contact

Adrien Leger - 2015