To Do List
sagnikbanerjee15 opened this issue · 0 comments
sagnikbanerjee15 commented
- Write up Dockerfile that will have
abridge
,samtools
,zpaq
, andfclqc
- Rewrite the
abridge
script to call the underlying software directly. No need to use docker and/or singularity. - Remove all occurrences of "informative CIGAR" and rename those to "integrated CIGAR"
- Add options to calculate the space saved from the program for each SAM field. Report this in the log file
- Develop a modular approach to compression and decompression. This will be necessary for troubleshooting and also for incorporating enhancements in the future
- #4
- Create a single program to compress both single and paired-ended data. Similarly, create one program to decompress both single and paired-end data
- Store more information on the first line of the compressed file in addition to the flags. For example the endedness of the data
- Add comments for each function
- Add more functions and decide if you wish to make those inline
- WAF to convert numeric data to a string. Use type-casting while calling the function. Write separate functions for signed and unsigned numbers
- Similarly, create functions for converting strings to numbers
- Examine the code to read directly from BAM files
- Optimize memory allocations
- Read directly from a BAM file - https://www.biostars.org/p/44424/, https://stackoverflow.com/questions/52915853/how-to-build-a-simple-main-cpp-file-using-samtools-c-api, https://samtools.sourceforge.net/sam-exam.shtml
- Incorporate SAMBAMBA & BAM in the comparison. Also, compare with different ranges of compression levels
- Perform tests with SAM/BAM files that contain CIGAR without mismatch indicators and also CIGAR with mismatch indicators
- Compile the rust code and check if it could be made faster with the C compiler
- Consider removing the section where a multi-line fasta file is generated. Instead, modify the code snippet to read from multi-line fasta
- Prepare the CWL workflow for carrying out all comparisons. Write a single workflow for both RNA-Seq and DNA-Seq reads
- Write a launcher for processing all the samples
- Write CWL scripts for the following software:
- Deez
- Samcomp
- CSAM
- Samtools (bam & CRAM)
- Genozip2
- Remove the adjustment done to quality scores since in this version those will never be stored with the iCIGAR
- Adjust the MAPQ value. Store X in place of 255 but check if substantial space reduction can be achieved
- While generating BAM and CRAM files for comparison, retain only the relevant tags - do not store everything
- Add spring to the compressor list in place of zpaq