/cse283-fp

CSE283 Final Project. Kunal and Mike

Primary LanguagePython

cse283-fp

CSE283 Final Project. Kunal and Mike

Test text

The GTF file all_tf.gtf was retrieved from the UCSC table browser by using table refGene, output format GTF, and input accessions all_tf.txt (generated by format_TFs.sh). all_tf.txt is a list of all TF names from List_of_TFs.txt. Note that 3 out of the 2223 TF names in all_tf.txt could not be retrieved by UCSC for all_tf.gtf. al_tf.gtf is passed to Cufflinks to identify differentially expressed genes.

The 2-column table id_2_name.txt maps a transcript ID, as listed in the GTF all_tf.gtf, to its gene symbol name. This was retrieved from the UCSC table browser by using table refGene, input accessions all_tf.txt, and manually selecting the 'name' and 'name2' fields.

encode_antibodies.html is from http://genome.ucsc.edu/cgi-bin/hgEncodeVocab?ra=encode/cv.ra&type=Antibody&bgcolor=FFFEE8

wget "http://genome.ucsc.edu/cgi-bin/hgFileSearch?hgsid=276643515&db=hg19&hgt_tsDelRow=&hgt_tsAddRow=&tsName=&tsDescr=&tsGroup=Any&fsFileType=Any&hgt_mdbVar1=dataType&hgt_mdbVal1=ChipSeq&hgt_mdbVar2=cell&hgt_mdbVal2=GM12878&hgt_mdbVal2=H1-hESC&hgt_mdbVal2=HeLa-S3&hgt_mdbVal2=HepG2&hgt_mdbVal2=K562&hgt_mdbVar3=view&hgt_mdbVal3=Peaks&hgfs_Search=search" -O all_GM12878_K562_H1-HESC_HEP-G2_HELA-S3.html