author: Ji Huang
date: 2019-04-09
last modified date: 2020-09-03
This is my dataset repository that are publicly available. Most of these are tables that I constantly refer to. If interested, you can used readr package to retrieve it from Github.
To read a table in R directly, Copy link address by right clicking the Download button. Then readr::read_tsv(coply_link)
in R.
PS: FTP links are not rendered correctly in Github. Please go to the md
file to find out.
- RAPDB_MSU_ID_conversion_20190411.txt.bz2. For convert rice gene IDs from RAPDB to MSU7 and vice versa. Refer to my post on how I got this table.
readr::read_tsv("https://github.com/timedreamer/public_dataset/raw/master/RAPDB_MSU_ID_conversion_20190411.txt.gz")
rapdb | msu7 |
---|---|
Os01g0100100 | LOC_Os01g01010 |
Os01g0100200 | LOC_Os01g01019 |
- rice_annotation_rapdb_msu7_20190412.txt.bz2. This table includes rice gene annotation from RAPDB and MSU. RAPDB annotation was downloaded from Gene annotation information in tab-delimited text format; MSU7 annotation was from its website. I kept some useful columns and renamed the column names. There are 5339 RAPDB genes have more than one transcripts. All columns are from RAPDB except
msu7
andmsu7_annotation
.
readr::read_tsv("https://github.com/timedreamer/public_dataset/raw/master/rice_annotation_rapdb_msu7_20190412.txt.gz")
rapdb | transcript | description | msu7 | msu7_annotation | oryzabase_synonym | oryzabase_name | transcript_evidence | orf_evidence | flcDNA_cloneID |
---|---|---|---|---|---|---|---|---|---|
Os01g0100100 | Os01t0100100-01 | RabGAP/TBC domain containing protein. | LOC_Os01g01010 | TBC domain containing protein, expressed | NA | NA | AK242339 (DDBJ, antisense transcript) | Q655M0 (UniProt) | J075199P03 |
Os01g0100100 | Os01t0100100-01 | RabGAP/TBC domain containing protein. | LOC_Os01g01010 | TBC domain containing protein, expressed | NA | NA | AK242339 (DDBJ, antisense transcript) | Q655M0 (UniProt) | J075199P03 |
- ptfdb_maizeTF_list_orgainzed_v4.txt.gz. This is a list of Transcriptional Factors from Plant TFDB. The original IDs was v3. I added the v4 IDs.
readr::read_tsv("https://github.com/timedreamer/public_dataset/raw/master/ptfdb_maizeTF_list_orgainzed_v4.txt.gz")
v3id | type | v4id |
---|---|---|
AC149475.2_FG005 | C2H2 | Zm00001d048404 |
AC149818.2_FG008 | C2H2 | Zm00001d048400 |
AC149818.2_FG009 | LBD | Zm00001d048401 |
-
maize_v3Tov4_function.tsv.gz. This table has both maize v3 to v4 id mapping and v4 functions. Both came from GRAMENE and I combined them together.
-
B73v4.gene_function.txt. Contains maize [gene short description] based on Gramene, ftp link here.
-
maize.v3TOv4.geneIDhistory.txt. Contains maize gene version 3 to version 4 conversion, ftp link here.
-
readr::read_tsv("https://github.com/timedreamer/public_dataset/raw/master/maize_v3Tov4_function.tsv.gz")
v3id | v4id | changes | method | type | annotation | source |
---|---|---|---|---|---|---|
AC148152.3_FG001 | Zm00001d007725 | No_change_in_genomic_sequence | Gene_Tree/Direct_mapping | 1-to-1 | Ankyrin repeat family protein | [source:homolog] |
- maizeTF_grassius_v4id_20190904.tsv.gz. This table has both the v3 id and v4 id for Maize Grassius TF plasmids.
Plate Address | Stock number | GenBank accession | Gene model | Transcript | Template | type | v4id |
---|---|---|---|---|---|---|---|
OSU_P_1_A1 | pUT4010 | KJ727026 | GRMZM2G122614 | GRMZM2G122614_T01 | Synthetic | ARF | Zm00001d003011 |
OSU_P_1_B1 | pUT4013 | KJ727027 | GRMZM2G121111 | GRMZM2G121111_T01 | Synthetic | MYB_related | Zm00001d024809 |
- Ath_TF_list.txt. This is the Arabidopsis TF genes based on PlantTFDB. This file was downloaded on 2020-02-06.
readr::read_tsv("https://raw.githubusercontent.com/timedreamer/public_dataset/master/Ath_TF_list.txt")
TF_ID | Gene_ID | Family |
---|---|---|
AT3G25730.1 | AT3G25730 | RAV |
AT1G68840.1 | AT1G68840 | RAV |
AT1G68840.2 | AT1G68840 | RAV |
- ptfdb-grassius_maizeTF_list_orgainzed_v4.txt. This is the combined maize TF list from PlantTFDB and Grassius.
readr::read_tsv("https://raw.githubusercontent.com/timedreamer/public_dataset/master/ptfdb-grassius_maizeTF_list_orgainzed_v4.txt")
v3id | name | type | v4id |
---|---|---|---|
GRMZM2G048582 | ZmNLP17 | Nin-like | Zm00001d006293 |
GRMZM2G130374 | ZmWRKY3 | WRKY | Zm00001d030969 |
GRMZM2G398506 | ZmWRKY1 | WRKY | Zm00001d021947 |
- ptfdb_Osj_TF_list_wRAPDB.tsv. The rice TF list from PlantTFDB and then converted to RAPDB IDs.
readr::read_tsv("https://raw.githubusercontent.com/timedreamer/public_dataset/master/ptfdb_Osj_TF_list_wRAPDB.tsv")
TF_ID | Gene_ID | Family | rapdb |
---|---|---|---|
LOC_Os01g04750.1 | LOC_Os01g04750 | RAV | Os01g0140700 |
LOC_Os01g04800.1 | LOC_Os01g04800 | RAV | Os01g0141000 |
LOC_Os05g47650.1 | LOC_Os05g47650 | RAV | Os05g0549800 |
- maize.B73.AGPv4.aggregate.gaf.gz. The maize GAMER-GO annotation. It
contains
GENE -- GO_ID
mapping. The orginial file was downloaded from MaizeGDB FTP.
readr::read_tsv("https://github.com/timedreamer/public_dataset/raw/master/maize.B73.AGPv4.aggregate.gaf.gz", skip=1)
# for use with `clusterProfiler`, you just need two columns.
zmaGO <- zmaGO %>% select(term_accession, db_object_id)
- agriGOv2_GOConsortium_term_v201608.txt.gz. The
GO_ID -- GO_annotation
mapping. The file was downloaded from AgriGOv2.
readr::read_tsv("https://github.com/timedreamer/public_dataset/raw/master/agriGOv2_GOConsortium_term_v201608.txt.gz", col_names = c("GO","type","name","number")) %>% select(GO, name)