pinellolab/dictys

ValueError: <10 TFs found in motifs.motif file.

Closed this issue · 3 comments

Hello. I'm currently working with a mouse dataset and following the steps outlined in the short multiome tutorial notebook. However, when I run dictys_helper makefile_check.py, the output shows that "Found 0 TFs in current dataset." The motifs.motif file that I am using is from https://hocomoco11.autosome.org/final_bundle/hocomoco11/full/MOUSE/mono/HOCOMOCOv11_full_MOUSE_mono_homer_format_0.0001.motif. Any help would be much appreciated. The full output is below:

(dictys) [kurella@d05-28 yAL]$ dictys_helper makefile_check.py
/home1/kurella/.conda/envs/dictys/lib/python3.9/site-packages/dictys/scripts/helper/makefile_check.py:16: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
Joint profile: True
Found 4185 cells with RNA profile
Found 32123 genes with RNA profile
Found 4185 cells with ATAC profile
Found 529 motifs
Found 451 TFs
Found 0 TFs in current dataset
Missing 451 TFs in current dataset: AHR,AIRE,ALX1,ANDR,AP2A,AP2B,AP2C,AP2D,ARI3A,ARI5B,ARNT,ARNT2,ASCL1,ASCL2,ATF1,ATF2,ATF3,ATF4,ATF7,ATOH1,BACH1,BACH2,BARX1,BARX2,BATF,BATF3,BCL6,BHA15,BHE40,BHE41,BMAL1,BRAC,BRCA1,CDC5L,CDX1,CDX2,CEBPA,CEBPB,CEBPD,CEBPE,CEBPG,CEBPZ,CLOCK,COE1,COT1,COT2,CREB1,CREM,CRX,CTCF,CTCFL,CUX1,CUX2,CXXC1,DBP,DDIT3,DLX2,DLX3,DLX5,DMRT1,DMRTB,E2F1,E2F2,E2F3,E2F4,E2F5,E2F6,E2F7,E4F1,EGR1,EGR2,EGR3,EGR4,EHF,ELF1,ELF2,ELF3,ELF5,ELK1,ELK3,ELK4,EPAS1,ERG,ERR1,ERR2,ERR3,ESR1,ESR2,ETS1,ETS2,ETV2,ETV4,ETV5,ETV6,EVI1,FEV,FLI1,FOS,FOSB,FOSL1,FOSL2,FOXA1,FOXA2,FOXA3,FOXC1,FOXC2,FOXD1,FOXD3,FOXF1,FOXF2,FOXI1,FOXJ2,FOXJ3,FOXK1,FOXL2,FOXM1,FOXO1,FOXO3,FOXO4,FOXP2,FOXP3,FOXQ1,FUBP1,GABPA,GATA1,GATA2,GATA3,GATA4,GATA5,GATA6,GCM1,GCR,GFI1,GFI1B,GLI1,GLI2,GLI3,GLIS3,GRHL2,HAND1,HBP1,HEN1,HES1,HESX1,HEY2,HIC1,HIF1A,HINFP,HLF,HLTF,HMGA1,HMGA2,HNF1A,HNF1B,HNF4A,HNF4G,HNF6,HSF1,HSF2,HTF4,HXA1,HXA10,HXA13,HXA5,HXA7,HXA9,HXB1,HXB4,HXB6,HXB7,HXB8,HXC6,HXC8,HXC9,HXD10,HXD13,HXD4,HXD9,IKZF1,INSM1,IRF1,IRF2,IRF3,IRF4,IRF5,IRF7,IRF8,IRF9,ISL1,ITF2,JUN,JUNB,JUND,KAISO,KLF1,KLF15,KLF3,KLF4,KLF5,KLF6,KLF8,LEF1,LHX2,LHX3,LHX6,LYL1,MAF,MAFA,MAFB,MAFF,MAFG,MAFK,MAX,MAZ,MBD2,MCR,MECP2,MEF2A,MEF2C,MEF2D,MEIS1,MEIS2,MITF,MLXPL,MSGN1,MSX2,MSX3,MTF1,MXI1,MYB,MYBA,MYBB,MYC,MYCN,MYF5,MYF6,MYOD1,MYOG,NANOG,NDF1,NDF2,NF2L1,NF2L2,NFAC1,NFAC2,NFAC3,NFAC4,NFAT5,NFE2,NFIA,NFIB,NFIC,NFIL3,NFKB1,NFKB2,NFYA,NFYB,NFYC,NGN2,NKX21,NKX22,NKX25,NKX28,NKX31,NKX32,NKX61,NOBOX,NR0B1,NR1D1,NR1D2,NR1H2,NR1H3,NR1H4,NR1I2,NR1I3,NR2C1,NR2C2,NR2E3,NR2F6,NR4A1,NR4A2,NR4A3,NR5A2,NR6A1,NRF1,OLIG2,ONEC2,OTX1,OTX2,OVOL1,OVOL2,P53,P63,P73,PAX2,PAX3,PAX5,PAX6,PAX8,PBX1,PBX2,PBX3,PDX1,PEBB,PIT1,PITX1,PITX2,PKNX1,PLAG1,PO2F1,PO2F2,PO3F1,PO3F2,PO4F2,PO5F1,PO6F1,PPARA,PPARD,PPARG,PRD14,PRD16,PRDM1,PRDM5,PRDM9,PRGR,PROP1,PRRX1,PRRX2,PTF1A,PURA,RARA,RARB,RARG,REL,RELB,REST,RFX1,RFX2,RFX3,RFX6,RORA,RORG,RREB1,RUNX1,RUNX2,RUNX3,RXRA,RXRB,RXRG,SALL1,SALL4,SIX2,SIX4,SMAD1,SMAD2,SMAD3,SMAD4,SMCA5,SNAI1,SNAI2,SOX10,SOX13,SOX15,SOX17,SOX18,SOX2,SOX3,SOX4,SOX5,SOX9,SP1,SP2,SP3,SP4,SP5,SP7,SPI1,SPIB,SPZ1,SRBP1,SRBP2,SRF,SRY,STA5A,STA5B,STAT1,STAT2,STAT3,STAT4,STAT6,STF1,SUH,TAF1,TAL1,TBP,TBX2,TBX20,TBX21,TBX3,TBX5,TCF7,TEAD1,TEAD2,TEAD3,TEAD4,TEF,TF2L1,TF65,TF7L1,TF7L2,TFCP2,TFDP1,TFE2,TFE3,TFEB,TGIF1,THA,THA11,THB,TLX1,TWST1,TYY1,UBIP1,USF1,USF2,VDR,VSX2,WT1,XBP1,ZBT17,ZBT18,ZBT7A,ZBTB6,ZEB1,ZEP1,ZEP2,ZFHX3,ZFP42,ZFP57,ZFX,ZIC1,ZIC2,ZIC3,ZKSC1,ZN143,ZN148,ZN281,ZN322,ZN335,ZN423,ZN431
Traceback (most recent call last):
  File "/home1/kurella/.conda/envs/dictys/lib/python3.9/site-packages/dictys/scripts/helper/makefile_check.py", line 138, in <module>
    raise ValueError(s)
ValueError: <10 TFs found in motifs.motif file.

Hi @lingfeiwang. It appears that makefile_check.py is case-sensitive for the gene names. My gene names were not capitalized in my expression.tsv file, but they were in my motifs.motif file. Resolving the capitalization fixed the issue I was having.

Good to know that! Alternatively you can correct capitalization in the motif file, which appears to me is causing the problem.