AttributeError: 'str' object has no attribute '_output_dir'

Question

AttributeError: 'str' object has no attribute '_output_dir'

Opened this issue 4 years ago · 9 comments

I'm using aws ec2 ubuntu. It does not allow me to create an individual.

user662 = l.create_individual('User662', '/home/ubuntu/myprojectdir/AaronAzuma.zip')
Traceback (most recent call last):
File "", line 1, in
File "/home/ubuntu/myprojectdir/venv/lib/python3.8/site-packages/lineage/init.py", line 96, in create_individual
return Individual(name, raw_data, self._output_dir, **kwargs)
AttributeError: 'str' object has no attribute '_output_dir'

Answer 1 · 2021-01-18T05:13:56.000Z

Thanks for the issue. Can you provide more details or code snippets? I just tested installing and running the README examples in a Python 3.8 virtual environment without any issues.

Answer 2 · 2021-01-18T06:47:12.000Z

Thanks Andrew, On using your example data and the create_individual working, I realized that my issue was with the parsing. I already converted the format from AncestryDNA to 23andMe and then tried to use create_indidual. I receive the parsing error, which then doesn't allow me to go forward. My other set of files also have 4 columns like 23andMe but no headers (from the H3Africa array with another lab). $ sed -n 1,20p lineage/inputs/myfile.txt #AncestryDNA raw data download #This file was generated by AncestryDNA at: 07/31/2018 23:48:22 UTC #Data was collected using AncestryDNA array version: V2.0 #Data is formatted using AncestryDNA converter version: V1.0 ... rsid chromosome position allele1allele2 rs369202065 1 569388 GG $ python manage.py shell Python 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0] on linux

>> from lineage import Lineage >> l = Lineage() >> user111 = l.create_individual('User111', 'myfile.txt')

pandas.errors.ParserError: Too many columns specified: expected 5 and found 4 LaKisha

…

On Sun, Jan 17, 2021 at 11:14 PM Andrew Riha ***@***.***> wrote: Thanks for the issue. Can you provide more details or code snippets? I just tested installing and running the README examples in a Python 3.8 virtual environment without any issues. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <https://github.com/apriha/lineage/issues/84#issuecomment-761984850>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALHHGO657CSAN6PJROW3PCTS2O7SFANCNFSM4WGHH4IQ> .

Answer 3 · 2021-01-19T05:34:17.000Z

Thanks LaKisha, that helps. lineage uses the snps library to parse files, so I transferred the issue here.

snps should be able to read raw AncestryDNA or 23andMe files without conversion... However, snps could be updated to handle the format you pasted as well. Do you have a link to the tool that produces that format?

As for the H3Africa files, can you confirm that an example file would look like this (tab-separated):

rs1	1	101	AA
rs2	1	102	CC
rs3	1	103	GG
rs4	1	104	TT
rs5	1	105	--
rs6	1	106	GC
rs7	1	107	TC
rs8	1	108	AT
.
.
.

Answer 4 · 2021-01-19T07:45:19.000Z

Hi Andrew, Here is the script I'm using to convert my files from AncestryDNA to 23andMe format: (venv) ubuntu@:~/myprojectdir/lineage/inputs$ for file in ./*.txt; do echo "converting from AncestryDNA to 23andMe format file:" $file; gawk -i inplace -F'\t' '{ print $1"\t"$2"\t"$3"\t"$4$5; }' $file; done This line results in a text file that looks like this: rsid chromosome position allele1allele2 rs369202065 1 569388 GG rs199476136 1 569400 TT rs3131972 1 752721 AG rs114525117 1 759036 GG rs12124819 1 776546 AA rs4040617 1 779322 AA rs141175086 1 780397 CC rs115093905 1 787173 GG rs11240777 1 798959 AG The H3Africa file looks like this after using the command line (tab): h3a_37_1_54676_C_T 1 54676 AA seq-h3a_37_1_61989_G_C 1 61989 CC seq-h3a_37_1_62271_A_G 1 62271 AA seq-h3a_37_1_64552_G_A 1 64552 AA seq-h3a_37_1_104072_C_T 1 104072 GG h3a_37_1_108310_T_C 1 108310 AA h3a_37_1_110509_G_A 1 110509 GG seq-h3a_37_1_118617_T_C 1 118617 GG seq-h3a_37_1_256586_T_G 1 256586 AC h3a_37_1_404672_G_A 1 404672 AA kgp15717912 1 534247 GG If it helps, I'm sharing with you that after converting to 23andMe format, I convert it to VCF format to use downline. Your tool is really quick, plus the graph. It would be great if I could use it my pipeline. Here's my 23andMe to VCF conversion: (venv) ubuntu@:~/myprojectdir/lineage/inputs$ for file in ./*txt; do echo "converting to vcf file:" $file; bcftools convert -c ID,CHROM,POS,AA -s ${file%.txt} --haploid2diploid -f ../references/Homo_sapiens.GRCh37.75.dna.primary_assembly.fa --tsv2vcf $file -Oz -o ${file%.txt}.vcf.gz; done # Index multiple vcf files in prep to merge for file in ./*.vcf.gz; do echo "indexing vcf file" $file; tabix $file; done # Merge multiple vcf file into single vcf file bcftools merge -Oz -o MergedSamples1.vcf.gz ../inputs/*.vcf.gz # Clean MergedSamples file bgzip -d ../results/MergedSamples.vcf.gz grep ^"#" ../results/MergedSamples.vcf > ../results/MergedSamples0.vcf awk -F$'\t' '{ if ( $3 ~ "rs" ) { print $0; } }' ../results/MergedSamples.vcf > ../results/MergedSamples1.vcf awk -F$'\t' '{ if ( $3 !~ ";" ) { print $0; } }' ../results/MergedSamples1.vcf > ../results/MergedSamples2.vcf cat ../results/MergedSamples0.vcf ../results/MergedSamples2.vcf > ../results/MergedSamplesEdited.vcf sed -n 1,20p MergedSamplesEdited.vcf gawk -i inplace '!a[$2]++' ../results/MergedSamplesEdited.vcf bgzip ../results/MergedSamplesEdited.vcf

…

On Mon, Jan 18, 2021 at 11:34 PM Andrew Riha ***@***.***> wrote: Thanks LaKisha, that helps. lineage uses the snps library to parse files, so I transferred the issue here. snps should be able to read raw AncestryDNA or 23andMe files without conversion... However, snps could be updated to handle the format you pasted as well. Do you have a link to the tool that produces that format? As for the H3Africa files, can you confirm that an example file would look like this (tab-separated): rs1 1 101 AA rs2 1 102 CC rs3 1 103 GG rs4 1 104 TT rs5 1 105 -- rs6 1 106 GC rs7 1 107 TC rs8 1 108 AT .. . .. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#120 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALHHGO3YAISSB3V4FHC7HRLS2UKWNANCNFSM4WIHI47A> .

Answer 5 · 2021-01-20T05:19:42.000Z

Thanks LaKisha. The issue with snps / lineage not being able to parse your converted file is because it's trying to apply the AncestryDNA parser based on the comments, and for that it looks for whitespace between the alleles and column headers.

But, you don't need to convert the file since snps can read AncestryDNA (and the other formats discussed in the README already. Give that a try and let me know how it works.

As for the H3Africa file, snps should also be able to read that.

And if you need a VCF file, you can save the SNPs in VCF format.

Answer 6 · 2021-01-24T21:34:19.000Z

Closing since there are no updates required for this issue.

Answer 7 · 2021-01-25T04:44:01.000Z

Sorry, I closed the issue too early. Upon further investigation, snps should be updated to handle the H3Africa format since the generic parser is not invoked (an rsid is not in the first line). Also, the generic parser wouldn't be able to parse this due to multiple whitespace.

So to handle this, snps could either (or both)

check if "h3a" is in the first line and apply a parser similar to the AncestryDNA parser with multiple whitespace
apply a generic parser as a last check that tries to read four or five column files with multiple whitespace

Answer 8 · 2021-01-27T14:09:48.000Z

Hi Andrew, I tried again with fresh AncestryDNA zip files. I'm still getting the same error message.

>> s = SNPs("/home/ubuntu/myprojectdir/lineage/inputs/Person1.zip") >> s.source

'AncestryDNA'

>> s.build

37

>> s.assembly

'GRCh37'

>> s.count

Traceback (most recent call last): File "<console>", line 1, in <module> AttributeError: 'SNPs' object has no attribute 'count'

>> user662 = l.create_individual('User662',

'/home/ubuntu/myprojectdir/lineage/inputs/Person1.zip') Traceback (most recent call last): File "<console>", line 1, in <module> File "/home/ubuntu/myprojectdir/venv/lib/python3.8/site-packages/lineage/__init__.py", line 96, in create_individual return Individual(name, raw_data, self._output_dir, **kwargs) AttributeError: 'str' object has no attribute '_output_dir'

…

On Sun, Jan 24, 2021 at 10:44 PM Andrew Riha ***@***.***> wrote: Sorry, I closed the issue too early. Upon further investigation, snps should be updated to handle the H3Africa format since the generic parser is not invoked (an rsid is not in the first line). Also, the generic parser wouldn't be able to parse this due to multiple whitespace. So to handle this, snps could either (or both) - check if "h3a" is in the first line and apply a parser similar to the AncestryDNA parser with multiple whitespace - apply a generic parser as a last check that tries to read four or five column files with multiple whitespace — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#120 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ALHHGO5WMVIZRNBAODDLNDLS3TZJ5ANCNFSM4WIHI47A> .

Answer 9 · 2021-02-01T06:08:04.000Z

Hi @lakishadavid , please try to create a new virtual environment and install lineage again - I've updated it to support the latest version of snps. FYI, here are some additional installation directions: https://lineage.readthedocs.io/en/latest/installation.html .