Error parsing date
anh151 opened this issue · 7 comments
Hello,
PharmCAT version 2.8.2
Environment: Linux Mint
java: jdk21
bcftools/bgzip/tabix: 1.18
We are trying to run PharmCAT on some local data. I tried running the pharmcat pipeline and each step individually and I get the same error. I tried v2.8.1 and same issue. I can try other versions if needed. This is a little bit time sensitive so if there is an older version that you don't think has this issue I can use that in the meantime.
cd /home/andrew/Desktop/bin/preprocessor && python3 -m pipenv run python /home/andrew/Desktop/bin/preprocessor/pharmcat_pipeline /home/andrew/Desktop/discovery/pharmcat_ready.vcf.gz --missing-to-ref -o /home/andrew/Desktop/discovery/pharmcat_10202023 -matcher
/home/andrew/.local/bin/jdk-21+35/bin/java -jar /home/andrew/Desktop/bin/preprocessor/pharmcat.jar -vcf pharmcat_10202023/ready.preprocessed.vcf.bgz -matcher
PharmCAT version: 2.8.2
Warning: Argument "-0"/"--missing-to-ref" supplied
THIS SHOULD ONLY BE USED IF: you sure your data is reference
at the missing positions instead of unreadable/uncallable at
those positions.
Running PharmCAT with positions as missing vs reference can
lead to different results.
Processing [/home/andrew/Desktop/discovery/pharmcat_ready.vcf.gz](https://file+.vscode-resource.vscode-cdn.net/home/andrew/Desktop/discovery/pharmcat_ready.vcf.gz) ...
[/home/andrew/Desktop/bin/preprocessor/preprocessor/utilities.py:703](https://file+.vscode-resource.vscode-cdn.net/home/andrew/Desktop/bin/preprocessor/preprocessor/utilities.py:703): FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
ref_pgx_regions = pd.concat([ref_pgx_regions, ref_pgx_regions.loc[idx_chr_m].assign(**{'CHROM': 'chrMT'})])
* WARNING: "chr22:42127530 REF=G ALT=CAC" does not match PharmCAT expectation of ALT at "chr22:42127530 REF=G ALT=GCA"
* WARNING: "chrX:154532990 REF=CGGT ALT=C" does not match PharmCAT expectation of REF at "chrX:154532990 REF=C ALT=T"
Adding back non-PGx variants at PGx positions...
* Cataloging 334 missing positions in [/home/andrew/Desktop/discovery/pharmcat_10202023/pharmcat_ready.missing_pgx_var.vcf](https://file+.vscode-resource.vscode-cdn.net/home/andrew/Desktop/discovery/pharmcat_10202023/pharmcat_ready.missing_pgx_var.vcf)
Running PharmCAT...
Checking files...
* Found 1 VCF file
Queueing up 1702 samples to process...
com.google.gson.JsonSyntaxException: Failed parsing 'Sep 27, 2023, 7:48:25 PM' as Date; at path $.modificationDate
at com.google.gson.internal.bind.DateTypeAdapter.deserializeToDate(DateTypeAdapter.java:90)
at com.google.gson.internal.bind.DateTypeAdapter.read(DateTypeAdapter.java:75)
at com.google.gson.internal.bind.DateTypeAdapter.read(DateTypeAdapter.java:46)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.readIntoField(ReflectiveTypeAdapterFactory.java:212)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$FieldReflectionAdapter.readField(ReflectiveTypeAdapterFactory.java:433)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:393)
at com.google.gson.Gson.fromJson(Gson.java:1227)
at com.google.gson.Gson.fromJson(Gson.java:1137)
at com.google.gson.Gson.fromJson(Gson.java:1075)
at org.pharmgkb.pharmcat.util.DataSerializer.deserializeDefinitionsFromJson(DataSerializer.java:63)
at org.pharmgkb.pharmcat.definition.DefinitionReader.readFile(DefinitionReader.java:194)
at org.pharmgkb.pharmcat.definition.DefinitionReader.<init>(DefinitionReader.java:55)
at org.pharmgkb.pharmcat.definition.DefinitionReader.<init>(DefinitionReader.java:45)
at org.pharmgkb.pharmcat.definition.DefinitionReader.defaultReader(DefinitionReader.java:223)
at org.pharmgkb.pharmcat.Env.<init>(Env.java:43)
at org.pharmgkb.pharmcat.BatchPharmCAT.execute(BatchPharmCAT.java:269)
at org.pharmgkb.pharmcat.BatchPharmCAT.main(BatchPharmCAT.java:124)
Caused by: java.text.ParseException: Failed to parse date ["Sep 27, 2023, 7:48:25 PM"]: Invalid number: Sep
at com.google.gson.internal.bind.util.ISO8601Utils.parse(ISO8601Utils.java:279)
at com.google.gson.internal.bind.DateTypeAdapter.deserializeToDate(DateTypeAdapter.java:88)
... 16 more
Caused by: java.lang.NumberFormatException: Invalid number: Sep
at com.google.gson.internal.bind.util.ISO8601Utils.parseInt(ISO8601Utils.java:316)
at com.google.gson.internal.bind.util.ISO8601Utils.parse(ISO8601Utils.java:133)
... 17 more
Thanks!
Andrew
We can't reproduce this problem on 2.8.2. Are you using the pharmcat jar as is?
This appears to be a problem parsing the allele translation data, which was fixed a long time ago...
We can't reproduce this problem on 2.8.2. Are you using the pharmcat jar as is?
This appears to be a problem parsing the allele translation data, which was fixed a long time ago...
Right, like he said. Are you using a version of the jar file you compiled yourself or are you using the jar downloaded from the release page? I tried an example VCF using the downloaded jar from the release page and I'm not seeing this problem.
I appreciate the quick responses.
Well with the pharmcat_pipeline it downloads the pharmcat jar file during the first run along with the ref sequence. Here is me running the pipeline and allowing the pipeline to download the .jar file.
Could this be an environment or data specific issue? I can provide an example file if that would help. Or i can try in another environment.
cd /home/andrew/Desktop/bin/preprocessor && python3 -m pipenv run python /home/andrew/Desktop/bin/preprocessor/pharmcat_pipeline /home/andrew/Desktop/discovery/test.vcf.gz --missing-to-ref -o /home/andrew/Desktop/discovery/pharmcat_10202023 -matcher
PharmCAT version: 2.8.2
�[33mWarning: Argument "-0"/"--missing-to-ref" supplied
THIS SHOULD ONLY BE USED IF: you sure your data is reference
at the missing positions instead of unreadable/uncallable at
those positions.
Running PharmCAT with positions as missing vs reference can
lead to different results.
�[0m
Downloading pharmcat.jar...
Processing /home/andrew/Desktop/discovery/test.vcf.gz ...
/home/andrew/Desktop/bin/preprocessor/preprocessor/utilities.py:703: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
ref_pgx_regions = pd.concat([ref_pgx_regions, ref_pgx_regions.loc[idx_chr_m].assign(**{'CHROM': 'chrMT'})])
* WARNING: "chr22:42127530 REF=G ALT=CAC" does not match PharmCAT expectation of ALT at "chr22:42127530 REF=G ALT=GCA"
* WARNING: "chrX:154532990 REF=CGGT ALT=C" does not match PharmCAT expectation of REF at "chrX:154532990 REF=C ALT=T"
Adding back non-PGx variants at PGx positions...
* Cataloging 334 missing positions in /home/andrew/Desktop/discovery/pharmcat_10202023/test.missing_pgx_var.vcf
Running PharmCAT...
Checking files...
* Found 1 VCF file
�[31mcom.google.gson.JsonSyntaxException: Failed parsing 'Sep 27, 2023, 7:48:25 PM' as Date; at path $.modificationDate
at com.google.gson.internal.bind.DateTypeAdapter.deserializeToDate(DateTypeAdapter.java:90)
at com.google.gson.internal.bind.DateTypeAdapter.read(DateTypeAdapter.java:75)
at com.google.gson.internal.bind.DateTypeAdapter.read(DateTypeAdapter.java:46)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.readIntoField(ReflectiveTypeAdapterFactory.java:212)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$FieldReflectionAdapter.readField(ReflectiveTypeAdapterFactory.java:433)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:393)
at com.google.gson.Gson.fromJson(Gson.java:1227)
at com.google.gson.Gson.fromJson(Gson.java:1137)
at com.google.gson.Gson.fromJson(Gson.java:1075)
at org.pharmgkb.pharmcat.util.DataSerializer.deserializeDefinitionsFromJson(DataSerializer.java:63)
at org.pharmgkb.pharmcat.definition.DefinitionReader.readFile(DefinitionReader.java:194)
at org.pharmgkb.pharmcat.definition.DefinitionReader.<init>(DefinitionReader.java:55)
at org.pharmgkb.pharmcat.definition.DefinitionReader.<init>(DefinitionReader.java:45)
at org.pharmgkb.pharmcat.definition.DefinitionReader.defaultReader(DefinitionReader.java:223)
at org.pharmgkb.pharmcat.Env.<init>(Env.java:43)
at org.pharmgkb.pharmcat.BatchPharmCAT.execute(BatchPharmCAT.java:269)
at org.pharmgkb.pharmcat.BatchPharmCAT.main(BatchPharmCAT.java:124)
Caused by: java.text.ParseException: Failed to parse date ["Sep 27, 2023, 7:48:25 PM"]: Invalid number: Sep
at com.google.gson.internal.bind.util.ISO8601Utils.parse(ISO8601Utils.java:279)
at com.google.gson.internal.bind.DateTypeAdapter.deserializeToDate(DateTypeAdapter.java:88)
... 16 more
Caused by: java.lang.NumberFormatException: Invalid number: Sep
at com.google.gson.internal.bind.util.ISO8601Utils.parseInt(ISO8601Utils.java:316)
at com.google.gson.internal.bind.util.ISO8601Utils.parse(ISO8601Utils.java:133)
... 17 more
�[0m
�[0m
I also tried running in the All of Us environment and I get the same error.
export JAVA_HOME="/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/bin/jdk-21+35" && \
export BCFTOOLS_PATH="/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/bin/bcftools" && \
export BGZIP_PATH="/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/bin/bgzip" && \
cd bin/preprocessor && \
/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/bin/python/bin/python3.9 -m pipenv run python \
/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/bin/preprocessor/pharmcat_pipeline \
/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/test.vcf.gz --missing-to-ref -o \
/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/pharmcat_test -matcher
PharmCAT version: 2.8.2
Warning: Argument "-0"/"--missing-to-ref" supplied
THIS SHOULD ONLY BE USED IF: you sure your data is reference
at the missing positions instead of unreadable/uncallable at
those positions.
Running PharmCAT with positions as missing vs reference can
lead to different results.
Downloading pharmcat.jar...
Only 1 CPU, cannot use concurrent mode
Processing /home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/test.vcf.gz ...
/home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/bin/preprocessor/preprocessor/utilities.py:703: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
ref_pgx_regions = pd.concat([ref_pgx_regions, ref_pgx_regions.loc[idx_chr_m].assign(**{'CHROM': 'chrMT'})])
* WARNING: "chr22:42127530 REF=G ALT=CAC" does not match PharmCAT expectation of ALT at "chr22:42127530 REF=G ALT=GCA"
* WARNING: "chrX:154532990 REF=CGGT ALT=C" does not match PharmCAT expectation of REF at "chrX:154532990 REF=C ALT=T"
Adding back non-PGx variants at PGx positions...
* Cataloging 334 missing positions in /home/jupyter/workspaces/piii03variantfrequencyprojectpgxcontrolled/pharmcat_test/test.missing_pgx_var.vcf
Running PharmCAT...
Checking files...
* Found 1 VCF file
com.google.gson.JsonSyntaxException: Failed parsing 'Sep 27, 2023, 7:48:25 PM' as Date; at path $.modificationDate
at com.google.gson.internal.bind.DateTypeAdapter.deserializeToDate(DateTypeAdapter.java:90)
at com.google.gson.internal.bind.DateTypeAdapter.read(DateTypeAdapter.java:75)
at com.google.gson.internal.bind.DateTypeAdapter.read(DateTypeAdapter.java:46)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$1.readIntoField(ReflectiveTypeAdapterFactory.java:212)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$FieldReflectionAdapter.readField(ReflectiveTypeAdapterFactory.java:433)
at com.google.gson.internal.bind.ReflectiveTypeAdapterFactory$Adapter.read(ReflectiveTypeAdapterFactory.java:393)
at com.google.gson.Gson.fromJson(Gson.java:1227)
at com.google.gson.Gson.fromJson(Gson.java:1137)
at com.google.gson.Gson.fromJson(Gson.java:1075)
at org.pharmgkb.pharmcat.util.DataSerializer.deserializeDefinitionsFromJson(DataSerializer.java:63)
at org.pharmgkb.pharmcat.definition.DefinitionReader.readFile(DefinitionReader.java:194)
at org.pharmgkb.pharmcat.definition.DefinitionReader.<init>(DefinitionReader.java:55)
at org.pharmgkb.pharmcat.definition.DefinitionReader.<init>(DefinitionReader.java:45)
at org.pharmgkb.pharmcat.definition.DefinitionReader.defaultReader(DefinitionReader.java:223)
at org.pharmgkb.pharmcat.Env.<init>(Env.java:43)
at org.pharmgkb.pharmcat.BatchPharmCAT.execute(BatchPharmCAT.java:269)
at org.pharmgkb.pharmcat.BatchPharmCAT.main(BatchPharmCAT.java:124)
Caused by: java.text.ParseException: Failed to parse date ["Sep 27, 2023, 7:48:25 PM"]: Invalid number: Sep
at com.google.gson.internal.bind.util.ISO8601Utils.parse(ISO8601Utils.java:279)
at com.google.gson.internal.bind.DateTypeAdapter.deserializeToDate(DateTypeAdapter.java:88)
... 16 more
Caused by: java.lang.NumberFormatException: Invalid number: Sep
at com.google.gson.internal.bind.util.ISO8601Utils.parseInt(ISO8601Utils.java:316)
at com.google.gson.internal.bind.util.ISO8601Utils.parse(ISO8601Utils.java:133)
... 17 more
Hi Andrew, thanks for the detailed error messages. I was able to replicate the issue using specifically JDK21. We are looking into the issue now.
If you need to run this now please try with JDK 17 instead and that should work.
Thanks for the help!
JDK17 was successful.
-Andrew
PharmCAT 2.8.3 has been released and will work with Java 21.