Malformed input or input contains unmappable characters
dattachandan opened this issue · 2 comments
What happened?
When using the exporter class(org.mitre.synthea.export.Exporter) and running the run_synthea app, I can see characters in XML that are causing exceptions, have anyone seen this before and recommend any fix?
Environment
- OS:Ubuntu 20.04
- Java: JDK 8 and 11
Relevant log output
java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: Raquel318_Henr?quez109_4b33f163-b07b-4ec7-b579-5ac453371d4f.xml
at sun.nio.fs.UnixPath.encode(UnixPath.java:147)
at sun.nio.fs.UnixPath.<init>(UnixPath.java:71)
at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
at sun.nio.fs.AbstractPath.resolve(AbstractPath.java:53)
at org.mitre.synthea.export.Exporter.exportRecord(Exporter.java:164)
at org.mitre.synthea.export.Exporter.export(Exporter.java:56)
at org.healthlink.exporter.syntheainterface.PatientGenerator.generatePerson(PatientGenerator.java:389)
at org.healthlink.exporter.syntheainterface.PatientGenerator.lambda$run$2(PatientGenerator.java:221)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Yes we've seen this before with certain file systems that don't support the accented characters in filenames. (#569) For posterity, do you happen to know the type of filesystem that files are being written to?
The quickest fix is to use only uuid filenames instead of including patient names in filenames, by setting the exporter.use_uuid_filenames
config setting to true either in src/main/resources/synthea.properties
or on the command line ./run_synthea --exporter.use_uuid_filenames=true ...
Alternatively if you're willing to make code changes you can add some filename sanitization to the org.mitre.synthea.export.Exporter.filename
method
Yes we've seen this before with certain file systems that don't support the accented characters in filenames. (#569) For posterity, do you happen to know the type of filesystem that files are being written to?
The quickest fix is to use only uuid filenames instead of including patient names in filenames, by setting the
exporter.use_uuid_filenames
config setting to true either insrc/main/resources/synthea.properties
or on the command line./run_synthea --exporter.use_uuid_filenames=true ...
Alternatively if you're willing to make code changes you can add some filename sanitization to the
org.mitre.synthea.export.Exporter.filename
method
I was running it on macOS Monterrey with Apple File System (APFS). It also happened in a ext4 ubuntu volume