datafaker-net/datafaker

Faker().locality().allSupportedLocales() returns empty when Datafaker is embedded in Spring Boot Uber JAR

Closed this issue ยท 25 comments

Faker().locality().allSupportedLocales() returns empty when Datafaker is embedded in Spring Boot Uber JAR.

Please provide way more detail.

There is not much more details needed:

  1. Create a Spring Boot App,
  2. Add DataFaker as dependency,
  3. Package app with spring boot plugin: https://docs.spring.io/spring-boot/maven-plugin/getting-started.html

Given the JAR is within another JAR, within BOOT-INF/libs, it's unable to find any locale:
https://docs.spring.io/spring-boot/specification/executable-jar/nested-jars.html

example.jar
 |
 +-META-INF
 |  +-MANIFEST.MF
 +-org
 |  +-springframework
 |     +-boot
 |        +-loader
 |           +-<spring boot loader classes>
 +-BOOT-INF
    +-classes
    |  +-mycompany
    |     +-project
    |        +-YourClasses.class
    +-lib
       +-dependency1.jar
       +-dependency2.jar

It's unable to find the yml files which are stored inside the datafaker JAR.

You can create a spring boot app using:
https://start.spring.io/

demo.zip

Make sure to add datafaker jar as dependency in pom.xml

Hi @jloisel , thanks for your report. I just had a look at the code, and yes, it's unlikely that this code will work with an uber jar indeed. What do you need this method for? If you have a suggestion or a PR on how to fix it, happy to accept it.

It's more robust to rely on datafaker listing the locales, instead of hardcoding them on our side. We use Datafaker to generate fake csv data files for test environments. I'm not really sure how this can be fixed, the only solution I see is to have a hardcoded list of known locales inside datafaker, instead of trying to list them from jar resources (unreliable).

Why not try whatever locales you need and have a reasonable fallback or error handling. Instead of trying to enumerate on either side?

It's not a feasable approach to have a fallback / error handler for every external library called from your code. We cannot suppose for every method being call that "it may not work". Every class / method should have a contract we can trust, otherwise it's of no use.

It's fine by me if you don't want to fix this issue. But, from now, you are aware that listing locales from inside any Spring Boot / Uber Jar application doesn't work at all. At least, I would suggest you document this issue / limitation so future users are aware of it.

We appreciate the report, I wasn't aware of this before, and we won't close the issue since it's an actual issue. We'd appreciate a PR to address the issue, that's the fastest way to get this resolved, and we're more than happy to merge that into the code base.

It's not a feasable approach to have a fallback / error handler for every external library called from your code.

I totally agree it's far from optimal, I was just trying to give an option to get you working asap.

@kingthorin @bodiam How important is the performance of this method?
I tried to enhance method new Faker().locality().allSupportedLocales() so that it recursively scans all jars, incl. spring boot jar etc.

  1. in DataFaker 2.3.1, it takes 28 ms
  2. in DataFaker 2.3.2-SNAPSHOT, it takes 77 ms.

Probably it's not critical, especially if allSupportedLocales() caches the result, and the following calls work very quickly?

@jloisel can you give the snapshot a try?

@asolntsev i don't think the performance here is super important, thanks for looking into this!

Thanks for the quick fix! Performance is not critical, as we only call it once during application initialization. Do you have a snapshot maven repository we can use to try it?

We can wait for a release if necessary, as we have a workaround for now.

EDIT: found the snapshot repository https://www.datafaker.net/documentation/getting-started/#snapshot-versions

I have tested this on our web application and it's crashing due to an OOM:

SEVERE: Servlet.service() for servlet [dispatcherServlet] in context with path [] threw exception [Handler dispatch failed: java.lang.OutOfMemoryError: Java heap space] with root cause
java.lang.OutOfMemoryError: Java heap space
	at java.base/java.io.InputStream.readNBytes(InputStream.java:445)
	at java.base/java.io.InputStream.readAllBytes(InputStream.java:346)
	at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.newByteChannel(ZipFileSystem.java:977)
	at jdk.zipfs/jdk.nio.zipfs.ZipPath.newByteChannel(ZipPath.java:864)
	at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.newByteChannel(ZipFileSystemProvider.java:238)
	at java.base/java.nio.file.Files.newByteChannel(Files.java:380)
	at java.base/java.nio.file.Files.newByteChannel(Files.java:432)
	at jdk.zipfs/jdk.nio.zipfs.ZipFileSystem.<init>(ZipFileSystem.java:177)
	at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.getZipFileSystem(ZipFileSystemProvider.java:125)
	at jdk.zipfs/jdk.nio.zipfs.ZipFileSystemProvider.newFileSystem(ZipFileSystemProvider.java:120)
	at java.base/java.nio.file.FileSystems.newFileSystem(FileSystems.java:528)
	at java.base/java.nio.file.FileSystems.newFileSystem(FileSystems.java:400)
	at net.datafaker.providers.base.Locality$1.visitFile(Locality.java:93)
	at net.datafaker.providers.base.Locality$1.visitFile(Locality.java:84)
	at java.base/java.nio.file.Files.walkFileTree(Files.java:2811)
	at net.datafaker.providers.base.Locality$1.visitFile(Locality.java:96)
	at net.datafaker.providers.base.Locality$1.visitFile(Locality.java:84)
	at java.base/java.nio.file.Files.walkFileTree(Files.java:2811)
	at java.base/java.nio.file.Files.walkFileTree(Files.java:2882)
	at net.datafaker.providers.base.Locality.allSupportedLocales(Locality.java:109)

Our Spring Boot JAR weights 363MB, with -Xmx512m. Our heaviest dependency is "Playwright" which weights 171MB alone. Most other dependencies are less than 10MB.

The code could be revised to look specifically for the faker jar, or have a max size to enumerate. (datafaker is currently just under 3MB, so doing jars under 10 should be fine for quite a while)

It would also probably be a lot faster only checking for datafaker jar.

Or you could having a static list of known files to lookup internally, within a specific package, which would be the fastest but less maintainable option (since you would need to update the list every now and then when adding new locales).

For example, you could use ServiceLoader (built-in Java mecanism):
https://stackoverflow.com/questions/52204709/what-is-serviceloader-and-how-is-it-used

@jloisel @kingthorin But in this case we would not detect locales that might be present in the user's project.
If it's ok, then I suggest to just have a hard-coded list of locales.
It's not a problem for maintenance because we can create a specific unit-test for verifying that all locales are present in this list.

Wasn't the original issue that they have a way to check the datafaker locales? That other libraries/code in their project support other locales is kind of irrelevant to datafaker.

With ServiceLoader, you can detect if users have their own locales. They will have to declare them via a file named META-INF/services/com.datafaker.locales.MyLocalService, which lists all the implementing classes. An implementing class could then provide the name of the file via a method, or an inputstream on it.

ServiceLoader is kind of primitive / simple Inversion of Control (similar to what Spring does).

ServiceLoader<MyLocalService> services = ServiceLoader.load(MyLocalService.class);

Sounds reasonable to me, but outside my field of knowledge.

@jloisel While it's technically possible to load "a'la LocaleSerivce" with service loader mechanism, I don't see a need for this.

@kingthorin @jloisel Please check my PR that simplifies a lot this "available locales" thing: #1350

@jloisel Please try the updated datafaker 2.3.2-SNAPSHOT

Thanks! going to test next monday.

I confirm it's now working and it's fast enough. Thanks! Do you plan to release a 2.3.2 including this fix soon?

@kingthorin @bodiam Yes, it's time to release DataFaker 2.3.2 !