cquiroz/scala-java-time

Reduce file size for Scala.js

exoego opened this issue · 8 comments

I am looking forward a way to get the benefits of java.time API and reduce the Scala.js-generated file size at same time. I have started evaluating sbt-tzdb v0.1 for that purpose.

I could save averagely 560KB in size (fullOptJS, no gzipped) in my several projects, by removing timezone DB with zonesFilter := { (z:String) => false}. That is 25%~40% savings for my projects.
(Note: I suppose my projects require no timezone DB, since my code base uses only OffsetDateTime, LocalDate and Duration right now)

I also investigated why some project can save 40%, but other can save only 25%. My observation shows that my projects can save additional 747KB averagely, if DateTimeFormatter usage is eliminated completely. I suppose that there are huge amounts of locales for DateTimeFormatter as similar as timezone DB.

So... it would be great if one can filter locales for DateTimeFormatter in common with timezone DB. (Sorry in advance, in case of my assumption regarding DateTimeFormatter is incorrect)

The below is my observation of file size reduction in my projects.

Project  Total (KB) Rest of code (KB) TimeZoneDB (KB) DateTimeFormatter (KB)
A 2465 1153 560 752
B 2286 978 560 748
C 2246 939 560 747
D 2237 930 560 747
E 2166 862 560 744
F 2140 835 560 745
G 2130 825 560 745
H 2113 808 560 745
I 2105 801 559 745
J 2083 779 559 745
K 1311 753 557 0
L 1307 749 557 0
M 1283 725 557 0
N 1271 713 557 0
  • Total = Rest of code + TimeZoneDB + DateTimeFormatter

The below chart illustrates the above table
image

Thanks for this information. Reducing the size is a goal for me too, for example, the tzdb format is very developer unfriendly but at the same time produces the smallest amount of code.

At the same time the ways to make scala.js smaller is not 100% clear to me and I often have to resort to trial and error

Regardless, here are some notes:

  • With version 2.0.0-M13, if you are not using timezones at all you can simply skip sbt-tzdb. The result is the same as having zonesFilter := { (z:String) => false}. That will save a few more bytes
  • DataTimeFormatter is indeed very large though I haven't realized how big was it until you did this 👍 .
  • Locales are optimizable by scala.js at least in fullOptJS. Are you importing any locales?
  • Regardless I'd like to do a sbt-cldr to have the same effect where you could only include e.g. english and spanish locales. As a minimum it should help speed up building if not reduce the size of the output.

Have you checked on your js how many locales are present? If possible can you check in both fast and full mode

With version 2.0.0-M13, if you are not using timezones at all you can simply skip sbt-tzdb. The result is the same as having zonesFilter := { (z:String) => false}. That will save a few more bytes

Oh, my 1st observation was accidentally conducted on 2.0.0-M12 with sbt-tzdb 😓 The below table is quick updated on one of my project.

scala-java-time DateTimeFormatter full fast
M12 used 1726 (A) 6268
M12 not used 978   4408
M13-SNAPSHOT used 1148 (B) 5212
M13-SNAPSHOT not used 991 (C) 4453

So savings is

  • TimeZone DB savings (A - B): 578 KB (almost equals to 560KB on 1st observeration)
  • DTFormatter saving: (B - C): 157 KB

Have you checked on your js how many locales are present? If possible can you check in both fast and full mode

No, I had not.

Actually, I am not sure how to identify locales in Scala.js files... I can see 23 "localed.cldrl.data.~~~" string literals in JS files like the below, both in fast and full, regardless of M12 vs M13 or DateTimeFormatter usage.

"locales.cldr.data.numericsystems$"
"locales.cldr.data.de$",
"locales.cldr.data.de_DE$",
"locales.cldr.data.en$",
"locales.cldr.data.en_001$",
"locales.cldr.data.en_CA$",
"locales.cldr.data.en_GB$",
"locales.cldr.data.en_US$",
"locales.cldr.data.fr$",
"locales.cldr.data.fr_CA$",
"locales.cldr.data.fr_FR$",
"locales.cldr.data.it$",
"locales.cldr.data.it_IT$",
"locales.cldr.data.ja$",
"locales.cldr.data.ja_JP$",
"locales.cldr.data.ko$",
"locales.cldr.data.ko_KR$",
"locales.cldr.data.root$",
"locales.cldr.data.zh$",
"locales.cldr.data.zh_Hans$",
"locales.cldr.data.zh_Hans_CN$",
"locales.cldr.data.zh_Hant$",
"locales.cldr.data.zh_Hant_TW$",

I will check again if you suggest how to identify locales.

Just for reference, I use DateTimeFormatter for ISO format like below.

val formatter = DateTimeFormatter.ofPattern("yyyy/MM/dd HH:mm:ss")
val strRepl = odt.format(formatter)

val odt = OffsetDateTime.parse(strDate, DateTimeFormatter.ISO_OFFSET_DATE_TIME)

Thanks. I'll add some of your questions to the documentation.

The presence of those classes on fast is not surprise but I had expect that in full most of them would go away. Still that is not as much data as it sounds but could be improved

DateTimeFormatter is very large by itself, I suspect most of the code is in the actual code

are you setting up your own locale? otherwise you'd be using English by default

are you setting up your own locale?

No, I am not.

otherwise you'd be using English by default

I see why there are strings like "JANUARY" and "Sunday Monday Tuesday Wednesday Thursday Friday Saturday".split(" ") in JS file. Thanks.

I ran some tests and it's correct that there is some extra locale data left even in fullOptJS
The other issue related to size is just related to how complex DateTimeFormatter is, which leads to the increased size

i'll try to reduce both problems in the upcoming weeks now that M13 is ready

FYI.

When I opend this issue, I used DateTimeFormatter.ISO_DATE_TIME to parse 2011-12-03T10:15:30+01:00.
However, ISO_DATE_TIME can also parse datetime string with timezone like 2011-12-03T10:15:30+01:00[Europe/Paris].
Therefore, lots of TimeZone-related codes are generated into JavaScript.

Recently I changed to DateTimeFormatter.ISO_OFFSET_DATE_TIME, which does not parse [Europe/Paris].
Though there are still amount of TimeZone-related things in JS like "America/Argentina/Buenos_Aires", it reduced JS size drastically (about 50%).

This is just a use-site tips.
Ofcourse ISO_DATE_TIME should be used if there is a requirement in application to parse timezone.

That is a great insight, perhaps it could be added to the documentation?

scalajs-java-locales implemented locales filtering via sbt-locales.
I'll keep this open though as an umbrella for the topic of size reduction