microsoft/TypeScript

Reduce typescript package size

pauldraper opened this issue Β· 57 comments

Search Terms

size, bloat, install

Suggestion

The typescript package is large, and it only getting larger.

screenshot from 2018-10-13 11-12-06

Version 3.1.3 is a whopping 40MB.

Use Cases

TypeScript is used in many contexts.

A TypeScript formatter (e.g. prettier) does not need an entire compiler. It only needs a parser. And 45MB scripted parser is orders of magnitude larger than one would normally expect. (For reference, the installed npm package for Esprima -- the most compatible and compliant ES parser in the ecosystem -- is a mere 0.3MB.)

Examples

Solution 1: Split up packages

  • typescript (existing package; depends on typescript-compiler, typescript-parser, typescript-server)
  • typescript-compiler (depends on typescript-parser)
  • typescript-parser
  • typescript-server (depends on typescript-compiler)

Optionally, there could be separate packages for typescript-config and typescript-i18n.

Solution 2: Don't duplicate code

There is a lot of code duplication between

  • lib/typescriptServices.js
  • lib/typescriptServices.js
  • lib/tsserver.js
  • lib/tsserverlibrary.js

Don't duplicate the code.

Checklist

My suggestion meets these guidelines:

  • This wouldn't be a breaking change in existing TypeScript / JavaScript code
  • This wouldn't change the runtime behavior of existing JavaScript code
  • This could be implemented without emitting different JS based on the types of the expressions
  • This isn't a runtime feature (e.g. new expression-level syntax)

Some backstory in #23339.

Interesting reading, thanks.

TypeScript [2.9.0] has doubled in size since v2.0.0 - now 35 MB

It was "fixed" by #25901, released in 3.1.1, which was 40MB. πŸ™


It won't be hard at all to shrink the package size. For example, lib/tsserver.js and lib/tsserverlibrary.js are 98% identical.

$ du -b node_modules/typescript/lib/tsserver.js node_modules/typescript/lib/tsserverlibrary.js
7290127	node_modules/typescript/lib/tsserver.js
7308140	node_modules/typescript/lib/tsserverlibrary.js
$ comm -12 <(sort node_modules/typescript/lib/tsserver.js) <(sort node_modules/typescript/lib/tsserverlibrary.js) | wc -c
7207205

And 99% of lib/typescript.js is identical to those.

$ du -b node_modules/typescript/lib/typescript.js
6859801	node_modules/typescript/lib/typescript.js
$ comm -12 <(sort node_modules/typescript/lib/tsserver.js) <(sort node_modules/typescript/lib/typescript.js) | wc -c
6850490

And lib/typescriptServices.js is byte-for-byte identical to that.

$ sha1sum node_modules/typescript/lib/typescript.js node_modules/typescript/lib/typescriptServices.js
0cff9734eba3d721a7ba3c72026e16f267610e24  node_modules/typescript/lib/typescript.js
0cff9734eba3d721a7ba3c72026e16f267610e24  node_modules/typescript/lib/typescriptServices.js

And 99% of lib/typingsInstaller.js is identical to that.

$ du -b node_modules/typescript/lib/typingsInstaller.js
5285788	node_modules/typescript/lib/typingsInstaller.js
$ comm -12 <(sort node_modules/typescript/lib/typingsInstaller.js) <(sort node_modules/typescript/lib/typescriptServices.js) | wc -c
5246999

And 80% of lib/tsc.js is identical to that

$ du -b node_modules/typescript/lib/tsc.js
3912404	node_modules/typescript/lib/tsc.js
$ comm -12 <(sort node_modules/typescript/lib/typingsInstaller.js) <(sort node_modules/typescript/lib/tsc.js) | wc -c
3219205

That's nearly 30MB of duplication just in those few files (and this doesn't even include declaration files).

I can't begin to guess at the kinds of design decisions that produce this (or what kind of compatibilities the TS team needs to support), but I trust there is a solution the maintainers would be happy with.

I can't begin to guess at the kinds of design decisions that produce this (or what kind of compatibilities the TS team needs to support), but I trust there is a solution the maintainers would be happy with.

It's done this way so that every file can be used by itself without having to deal with the nastiness of modules in JavaScript. Every file is a functional library/program in itself. I think that is a great thing, at the cost of some disk space.

Reading the linked issue #23339, it appears that it desire is in fact to (eventually) use modules.

#23339 (comment)

If we used modules, we'd be able to share each file and avoid this duplication

it is something we want to do, but no plans for the short term. that is where the majority of savings would come from.


nastiness of modules in JavaScript

ES module systems in general can be hit-and-miss, but reminder that we're talking specifically about an npm package.

npm, npm packages, node_modules, package.json, etc. are relate to Node.js (or clones) which supports CommonJS. Right?

I have two ideas, but I am not sure which one is better.

  1. split code in the source code
    for now, some common utils or helper has been shared with a different component, we could split them by function, eg: utils.ts -> utils.ts( common), utils.factory.ts(depend on factory), utils.emitter.ts(depend on emitter), etc.
    if you want a factory or emitter only. just create a tsconfig.json file that include the depended file,

  2. analyze and transform the bundled file
    the namespace has been compiled to many iife and injected the namespace instance,
    we could compile with target esnext and merge those iife, then transform the ts.xxx = xxx to export xxx,
    and then, we could pack them as a normal esm project and tree shark

ping @DanielRosenwasser
What do you think about that🧐?

I am skeptical that tree-shaking is useful for shipping our own package because presumably everything we ship is used in some capacity, or is part of our public API - at which point, our consumers would actually be the ones winning from tree-shaking.

Splitting source on its own can help, but practically speaking the larger components like services and TSServer will need the entire core compiler.

I think that converting to modules is the most practical and obvious way to avoid duplicating most of the contents of tsc.js 3+ times.

A simpler solution: inspired from Busybox.

Combine N near-duplicate files into 1 polymorphic file that can do N things based on a parameter passed in.

It would introduce a performance overhead of parsing tiny % of unnecessary JS code, but can make the tool integration story way simpler. Maybe worth it?

One trivial way to know which feature is expected would be to directly copy Busybox approach: symlink all the duplicate files and differentiate at runtime based on the __filename. Saves disk space, package size, bandwidth. There are more interesting options too.

From speaking with @RyanCavanaugh, it sounded like @orta was interested in working on this.

+1 for splitting up typescript into multiple packages. One major benefit would be that these individual packages (other than the "typescript" package) could use semantic versioning on at least their APIs then other libraries could just depend on the packages they need. Right now it's kind of a pain to maintain a library that has a peer dependency on the typescript package (without being super strict about the supported version).

orta commented

Yeah, I'm chatting with folks internally this week, but my goal is roughly:

  • Let the package typescript be the same as right now (as removing things would break the world) which provides all tooling

Then have subset packages which are smaller and focused on a specific task:

  • @typescript/tsc for folks who are just doing compilation (e.g. tsc compiles on the server, prettier for the AST)
  • @typescript/services for folks building dev tools like monaco-typescript, or executeprogram etc

I doubt I can offer any useful semver on them, as they link to the main TS version. That'd need the API to actually be classed as "stable" which doesn't look like that's happening soon.

Figuring out how/if we can reduce the main "typescript" is hopefully something I can get an idea about during ^

Removing tools from the package doesn't reduce overall size. Compilation, dev tools etc reuse a lot of the same code that is now copied to multiple commands without changes. The issue is how to share the very duplicated part between the tools, reduce the duplication, or pack the tools into one bundle.

Yeah, I'm chatting with folks internally this week, but my goal is roughly

Oh, we're generally for it (and have been for years, provided we still provide a services bundle for our (browser) consumers who use it) - we just need an automated way to remap the current namespace-based code layout into modules, this way we can keep a PR doing the migration up to date and not stop development on other things. I have a branch from two years ago that migrated all of src/compiler to modules (by hand) - checker.ts had something like 100 lines of imports on it. And that took quite awhile to make. That gave some of us some pause and reduced enthusiasm, but... I'm hoping the final result is still seen as worth it.

With respect to said automation, I think we could probably write a kind of codemod for it using the APIs we have today, but nobody's put in the effort yet.

mjbvz commented

@orta VS Code is very interested in this work. Right now we consume TypeScript in two ways:

  • tsserver.js β€” Used by our JS/TS extension
  • typescript.js β€” Used by our html extension

Each of those files is around 8MB on disk. Additionally, are interested in shipping built-in support for tsc (tsc.js), but that's another 4.5MB and that's difficult for me to justify. It seems to me like all these various TypeScript components should be able to share a lot of code.

Let me know if you would like any additional info about how VS Code consumes TS


As a side note, typingsInstaller.js is pretty huge too (6MB)!! Does it pull in a lot of stuff from TS core?

orta commented

I brought this up during the most recent design meeting - #34899

Where the end result was basically, we're meeting about trying to get modules happening again

As mentioned above - all of these files are basically the same but with a bit of flavor difference because they represent different sets of the compiler + services - for example I think you can probably use tsserverlibrary for both the html + JS/TS cases in vscode, buttsc.js doesn't look like it lives in there.

orta commented

#35561 is looking like the answer to this, I'll keep my eye on PR to see how things change

I am skeptical that tree-shaking is useful for shipping our own package because presumably everything we ship is used in some capacity, or is part of our public API

If it's not much effort to add into the build, this could still be a worthy goal. There are a few consumers, like Prettier and the new VS Code JS debugger extension, who ship TypeScript in a bundled form. It would double the size on disk if you shipped both ESM and CommonJS in a single package--maybe it could be a separate set of /typescript.*-esm/ packages?

@connor4312 You'll be able to give it a shot when it's migrated, I'm just saying to temper expectations about the savings you'll see.

Has anybody tried implementing an executable typescript multiplexer following the native pattern of crunchgen or toybox, per @mihailik's suggestion? It would generate the most savings I think.

mhart commented

npm install typescript@4.0.2 results in a 60MB node_modules on my Mac (56MB of which is typescript itself). Typescript is by far the largest module in our stack (and we have 146 explicit deps in package.json) – would love to see some reduction here πŸ™

Yup. This is the second largest module in my stack. typescript@4.0.3 is taking up 52M on disk - while its fine for prod since people typically dont ship typescript as well in images but the transpiled js files, still a reduction in size can impact the dev env significantly.

The install size of typescript@4.5.4 is 61 MB: install size

However most of that (51.8 MB) is just these six JavaScript files. Minifying them using uglify-js with just basic configuration reduces their size drastically (to 16.5 MB):

File Size Minified size
tsc.js 5621 kB 2206 kB
tsserver.js 10378 kB 3237 kB
tsserverlibrary.js 10331 kB 3220 kB
typescript.js 9728 kB 2989 kB
typescriptServices.js 9728 kB 2989 kB
typingsInstaller.js 7298 kB 2273 kB

The resulting package size (25.7 MB) is less than half of the current install size at the cost of one additional build step. Is this maybe something that should be explored? I didn't manage to find any thread discussing this except for one mention in #23339.

@vostrnad They're working on modularizing the compiler. #35561

From an outsiders perspective, it seems there hasn't been much work done on this recently. People from a lot of corners of the typescript universe have chipped in their approval towards a smaller typescript package. I'm no expert on anything low-level, but I'm just chipping in to start the discussion. Could the community be of any resource to this?

I have (and another dev or two before me has) been working on #35210 (turning the TS package into modules, mentioned in this thread before), which would directly impact this by only having one copy of everything in the package (like most npm packages). Then, the package would be smaller, and the lack of namespace generation into single files would allow importers to properly tree shake (allowing consumers to ship less).

Forgive the lack of obvious progress; this work is done out of tree in a code transformer that will do the conversion from namespaces/outFile to modules in bulk, since this sort of thing is far too difficult of a task to do solely by hand (and probably not gradually either).

This is what my work environment github folder looks like; the repeated yellow chunks are all typescript in various node_modules folders.

lots of typescript

πŸ‘‹ Inspired by discussions here (especially @vostrnad's observation), I created a smaller redistribution of TypeScript: https://github.com/kidonng/typescript

It's not battle tested though, but I've successfully used it to build the Vite repo.

  • @kidonng/tsc: install size
  • @kidonng/typescript: install size

FYI: We(Prettier) just reduced bundled package size from ~3.5m to ~1.4m by manually remove unused code. prettier/prettier#13431

For those following this thread, I've just posted the PR that converts the codebase to be implemented with modules (#51387); with this change comes major changes to our build and packaging, including a 43% reduction in package size.

I am filing followup issues now that the modules PR has been merged.

One such issue of interest here is #51440; the TL;DR is that if we raise our minimum supported Node version to Node 12, we could safely ship our executables as ESM, which would save us roughly 7 MB more on top of the 43% reduction above.

The reduction from modules is very significant. (Thanks!!!!!!)

If your math is correct, that reduces the package size from 65MB to 36MB.

Which is still larger than it was when #23339 was filed, asking for it to be smaller.

But alas, such is progress.

This was the largest possible improvement to the size. More could be done, but it's not gonna cut in half again.

Eventually, we may be able to ship as ESM and achieve the smallest possible package. Or, go further and publish individual packages for parts of our repo. That goal's a long way off, but there is work left o be done here.

Confirmed, TS nightly is much smaller now, thanks!

  • Before: install size
  • After: install size

Following the migration to modules in typescript@5.0.0-dev.20221108, I ran my minification tests again. Using uglify-js on the five largest JavaScript files now reduces the package size from 35.6 MB to 18.0 MB:

File Size Minified size
tsc.js 5097 kB 2281 kB
tsserver.js 7923 kB 2999 kB
tsserverlibrary.js 7886 kB 2983 kB
typescript.js 7338 kB 2705 kB
typingsInstaller.js 1756 kB 985 kB

I mentioned minification in the module conversion PR; we are restricted on that front because so many people still patch our package. If we minify, patching becomes difficult to impossible.

I'd love to be able to do so, but we have to figure out what to do about that first.

(We'd also probably not go "full" minify; we need to keep names for backtraces.)

Minify only saves space if you don't include source maps.

And excluding source maps seems like deal-breaker.

We already exclude source maps in the package, but our output is left "pretty" so that stack traces are meaningful when provided by downstream users.

If we were enabling minification, we would likely only have it remove whitespace and optimize syntax, leaving names in the output.

Re: ES Modules, I think we have to take performance as a serious goal. We get a big speed boost from esbuild's whole-program-aware bundling and giving that up for a better sticker number isn't a good trade-off for most users. People who want to vendor TS and get the smallest possible final output should pick up our mid-build artifacts and tree shake them.

Yeah, this is something I want to performance test; my impression is that ESM imports should be as fast as the whole-program bundling. I think that the differences were really down to variance + load time.

People who want to vendor TS and get the smallest possible final output should pick up our mid-build artifacts and tree shake them.

It's worth noting that vendoring has some big tradeoffs which might leave a user worse off. If someone still installs TypeScript (due to another dependency, for custom build tasks, or for having their editor use a workspace version), that person gets even more duplication of TypeScript, possibly with mismatched versions.

This is closed, but since people do still follow this issue, #55273 is on the docket for an early 5.3 merge; this PR effectively replaces typescript.js with tsserverlibrary.js and removes the latter. This leaves typescript.js as the sole provider of the public API, saving roughly 8MB unpacked. Copy/pasting the package size report that is run on PRs:

Before After Diff Diff (percent)
Packed 6.90 MiB 5.48 MiB -1.42 MiB -20.61%
Unpacked 38.74 MiB 30.41 MiB -8.33 MiB -21.50%
Before After Diff Diff (percent)
lib/tsserverlibrary.d.ts 570.95 KiB 865.00 B -570.10 KiB -99.85%
lib/tsserverlibrary.js 8.57 MiB 1012.00 B -8.57 MiB -99.99%
lib/typescript.d.ts 396.27 KiB 570.95 KiB +174.68 KiB +44.08%
lib/typescript.js 7.95 MiB 8.57 MiB +637.53 KiB +7.84%

As for our executables (and potentially an ESM API); that'll be handled by #51440 when I get to dealing with the long set of changes that are required to make that happen.

pi0 commented

Hi! First of all, thanks @jakebailey and the rest of the typescript team for constantly working on this matter to reduce the typescript install size πŸ’™

With the awareness of all these efforts, I made an experimental project tslite.

tslite is a redistribution of TypeScript without API changes and with optimizations like code minification that probably won't be possible for the typescript package itself but (significant) smaller size benefits a segment of users that directly install/need typescript as a peer dependency in their projects.

I hope this project will be helpful rather than something conflicting with the future roadmap of install size optimizations from the core package.

There is still more size work that can be done, specifically #51440.

However, I will note that the problem of package sizes is really not as bad as people think these days; every modern package manager uses hardlinks to a global cache, meaning that every install of TypeScript on a system will share the same backing files on disk. The "apparent" size may seem duplicative, but it's really all shared.

That and the install size seen on packagephobia is the unpacked size; the actual bits transferred from the registry are much, much smaller. Even gzip brings the tarball to about 6MB. tslite is smaller on that front at about 3MB, but overall most people only download each version of TypeScript once.

That combined with the hardlinking really means that we're talking about a few MB per system, paid once. One spends more network and disk space loading up Twitter or even GitHub via images and scripts that change often than the TS package.

I'm still going to try and make it smaller because I find it fun to do so, but it's a little moot IMO.

This matters when opening a repo on an online IDE where there is no cache. My home connection is ~2MB/s, so even in tarball TS still adds few seconds when I open a Stackblitz repro for Vite.

every modern package manager uses hardlinks to a global cache

Neither npm nor yarn use a global cache. (Unless Yarn is PnP mode, which brings a number of issues.)

overall most people only download each version of TypeScript once

There are over 2,800 versions of TypeScript. The chance that two different projects happen to install the same exact version is very low.

Even for a single npm install which dedups as much as possible, right now I'm looking at a project with 5 TypeScript versions. (Why? jsii, postcss-loader, prettier-plugin-organize-imports, puppeteer-core, cosmiconfig-typescript-loader, plus the version for the project itself.)

This matters when opening a repo on an online IDE where there is no cache. My home connection is ~2MB/s, so even in tarball TS still adds few seconds when I open a Stackblitz repro for Vite.

That's certainly true. It's a shame that these systems do not cache their artifacts.


Neither npm nor yarn use a global cache. (Unless Yarn is PnP mode, which brings a number of issues.)

Yarn 3 supports hard linking (https://yarnpkg.com/configuration/yarnrc#nmMode). If you're still using Yarn v1, you're not going to get any new features at all.

I was wrong about npm; it has a global cache but it copies the files.

There are 2,800+ versions of TypeScript. The chance that two different projects happen to install the same exact version is very low.

Even for a single npm install which dedups as much as possible, right now I'm looking at a project with 5 versions. (Why? jsii, postcss-loader, prettier-plugin-organize-imports, puppeteer-core, cosmiconfig-typescript-loader, plus the version for the project itself.)

There should really only be one TS version in a project; if this is happening, then some package is over-restricting what version of TS it needs. All modern package managers allow you to override versions within a workspace, and I would think it'd be safe to do that if space is a concern and your package manager can't hardlink.

It's also misleading to say that there are 2,800 versions of TypeScript; there are only a handful of stable releases. The rest are nightly builds.

People shouldn’t have to override Typescript versions. The project I’m working on now has 70 dependencies and if they all required post-install customization npm would be pretty unusable.

People shouldn’t have to override Typescript versions. The project I’m working on now has 70 dependencies and if they all required post-install customization npm would be pretty unusable.

I'm referring specifically to doing this in npm:

"overrides": {
    "typescript@*": "$typescript"
},

Or in yarn:

"resolutions": {
    "typescript@*": "$typescript"
},

Or in pnpm:

"pnpm": {
    "overrides": {
        "typescript@*": "$typescript"
    },
}

I am not referring to any sort of post-install patching, but just asking the package manager to resolve to a single version.

The point is that an override only seems reasonable because other dependencies don’t require any extra setup. NPM repos are supposed to be low-effort installs and typescript should be no exception.

It's a shame that these systems do not cache their artifacts.

There should really only be one TS version in a project

npm; it has a global cache but it copies the files.

Yes, as you say, IDEs, package maintainers, and package managers should be aggressively deduplicating redundancies.

....

....

....

....

And TypeScript should be doing the same. (Right now it's something crazy like ~75% duplicate code.)

Yes, again, #27891 (comment) removed one more copy, and #51440 will remove even more (down to the absolute minimum of 2 copies one can have when shipping both CJS and ESM). I'm not sure what else I can say, I was just originally attempting to explain that a large bulk of situations do not benefit from the effort to lower the package size.

I think it'd be useful for people to be a bit more specific about what they care about so we can tailor our efforts.

For time-over-wire, deduplication isn't a great savings, since each additional copy is a tiny increment (compressed checker.ts (2 MB) is 416k, compressed 4x checker.ts is 418k)

For space-on-disk, uh, I'm going to need some more details. It's not 1998 anymore. 40 MB is 0.04% of a terabyte. If the problem is that there are 35 copies of TS due to how a package manager behaves, going from 35 to even 10 is going to be a much bigger than anything we could plausibly do. That's a package manager problem, not a TypeScript problem, it's unrealistic to expect projects to put their effort into slimming down instead of having package managers duplicate less.

For bundling into other projects like web IDEs, treeshaking is going to be a big part of any successful strategy here. Identifying places where we can be more shakeable is a good thing.

For space-on-disk, uh, I'm going to need some more details.

I see where you're coming from, but npm is still the most commonly used package manager. The problem is npm, definitely, but who knows when this will change? Anything that reduces TS size definitely has impact on the ecosystem.

Same is true for the "time-over-wire" thing, shipping 1 MB less is probably not doing anything for one person, but if you multiply 1000 kb by 43 million downloads weekly, the picture looks totally different again.

As far as I can tell, the details you need are written down here: https://github.com/pi0/tslite#how

mhart commented

There are plenty of cases where cache isn't available and the size of the package matters – for over the network size, number of files that need to be written, and amount of JS that needs to be parsed at execution time.

  1. Container builds. Installs in containers typically have no cache (would require volume mounting, etc), so installs are slower. The resulting container size also matters for a number of reasons, from execution time, to registry push time over slow networks, etc, etc. Ppl typically try to keep their container sizes to a minimum, so small packages help here.
  2. CI builds. Similar to above. Often done in containers. Many CI systems have caching abilities, but they can be complicated to setup – and many don't. So often typescript is being installed from scratch each time, for every single build, just adding time to every single build.
  3. Serverless environments. Environments like Lambda, Google Cloud Run, Cloudflare Workers, etc execute much better with smaller zipfiles/container sizes. Reducing dependencies in these environments is a known best practice. Large packages are frowned on. Some have limits on size.
  4. Performance. The more files parsed, the more JS parsed, the slower a package is to start.

If the problem is that there are 35 copies of TS due to how a package manager behaves, going from 35 to even 10 is going to be a much bigger than anything we could plausibly do

35 to 10 is a 71% reduction.

The latest version (5.2.2) has tsc, tsserver, tsserverlibrary, and typescript which total 32MB but have only 9MB of unique content.

Removing the duplicate code drops the package from 41MB to 18MB, a 56% reduction.

So.....actually, there is a lot that TS can do.


it's unrealistic to expect projects to put their effort into slimming down instead of having package managers duplicate less

The problem is largely a synthetic one introduced by TS's bundling. The source code (excluding tests) is only 32MB.

there are 35 copies of TS due to how a package manager behaves, going from 35 to even 10 is going to be a much bigger than anything we could plausibly do

Tangential, but if you want to go that route @RyanCavanaugh , there's a four-year PR open for PnP to dedup installs, maybe it could be get some eyeballs :)

#35206

Totally agree! I wrote a small utility to extract the parameters of a function, and leveraged the TS compiler library for obvious reasons.

Works beautifully... but I have to include Typescript's 8 MB dependency, which effectively renders it useless.

The Typescript library needs to be tree-shakable! More to the point... why isn't it ALREADY tree-shakable??