serverless-heaven/serverless-webpack

Packaging external modules with copying and not Npm installing

jopicornell opened this issue · 16 comments

This is a Feature Proposal

Description

I use serverless-webpack mainly for it's gain in size reduction for every function when it's individually packed. The main feature I wanted so is the external module packaging, but as I'm in Galapagos Islands, where the internet is one of the worst I've had in my life, I can't do an npm install for every lambda function. My project have a lot of lambda functions and I'll have to wait 2 to 3 hours more or less (if there's no network problems that throw all the process away) only for packaging.

My proposal is not to npm install, but to disk copy every module and it's dependencies from the main node_modules, because if you npm install in the main node_modules, you have all the modules in your computer, built and ready. I don't see the need of npm installing every time. I've my own fork where i've built my idea and tested. Also I've done integrations test and they should work (As my internet is extremly slow, I can't deploy it, but I've reviewed the packaged functions and they look great).

How do you see that? Maybe there's something I've misunderstood or something that npm install does that copyiing from node_modules dir is not a good idea. I'm open to discuss this.

Similar or dependent issue(s):

Additional Data

  • Serverless-Webpack Version you're using: v3.0.0
  • Webpack version you're using: v3.6.0
  • Serverless Framework Version you're using: v1.22.0
  • Operating System: Windows 10
  • Stack Trace: N/A

The reason to not do a manual copy is, that you'd have to actually reimplement npm to make that work.

Let me explain this a bit:
The plugin works with individual packaging, so that it only installs the needed modules that are reported from Webpack. So it uses npm to fetch all modules and let's npm build a node_modules directory for the package - and in a second step it removes unneeded dependencies for each function using npm prune.

But the node_modules folder is not just an aggregation of all the modules that npm installs there. It is an optimized tree of folders where npm removes duplicate modules that are used somewhere in the dependencies, puts them to top-level when it's needed, etc.

So, the layout of the node_modules in the project folder is optimized for exactly this case, where ALL modules that are contained in the project are installed. If you try to simulate the packages for a single function, i.e. remove some dependencies from the project, do a npm prune, you'll end up with a completely different layout.

If you now like to just copy the folders from full project's node_modules you'll miss all these optimizations and the resulting packages will be inefficient. Depending on npm's optimization you'd even miss dependencies. Imagine this case:
You do an npm install twitter. npm might optimize the node module folder in 2 ways (which you cannot predict from the outside): Either it keeps the request dependency in the node_modules folder of the twitter module or it moves the request module to top-level (if that's better in case of module optimization). Let's assume it did it the 2nd way. If you now copy the node_modules/twitter folder, you'll end up with a package that is missing the request module and it will break. There is no means to find out where it is which is reproducible and guarantees that the right version is used.

A second reason against packaging by copy is the case where you explicitly bundle some dependencies, i.e. you add one to node-external's whitelist (let's take the twitter module as an example here). This will let webpack bundle the module and it reports the 2nd level dependencies (in this case the request module) as new direct dependencies. Currently the plugin handles this case very well and will install request dependencies right into the function package.

If you try to do that without npm and copy the modules, how would an algorithm look like, to resolve the new request dependency? You cannot find it in the project's dependencies as it is a 2nd level dependency, and you cannot find it within the node_modules folders without having to replicate npm internal functionality. Of course you can parse the installed package.json of the twitter module, but first you have to find out, that is is the twitter module that led to the dependency - you cannot just copy any request module you find in the tree. You have to find exactly the one that cased the new dependency. Furthermore the package.json files that are present in the node_modules are transformed by npm - and so you have to regard them as internal data created by npm and should not rely on their formats.

The only way to have a stable and working copy mechanism in combination with webpack's and npm's optimization algorithms is, to reimplement npm's functionality. For production use, using npm install is imo the only way to guarantee a packaging that will emit working packages, that supports all the configurations that are currently available with the plugin.
In my opinion a copy is not an option that can compare in any way to the current solution - in regards to quality, reliability and stability - and support of arbitrary webpack external packaging scenarios.

However there are scenarios where these essential properties (quality, reliability and stability) are not needed - in a debug environment and during development. I see a "copy" approach only valid for such a case, and then only without individual packaging (as consequence of the issues I mentioned above) as this is fully ok for debugging and experimentation. Maybe there can be a switch to use "copy-mode", which will turn off individual packaging --dbg-copy-modules and just copies the projects node_modules into the service artifact.

One further thought 🤔 : Would it be an option for you that you setup an EC2 instance in AWS where this instance has fast access to internet resources and do your production deployments from there? Together with a possible switch as I mentioned above, local testing would be possible without the need to have a fast connection - as the plugin now fully integrates with serverless-offline and invoke local which use the local node modules, that should be possible even today.

A different approach to overcome the slow network would be to setup a local NPM server with caching, that caches modules once retrieved and only downloads again if new versions are requested. We use Verdaccio which works quite well for that. It even can be used as NPM registry for local scoped packages.

Yes, I also don't like the npm install on every deploy but I see no way around it either. I think there was an older version of another webpack plugin that also just copied folders based on the first level dependencies which worked up to npm3 (which still used hierarchical structures where 2nd level dependencies would be a subfolder of 1st level dependencies) but breaks with newer npm versions (which flattens out all modules directly into the node_modules folder as described by @HyperBrain ).

Long story short, I don't think simple copying can work for complex projects that have transient or non-obvious dependencies. A local NPM server seems most reasonable to me. A quick Google search turned up this one: http://willcodefor.beer/setup-your-own-npm-cache-server/ -- Would that work for you, @jpicornell ?

I can't do an npm install for every lambda function

The plugin only fetches the packages via install exactly one time. The reduced packages for the single functions are created from this one install and will not fetch anything anymore (using npm prune to reoptimize and delete unneeded packages)

@HyperBrain and @arabold Thank you for such an amazing information! I will try all this useful information (I didn't thought about an EC2 instance and it's an amazing idea!). For local development is ok for now, because this plugin have an amazing integration with serverless-offline and works REALLY WELL together. The problem was doing the deployments, and as I was using npm 3 I think I didn't take advantage of npm caching of npm 5. I use Yarn in every project, so I didn't know how it works now on npm5, now I'll take a look. I use pipelines of bitbucket, that's also an amazing deployment system. If it don't work, I'll take a look on that EC2 instance.

If the idea of making my code as "debug option" is still on your mind, I can do a pull request, but with all this information, things are really clear and you don't need a copy function, but a way to improve integration with npm (that's getting better with new versions of npm I think).

BTW, verdaccio seems an amazing option for npm caching if I need a reliable and good option for this.

Thank you Guys! I'll keep an eye on issues and see if I can contribute with this amazing project.

Oh, Pipelines for Bitbucket! We're using Bamboo but I'm actually really interested in that, too. I'll definitely take a look as it sounds that this works well for you.

@jpicornell I'm happy that we could help you 😄 . Regarding the copy, I think you're right and we should postpone that for now. It might need a deeper discussion first, so that a possible debug functionality is really thought through - as there are lots of side-effects that have to be taken into account.

@arabold It's a great service! and really easy, with node_modules caché. I'm really happy and the configuration (with docker) it's really great. It has also service for adding a db server and other resources. Have a look and you'll love it! Also you have 50 minutes of compilation per team free.

@HyperBrain Yes, it's another discussion with deeper impact. One thing I was thinking is that I wanted to test my packaged services locally. Why I would like to do that? In my project we use sequelize as ORM. Databases (mysql, postgress...) aren't a common require but programmatically required (one thing I hate). When they do that, webpack isn't catch they as an external module, and if I don't test the packaged file I get errors only when it's deployed. Maybe the integration with serverless-offline should do that with a flag or something. That's why I started packaging my services locally and find this problems.

This is only the main reason for posting this issue and not another Feature Proposal. Maybe it's interesting for you having in mind that this could happen. I'll work on bitbucket and see if I can get bitbucket working with serverless-webpack correctly. Many thanks again!

@jpicornell Dependencies that are not recognized by webpack as external dependencies can be added with the new forceInclude option in the configuration (since 3.1.0). So, if you need to have e.g. PostGres (pg) in the packaged node modules, you just have to make sure it's in your production dependencies in your package.json and add this to the serverless.yml:

# serverless.yml
custom:
  webpackIncludeModules:
    forceInclude:
      - pg

For more modules just add them to the forceInclude array (e.g. - mysql).

See also the README for a description.

Thanks for the good discussion. I'll close this now. Feel free to continue the discussion if needed.

OrKoN commented

Hey, I have a somehow related problem. I use core-js for some polyfills and it's quite a big module. Webpack is smart enough to replace imports of core-js with imports to particular modules inside. But when serverless-webpack packages the modules, it puts the entire module as a dependency making the bundle huge. Is there a way to do tree-shaking also on node-modules somehow? I tried to have webpack to bundle the modules but it takes too long: it seems like webpack tries to run babel on the node_modules. Does anyone happen to have a webpack config example which allows that?

The reason why the whole module is bundled is, because it is detected as external module and installed to the package.

What you can do is to add core-js to the whitelist of node-externals like this:

externals: [ nodeExternals({
    whitelist: [ 'core-js' ]
  }) ],

This will bundle only core-js (and only the used parts) into your code (with tree-shaking applied) and the 2nd level dependencies will be automatically included by the sls-webpack plugin.

OrKoN commented

@HyperBrain thanks! I have just tried but it does not seem to work. My webpack config is like this:

const nodeExternals = require('webpack-node-externals');
const slsw = require('serverless-webpack');

module.exports = {
  entry: slsw.lib.entries,
  target: 'node',
  externals: [
    nodeExternals({
      whitelist: ['core-js'],
    }),
  ],
  devtool: 'source-map',
  mode: 'production',
  module: {
    rules: [
      {
        test: /\.js$/,
        exclude: /node_modules/,
        loader: 'babel-loader',
        query: {
          presets: [
            [
              'env',
              {
                target: { node: '6.10' },
                modules: 'commonjs',
                useBuiltIns: 'usage',
              },
            ],
          ],
        },
      },
    ],
  },
};

It still says: "Serverless: Packing external modules: core-js@2.5.3"
Am I missing something?

That's strange, but honestly I did not use the whitelist for some time myself, so I cannot tell if it is generally broken or just an issue here with core-js. The last time I checked it was with webpack 3 and an older version of the plugin.

According to the documentation of node externals, the whitelist also accepts regexes as array elements. maybe whitelist: [ /^core-js/ ] works.

BTW: You can set mode to mode: slsw.lib.webpack.isLocal ? 'development' : 'production'. That makes local debugging with sls-offline or sls invoke local easier

OrKoN commented

@HyperBrain Thanks! actually the regex form did the trick!