timkendrick/recursive-copy

Out of memory on large amount of files

rapkin opened this issue · 8 comments

When I try to copy large count of files I have this error ("out of memory"- console out below).
Code looks like this:

const path = require('path')
const copy = require('recursive-copy')

copy(path.join(__dirname, 'source'), path.join(__dirname, 'destination'))
    .then((results) => console.info('Copied ' + results.length + ' files'))
    .catch((error) => console.error('Copy failed: ' + error))

I think this lib should handle case like this (when we have large count of files). Maybe problem with results array (I'm not sure).

I made synchronous version and without results array to prevent memory leaks: https://gist.github.com/rapkin/49f6ce6f8f27e1717b3f9dca9fe3d4a4

Console out:

PS C:\inetpub\wwwroot\api> node copy.js

<--- Last few GCs --->

[1048:000002268029C690]   292270 ms: Mark-sweep 1415.2 (1745.9) -> 1415.2 (1690.4) MB, 3238.6 / 2.3 ms  last resort GC in old space requested
[1048:000002268029C690]   295173 ms: Mark-sweep 1415.2 (1690.4) -> 1415.2 (1667.9) MB, 2902.4 / 10.8 ms  last resort GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0000034F73BA5EC1 <JSObject>
    1: enqueue [C:\inetpub\wwwroot\api\node_modules\graceful-fs\graceful-fs.js:~251] [pc=0000032080ECB010](this=000001A57BB8BE21 <JSGlobal Object>,elem=000001BDDD3EF711 <JSArray[2]>)
    3: /* anonymous */ [C:\inetpub\wwwroot\api\node_modules\graceful-fs\graceful-fs.js:238] [bytecode=0000037525696349 offset=140](this=000001A57BB8BE21 <JSGlobal Object>,err=000001BDDD3EF5D9 <Error map = 000000DC...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

OS: Windows Server 2016
Nodejs: v8.9.1

Hi @rapkin – thanks for the bug report.

I've been trying to replicate this on my machine using your example script to copy a directory containing 100,000 files, each containing 1KB of random bytes. I've not seen any out-of-memory errors so far, so I've been having to guess at what's causing the memory issues.

I've created a refactored version of this library which should allow much better garbage collection – could you give this a go and see if it solves your problems? You can install it via npm install timkendrick/recursive-copy#fix/gc

If that doesn't help, the refactored version also has a { debug: true } option which logs what's going on. If it's still not working then could you tell me what the debug output is?

I'd also be grateful if you could let me know the following:

  • How many files you're attempting to copy
  • What the total size of the source folder is
  • Whether the source folder structure is fairly flat or deeply nested
  • How much RAM is available to the Node process on your machine

I'd be surprised if the out-of-memory error is due to the results array, however I've also added a { results: false } option to the refactored version, so you can try that and see if it changes anything.

Thanks in advance!

Hi @timkendrick

  • Count of files ~ 160 000
  • Size of folder ~ 160Gb
  • Folder contains ~ 24 000 folders (~ 7 files in every folder)
  • RAM 32Gb (20Gb free before I run this script)

Error 'out of memory' ocurred when process takes ~ 2.5 Gb (15 Gb still free on this machine).
I'll try to test with your refactored version.

With {results: false} option still "out of memory". I hope additional details will help to find problem.
Here last logs:

...
Copying C:\inetpub\wwwroot\api\resources\fffd3d9f-37fd-4153-b79a-08835e6eedb1\converted.webm…
Copying C:\inetpub\wwwroot\api\resources\fffd3d9f-37fd-4153-b79a-08835e6eedb1\thumb.jpeg…
Copying C:\inetpub\wwwroot\api\resources\fffd3d9f-37fd-4153-b79a-08835e6eedb1\palette.png…
Copied C:\inetpub\wwwroot\api\resources\0004aad5-b7d3-4cc1-a299-f0a973c2cf7a

<--- Last few GCs --->

[11272:00000216C7E5B010]   467160 ms: Mark-sweep 1418.4 (1784.4) -> 1418.3 (1727.9) MB, 3171.2 / 2.7 ms  last resort GC in old space requested
[11272:00000216C7E5B010]   470093 ms: Mark-sweep 1418.3 (1727.9) -> 1418.3 (1704.9) MB, 2932.9 / 5.5 ms  last resort GC in old space requested


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 000000A4623A5EC1 <JSObject>
    1: /* anonymous */ [C:\inetpub\wwwroot\api\node_modules\graceful-fs\graceful-fs.js:~236] [pc=000000CA727B7BB6](this=00000373CD58BE21 <JSGlobal Object>,err=000000DEA81C5669 <Error map = 000003891D469C99>,fd=0000013B30102311 <undefined>)
    2: arguments adaptor frame: 1->2
    3: /* anonymous */ [fs.js:~134] [pc=000000CA727B5374](this=000001B548B03B79 <FSReqWrap map = 000003891D469DF9>)
   ...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

Thanks for this information – can you also let me know what gets logged when you use the #fix/gc version and specify { debug: true }, as seen below?

const path = require('path')
const copy = require('recursive-copy')

copy(path.join(__dirname, 'source'), path.join(__dirname, 'destination'), { debug: true })
    .then((results) => console.info('Copied ' + results.length + ' files'))
    .catch((error) => console.error('Copy failed: ' + error))

That way hopefully I can figure out which stage of the copy is failing.

Thanks!

Sorry, just seen that you've provided this. I'll take a further look into it now.

Hi again @rapkin, and thanks for the detailed information, it really helped diagnose the problem.

I was able to replicate this on my machine, and it looks like the process is running out of memory due simply to the enormous number of parallel copy operations.

I've pushed a change to the #fix/gc branch which adds a concurrency option (default: 255) which limits the number of simultaneous copy operations to prevent this from happening.

This fixed the problem on my machine. It also worked without setting the { results: false } option, although it slows down significantly as the results array gets larger (due to frequent GC pauses). I'd recommend setting { results: false } in situations like yours.

Let me know if the current #fix/gc branch works for you and I'll push an update to npm.

Hi @timkendrick, good job - it works pretty nice. Thanks for you help!

Great, that's published to npm as v2.0.8. Thanks for the help!