/external-sorting

An external sorting algorithm implementation for NodeJS 💾

Primary LanguageTypeScriptMIT LicenseMIT

Stargazers Downloads Issues Vulnerabilities MIT License Contributor Covenant


External Sorting

For my work I needed an external sorting algorithm to sort big arrays (for example: sort 32 array of ~300MB with 2GB of RAM at the same time), but I haven't found any resource which talks about this kind of solution for NodeJS, so I've created it, I've decided to share this part of my project with the community and I hope that the community will help me to improve my solution.

Quick examples

asc sort of strings separate with \n

import fs from 'fs';
import esort from 'external-sorting';

esort({
  input: fs.createReadStream('input_file'),
  output: fs.createWriteStream('output_file'),
  tempDir: __dirname,
  maxHeap: 100
})
  .asc()
  .then(() => {
    console.log('done');
  })
  .catch(console.error);

desc sort of numbers separate with \n

import fs from 'fs';
import esort from 'external-sorting';

await esort({
  input: fs.createReadStream('input_file'),
  output: fs.createWriteStream('output_file'),
  tempDir: __dirname,
  deserializer: parseFloat,
  serializer: (v: number) => v.toString(10),
  maxHeap: 100
})
  .desc()
  .then(() => {
    console.log('done');
  })
  .catch(console.error);

asc sort of objects by property a.b.c separate with \r\n

import fs from 'fs';
import esort from 'external-sorting';

await esort({
  input: fs.createReadStream('input_file'),
  output: fs.createWriteStream('output_file'),
  tempDir: __dirname,
  deserializer: JSON.parse,
  serializer: JSON.stringify,
  delimiter: '\r\n',
  maxHeap: 100
})
  .asc((obj) => obj.a.b.c)
  .then(() => {
    console.log('done');
  })
  .catch(console.error);

asc sort of objects by properties a, b.c and d separate with \n

import fs from 'fs';
import esort from 'external-sorting';

await esort({
  input: fs.createReadStream('input_file'),
  output: fs.createWriteStream('output_file'),
  tempDir: __dirname,
  deserializer: JSON.parse,
  serializer: JSON.stringify,
  maxHeap: 100
})
  .asc([
    (obj) => obj.a,
    (obj) => obj.b.c,
    (obj) => obj.d
  ])
  .then(() => {
    console.log('done');
  })
  .catch(console.error);

Benchmark

Mean [s] Min [s] Max [s] Relative
sort 500,000 string 2.637 ± 0.048 2.570 2.714 1.00
sort 500,000 number 3.691 ± 0.259 3.442 4.234 1.40 ± 0.10
sort 500,000 object 5.039 ± 0.262 4.741 5.407 1.91 ± 0.11
sort 1,000,000 string 5.887 ± 0.637 5.105 6.820 2.23 ± 0.24
sort 1,000,000 number 6.978 ± 0.499 6.531 7.966 2.65 ± 0.20
sort 1,000,000 object 9.665 ± 0.111 9.522 9.791 3.66 ± 0.08
Model Name: MacBook Pro
Model ID: MacBookPro14.3
Processor Name: Quad-Core Intel Core i7
Processor speed: 2.8 GHz
Number of processors: 1
Total number of cores: 4
Level 2 cache (per core): 256 KB
Level 3 cache: 6 MB
Hyper-Threading Technology: Enabled
Memory: 16 GB
SSD: Apple SM0512L

TODO

  • support .by of fast-sort

Credits

Thanks to @snovakovic for the fast-sort package, you can find it on NPM or GitHub

License

This project is licensed under the MIT License - see the LICENSE file for details