Question: CPU usage overhead vs console.log

Question

Question: CPU usage overhead vs console.log

Closed this issue 3 months ago · 10 comments

Hi - I recently switched some semi-high-frequency logs (~50/second per pod) from console.log to Pino and have noticed that it's directly responsible for nearly doubling CPU utilization, which has resulted in the need for extra pods to come online to match our usage thresholds in GKE.

I would have expected a small performance penalty, but not nearly 2x. The following two graphs come from pods running the same code, one from each of these two branches (where USE_PINO is assigned at random at boot time):

  if (USE_PINO) {
    LOG.debug(`SomeString[${var1}][${new Date().toISOString()}][${var2}][${var3}][${var4}]`);
  } else {
    console.log(`SomeString[${var1}][${new Date().toISOString()}][${var2}][${var3}][${var4}]`);
  }

where LOG is a global defined as

import pino from 'pino'
globalThis.LOG = pino({ base: {}, level: "debug", transport: undefined });

(I know the string interpolation is not the best way to do this, but it should be equally bad in either branch)

Is this an expected amount of overhead? I've attempted some local profiling and load testing and am unable to discern the root cause, so wanted to ask if you had any ideas. Thank you!

Answer 1 · 2024-02-07T09:34:30.000Z

Generating JSON requires slightly more overhead than just strings. Specifically, your long strings must be escaped to be valid JSON.
On top of that, the algorithm used for serializing objects differs from the one used by console.log, which limits the depth.

I see more CPU utilization in your graph but I hardly see a reason for extra pods and usage threshold. (If you keep Node.js CPU usage at anything lower than 70-80%, you are wasting a lot of money).

Answer 2 · 2024-02-07T16:09:08.000Z

Generating JSON requires slightly more overhead than just strings. Specifically, your long strings must be escaped to be valid JSON. On top of that, the algorithm used for serializing objects differs from the one used by console.log, which limits the depth.

I see more CPU utilization in your graph but I hardly see a reason for extra pods and usage threshold. (If you keep Node.js CPU usage at anything lower than 70-80%, you are wasting a lot of money).

Thanks for the quick reply! We've got utilization at 60% (I should bump that up), but this service is only asking for .5 CPU, which is why they're now hitting the horizontal autoscaler:

If the answer is that this is not unexpected, I'm OK with needing using resources (or tweaking what we currently have to better utilize them) - I was just a bit surprised at how much overhead there was and thought it was worth asking. Thanks again.

Answer 3 · 2024-02-07T16:35:32.000Z

Can you please provide some numbers that show the discrepancy? The graphs are not very helpful.

Answer 4 · 2024-02-07T16:50:18.000Z

Can you please provide some numbers that show the discrepancy? The graphs are not very helpful.

Sure! What numbers specifically would be useful? I can't take an actual CPU profile on the production instances, but I can share the numbers that underlie the charts. For this service, the k8s configuration is set to:

limits:
  cpu: 1
requests:
  cpu: .5

minReplicas: 4
maxReplicas: 8

name: cpu
target:
  averageUtilization: 60

From the third chart I posted (which overlaps the instant where we deployed Pino), before Pino we were generally hovering between 4-6 replicas with ~60% of .5 CPU each for a total of 1.2 - 1.4 CPUs used. After the Pino change, we immediately scaled up to our max of 8, and the usage is at 2.5 CPUs. This is more than 60% of our max of 4 CPUs (8 max replicas * .5 CPU), but isn't scaling further because of the cap.

I totally get that it's more expensive to use structured logging than a string to stdout, I was just a bit surprised that it's nearly doubling our CPU usage vs somewhere in the 10% overhead range.

This isn't causing a noticeable delay in processing, however. I've measured wall time with this:

let timeSpent = 0;
setInterval(() => {
  console.log(`PINO_PERFORMANCE_PROFILING USE_PINO=[${USE_PINO}] timeSpent=[${timeSpent}]`);
  timeSpent = 0;
}, 10_000);

...

const start = performance.now();
if (USE_PINO) {
  LOG.debug(...);
} else {
  console.log(...);
}
timeSpent += performance.now() - start;

The actual time spent in both paths is the same, it's just that one path requires a lot more CPU.

Answer 5 · 2024-02-08T14:31:11.000Z

I'm still not understanding what "a lot more CPU" actually means. Something like:

'use strict'

const pino = require('pino')
const log = pino({ destination: '/dev/null' })
const { PerformanceObserver, performance } = require('node:perf_hooks');
const var1 = 'one'
const var2 = 'two'
const var3 = 'three'
const var4 = 'four'

const obs = new PerformanceObserver((items) => {});
obs.observe({ type: 'measure' });

const cpuMetrics = []
let cpu = process.cpuUsage()
cpuMetrics.push(cpu)

performance.mark('a', { detail: cpu })
for (let i = 0; i < 10_000; i += 1) {
  console.log(`SomeString[${var1}][${new Date().toISOString()}][${var2}][${var3}][${var4}]`);
}
cpu = process.cpuUsage(cpu)
cpuMetrics.push(cpu)
performance.mark('b', { detail: cpu })

cpu = process.cpuUsage(cpu)
cpuMetrics.push(cpu)
performance.mark('c', { detail: cpu })
for (let i = 0; i < 10_000; i += 1) {
  log.info(`SomeString[${var1}][${new Date().toISOString()}][${var2}][${var3}][${var4}]`)
}
cpu = process.cpuUsage(cpu)
cpuMetrics.push(cpu)
performance.mark('d', { detail: cpu })

performance.measure('console.log', 'a', 'b')
performance.measure('pino.info', 'c', 'd')

setTimeout(() => {
  console.log('\n\n\n\n\n')
  for (const entry of performance.getEntriesByType('mark')) {
    console.log(entry)
  }

  const time1 = cpuMetrics[1].user - cpuMetrics[0].user
  const time2 = cpuMetrics[3].user - cpuMetrics[2].user
  console.log(`console.log cpu time: ${time1}µs`)
  console.log(`pino.info cpu time: ${time2}µs`)
}, 2_000)

Which on my system produces:

PerformanceMark {
  name: 'a',
  entryType: 'mark',
  startTime: 124.7484188079834,
  duration: 0,
  detail: { user: 38019, system: 34800 }
}
PerformanceMark {
  name: 'b',
  entryType: 'mark',
  startTime: 179.44191932678223,
  duration: 0,
  detail: { user: 50136, system: 7629 }
}
PerformanceMark {
  name: 'c',
  entryType: 'mark',
  startTime: 179.4865436553955,
  duration: 0,
  detail: { user: 38085, system: 34815 }
}
PerformanceMark {
  name: 'd',
  entryType: 'mark',
  startTime: 194.47896003723145,
  duration: 0,
  detail: { user: 70935, system: 9116 }
}
console.log cpu time: 12117µs
pino.info cpu time: 32850µs

Answer 6 · 2024-02-08T16:48:37.000Z

I'll wrap the logs with process.cpuUsage and report back.

A question from your example: since you're using process.cpuUsage(cpu) for the second call, wouldn't cpuMetrics[1] already hold the delta in usage from element 0 and so we shouldn't subtract again for time1? I haven't used this API before, but the docs say

The result of a previous call to process.cpuUsage() can be passed as the
argument to the function, to get a diff reading.

Answer 7 · 2024-02-08T20:33:38.000Z

OK, I've got some updated numbers following the above suggestion achieved with:

const USE_PINO = Math.random() > 0.5;
let timeSpentUser = 0;
let timeSpentSystem = 0;
let numCalls = 0;
setInterval(() => {
  console.log(
    `PINO_PERFORMANCE_PROFILING USE_PINO=[${USE_PINO}] timeSpentSystem=[${timeSpentSystem}] timeSpentUser=[${timeSpentUser}] numCalls=[${numCalls}]`
  );
  timeSpentUser = 0;
  timeSpentSystem = 0;
  numCalls = 0;
}, 30_000);

...

function doThing() {
  
  ...

  const start = process.cpuUsage();
  if (USE_PINO) {
    LOG.debug(`SomeString[${var1}][${new Date().toISOString()}][${var2}][${var3}][${var4}]`);
  } else {
    console.log(`SomeString[${var1}][${new Date().toISOString()}][${var2}][${var3}][${var4}]`);
  }
  const end = process.cpuUsage(start); // tracks delta since we provided start
  timeSpentUser += end.user;
  timeSpentSystem += end.system;
  numCalls++;

  ...

}

Here's a sample of some of the results (sorted for easier scanning). Sometimes user CPU is higher with Pino, sometimes it's lower. The system time however is always quite a bit higher with Pino.

USE_PINO=[false] timeSpentSystem=[3483]  timeSpentUser=[52337] numCalls=[762]
USE_PINO=[false] timeSpentSystem=[6560]  timeSpentUser=[77116] numCalls=[1144]
USE_PINO=[false] timeSpentSystem=[6651]  timeSpentUser=[60171] numCalls=[744]
USE_PINO=[false] timeSpentSystem=[6664]  timeSpentUser=[64707] numCalls=[815]
USE_PINO=[false] timeSpentSystem=[6777]  timeSpentUser=[53418] numCalls=[725]
USE_PINO=[false] timeSpentSystem=[7417]  timeSpentUser=[55363] numCalls=[741]
USE_PINO=[false] timeSpentSystem=[7924]  timeSpentUser=[82102] numCalls=[1185]
USE_PINO=[false] timeSpentSystem=[8187]  timeSpentUser=[75061] numCalls=[1086]
USE_PINO=[false] timeSpentSystem=[8373]  timeSpentUser=[82283] numCalls=[1063]
USE_PINO=[false] timeSpentSystem=[9058]  timeSpentUser=[75110] numCalls=[1095]
USE_PINO=[false] timeSpentSystem=[10542] timeSpentUser=[90660] numCalls=[1153]
USE_PINO=[true]  timeSpentSystem=[33149] timeSpentUser=[32700] numCalls=[786]
USE_PINO=[true]  timeSpentSystem=[42430] timeSpentUser=[40303] numCalls=[897]
USE_PINO=[true]  timeSpentSystem=[43592] timeSpentUser=[38381] numCalls=[863]
USE_PINO=[true]  timeSpentSystem=[50531] timeSpentUser=[45990] numCalls=[1125]
USE_PINO=[true]  timeSpentSystem=[53014] timeSpentUser=[57296] numCalls=[1032]
USE_PINO=[true]  timeSpentSystem=[54243] timeSpentUser=[59892] numCalls=[1091]
USE_PINO=[true]  timeSpentSystem=[55062] timeSpentUser=[52429] numCalls=[1201]
USE_PINO=[true]  timeSpentSystem=[64511] timeSpentUser=[54446] numCalls=[853]
USE_PINO=[true]  timeSpentSystem=[71357] timeSpentUser=[82430] numCalls=[1029]
USE_PINO=[true]  timeSpentSystem=[76478] timeSpentUser=[66858] numCalls=[1140]
USE_PINO=[true]  timeSpentSystem=[95019] timeSpentUser=[62797] numCalls=[1122]

Here are the associated CPU charts for the two nodes

Thanks for taking a look at this! I think a likely answer is that this is indeed just unavoidable overhead, but I really appreciate the sanity check.

Answer 8 · 2024-02-08T20:36:37.000Z

A question from your example: since you're using process.cpuUsage(cpu) for the second call, wouldn't cpuMetrics[1] already hold the delta in usage from element 0 and so we shouldn't subtract again for time1? I haven't used this API before, but the docs say

My script may not be accurate. I cobbled it together during a short downtime window.

Answer 9 · 2024-02-08T22:01:01.000Z

The overhead is likely the result of the escaping cost of the massive string. If you don’t care about it and you just want a faster console.log, you could use sonic-boom directly.

Answer 10 · 2024-03-11T00:04:36.000Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.