webpack-contrib/copy-webpack-plugin

The new asyncMemoize function should memoize the original promise

Closed this issue · 4 comments

Modification Proposal

The new asyncMemoize function introduced in v12.0.1 (#760) waits until the promise returned by the memoized function is resolved before "activating" the cache.

Relevant snippet:

result = await /** @type {function(): any} */ (fn)();
cache = true;

The problem is that if the function returned by asyncMemoize is called again before the first function resolves (or rejects), the memoized function is executed a second time.

In my opinion, asyncMemoize should do exactly what memoize does when memoizing an async function: The returned function returns the promise that the memoized function returns.

I guess asyncMemoize exists because of TypeScript? I don't use TypeScript, so I don't know what is being solved by having a dedicated function to memoize async functions instead of just using memoize.

Expected Behavior / Situation

A function returned by asyncMemoize should not execute the memoized function more than once. Subsequent executions of the returned function should return the value that the memoized function returned the first time it was executed.

Actual Behavior / Situation

A function returned by asyncMemoize can execute the memoized function a second time (or more) if the returned function is executed a second time when the promise returned by the memoized function is still pending.

Please paste the results of npx webpack-cli info here, and mention other relevant information

This is not a problem yet, but it could be, so better fix it before it becomes a problem.

Sorry, I think you're a little mistaken, look at code:

const getGlobby = asyncMemoize(async () => {
  // @ts-ignore
  const { globby } = await import("globby");

  return globby;
});

/**
 * @template T
 * @param fn {(function(): any) | undefined}
 * @returns {function(): Promise<T>}
 */
function asyncMemoize(fn) {
  let cache = false;
  /** @type {T} */
  let result;

  return async () => {
    if (cache) {
      return result;
    }

    result = await /** @type {function(): any} */ (fn)();
    cache = true;

    // Allow to clean up memory for fn
    // and all dependent resources
    // eslint-disable-next-line no-undefined, no-param-reassign
    fn = undefined;

    return result;
  };
}

When I call getGlobby():

  • We don't have cache, so we run memorized function (using await), i.e. where we are having const { globby } = await import("globby");
  • After primise was resolved we have cache = globby;, i.e. we cached globby function
  • But because we have return async () => { /* logic */ } we are returning Promise and we need await it

So we don't run import("globby") multiple time, only once.

A function returned by asyncMemoize should not execute the memoized function more than once.

memoize works the same, every call getSomething() calls returned function and check cache, no cache - run memorized function.

Subsequent executions of the returned function should return the value that the memoized function returned the first time it was executed.

We already do it, but because we need to make an async operation we can't just return globby, only Promise<globby>

Why not just memoize? Because in such case we will return memorized function and every await will run code with import(...) and there's no point in doing this, you can just write await import("globby")

Anyway feel free to feedback

Ok, let's run the asyncMemoize globby example step by step:

  1. getBlobby is assigned a memoized function:
    const getGlobby = asyncMemoize(async () => {
      // @ts-ignore
      const { globby } = await import("globby");
    
      return globby;
    });
    • On this step, the memoized function has cache === false and result === undefined, and fn hasn't been called yet.
  2. getBlobby is called and returns a Promise
    • getBlobby sees cache === false, so it calls fn.
    • The Promise returned by getBlobby is not fulfilled until the promise returned by fn is fulfilled.
    • cache is still false, as getBlobby is awating the Promise returned by fn
  3. If at this point getBlobby is called again, fn will be called again, because cache === false.
  4. When the Promise returned by the first call to fn resolves, cache is set to true, result is set to the resolved value and the Promise returned by the first call to getBlobby is resolved with result.
  5. When the Promise returned by second the call to fn resolves, cache is set to true (no changes), result is set to the new resolved value and the Promise returned by the second call to getBlobby is resolved with the new result.
  6. From this point onwards, calls to getBlobby don't call fn.The returned Promise is a new Promise each time, and the resolved value is one corresponding to the second call to fn.

Now, let's run the memoize (not async) globby example step by step:

  1. getBlobby is assigned a memoized function:
    const getGlobby = memoize(async () => {
      // @ts-ignore
      const { globby } = await import("globby");
    
      return globby;
    });
    • On this step, the memoized function has cache === false and result === undefined, and fn hasn't been called yet.
  2. getBlobby is called and returns a Promise
    • getBlobby sees cache === false, so it calls fn.
    • Because there is no await, getBlobby immediately sets cache = true and result is assigned the Promise returned by fn.
    • The Promise returned by getBlobby is not fulfilled until the promise returned by fn is fulfilled, because they are the same Promise.
  3. If at this point getBlobby is called again, fn will not be called again, because cache === true.
    • The Promise returned by this call to getBlobby is the same promise returned by the first call to getBlobby, which is the same Promise that fn returned.
  4. When the Promise returned by fn resolves, the Promises returned by the first and second calls to getBlobby are resolved with the same value, because these three Promises are the same.
  5. From this point onwards, calls to getBlobby still return the same (resolved) Promise that fn returned.

I think that this should be fixed mostly because IMO asyncMemoize is not working as intended.

Also, Node already caches modules, so maybe the memoizing part should be removed entirely and just the lazy loading part should be kept. E.g.

const getGlobby = async () => {
  // @ts-ignore
  const { globby } = await import("globby");

  return globby;
};

The Promise returned by getBlobby is not fulfilled until the promise returned by fn is fulfilled.
cache is still false, as getBlobby is awating the Promise returned by fn

We have result = await /** @type {function(): any} */ (fn)();, so result is globby and cache is true and on the second run we return cache

Also, Node already caches modules, so maybe the memoizing part should be removed entirely and just the lazy loading part should be kept. E.g.

Yes, Node.js already caches modules, but caching on Node.js side is not fast as we want, just try it.

I think that this should be fixed mostly because IMO asyncMemoize is not working as intended.

It works, you are saying about caching different things, that is why you are thinking it doesn't work 😄

Anyway I made benchmark:

/**
 * @template T
 * @param fn {(function(): any) | undefined}
 * @returns {function(): T}
 */
function memoize(fn) {
  let cache = false;
  /** @type {T} */
  let result;

  return () => {
    if (cache) {
      return result;
    }

    result = /** @type {function(): any} */ (fn)();
    cache = true;
    // Allow to clean up memory for fn
    // and all dependent resources
    // eslint-disable-next-line no-undefined, no-param-reassign
    fn = undefined;

    return result;
  };
}

/**
 * @template T
 * @param fn {(function(): any) | undefined}
 * @returns {function(): Promise<T>}
 */
function asyncMemoize(fn) {
  let cache = false;
  /** @type {T} */
  let result;

  return async () => {
    if (cache) {
      return result;
    }

    result = await /** @type {function(): any} */ (fn)();
    cache = true;

    // Allow to clean up memory for fn
    // and all dependent resources
    // eslint-disable-next-line no-undefined, no-param-reassign
    fn = undefined;

    return result;
  };
}

const getGlobby = memoize(async () => {
  // @ts-ignore
  const { globby } = await import("globby");

  return globby;
});

const getGlobbyAsync = asyncMemoize(async () => {
  // @ts-ignore
  const { globby } = await import("globby");

  return globby;
});

const { Benchmark } = require("benchmark");

const suite = new Benchmark.Suite();

function p(fn) {
  return {
    defer: true,
    async fn(deferred) {
      await fn();
      deferred.resolve();
    },
  };
}

// Warn up
(async function warnup() {
  await import("globby");
})();

// add tests
suite
  .add(
    'await import("globby")',
    p(async () => {
      await import("globby");
    }),
  )
  .add(
    "await getGlobby()",
    p(async () => {
      await getGlobby();
    }),
  )
  .add(
    "await getGlobbyAsync()",
    p(async () => {
      await getGlobbyAsync();
    }),
  )
  // add listeners
  .on("cycle", (event) => {
    console.log(String(event.target));
  })
  .on("complete", function () {
    console.log(`Fastest is ${this.filter("fastest").map("name")}`);
  })
  // run async
  .run({ async: true });

And got:

await import("globby") x 162,695 ops/sec ±2.66% (84 runs sampled)
await getGlobby() x 5,478,279 ops/sec ±0.79% (86 runs sampled)
await getGlobbyAsync() x 4,613,087 ops/sec ±0.72% (88 runs sampled)
Fastest is await getGlobby()

So caching the original promises is faster, I will fix it

So caching the original promises is faster, I will fix it

Thanks!

So the bug won't happen now, but just to be clear, the problem with asyncMemoize happens when the memoized function is called a second time before the Promise returned by the first call resolves.

E.g.

const promise1 = getGlobbyAsync() // No await
// cache is still false at this point
const promise2 = getGlobbyAsync() // This calls import a second time

const [globby1, globby2] = await Promise.all(promise1, promise2)
// From here onwards cache is true, so getGlobbyAsync won't call import again