hardkoded/puppeteer-sharp

PuppeteerSharp "leaks" temporary files on exit

MarkPflug opened this issue · 10 comments

Description

PuppeteerSharp (or the browser process) leaves ~8mb of files in temp folder. This happens when the Browser object is disposed (async) immediately before process exit. We have a commandline utility that uses PuppeteerSharp to generate PDF documents. We found that calling it repeatedly to process a large number of PDFs would eventually fill the C: drive with temp file detritus. We discovered a workaround, which was to add a delay (await Task.Delay(1000)) after the browser disposal before exiting the process. With this delay, the temp files were deleted.

The symptom makes me think there is some un-awaited Task in the browser dispose code that is still pending after DisposeAsync completes. Reviewing the code, I don't see any obvious culprits.

Complete minimal example reproducing the issue

This code example is a C# top-level program. The only non-minimal part is adjusting the temp directory to make it easier to see the files that are created by isolating them from the base temp folder. I don't know how to make this a unit test, since it requires quick process termination to illustrate the issue.

using PuppeteerSharp;

// set temp folder env var to cause browser to write to a 
// specific location to make it easier to isolate the files
// issue repros without this, this just makes it easier to see
// and cleanup
var temp = Path.GetTempPath();
temp = Path.Combine(temp, "Puppeteer");
Directory.CreateDirectory(temp);
Environment.SetEnvironmentVariable("TMP", temp);

// repros with both msedge and chrome, I just happen to have edge installed
var opts = new LaunchOptions { ExecutablePath = "C:\\Program Files (x86)\\Microsoft\\Edge\\Application\\msedge.exe" };
await using (var b = await Puppeteer.LaunchAsync(opts))
{

} // <-- browser cleanup

// without the delay, the temp folder will have a subfolder with ~8mb
// with this delay, those files are deleted before exit
//await Task.Delay(1000); // <-- uncomment this line to demonstrate the workaround

Expected behavior:

The files in temp to be deleted without requiring a delay.

Actual behavior:

Each invocation of this program (without the delay) will leave ~8mb of files in the temp folder.

Versions

OS: both windows 10 and windows server 2022.
PuppeteerSharp: 18.0.3 (latest), but also repros on 10.1.0.
Framework: net framework 4.8 and net 8.0.

Additional Information

This isn't a high priority issue for us, since the workaround seems to resolve the issue. The behavior does feel like a bug though, so thought it worth reporting.

Further information: hitting a breakpoint right after the DisposeAsync returns, I see a thread with the following callstack:

System.Private.CoreLib.dll!Interop.Kernel32.DeleteFilePrivate(string path) Line 1573	C#
 	System.Private.CoreLib.dll!System.IO.FileSystem.RemoveDirectoryRecursive(string fullPath, ref Interop.Kernel32.WIN32_FIND_DATA findData, bool topLevel) Line 482	C#
 	System.Private.CoreLib.dll!System.IO.FileSystem.RemoveDirectoryRecursive(string fullPath, ref Interop.Kernel32.WIN32_FIND_DATA findData, bool topLevel) Line 502	C#
 	System.Private.CoreLib.dll!System.IO.FileSystem.RemoveDirectoryRecursive(string fullPath, ref Interop.Kernel32.WIN32_FIND_DATA findData, bool topLevel) Line 502	C#
 	System.Private.CoreLib.dll!System.IO.FileSystem.RemoveDirectoryRecursive(string fullPath, ref Interop.Kernel32.WIN32_FIND_DATA findData, bool topLevel) Line 502	C#
 	System.Private.CoreLib.dll!System.IO.FileSystem.RemoveDirectory(string fullPath, bool recursive) Line 438	C#
 	PuppeteerSharp.dll!PuppeteerSharp.Helpers.TempDirectory.DeleteAsync(string path)	Unknown
 	PuppeteerSharp.dll!PuppeteerSharp.Helpers.TempDirectory.DisposeCore()	Unknown
 	PuppeteerSharp.dll!PuppeteerSharp.Helpers.TempDirectory.Dispose()	Unknown
 	PuppeteerSharp.dll!PuppeteerSharp.States.ExitedState.EnterFrom(PuppeteerSharp.LauncherBase p, PuppeteerSharp.States.State fromState)	Unknown
 	PuppeteerSharp.dll!PuppeteerSharp.States.ProcessStartingState.StartCoreAsync.__OnProcessExited|2(object sender, System.EventArgs e)	Unknown
 	System.Diagnostics.Process.dll!System.Diagnostics.Process.RaiseOnExited()	Unknown
 	System.Diagnostics.Process.dll!System.Diagnostics.Process.CompletionCallback(object waitHandleContext, bool wasSignaled)	Unknown
 	System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state) Line 138	C#
 	System.Private.CoreLib.dll!System.Threading.RegisteredWaitHandle.PerformCallback(bool timedOut) Line 226	C#
 	System.Private.CoreLib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() Line 690	C#
 	System.Private.CoreLib.dll!System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart() Line 1328	C#

This is almost certainly the cause of the issue, and I suspect this task needs to be awaited as part of the dispose call. Hopefully, I'll be able to figure this out and submit a PR.

I'm wondering whether we might need a retry logic.

I don't think it's a retry issue. I'm not quite understanding the relationship between browser and launcher. The launcher is invisible from my perspective, at least from the API I'm using. It looks like the LauncherBase.TempUserDataDir is disposed synchronously in two State transitions: Exited and Disposed. I haven't quite unraveled how those are invoked, but it looks like there is a callback that triggers them? I was hoping it might be obvious to someone already familiar with the codebase.

I think that might be fixed by making TempDirectory AsyncDisposable

@MarkPflug can you confirm this fixes the issue?

I'm still seeing the same behavior, the temp file remain, with that change.

@MarkPflug can you give it another look?

That looks like it fixed it! We won't be able to immediately benefit from this, but I look forward to removing our Task.Delay hack at some point in the future. Much appreciate your help with this!

If you want I can create a new version.

A release with this fix would be great, even though we might not be able to use it until our next release.