SimpleBrowserDotNet/SimpleBrowser

Memory Leak - Again

Opened this issue · 15 comments

As indicated in Issue #120, memory leaks have been reported by iEmiya. Leaks can be reproduced with this test:

A simple test for memory leaks.

        [Test]
        public void MemoryLeak()
        {
            // look : exception
            long? useTotalMemory = null;
            var task = System.Threading.Tasks.Task.Factory.StartNew(() =>
            {
                long threadId = 1;
                while (true)
                {
                    long totalMemory = GC.GetTotalMemory(false);
                    Trace.WriteLine(string.Format("Use total memory: {0:#,##0} byte(s)", totalMemory));
                    if (threadId % 20 == 0)
                    {
                        if (!useTotalMemory.HasValue) useTotalMemory = totalMemory;
                        else
                        {
                            if (totalMemory > useTotalMemory) throw new ApplicationException("Memory leak in 'SimpleBrowser'");
                            break; // Ok
                        }
                    }


                    try
                    {
                        threadId++;
                        Browser b1 = new Browser(Helper.GetFramesMock());
                        b1.Navigate("http://localhost/");
                        if (threadId % 2 == 0) throw new ApplicationException();
                        b1.Close();

                        GC.WaitForPendingFinalizers();
                        GC.Collect();
                    }
                    catch (Exception e)
                    {
                        Trace.Fail("Failed to navigate");
                    }
                }
            });

            task.Wait();
        }
Teun commented

Could you explain a bit more? I have run this code. I see it throw an exception every other loop. I see the number of allocated byte also go up and down in this rhythm. What do we expect to see? What is the hypothesis here? Do we think that SimpleBrowser frees up something that it forgets to do when an exception happens?

What I see in this code is that the forced GC happens in certain cases (threadId % 2 == 1) and not in other cases (threadId % 2 == 0). Then it figures that the allocated memory is going up and down. Or do you guys see something different happening?

I haven't tried the code provided by @iEmiya myself. My SimpleBrowser application has been running continuously (24/7) for weeks without an apparent memory issue. I try not to fix problems that aren't problems. ;)

@kevingy my experience is completely opposite, 32GB less than 12 hours.
The correct code is working correctly.
However, no proper application should not lead to tragic consequences.

I think that the "error" in keeping unnecessary windows open to the global static variable.
I do not understand the reason for such an approach. I do not understand why do you select a solution (What sense to store, something that is not required).
In the absence of a global static variable for errors in the use of the class, debris will be removed GC, currently debris remains on forever =)

@Teun I apologize that I could not answer before.
The purpose of the test is that we open the pages and starts its processing. There are many reasons for an exception. If a developer forgets to call Close () method to close the page, it will lead to "memory leaks" as an open page will remain in the global static variable.
The reasons for this erroneous behavior may be a few, but the main - Exception Handling. The second is less trivial - the desire not to create each time an instance of the transition between pages indefinitely without calling Close (), as a result we store a huge amount do not need pages.

Teun commented

Ah, now I understand what you mean with the "memory leak". You mean that we
keep a reference to windows even if the code using SimpleBrowser is not. We
follow the pattern that real browsers normally do: when you instantiate new
windows, they stay around until you close them. You can still get a
reference to those windows and it would be a bug if windows are garbage
collected because you didn't keep a reference to them.

But I see your point. What would you suggest? We could track the references
not in a static variable, but somewhere else. Something that would go out
of scope once you unreference the last window in the scope.

2015-02-26 10:20 GMT+01:00 Emiya notifications@github.com:

@Teun https://github.com/Teun I apologize that I could not answer
before.
The purpose of the test is that we open the pages and starts its
processing. There are many reasons for an exception. If a developer forgets
to call Close () method to close the page, it will lead to "memory leaks"
as an open page will remain in the global static variable.
The reasons for this erroneous behavior may be a few, but the main -
Exception Handling. The second is less trivial - the desire not to create
each time an instance of the transition between pages indefinitely without
calling Close (), as a result we store a huge amount do not need pages.


Reply to this email directly or view it on GitHub
#121 (comment)
.

@iEmiya Having thought more about the problem, I am starting to agree that a static variable probably isn't the best approach, but removing the static and breaking all SimpleBrowser users who were appropriately and correctly calling Close() and ClearWindows() is not a good change. The change has to be backwards compatible enough to allow enough time for those of us who are cleaning up after ourselves to update our scripts without downtime. Perhaps leave the Close() and ClearWindows() methods in place without a body and marked with an Obsolete method attribute for a few months?

According to comments on Issue #108, @axefrog has a fix for all of this already. Perhaps we can nudge him to submit his changes. I suspect his change will close issues #121 and #108.

@Teun ^_^ "real browsers" sometimes eat all the memory too. I do not think it's worth taking.

@kevingy The error has been seen in #86, then the most obvious solution was to change the static variable to a local # 118 and small changes in the code.
Perhaps to save compatibility need to think about more in details interface, but in the implementation of #118 I have not noticed or anything to break the old functionality, if you use a local variable to SimpleBrowser.

Perhaps we should think over the use of access to a static variable, but unfortunately in my projects I did not use access to this property. If you have the opportunity showcased the sample code where you want to use such a field.

I stand by my position that @axefrog has already changed the code in question, but has not yet committed the change. In my opinion, the changes from #86 and #118 (since they are really the same changes) are short-sighted hacks, unnecessarily fixing a problem that's not a problem. Based upon comments on #108, the solution proposed by and supposedly coded by @axefrog is superior to the changes in #86 and #118. (Specifically, "When creating a new BrowserWindow class, you can specify an existing context object or let it create its own. The context object provides all of the functionality that was _previously_ in static properties for window management.")

I have said my piece. Do as you will (so long as you don't break the build for every SimpleBrowsr user who is properly cleaning up memory.)

@kevingy Okay. Let me give a simple example against the existence of the current statics:

Presented house which falls due only to the fact that it does not close the door.
Do not you find that would be weird. There is no malice. A man comes in, gets a pass, up the elevator, passes through several rooms and that logical, go out of the building.
But our building falls, when during the day accumulate a lot of open doors (note that some of us do not even touched door - they open themselves). I believe that this is a bad building.

I believe that the use of knowledge about the open door (browser windows) does not flow from which it is made a wrong decision in the very architecture.
I believe that now has access to this flow field is not safe to use it in the decision is a source of additional errors.

Use barriers, synchronization, Dispose or destructors - only complicate the decision as a whole and do not lead to anything but strange ideas between streaming access to all your open windows.
I believe that the simplest solution - remove it from the solution and use the current context.
You decide whether the "building" only to fall due to open "doors" or not. I believe that there is should be at the door closers.

This may be related to or duplicated by Issue #206

I tried to address this in a fork: https://github.com/mikaelliljedahl/SimpleBrowser
by making the Browser object implement IDisposable pattern.
I guess the problem was that the reference from _allWindows is never removed.
Static list of object references needs to be cleared which is done in the Close() method but you never reach it in the sample code.

I simply added: : IDisposable and this

public void Dispose()
{
foreach(var frame in Frames)
{
frame.Dispose();
}
this.Close();

    }

_allWindows isn't the main culprit of the leak. It contributes, but there is a more insidious leak.

I'd have to look for where it is exactly, but there is a collection of HTML elements that grows infinitely on every navigation just in case the user want's to click the back button from wherever they are back to the first page. If you navigate 100 times, then click back 100 times, the collection doesn't shrink. Navigating to a different page from the first page only grows the collection larger.

I highly recommend, when ever possible, open a browser, navigate as little a possible, then destroy the browser and create a new one next time. Don't ever create one browser and use it for a long time.

That is not the definition of a memory leak. If the object is not still in use, the GC should be able to collect. Your example of a never ending use of the same browser is not applicable for this issue. That is just outside the scope of the library to stop the programmer from doing :)

I am pretty sure I fixed the real problem with the disposable pattern.

That is not the definition of a memory leak.

I have never referred to this problem as a "memory leak." Others have, but I know the real reason for what appears to be a "memory leak" - and it's not a memory leak. I have investigated several threads on this same issue. They all point back to an infinitely growing collection. Which, as you say, is not a memory leak.

If the object is not still in use, the GC should be able to collect.

YES, EXACTLY!!! The items in the _allWindows or _logs collections will remain after being disposed, as they are still referenced by the _allWindows or _logs collections that contains them. The GC won't think they are eligible to be cleaned up. The objects are still in use. Even if the window is Close()'d, the memory is still referenced from _logs.

I don't know who implemented this, but it's a huge mess that is on my to do list to clean up - and stop infinitely consume memory. That mess can't be cleaned up by implementing IDisposable.

Your example of a never ending use of the same browser is not applicable for this issue. That is just outside the scope of the library to stop the programmer from doing :)

That is fair. My contention from the start was that SimpleBrowser should work like any other browser. In this case SimpleBrowser should work infinitely without consuming all of the system's memory. (That said, Google Chrome is currently using 4GB of RAM as I type this.) My recommendation was intended as a workaround until the real issue could be fixed - and implementing IDisposable is not the fix.

I am pretty sure I fixed the real problem with the disposable pattern.

You didn't, but you think you did - and you've been telling everyone who will listen that you have fixed it, but you haven't.

Did you test your hypothesis? Did you create a test where large memory usage was cleared by using your IDisposable implementation? Can you show me your data? I can't imagine how this would ever work.

Like you said, this isn't a memory leak, so why would the disposable pattern clean up anything? What you have is a fundamental misunderstanding of the disposable pattern and IDisposable implementations, in general.

I actually used it in production. Sorry I can't share the code, but the memory issues we had were solved by the "fix"

New Browser objects were created, did their scraping job, then was disposed by a timer triggered function in Azure.