SimpleBrowserDotNet/SimpleBrowser

Thread Safety, Memory Leaks etc

Closed this issue · 7 comments

I discovered some problems using SimpleBrowser in multithreaded web-crawler
There are List<Browser> _allWindows member so it's impossible to run multiple isolated threads. It also cause memory leaks if you don't call Close

I propose to store opened windows @ top-level browser instead of static field. This way you can run multiple isolated browser without affecting each other...

I'm too lame to setup Git onto this PC so I just post updated file to Gist:
https://gist.github.com/dimzon/466f94f469c17f9bbc81

While your changes in Browser.cs compile, these changes break the unit tests (specifically SimpleBrowser.UnitTests\OfflineTests\WindowsAndFrames.cs). Any change that doesn't completely compile cannot be integrated.

I don't mind importing the change for you, if you don't have Git installed, but I'm not going to update unit tests for your change.

Teun commented

Related question: what exactly is the top window? If I open several firefox
windows they do affect each other (when they use the same names in target
attributes). We should try to follow that pattern IMO.

Teun
Op 27 jun. 2014 21:42 schreef "Kevin Yochum" notifications@github.com:

While your changes in Browser.cs compile, these changes break the unit
tests (specifically
SimpleBrowser.UnitTests\OfflineTests\WindowsAndFrames.cs). Any change that
doesn't completely compile cannot be integrated.

I don't mind importing the change for you, if you don't have Git
installed, but I'm not going to update unit tests for your change.


Reply to this email directly or view it on GitHub
#86 (comment)
.

My general impression of this change is that it was hacked together to meet the specific need(s) of the OP rather than considering the best approach for SimpleBrowser. My intention was to re-create his change in my fork and create a pull request for group review before bringing it into the official repository.

Related question: what exactly is the top window?

My general impression of this change is that it was hacked together to meet the specific need(s) of the OP rather than considering the best approach for SimpleBrowser.

The SimpleBrowser is a library for unit testing and site automation (web-crawlers) isn't it? Unfortunally it's impossible to use it in multi-threaded environment since design flaws. You need a way to isolate different "transactions". If you don't like "top window" concept just take shared state out to separate "BrowserContext" object so Browsers created by different context behave like they are running on different PCs.

btw I'm using SImpleBrowser in multi-threaded environment like a web-crawler (tour search aggregation engine) to deal with ASP.NET forms and their annoying postbacks. And it's working on production servers (up to 500 users at same time).

PS. I really like yet another part of SImpleBrowser - your HTML Parser, seems like it's much stable than HTMLAgilityPack. Can You refactor it to separate independent class library?

PPS. Yet another feature requests

  1. OnBeforeFrameLoadEvent - to be able to skip IFRAME loading/parsing via EventArgs.Cancel=true
  2. OnHttpErrorEvent - to be able to retry http-request attempt via EventArgs.Retry=true
Teun commented

I agree with Dmitry that it would be good to allow for isolated sets of
browsers that do not interact with each other.

As for multi-threading, we never designed for concurrency, so it would not
surprise me if there are issues of thread-safety. However, the fact that
the windows collection is shared by the whole process is not a concurrency
issue, but just the design. But as said, we might want to reconsider.

Possible solution: the browser has a reference to a "windowspace". When we
create new brwosers from the current browser (as with frames and navigating
a target link), we pass in the window space. But if you create a new
browser unsing new SimpleBrowser(), it would get a fresh windowspace.

It may break existing unit tests, but I wouldn't expect it to break any
existing genuine clients.

Teun
Op 28 jun. 2014 02:14 schreef "Dmitry" notifications@github.com:

Related question: what exactly is the top window?

My general impression of this change is that it was hacked together to
meet the specific need(s) of the OP rather than considering the best
approach for SimpleBrowser.

The SimpleBrowser is a library for unit testing and site automation
(web-crawlers) isn't it? Unfortunally it's impossible to use it in
multi-threaded environment since design flaws. You need a way to isolate
different "transactions". If you don't like "top window" concept just take
shared state out to separate "BrowserContext" object so Browsers created by
different context behave like they are running on different PCs.

btw I'm using SImpleBrowser in multi-threaded environment like in
web-crawler to deal with ASP.NET forms and their annoying postbacks.

PS. I really like yet another part of SImpleBrowser - you HTML Parser,
seems like it's much stable than HTMLAgilityPack. Can You refactor it to
separate independent class library?


Reply to this email directly or view it on GitHub
#86 (comment)
.

dimzon said:

The SimpleBrowser is a library for unit testing and site automation (web-crawlers) isn't it?

It is. I never said that what you were trying to fix wasn't needed or worthwhile. I said that your approach seemed to be what you needed, not necessarily what SimpleBrowser needed. I've looked at your code. On the surface, the changes appear to be short-sighted and lacking in design. Your solution works (I assume). I'm just not convinced that it is the best or most elegant solution to the problem. I simply expressed that concern. Teun went a step further, offering suggestions for improvement.

Teun said:

I agree with Dmitry that it would be good to allow for isolated sets of browsers that do not interact with each other.

I agree also.

Teun said:

It may break existing unit tests, but I wouldn't expect it to break any existing genuine clients.

The unit tests were main concern. Failing unit tests is as serious an issue as breaking the build. I was going to let the question of whether the actual code changes were appropriate be decided by code review. Just because I was unimpressed doesn't mean that the changes won't be accepted. This is a team-driven project, not a Kevin-driven project.

dimzon said:

PS. I really like yet another part of SImpleBrowser - your HTML Parser, seems like it's much stable than HTMLAgilityPack. Can You refactor it to separate independent class library?
PPS. Yet another feature requests

  1. OnBeforeFrameLoadEvent - to be able to skip IFRAME loading/parsing via EventArgs.Cancel=true
  2. OnHttpErrorEvent - to be able to retry http-request attempt via EventArgs.Retry=true

These are all great suggestions. I'll create separate issues for them so that they can be tracked independently rather than being caught up in the mix.

Kevin

Integrated by Teun:
Author: Teun Duynstee github@duynstee.com#mailto:github@duynstee.com
Date: 4 months ago (6/9/2014 4:59:41 PM)
Commit hash: a8f86f6