A simple C# program to collect mail addresses from websites. Implements a simple node research algorithm, something looking like A*.
It opens pages, download the HTML, and parse it to extract simple mail addresses.
- Langage: C#
- Plaform: .Net Core 3.1.0
- IDE used: Visual studio community 2019
- Using Visual Studio: double-click
MailCollector.sln
, press the green play button, it will run the test according to the args inProperties/launchSettings.json
. - Using shell:
dotnet run --tests
for launching tests, ordotnet run [PATH] [DEPTH]
, PATH being the path to the file you want to open, and DEPTH the depth you want to explore. - Working examples :
dotnet run http://www.csszengarden.com/ 1
dotnet run ./TestCases/ProvidedTest/index.html 3
using System.Collections.Generic;
using AleungcMailCollector;
static void Main(string[] args)
{
List<string> emailList = new List<string>();
MailCollector collector = new MailCollector();
WebBrowser browser = new WebBrowser();
emailList = collector.GetEmailsInPageAndChildPages(browser, "PATH_TO_PAGE"), 0);
}
- It's a bit slow. Global node searching algorithms tend to be slow when not optimized.
- Does not work with https or addresses that deviate too much from the examples. Would require some more work to handle more cases.