HF Vault
A crawler for the Hammerfest forum.
Table of Contents
About
On the 27th of March 2020, Motion Twin published a heartbreaking message on their Twinoid platform. As Flash is nearing its end of support, they will gradually close and remove their older games from the web.
Moments after the official announcement, hundreds of players from multiple countries united to save these games and memories. Thus was born Eternal-Twin, a project dedicated to save the communities and memories of this part of the video game history, led by Demurgos and MrPapy.
HF Vault is a crawler for archiving the Hammerfest forum. It scrapes each and every page and stores the data in a PostgreSQL database, while computing the right year for each forum post.
This project is for documentation purpose only. The forum has already been archived. Please don’t abuse the poor servers from 2006.
Getting started
Prerequisites
- .NET Core 3.1 runtime
- PostgreSQL ⩾ 11 database initialised with the latest migration
- There must be a
config.json
file in your CWD configured as:{ "ConnectionStrings": { "hf-vault": <fill in your connection string> } }
Usage
Synopsis
dotnet <path-to-hf-vault.dll> --help
dotnet <path-to-hf-vault.dll> [-r REALM]
Or if running the SDK in the project’s root:
dotnet run -- --help
dotnet run -- [-r REALM]
Options
Mandatory arguments to long options are mandatory for short options too.
-r, --realm=REALM
Tell the scraper which host to use.
Value | Host |
---|---|
FR |
"http://www.hammerfest.fr" |
EN |
"http://www.hfest.net" |
ES |
"http://www.hammerfest.es" |
-h, --help
Display the usage and exit.
Compiling
Prerequisites
- .NET Core 3.1 runtime
- PostgreSQL ⩾ 11 database initialised with the latest migration
- Add the connection string to
./src/development.settings.json
- Copy
./src/development.settings.json
to the project root (next to the README) asconfig.json
.
Your project directory should look like this:
.
├── AUTHORS.md
├── ChangeLog.md
├── config.json
├── COPYING
├── hf-vault.fsproj
├── migration
│ └── ...
├── NEWS.md
├── README.md
└── src
├── development.settings.json
├── Dto
│ └── ...
├── Forum
│ └── ...
└── ...
Building
To build the project (by default to ./bin/Debug/netcoreapp3.0/hf-vault.dll
) run:
dotnet build
Or just type
dotnet run -- --help
To build and run the project. If it shows the usage and some log lines, it’s all good!
! Warning ! The build will fail if the database hasn’t been properly initialised.
Overview
./src
: all the source files../src/HfVault.fs
: entry point and crawler implementation../src/Forum
: these modules represent a hierarchical layer of the forum (root > theme > thread > post), every module provides a type and aload
function../src/Dto
: these modules represent the logic using types and help validate the data before inserting it into the database.
Libraries used
- HtmlAgilityPack: parsing HTML
- Aether: optics
- FSharp.Data.Npgsql: PostgreSQL type provider
- Logary: logging
- Hopac: mainly used for logging, the rest is classic F# Async
Acknowledgements
- Motion Twin whose awesome games have created great moments in all of France’s middle schools
- Eternalfest for keeping this game alive
- Eternal-Twin for keeping the memories alive
License
Distributed under the GNU General Public License v3.