/Spider-Index

Spider is a tool to crawl freesites in Freenet and create a static index of all freesites

Primary LanguageJavaApache License 2.0Apache-2.0

Spider

Spider crawls freesites in Hyphanet, extracts various information and creates an index of these freesites. It is currently used by me to create my uncensored index Spider 1 and to create my censored index Clean-Spider 1.

Requirements

Build

  1. Download the source code and extract it.
  2. Open a command prompt in the root-directory of the extracted source code.
  3. Run the following command: gradlew distZip. This will create the zip-archive build/distributions/Spider.zip.
  4. Extract the generated zip-archive.

Run

Run Spider with bin/spider help to view short usage information. Before the first start, the database of Spider must be initialized with bin/spider init. Alternatively you can use my database of Spider, a key to the recent SQL-dump can be found on the About / FAQ page in (Clean-)Spider.

Important files

Activelink

All activelink images can be found in the folder activelink.

  • activelink.xcf = Activelink for Spider
  • activelink-clean.xcf = Activelink for Clean-Spider
  • activelink-source.xcf = Activelink for the source code of Spider

The xcf files can be opened with GIMP.

Settings

All settings files can be found in the folder src/main/dist.

  • spider.properties = Settings file for Spider
  • spider-clean.properties = Settings file for Clean-Spider

Spider always uses the settings file `spider.properties', so you have to rename the files for the usage in Clean-Spider.

Tasks

For each new edition of the freesite I run the following tasks in Spider:

  • reset-all-highlight
  • update-online 60
  • crawl
  • update-offline
  • crawl
  • reset-all-offline
  • crawl
  • add-freesite-from-fms
  • add-freesite-from-frost
  • crawl
  • update-online
  • crawl
  • export-database

You can either run these tasks individually or you can run them as task list.

Task list

Run the above tasks as task list using bin/spider run-task-list. Spider will execute each task in the given order one by another. Additionally Spider will save the state such that you can interrupt the process at any time and can continue where you previously interrupted it. You can restart the task list with bin/spider reset-task-list (once the task list is finished it will automatically reset itself) and show the current progress with bin/spider show-task-list.

Set up a development environment

  1. Run git clone https://github.com/Spider-Admin/Spider.git.
  2. Change to the cloned repository.
  3. Run gradlew clean cleanEclipse eclipse.
  4. Copy all files from src/main/dist to the current directory.
  5. Import the project from the current directory into Eclipse.

Publish the freesite

I used jSite 1 to publish (Clean-)Spider. Just create a new project for (Clean-)Spider in jSite and publish it with java -cp path/to/jSite/jSite-0.14-jar-with-dependencies.jar de.todesbaum.jsite.main.CLI --project=projectname.

Additionally I shared the private key of Clean-Spider with ArneBab, Bombe, xor and nextgens. I splitted the private key with ssss using ssss-split -t 2 -n 4 and send one part to each of them. You need at least 2 parts to recover the private key.

Contact

Author: Spider-Admin

Freemail: spider-admin@tlc66lu4eyhku24wwym6lfczzixnkuofsd4wrlgopp6smrbojf3a.freemail 2

Frost: Spider-Admin@Z+d9Knmjd3hQeeZU6BOWPpAAxxs

FMS: Spider-Admin

Sone: Spider-Admin 1

I do not regularly read the email associated with GitHub.

License

Spider by Spider-Admin@Z+d9Knmjd3hQeeZU6BOWPpAAxxs is licensed under the Apache License, Version 2.0.

Footnotes

  1. Link requires a running Hyphanet node at http://localhost:8888/ 2 3 4

  2. Freemail requires a running Hyphanet node