A tool to fetch and store Hacker News data in a SQLite database.
To install the Hacker News Data Fetcher, follow these steps:
-
Install:
pip install hn-data-fetcher
-
Run the Script:
-
The script can be run in four different modes:
update
,backfill
,overwrite
, andoverwrite-from-date
. -
Use the following command to run the script:
hn_data_fetcher --mode <mode> [--start-id <start_id>] [--start-date <start_date>] [--db-name <db_name>] [--concurrent-requests <concurrent_requests>] [--update-interval <update_interval>] [--db-queue-size <db_queue_size>] [--db-commit-interval <db_commit_interval>] [--tcp-limit <tcp_limit>]
-
Parameters:
--mode
: Operation mode. Choices areupdate
,backfill
,overwrite
, oroverwrite-from-date
.--start-id
: Starting ID foroverwrite
mode (required if mode isoverwrite
).--start-date
: Starting date foroverwrite-from-date
mode in YYYY-MM-DD format (required if mode isoverwrite-from-date
).--db-name
: Path to the SQLite database file to store HN items (default:hn2.db
).--concurrent-requests
: Maximum number of concurrent API requests to HN (default:1000
).--update-interval
: How often to update the progress bar, in number of items processed (default:1000
).--db-queue-size
: Maximum size of the database operation queue (default:1000
).--db-commit-interval
: How often to commit database transactions, in number of items (default:1000
).--tcp-limit
: Maximum number of TCP connections.0
means unlimited (default:0
).
-
Examples:
- To update the database with new items:
hn-data-fetcher --mode update
- To backfill the database with historical items:
hn-data-fetcher --mode backfill
- To overwrite existing items starting from a specific ID:
hn-data-fetcher --mode overwrite --start-id 1000
- To overwrite existing items starting from a specific date:
hn-data-fetcher --mode overwrite-from-date --start-date 2024-01-01
- To update the database with new items:
-
-
Monitor Progress:
- The script provides a progress bar with an estimated time of arrival (ETA) for completion.
- It also handles errors gracefully and ensures that the database is updated correctly.
-
Graceful Shutdown:
- You can stop the script at any time by pressing
Ctrl+C
. The script will handle the shutdown gracefully, ensuring that all ongoing transactions are completed.
- You can stop the script at any time by pressing
-
Install Development Dependencies:
- Install the package in editable mode and development dependencies:
pip install -e . pip install -r requirements-dev.txt
- Install the package in editable mode and development dependencies:
-
Run Tests:
- Execute the test suite:
pytest tests/ -v
- Execute the test suite: