/url-queue

Primary LanguageTypeScript

url-queue

An service that creates a job queue to fetch HTML from URLs. It caches recent jobs and their responses for 60 seconds using Redis and stores the results persistently in a database using TypeORM. This service should be able to be scaled up across multiple instances in a cluster or a processor using PM2.

Getting Started

  • Make sure you have at least node v10.13.0-v11.x.x (node 12 is not yet supported) and yarn v1.13.0+ installed (npm at your own risk).
  • Make sure you have both Redis and a relational database installed locally and running.
  • Clone the repo
$ git clone https://github.com/flamingYawn/url-queue.git
$ cd url-queue
  • Install dependencies
$ yarn
  • Install ts-node and typeorm globally
$ yarn global add ts-node typeorm
  • Create a new database named url_queue in whatever RDB you're using.
  • Replace the example environment variables in .env.
  • Start the server for the first time
$ yarn start
  • Once you start, yarn start should set up the database with the correct table.

And then you should be good to go.

Usage

To start a url-fetching job, go to

/api/v1/url/www.google.com

Note: The url cannot have any /s in it, so only root URLs will work

The response should look something like this:

{
    "id": "a23ada507b8f957270ec9f9782f1818a"
}

Then, take that id and go to

/api/v1/result/a23ada507b8f957270ec9f9782f1818a

And it should serve the page you requested (sans subsequent calls for images/scripts/etc.)