/pgscraper

Postgresql extension for scraping data from web

Primary LanguageRust

PGScraper

This is a small extension to allow you to scrape data from web directly from your postgres database.

This is a pet project I'm working to learn Rust.

A few useful commands:

To init your pgx and provide some instances only to test the extension, use it only once:

cargo pgx init

After cloning, you can install the extension:

cargo pgx install

Run on Postgres13:

cargo pgx run pg13

Then, on psql you'll need to enable the extension:

CREATE EXTENSION pgscraper;

And now you have access to two functions:

select html_select('title', 'https://blog.timescale.com');
┌───────────────────────────────┐
│          html_select          │
├───────────────────────────────┤
│ <title>Timescale Blog</title> │
└───────────────────────────────┘
(1 row)

Or just internal text:

 select html_select_text('title', 'https://blog.timescale.com');
┌──────────────────┐
│ html_select_text │
├──────────────────┤
│ Timescale Blog  ↵│
│                  │
└──────────────────┘
(1 row)

This project is inspired on the Bad Postgres extension ideas!