Scrapes product pricing and info from the PaknSave NZ website. Product information and price snapshots can be stored on Azure CosmosDB, or this program can simply log to console. Images can be sent to an API for resizing, analysis and other processing.
The scraper is powered by Microsoft Playwright
. It requires .NET 6 SDK
& Powershell
to run. Azure CosmosDB is optional.
First clone or download this repo, change directory into /src
, then restore and build .NET packages with:
dotnet restore && dotnet build
Playwright Chromium web browser must be downloaded and installed using:
pwsh bin/Debug/net6.0/playwright.ps1 install chromium
If running in dry mode, the program is now ready to use with:
dotnet run dry
To set optional advanced parameters, create appsettings.json
.
If using CosmosDB, set the CosmosDB endpoint and key using the format:
{
"COSMOS_ENDPOINT": "<your cosmosdb endpoint uri>",
"COSMOS_KEY": "<your cosmosdb primary key>"
}
To override the default store with a specific location, set geolocation co-ordinates in long/lat format. The closest store to the co-ordinates will be selected.
{
"GEOLOCATION_LAT": "-41.21",
"GEOLOCATION_LONG": "174.91"
}
To dry run the scraper, logging each product to the console:
dotnet run dry
To run the scraper with both logging and storing of each product to the database:
dotnet run
P1234567 | Coconut Supreme Slice | 350g | $ 5.89 | $16.83 /kg
P5345284 | Cookies Gluten Free Delicious Choc Chip Cookie | 250g | $ 4.89 | $19.56 /kg
P5678287 | Cookies Gluten Free Delicious Macadamia Cookie | 250g | $ 4.89 | $19.56 /kg
P3457825 | Belgium Slice | Each | $ 5.89 |
P5789285 | Cookies Gluten Free Delicious Double Choc Chip | 250g | $ 4.89 | $19.56 /kg
P2356288 | Bakery Crunchy Bran Biscuits With Sultanas | 230g | $ 4.49 | $19.52 /kg
P2765307 | Sanniu Evergreen Variant Biscuits | 4 x 132g | $ 6.36 | $12.05 /kg
This sample was re-run on multiple days to capture changing prices.
{
"id": "P1234567",
"name": "Coconut Supreme Slice",
"size": "350g",
"currentPrice": 5.89,
"category": [
"biscuits"
],
"priceHistory": [
{
"date": "2023-05-04T01:00:00",
"price": 5.89
}
{
"date": "2023-01-02T01:00:00",
"price": 5.49
}
],
"unitPrice": 16.83,
"unitName": "kg",
}