Project to scrape web pages and implement NodeJS, MongoDB and NestJS.
To scraping web pages, we need to configure using the next end-point pages.
Method: POST
http://localhost:3000/pages
{
"url": "https://listado.mercadolibre.com.mx/running-shoes",
"provider": "Mercado Libre",
"identifiers": {
"main": ".ui-search-result__wrapper",
"image": ".slick-slide.slick-active img",
"price": ".ui-search-price.ui-search-price--size-medium.ui-search-item__group__element .ui-search-price__second-line .price-tag .price-tag-fraction",
"title": ".ui-search-item__group.ui-search-item__group--title h2"
}
}
- url: page to scrap
- provider: Company where you obtain data
- identifiers: all class that we need to obtain the data.
- main: class css to help to obtain the products
- image: class css to obtain the image of the product
- price: class css to obtain the price of the product
- title: class css to obtain the title of the product
Once configurated the pages is necessary to configure the environment variables with the next values:
- DB_PORT
- DB_USER
- DB_PASSWORD
- DB_HOST
- DB_NAME
Import index.js from WebScraping/src/index.js.
const scrapPages = require('./WebScraping/src');
scrapPages();
Once executed the Job you could obtain the data of the products saved.
Method: GET
http://localhost:3000/pages?offset=0&limit=5
Param | value |
---|---|
offset | 0 |
limit | 5 |
Api Rest created with NestJS and MongoDB.
$ npm install
Configure the environment variables of the api with next values:
- DB_URL: Data base url of mongo.
# development
$ npm run start
# watch mode
$ npm run start:dev
# production mode
$ npm run start:prod
Method: GET
http://localhost:3000/pages?offset=0&limit=5
Param | value |
---|---|
offset | 0 |
limit | 5 |
⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃
Method: POST
http://localhost:3000/pages
{
"url": "https://listado.mercadolibre.com.mx/running-shoes",
"provider": "Mercado Libre",
"identifiers": {
"main": ".ui-search-result__wrapper",
"image": ".slick-slide.slick-active img",
"price": ".ui-search-price.ui-search-price--size-medium.ui-search-item__group__element .ui-search-price__second-line .price-tag .price-tag-fraction",
"title": ".ui-search-item__group.ui-search-item__group--title h2"
}
}
⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃
Method: GET
http://localhost:3000/products?offset=0&limit=5
Param | value |
---|---|
offset | 0 |
limit | 5 |
⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃
Method: POST
http://localhost:3000/products
{
"price": "1,299",
"image": "",
"title": "Fila Ray Tracer Blanco/negro",
"provider": "Mercado Libre"
}
⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃ ⁃
Author: bautistaj
Package: postman-to-markdown