class API(username, password, localhost=False)
Provides the APIs to interact with the database
def history_get(self, **kwargs)
def history_post(self, **kwargs)
def seller_get(self, **kwargs)
def seller_patch(self, phone_number, **kwargs)
def seller_post(self, **kwargs)
def vehicle_get(self, **kwargs)
def vehicle_patch(self, vin, **kwargs)
def vehicle_post(self, **kwargs)
- pscraper.scraper.marketplaces.autotrader
- pscraper.scraper.marketplaces.carmax
- pscraper.scraper.marketplaces.cars
- pscraper.scraper.marketplaces.helpers
def scrape(zip_code, search_radius, target_states, api)
Scrape data about electric vehicles from all supported marketplaces using the specified parameters
zip_code
: str
: The zip code to perform the search in
search_radius
: int
: The search radius for the specified zip code
target_states
: list
: The states to search in (i.e. ['CA', 'NV']
)
api
: API
: Pscraper API to communicate with the database
list
of tuples
: (time, vehicles) per marketplace
def scrape_autotrader(zip_code, search_radius, target_states, api)
def scrape_carmax(zip_code, search_radius, target_states, api)
def scrape_cars(zip_code, search_radius, target_states, api)
Scrape EV data from cars.com filtering with the specified parameters
zip_code
: str
: The zip code to perform the search in
search_radius
: int
: The search radius for the specified zip code
target_states
: list
: The states to search in (i.e. ['CA', 'NV']
)
api
: API
: Pscraper API to communicate with the backend
total
: int
: Total number of cars scraped
def get_cars_com_response(url, session)
Scrapes vehicle and page information from url
url
: str
: Url to get the response from
session
: requests.sessionsSession
: Session to use for sending requests
dict
: Parsed information about the url and the vehicles it contains
def validate_params(search_radius, target_states)
Validates that target_states
are eligible states and search_radius
is valid
search_radius
: int
: Radius to scrape in
target_states
: list
: States provided by the scraper
def get_seller_id(vehicle, api, session)
Returns a seller id (primary_key). Search for existing seller by phone number.
If not found creates a new seller and returns it's id.
Requires seller to have streetAddress
, city
and state
. If any are missing returns -1.
vehicle
: dict
: Vehicle whose seller needs to be created/searched
api
: API
: Pscraper api, that allows retrieval/creation of marketplaces
session
: requests.sessionsSession
: Google Maps Session to use for geolocating seller
def update_vehicle(vehicle, api, google_maps_session)
Updates vehicle's last date and duration if it exists in the database, creates a new vehicle if it doesn't. Updates vehicle's price/seller if a change is found from the existing price/seller.
vehicle
: dict
: vehicle to be created/updated
api
: API
: Pscraper api, that allows retrieval/creation of marketplaces
google_maps_session
: requests.sessionsSession
: Google Maps Session to use for geolocating seller
def request_wrapper(method, success_codes)
class BaseAPI(base_url, auth)
def get_full_url(self, url)
def get_request(self, url, params)
def patch_request(self, url, data)
def post_request(self, url, data)
def get_geolocation(address, session)
Finds latitude and longitude from a human readable address using Google Maps API.
You need to set the environment variable GCP_API_TOKEN
to your Google Maps API token
address
: str
: Human readable address
session
: requests.sessionsSession
: Session to use for geolocating
latitude, longitude
: tuple
: Lat, Lng found from Google Maps API
def get_traceback()
Get formatted traceback information after exception
text, longitude (str) Traceback text
:
def measure_time(func)
A decorator to call a function and track the time it takes for the function to finish execution
func
: Function to be decorated
def send_slack_message(**kwargs)
Sends a message in Slack. If only one argument is provided (channel) it sends traceback information about
the most recent exception. You need to set the SLACK_API_TOKEN
environment variable of your slack workspace
API token
kwargs
: Keyword arguments to be used as payload for WebClient
def send_slack_report(cars_et, cars_total, at_et, at_total, cm_et, cm_total, states, channel='#daily-job')
Post scraping report on slack channel #daily-job
. Need to set the SLACK_API_TOKEN
environment variable
to your slack workspace API token. Uses utils.misc.send_slack_message
.
cars_et
: Time in seconds it took to scrape cars.com
cars_total
: Number of vehicles scraped from cars.com
at_total
: Time in seconds it took to scrape autotrader
at_et
: Number of vehicles scraped from autotrader
cm_total
: Time in seconds it took to scrape carmax
cm_et
: Number of vehicles scraped from carmax
states
: Scraped states to include in the report
channel
: Slack channel to send the report to, default: #daily-job