Getting Unreachable hosts error when trying to scrape data
beeena opened this issue · 0 comments
beeena commented
I'm trying to scrape data using the following command.
docker run -it --env-file=./config/development/dev.env -e "CONFIG=$(cat ./config/config.json | jq -r tostring)" algolia/docsearch-scraper
Although I have ensured the usage of an accurate API-key and App-ID, I am encountering an error of "Unreachable hosts".
Error
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/src/index.py", line 119, in <module>
run_config(environ['CONFIG'])
File "/root/src/index.py", line 45, in run_config
config.query_rules
File "/root/src/algolia_helper.py", line 21, in __init__
self.index_name_tmp
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/search_client.py", line 127, in copy_rules
return self.copy_index(src_index_name, dst_index_name, request_options)
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/search_client.py", line 94, in copy_index
request_options,
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/http/transporter.py", line 35, in write
return self.request(verb, hosts, path, data, request_options, timeout)
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/http/transporter.py", line 72, in request
return self.retry(hosts, request, relative_url)
File "/root/.local/share/virtualenvs/root-BuDEOXnJ/lib/python3.6/site-packages/algoliasearch/http/transporter.py", line 94, in retry
raise AlgoliaUnreachableHostException("Unreachable hosts")
algoliasearch.exceptions.AlgoliaUnreachableHostException: Unreachable hosts
config.json
{
"index_name": "dev_RESORTIFI_HELP",
"start_urls": [
"https://help.resortifi.com/"
],
"sitemap_urls": [
"https://help.resortifi.com/sitemap.xml"
],
"sitemap_alternate_links": true,
"stop_urls": [
"/tests"
],
"selectors": {
"lvl0": {
"selector": "(//ul[contains(@class,'menu__list')]//a[contains(@class, 'menu__link menu__link--sublist menu__link--active')]/text() | //nav[contains(@class, 'navbar')]//a[contains(@class, 'navbar__link--active')]/text())[last()]",
"type": "xpath",
"global": true,
"default_value": "Documentation"
},
"lvl1": "header h1",
"lvl2": "article h2",
"lvl3": "article h3",
"lvl4": "article h4",
"lvl5": "article h5, article td:first-child",
"lvl6": "article h6",
"text": "article p, article li, article td:last-child"
},
"strip_chars": " .,;:#",
"custom_settings": {
"separatorsToIndex": "_",
"attributesForFaceting": [
"language",
"version",
"type",
"docusaurus_tag"
],
"attributesToRetrieve": [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type"
]
},
"conversation_id": [
"833762294"
],
"nb_hits": 46250
}