help
rs-11 opened this issue · 2 comments
rs-11 commented
Something doesn't work:(
System: Ubuntu 18.04.4 LTS
Most likely not a problem with document-fetcher, but with my system. Maybe somebody can help me anyway.
lxml is installed with pip and pip3
command i ran:
sudo python3 main.py
output:
Traceback (most recent call last):
File "main.py", line 73, in <module>
loop.run_until_complete(main())
File "/usr/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
return future.result()
File "main.py", line 60, in main
await asyncio.gather(*producers)
File "/root/Documents/ethz-doc/ethz-document-fetcher/downloader.py", line 47, in custom_producer
return await func(session, queue)
File "/root/Documents/ethz-doc/ethz-document-fetcher/custom/analysis.py", line 13, in parse_main_page
await validate_url(session, queue, links_to_pdf, BASE_URL)
File "/root/Documents/ethz-doc/ethz-document-fetcher/custom/utils.py", line 10, in validate_url
soup = BeautifulSoup(html, "lxml")
File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 165, in __init__
% ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
GeorgOhneH commented
should be fixed in 80967c5
rs-11 commented
it works, thanks a lot.
On linux the filepaths are not correct. All files are stored in '/' (root), however i managed to fix that by changing /downloader.py line 21.