GeorgOhneH/ethz-document-fetcher

help

rs-11 opened this issue · 2 comments

rs-11 commented

Something doesn't work:(

System: Ubuntu 18.04.4 LTS

Most likely not a problem with document-fetcher, but with my system. Maybe somebody can help me anyway.

lxml is installed with pip and pip3

command i ran:
sudo python3 main.py

output:

Traceback (most recent call last):
  File "main.py", line 73, in <module>
    loop.run_until_complete(main())
  File "/usr/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
    return future.result()
  File "main.py", line 60, in main
    await asyncio.gather(*producers)
  File "/root/Documents/ethz-doc/ethz-document-fetcher/downloader.py", line 47, in custom_producer
    return await func(session, queue)
  File "/root/Documents/ethz-doc/ethz-document-fetcher/custom/analysis.py", line 13, in parse_main_page
    await validate_url(session, queue, links_to_pdf, BASE_URL)
  File "/root/Documents/ethz-doc/ethz-document-fetcher/custom/utils.py", line 10, in validate_url
    soup = BeautifulSoup(html, "lxml")
  File "/usr/lib/python3/dist-packages/bs4/__init__.py", line 165, in __init__
    % ",".join(features))
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

should be fixed in 80967c5

rs-11 commented

it works, thanks a lot.
On linux the filepaths are not correct. All files are stored in '/' (root), however i managed to fix that by changing /downloader.py line 21.