Throws Exception when downloading certain courses
Robotic-Brain opened this issue · 1 comments
Robotic-Brain commented
Hi,
PFERD told me to report a bug so I'm doing that...
I'll give more information when i have time to investigate further.
Maybe Related? #67
Version:
PFERD 3.4.2 (https://github.com/Garmelon/PFERD)
Config:
[DEFAULT]
working_dir = /data/pferd
redownload = never-smart
on_conflict = no-delete
tasks = 2
downloads = 1
task_delay = 0.1
links = plaintext
videos = yes
forums = yes
transform =
(.*) -re->> "{g1.replace(' ', '_')}"
[crawl:HOC/WS22_ARS_Reflections_9003053]
type = kit-ilias-web
target = https://ilias.studium.kit.edu/goto.php?target=crs_1890802&client_id=produktiv
Output:
Loading crawl:HOC/WS22_ARS_Reflections_9003053
Warning Please avoid using too many parallel requests as these are the KIT ILIAS
instance's greatest bottleneck.
Running crawl:HOC/WS22_ARS_Reflections_9003053
Loading cookies
Sharing cookies
'/data/pferd/HOC/WS22_ARS_Reflections_9003053/.cookies' has newest mtime so far
Loading cookies from '/data/pferd/HOC/WS22_ARS_Reflections_9003053/.cookies'
Creating base directory at '/data/pferd/HOC/WS22_ARS_Reflections_9003053'
Loading previous report from '/data/pferd/HOC/WS22_ARS_Reflections_9003053/.report'
Failed to load report
[Errno 2] No such file or directory: '/data/pferd/HOC/WS22_ARS_Reflections_9003053/.report'
Inferred crawl target: URL https://ilias.studium.kit.edu/goto.php?target=crs_1890802&client_id=produktiv
Decision: Crawl '.'
Testing rule 1: (.*) -re->> "{g1.replace(' ', '_')}"
Match found, updated path to '.'
Final result: '.'
Answer: Yes
Parsing HTML page for '.'
URL: https://ilias.studium.kit.edu/goto.php?target=crs_1890802&client_id=produktiv
Page is a normal folder, searching for elements
Warning Encountered unexpected HTML structure, ignoring element.
Could not extract type from <img alt="-obj_xoct-" class="icon xoct medium outlined"
src="./Customizing/global/skin/kit/images/outlined/icon_default.svg"/> for card title <a
href="ilias.php?baseClass=ilObjPluginDispatchGUI&cmd=forward&ref_id=1956243&forwardCmd=showContent"
id="il_ui_fw_6398c3348493a1_82967459">Einleitung & Organisatorisches<span data-list-item-id="lg_div_1956243_pref_1890802"></span></a>
Warning Encountered unexpected HTML structure, ignoring element.
Could not extract type for <a
href="ilias.php?baseClass=ilObjPluginDispatchGUI&cmd=forward&ref_id=1956243&forwardCmd=showContent"
id="il_ui_fw_6398c3348493a1_82967459">Einleitung & Organisatorisches<span data-list-item-id="lg_div_1956243_pref_1890802"></span></a>
Warning Encountered unexpected HTML structure, ignoring element.
Could not extract type from <img alt="Datei" class="icon file medium outlined"
src="./Customizing/global/skin/kit/images/outlined/icon_file.svg"/> for card title <button class="btn btn-link" data-action=""
id="il_ui_fw_6398c334576cf0_75968013">Einleitung & Organisatorisches - Folien.pdf</button>
Warning Encountered unexpected HTML structure, ignoring element.
Could not extract type for <button class="btn btn-link" data-action="" id="il_ui_fw_6398c334576cf0_75968013">Einleitung &
Organisatorisches - Folien.pdf</button>
Warning Encountered unexpected HTML structure, ignoring element.
Could not extract type from <img alt="Datei" class="icon file medium outlined"
src="./Customizing/global/skin/kit/images/outlined/icon_file.svg"/> for card title <button class="btn btn-link" data-action=""
id="il_ui_fw_6398c334590272_09102644">Einleitung & Organisatorisches - Skript.pdf</button>
Warning Encountered unexpected HTML structure, ignoring element.
Could not extract type for <button class="btn btn-link" data-action="" id="il_ui_fw_6398c334590272_09102644">Einleitung &
Organisatorisches - Skript.pdf</button>
Warning Encountered unexpected HTML structure, ignoring element.
Could not extract type from <img alt="Datei" class="icon file medium outlined"
src="./Customizing/global/skin/kit/images/outlined/icon_file.svg"/> for card title <button class="btn btn-link" data-action=""
id="il_ui_fw_6398c3345a65b1_83325404">ARs ReflecTIonis - Kurshandbuch.pdf</button>
Warning Encountered unexpected HTML structure, ignoring element.
Could not extract type for <button class="btn btn-link" data-action="" id="il_ui_fw_6398c3345a65b1_83325404">ARs ReflecTIonis -
Kurshandbuch.pdf</button>
Warning Encountered unexpected HTML structure, ignoring element.
Could not extract type from <img alt="Datei" class="icon file medium outlined"
src="./Customizing/global/skin/kit/images/outlined/icon_file.svg"/> for card title <button class="btn btn-link" data-action=""
id="il_ui_fw_6398c3345bea55_15233962">Übersicht Studienbereiche und Studiengänge.pdf</button>
Warning Encountered unexpected HTML structure, ignoring element.
Could not extract type for <button class="btn btn-link" data-action="" id="il_ui_fw_6398c3345bea55_15233962">Übersicht Studienbereiche und
Studiengänge.pdf</button>
Warning Encountered unexpected HTML structure, ignoring element.
Could not extract type from <img alt="Datei" class="icon file medium outlined"
src="./Customizing/global/skin/kit/images/outlined/icon_file.svg"/> for card title <button class="btn btn-link" data-action=""
id="il_ui_fw_6398c33461eef2_39766784">Einleitung - Übungsaufgaben mit Lösung und Erläuterung.pdf</button>
Crawled '.'
Error An unexpected exception occurred
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/PFERD/pferd.py", line 156, in run
await crawler.run()
File "/usr/local/lib/python3.10/site-packages/PFERD/crawl/http_crawler.py", line 193, in run
await super().run()
File "/usr/local/lib/python3.10/site-packages/PFERD/crawl/crawler.py", line 85, in wrapper
return await f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/PFERD/crawl/crawler.py", line 338, in run
await self._run()
File "/usr/local/lib/python3.10/site-packages/PFERD/crawl/ilias/kit_ilias_web_crawler.py", line 208, in _run
await self._crawl_url(self._target)
File "/usr/local/lib/python3.10/site-packages/PFERD/crawl/ilias/kit_ilias_web_crawler.py", line 263, in _crawl_url
await gather_elements()
File "/usr/local/lib/python3.10/site-packages/PFERD/crawl/ilias/kit_ilias_web_crawler.py", line 104, in wrapper
return await f(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/PFERD/crawl/ilias/kit_ilias_web_crawler.py", line 258, in gather_elements
elements.extend(page.get_child_elements())
File "/usr/local/lib/python3.10/site-packages/PFERD/crawl/ilias/kit_ilias_html.py", line 102, in get_child_elements
return self._find_normal_entries()
File "/usr/local/lib/python3.10/site-packages/PFERD/crawl/ilias/kit_ilias_html.py", line 548, in _find_normal_entries
result += self._find_cards()
File "/usr/local/lib/python3.10/site-packages/PFERD/crawl/ilias/kit_ilias_html.py", line 688, in _find_cards
description = caption_parent.find_next_sibling("div").getText().strip()
AttributeError: 'NoneType' object has no attribute 'getText'
╭──────────────────────────────────────────────────────────────────────────────╮
│ Please copy your program output and send it to the PFERD maintainers, either │
│ directly or as a GitHub issue: https://github.com/Garmelon/PFERD/issues/new │
╰──────────────────────────────────────────────────────────────────────────────╯
Report for crawl:HOC/WS22_ARS_Reflections_9003053
Error 'NoneType' object has no attribute 'getText'
I-Al-Istannen commented