Document how to read a dump from a local file
fenopa opened this issue · 7 comments
not possible?
It would be great to have such option - it would also allow reading from other wikis. Such as say OSM Wiki, see https://wiki.openstreetmap.org/wiki/Wiki#Wiki_Dumps_.2F_Export
Let's introduce a new MediaWikiDumpFile
class:
from mediawiki_dump.dumps import MediaWikiDumpFile
Actually, it's already there :-) The LocalFileDump
class is your friend here. I'll update the README to describe that case as well.
https://github.com/macbre/mediawiki-dump/blob/master/mediawiki_dump/dumps.py#L181-L196
@matkoniecz, can you check the following code with the OSM wiki dump that you've mentioned?
dump = LocalWikipediaDump(dump_file="path/to/osm.dump.xml.bz2")
reader = DumpReader()
pages = reader.read(dump)
Traceback (most recent call last):
File "/home/mateusz/Documents/install_moje/OSM_software/fetch_osm_wiki/dump_reader.py", line 16, in <module>
for page in reader.read(dump):
File "/home/mateusz/.local/lib/python3.10/site-packages/mediawiki_dump/reader.py", line 247, in read
for chunk in dump.get_content():
File "/home/mateusz/.local/lib/python3.10/site-packages/mediawiki_dump/dumps.py", line 144, in get_content
yield decompressor.decompress(chunk)
OSError: Invalid data stream
I also tried unpacking file outside and pointing is an input. Looking at file itself I see no obvious corruption.
@macbre Do you have known example of such download working? Maybe I should download small wikipedia and try is it loadable from file for me?
Is it working for you?