zkemail/archive.prove.email

Make upload work with .pst files from Outlook

Closed this issue · 7 comments

The export does not download an .mbox but a .pst. Update the scripts and documentation

I got this error:

<class 'pypff.message'> 72: "None"
Traceback (most recent call last):
  File "~/archive.prove.email/util/pst_scraper.py", line 67, in <module>
    decode_pst()
  File "~/archive.prove.email/util/pst_scraper.py", line 62, in decode_pst
    parse_item(0, pst.get_root_folder(), 0)
  File "~/archive.prove.email/util/pst_scraper.py", line 53, in parse_item
    parse_item(i, sub_item, depth + 1)
  File "~/archive.prove.email/util/pst_scraper.py", line 53, in parse_item
    parse_item(i, sub_item, depth + 1)
  File "~/archive.prove.email/util/pst_scraper.py", line 53, in parse_item
    parse_item(i, sub_item, depth + 1)
  File "~/archive.prove.email/util/pst_scraper.py", line 50, in parse_item
    print(f'{indent}{type(item)} {index}: "{item.name}"', file=sys.stderr)
                                            ^^^^^^^^^
AttributeError: 'pypff.item' object has no attribute 'name'

Also run this on

richoutlooknfo.pst
03-18-10.pst

new feedback from Yush:

Traceback (most recent call last):
  File "/Users/aayushgupta/Documents/.projects.nosync/zkemail/archive.prove.email/util/pst_scraper.py", line 67, in <module>
    decode_pst()
  File "/Users/aayushgupta/Documents/.projects.nosync/zkemail/archive.prove.email/util/pst_scraper.py", line 62, in decode_pst
    parse_item(0, pst.get_root_folder(), 0)
  File "/Users/aayushgupta/Documents/.projects.nosync/zkemail/archive.prove.email/util/pst_scraper.py", line 53, in parse_item
    parse_item(i, sub_item, depth + 1)
  File "/Users/aayushgupta/Documents/.projects.nosync/zkemail/archive.prove.email/util/pst_scraper.py", line 53, in parse_item
    parse_item(i, sub_item, depth + 1)
  File "/Users/aayushgupta/Documents/.projects.nosync/zkemail/archive.prove.email/util/pst_scraper.py", line 52, in parse_item
    sub_item = item.sub_items[i]
               ~~~~~~~~~~~~~~^^^
OSError: pypff_item_get_sub_item_by_index: unable to retrieve item type object.

also make sure to remove duplicate lines in the export

@Divide-By-0 here's a new version: https://github.com/zkemail/archive.prove.email/blob/pst_scaper-test/util/pst_scraper.py
Try if it works with your .pst file

In the new version, the crash you reported should hopefully be fixed, and it should also filter out any duplicates

@Divide-By-0 Now I have pushed the latest changes (the updates that you tested with your outlook dataset) to main branch. Feel free to close this issue.