kdeldycke/mail-deduplicate

TypeError: 'NoneType' object is not subscriptable (mail with no Date)

turian opened this issue · 1 comments

turian commented

Preliminary checks

Describe the bug

Mails without dates still cause a crash.

To reproduce

Steps to reproduce the behavior:

  1. The full mdedup CLI invocation you used.

    $ mdedup --verbosity=DEBUG ./my_maildir/
    
  2. The data set leading to the bug.

From - Wed Jan 10 23:48:08 2024
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
MIME-Version: 1.0
From: "Mailbox Support" <support@XXXXXX.com>
To: "Joseph Turian" <joseph@XXXXXX.com>
Subject: Tips for Using Mailbox in Gmail
Content-Type: multipart/alternative;
 boundary="----mailcomposer-?=_1-1368110365847"

------mailcomposer-?=_1-1368110365847
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Hi Joseph,

Expected behavior

If there is no timestamp on the email, I assume that there would be a fallback / tiebreak strategy, or that datetime.datetime.now() would be assigned.

CLI output

time mdedup --strategy select-oldest \
        --action copy-selected \
        --export merged.mbox \
...
info: ◼ 4 mails sharing hash c7418e431b4ddf4a3dbc3968d975d1efa33fcb9707c1b6e15b3be4cd
info: Check mail differences are below the thresholds.
Execution time: 2961.877 seconds.
Traceback (most recent call last):
  File "/opt/homebrew/anaconda3/bin/mdedup", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mail_deduplicate/__main__.py", line 55, in main
    mdedup(prog_name=mdedup.name)
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/click_extra/commands.py", line 337, in main
    return super().main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/click_extra/commands.py", line 398, in invoke
    return super().invoke(ctx)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/cloup/_context.py", line 47, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mail_deduplicate/cli.py", line 420, in mdedup
    dedup.build_sets()
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mail_deduplicate/deduplicate.py", line 453, in build_sets
    duplicates.categorize_candidates()
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mail_deduplicate/deduplicate.py", line 306, in categorize_candidates
    selected = apply_strategy(self.conf.strategy, self)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mail_deduplicate/strategy.py", line 266, in apply_strategy
    return set(method(duplicates))
               ^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mail_deduplicate/strategy.py", line 47, in select_oldest
    f"Select all mails sharing the oldest {duplicates.oldest_timestamp} "
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mail_deduplicate/deduplicate.py", line 183, in oldest_timestamp
    return min(map(attrgetter("timestamp"), self.pool))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/site-packages/mail_deduplicate/mail.py", line 121, in timestamp
    return email.utils.mktime_tz(email.utils.parsedate_tz(value))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/anaconda3/lib/python3.11/email/_parseaddr.py", line 193, in mktime_tz
    if data[9] is None:
       ~~~~^^^
TypeError: 'NoneType' object is not subscriptable

Environment

All data on execution context as provided by $ mdedup --version:

mdedup, version 7.3.0

Additional context

Related to #62 #132

turian commented

A possible fix, if None is the best return type (given that the with contextlib.suppress(ValueError) suggests that would be:

if value is None:
    return None
with contextlib.suppress(ValueError):
    ...

You could also do with contextlib.suppress(ValueError, TypeError) but that would be more permissive and less specific.