/search-based-email

Documenting one user's conversion to Notmuch/mu from Thunderbird

Primary LanguageShell

Converting to Search-Based Email: Notmuch or mu

1. Questions

Here’s the most-important questions, given the purpose and the problem:

  1. Tools. Am I accurately assessing all available tools?

  2. User scenarios. What are the best tools, or trade-offs of each tool, for each user scenario?

  3. Workflows. Do you know a better workflow than what’s presented for any tool or usage scenario?

2. Purpose

2.1. Summary

I want to transition my email environment from a classic setup to a tag-and-index-based system (most likely Notmuch- or mu-based), and I’m seeking help.

  1. Near-term. I’m seeking significant feedback on this document from the Notmuch, mu, and related user communities. Most importantly, I seek answers to these questions.

  2. Longer-term. This document might be reusable by others, including future new users, if someone matures it moving forward. (In such case, the tone would change from being centered on my problems to a more-general-purpose document.) I couldn’t find similar content on the web. Please advise if I missed something.

With the right system, I’m hopeful I can make email management a much-less-time-consuming burden on my work…​ and my life.

2.2. Details

DISCLAIMER: this document may be in flux as I learn and attempt to master this domain.

Why write this doc instead of just trudging through the tool testing myself? Read on.

3. The Problem

I find email classification, recall, and translation to work/task tracking to be much more laborious than it should be. I expect the computer to be faster than me, and tailor to my preferred workflow. Instead, I’m constantly waiting for the computer, or forced to type or mouse-click inefficiently (I’m a vim-loving keyboard jockey), and/or experiencing a poor workflow.

More specifically:

  • My existing environment

    • At least 4300 IMAP-based email folders with 1MM (million) email messages served via multiple IMAP servers.

    • Primary MUA = Mozilla Thunderbird 17.0.11

    • Mac OS 10.9.5 (Mavericks)

See Appendix 2: More Problem Details and user scenarios for more context and requirements.

(An aside: why title this document "search-based email"? Suggestion from the community. It seems better than "tag-based email," my original title. I’ve also considered "index-based email," but "search" seems a bit more descriptive than "index," and less esoteric. I’m open to any title that best groks with the community.)

4. Possible Solutions

After reviewing currently-available tools, I’m seeking:

  1. fast search on my entire email-message/IMAP collection (my "email database"),

  2. to leverage search/index/tag-based email categorization, and

  3. provide easy extensibility with Linux and MacOSX command-line "primitives" or other, custom scripts/commands.

It’s fair to ask: did I come up with the above "requirements" by a) adjusting to features from currently-available tools, or b) by independently dreaming up my own specifications? More the former/(a). I’d much prefer Jarvis be my email interface, but he’s not currently (or at least not economically) available.

5. Tool Choices

5.1. Core tool choice - Indexing

My investigation thus far suggests the implementation path hinges on choosing 1 of the following 2 applications, as they seem to mutually-exclusively represent the best (or at least most-popular) of the core of email-message indexing and tagging tool suites:

Is this assessment accurate? What other tools/options might I be overlooking?

My comparison analysis:

  1. Initial tests show Notmuch performing approximately 15 times faster than mu.

  2. mu can embed its metadata (tags, etc) "natively" into the IMAP content/messages. Notmuch can not. However, muchsync (maybe other tools?) can replicate this metadata, but it takes additional process+infrastructure.

  3. #1 greatly outweighs #2. Because of this, Notmuch "wins" (with me), pending the retest results of #1 (above) and additional feedback from others.

What other trade-offs might motivate me to employ mu over Notmuch?

Since Notmuch won (for now), the rest of this document may be more Notmuch-specific.

5.2. IMAP-to-Maildir syncing

Notmuch seems to work best (or maybe requires?) the Maildir format. The following tools (presumably) all sync an IMAP server to a Maildir filesystem.

5.2.2. Choice

I’ve currently chosen mbsync, aka isync.

5.2.3. Comments

  • I’ve used mbsync more than any other tool listed here, and it’s thus far working nicely.

  • Search "mbsync vs offlineimap" to see more.

  • I understand getmail the least. It’s less referenced (on the web) for this usage/context than either offlineimap or mbsync. Why is this? Is it not a viable alternative to the above? getmail’s website seems to primarily (?) pitch it as a fetchmail replacement.

  • A seemingly-popuplar reference: "OfflineIMAP sucks," 2012-08-30

5.4. Auto-detect if IMAP resync needed

(I’ve not yet started this implementation.)

5.4.1. client→server checking

5.4.2. server→client checking

  • https://github.com/athoune/imapidle + some of my own Python scripting, which I’m hopeful will not be difficult.

  • mswatch

    • http://mswatch.sourceforge.net

    • requires IMAP-server-side shell access - difficult if not impossible to get for all my IMAP accounts.

    • this might also be a client→server option

    • wrapping imapidle with my own Python script that triggers mbsync seems like a better, more-flexible alternative

5.5. MUA

5.5.1. Summary

Given the problem, the work to find and master the best/better MUA(s) for me and concurrently learn a new search-and-tag-based email classification paradigm seems like my biggest challenge.

mutt-kz and alot currently present the most-attractive solutions (for me), but it’s early.

A dark-horse candidate: notmuch.el, an Emacs front-end.

5.5.2. Details

5.6. Synchronizing Notmuch metadata across machines

(I’ve not yet started this implementation.)

6. Usage-Scenario Challenges in Priority-First Order

My usage-scenario challenges include but may not be limited to:

6.1. Which MUA(s)?

Decide which MUA(s) to use, particularly deciding on a primary MUA. This is technically not a usage-scenario, but currently represents my biggest challenge. See the MUA options.

6.2. Support old-style IMAP-folder paradigm

While I may be be moving to a search/index/tag-based paradigm, I still need to access my 4k+ IMAP folders as I did before, at least while I’m transitioning my current folder-based paradigm that I currently employ with Mozilla Thunderbird (TB), which leverages the TB’s Nostalgy add-on to do it.

TB-Nostalgy also offers excellent keyboard-shortcut-mapping capability and is one of the few great features of Thunderbird that I’d like to replicate in my new MUA.

6.3. Support IMAP-account separation

  1. I have multiple email accounts, which is not uncommon. I want to "view" each one differently, such that emails and folders from account X does not clutter my view of emails/folders when viewing account Y.

  2. It would be extremely helpful to additionally support a "combined" view/mode of all my accounts. But this is not an absolute requirement, simply because #1 is currently more important than #2.

6.4. Initial tagging

6.5. Moving msgs after applying tags

  • Context, details: mutt-kz thread: "Moving msgs after applying tags?".

  • Will messages retain Notmuch-associated metadata (tags, etc) for lifetime of any message, including post-folder moves - without any special configuration?

    • I’m used to moving messages between folders in order to classify. Further, I will like to keep a clean Inbox and other folders, for my non-Notmuch-based email clients, thus (presumably) requiring message moving.

    • Once I associate Notmuch-metadata (by adding tags, or whatever metadata/etc scenarios might be involved with Notmuch) with a message, I need said metadata/tags/etc to associate with a message forever, regardless of wherever I put said message. Is this the way it works "out of the box" with Notmuch-based systems?

6.7. MUA folder-based searching with name auto-completion

6.8. Piping email-message content through shell commands

I want to "pipe" the content of:

  1. one email message,

  2. many email messages (by selecting multiple emails at the same time), or

  3. an entire IMAP folder of emails

to any command/script of my choosing.

Example, potential solutions, not yet tested:

6.12. Highlight folders with unread or new emails

How to do this? Including support for virtual folders. Details here.

6.13. Sync Notmuch tags with maildir flags

Does anyone use notmuchsync, and does it work well?

6.17. Writing HTML-formatted messages

6.18. Forwarding messages with attachments

7. Appendix 1: Purpose: More Details

7.1. My user profile

  • In summary, I’m a vim and Python lover, a keyboard jockey, and a reasonably-experienced, fairly-technical, demanding user. And like many others, I receive a remarkable amount of email in diverse contexts.

  • I’m historically-trained as a software and computer-systems engineer.

    • I’ve significant experience with programming in a variety of programming languages and system-administering a variety of OSes including but not limited to: C, C\++, Java, Ada, perl, Python; Windows, many commercial Unix-es, Linux, VMS, MacOSX. My favorite "Swiss army knife" language is Python. If I’ve time, I’m open to extending/fixing Python programs. I’d like to learn Ruby and Go.

  • I’m now more of a "business person." In spite of this:

    • vim remains my primary editor (I hate moving my hand from the keyboard to the mouse or trackpad),

    • Mac OS X is my primary computing machine,

    • and I still significantly code in Python to solve "glueware" problems.

    • I also still dabble in Linux (mostly Debian/Ubuntu) and MacOSX sysadmin.

  • Learning new systems/languages/applications/software is old hat…​

    • …​but it’s now harder only because of time constraints from expanded business responsibilities.

  • Some might describe me as an impatient, unforgiving computing user. I hate being faster than the computer. Further, when the computer/software/application says it’s job is done, I want it to be done. However, some environments and applications perform significant, asynchronous activity even after reporting they are done servicing a request. (Thunderbird is notorious for this.) And this drives me nuts. "Computer, if you need more time to complete a job, don’t lie to me. I can go do other things while I wait for you. But please do not delay me further after you already said you were done."

Despite my history assimilating to new applications/environments, the search-and-tag-based-classification paradigm still seems significantly different and a bit daunting to this "old school IMAP-folder user", and may (or may not?) take some time to master. See Usage-Scenario Challenges in Priority-First Order. For example, opening alot for the first time and looking at a staggering 50k+ emails in my "inbox" can give someone pause; hopefully Initial tagging will take care of that.

Additionally, the search-and-tag-based documentation resources—​to describe new-user-paradigm shifts and present the most-popular toolsets—​seem disjointed and/or non-existent. Hence some of the motivation to present this document.

7.2. Why write this doc?

Why did I spend the time to write this document, instead of just trying all the tools?

  1. Email is too important not to "get it right." Or at least, email is too "frequent," probably my most-frequent life activity (very unfortunately).

  2. Brute-force "experience" may be too inefficient. I’d rather learn from others' experiences rather than inefficiently reply them all myself.

  3. This document may help future newbies. And possibly accelerate new-user population growth.

  4. Defining requirements up front: this usually works. Rarely have I regretted taking the time to well-define requirements (separate from design and/or solution) for any significant software or tool-adoption project.

  5. I might learn something I wouldn’t have previously found. It’s possible this document might attract enough attention for people to offer solutions (applications, workflows, or whatever) I might not have otherwise discovered.

  6. Breaking my production email "IMAP database" testing new apps would be very…​ bad. My businesses and projects rely on my email system to be top-notch solid. If my email gets corrupted, lost, etc - things go very bad, very fast. Especially if I’m unknowingly messing up my email. Hence, I’m rather cautious about correct implementation.

In any case, I’m hopeful that experienced and diverse feedback from the search/index/tag-based-email-using communities can help avoid these problems. At least, it seemed like the most-effective way, as the space doesn’t (yet) seem friendly to newbies.

8. Appendix 2: More Problem Details

(DISCLAIMER: This section’s under construction, and not complete.)

OS X is great, but TB is difficult. Thunderbird is old, buggy, troublesome, slow, basically inextensible (for me, anyway), and as I understand it, feature frozen. I’m tired of debating with the mozillaZine support team about TB’s bugs and limitations. Among other things, it’s IMAP sync is slow and unreliable. It literally (and unfortunately, inconsistently) deletes IMAP folders on it’s own whim, asynchronously, sometimes when I least expect it. Sometimes it loses track of the folders it didn’t delete, and simply creates new ones, bloating my mbox (TB only reliably supports mbox) files terribly over time.

Additionally, the TB text/formatting editor is legendarily bad/buggy. I’d desperately prefer to simply edit in vim, and edit rich/html text in markdown or asciidoc and convert to html with a rendering engine, and I suspect I could script-integrate such capability…​ if I had an MUA that could play nicely with external scripts.

Further, I’m a keyboard jockey—​eg: vim lover—​and Python programmer. I’ve maxed out TB’s keyboard-shortcut-ness (eg: TB’s Nostalgy add-on) best I can tell, and it’s still limiting. I have external tools (some developed by me and/or my team) to parse and perform "magic" (like task-tracking and bug-report integration) on email folders and individual messages, and TB—​with it’s lack of proper maildir support and difficult extensibility—​makes it extremely difficult if not impossible to integrate with the external tools.

In short, it’s time to move on from Thunderbird.

9. Appendix 3: General Setup Guides

(Previously-referenced guides or sections of guides listed elsewhere in this doc are not duplicated here. The following is provided here for my general reference; maybe others will find these references useful.)

9.3. generic-mutt stuff

9.3.1. mutt and Maildir

mbsync + mutt

From the mbsync(1) man page as of 2015-07:

Flatten delim
Flatten the hierarchy within this Store by substituting
the canonical hierarchy delimiter / with delim. This can
be useful when the MUA used to access the Store provides
suboptimal handling of hierarchical mailboxes, as is the
case with Mutt. A common choice for the delimiter is ..
Note that flattened sub-folders of the INBOX always end up
under Path, including the "INBOXdelim" prefix.

Does this mean mutt does not work with Maildir hierarchical subfolders?