karlicoss/HPI

module suggestion: firefox history/bookmarks/etc.

redthing1 opened this issue · 7 comments

I think a module for accessing Firefox data would be very useful.
This documentation on the Mozilla website details how the data is stored.

I am thinking that a separate script could be used to walk through that database and generate a JSON dump (similarly to how rexport works), and then an HPI module could provide access to that data.

I'm new to this project so I am not yet really familiar with how modules work, but when I have time I will attempt it and submit a PR.

Oh, it looks like one of the contributors to this repo @seanbreckenridge has already done this!

I wonder if there is already a corresponding HPI module or does it still need to be written?

I personally dont use bookmarks in the browser (just have a textfile with a script to open stuff), so I haven't written anything to parse that yet. Feel free to open an issue on ffexport if thats something youre interested in

Otherwise yeah, ffexport lets you export history, have a script here that saves my history sqlite file every couple weeks.

The my.browsing file on my branch uses parts of ffexport to load the data in; it also copies the live history database when computing my so it includes any backups and the current history.

As a demo:

>>> from collections import Counter
>>> from urllib.parse import urlparse
>>> from my.browsing import history
>>> Counter(map(lambda v: urlparse(v.url).netloc, history())).most_common(5)
[('github.com', 39666), ('discord.com', 21064), ('www.youtube.com', 19497), ('duckduckgo.com', 19152), ('www.google.com', 9598)]

No need to export it to JSON (though ffexport can do that), it merges and removes duplicates this from copies of the sqlite files directly

I know karlicoss uses promnesia, so that may be why that hasnt been incorporated into HPI

Just as an update, I've since converted that into browserexport, which supports reading history from:

  • Firefox (and Waterfox)
  • Chrome (and Chromium, Brave, Vivaldi)
  • Safari
  • Palemoon

If you wanted to use this, you could install my HPI modules alongside this repository (see here)

Run hpi module install my.browsing to install dependencies

setup a config block in your config file like:

# uses browserexport https://github.com/seanbreckenridge/browserexport
class browsing:
    # folder which contains your backed up databases
    export_path: Paths = "~/data/browsing"

    # additionally, read history from my active firefox database
    from browserexport.browsers.firefox import Firefox

    live_databases: Paths = Firefox.locate_database()

Then use the history function:

[ ~ ] $ ipython

In [1]: from my.browsing import history

In [2]: visits = list(history())

In [3]: len(visits)
Out[3]: 390621

[ ~ ] $ hpi query --limit 1 my.browsing.history
[{"url":"https://duckduckgo.com/?q=Brave+Verified+sites&t=brave","dt":"2020-07-21T00:11:23.544069+00:00","metadata":{"title":"Brave Verified sites at DuckDuckGo","description":null,"preview_image":null,"duration":null}}]

No support for bookmarks (yet), (I just use this); may add it in the future if someone is interested

That's great, thanks!
I'll experiment with hooking it up to cachew, and definitely would be up for using it in Promnesia!

Sounds good - I think I already have it hooked up to cachew, unless you mean something different. Corresponding promnesia Source for now

Only thing missing before a PR is the FirefoxMobile Browser/logic, need to export a db from my (now rooted) phone, and look at the browser source file in promnesia.

Ah -- by cachew support, I meant 'incremental' caching, so ideally if you add a new database, you'd ideally just 'merge' it in with the previously cached results.. kind of what the madness here was achieving, but without the madness :) https://github.com/karlicoss/promnesia/blob/ea9d9ef8e654c9daee7f7fb1ac458d586f8d4393/src/promnesia/sources/browser.py#L50-L51

@redthing1 browser history has a module here now; see here to set it up

If bookmarks from the databases is something you're still interested in, feel free to create an issue here