/dropbox-search

Index and search the contents of your files in Dropbox.

Primary LanguageJavaScript

Dropbox Search

Index and search the contents of your documents stored in Dropbox. Requires node.js and Solr, supports many document types and keyword shortcuts, and updates index as you add or edit files. Bundled with a simple web front end with snippets and ajax loading of results.

Search screenshot

Example Searches

recipeSearch for text (case and stemming aware)
"cake recipe"Phrase search
jam recipe in:FilesMatch documents within a folder
when:yesterdayAll documents modified yesterday
recipe when:2012Matches from year 2012
by:MikeMatch the given author
dogs type:imageReturn only images
where:40"47'Match lat/long in image metadata

Installation

  1. Get a Dropbox API key.
  2. Set up and run your Solr instance.
  • Use the included solr/schema.xml file, note lines marked EDIT dropbox-search
  1. Edit environment variables as below.
  2. npm install to download dependencies (solr, dbox, express, dateformat).
  3. node indexer.js to index your documents.
  4. node server.js to launch the web app.
  5. Browse to http://localhost:8888/search

Environment Variables

DROPBOX_APP_KEY =
DROPBOX_APP_SECRET =
DROPBOX_UID =
DROPBOX_OAUTH_TOKEN =
DROPBOX_OAUTH_SECRET =
SOLR_HOST = 127.0.0.1
SOLR_PORT = 8983
ROOT_PATH = /

The code doesn't yet implement the oauth protocol, so you must do this manually and provide token and secret for now.

Indexing

Dropbox-search uses ExtractingRequestHandler to index multiple file types, including: pdf, doc and docx (Word), xls (Excel), ppt, odt, csv, html, rtf, txt, and more. In addition to text content, it extracts metadata such as author and date. For image files, it extracts exif metadata like gps_latitude.

I also define some useful shortcuts like:

  • when : matches a date (e.g. today, yesterday, year-mm-dd, year-mm, or year)
  • type : matches a file type (e.g. image or rtf)
  • in : matches files within the given folder or path fragment (e.g. MyFiles)
  • by : same as author
  • where : matches gps_latitude or gps_longitude

The indexer listens for Dropbox API delta events to fetch documents that need to be added or removed from the index.

Note: Dropbox may rate-limit excessive file fetches by returning 503 errors. I try to handle this by queueing file fetches to happen at most once per second.

File type icons © Dropbox Icon Library.