
This php script sorts your documents (by using hardlinks) into subfolders based on the hashtags it finds in your documents filenames.

Primary LanguagePHPMIT LicenseMIT


FileBasedMiniDMS.php by Stefan Weiss (2017)
Version 0.11 08.06.2017


Version 0.11 (08.06.2017)

  • New: automatic OCR and automatic rename

Version 0.02 (02.03.2016)

  • release of this file based document management system.
  • sorts files with hashtags into hashtag-folders.


  1. Place this file on your FileServer/NAS
  2. For OCR (Step 1): Install Docker and pull an ocrmypdf image, eg. docker pull jbarlow83/ocrmypdf
  3. For Automatic rename (Step 1.1): make sure that pdftotext is available.
  4. Adjust settings for this script in config.php to fit your needs
  5. Create a cronjob on your FileServer/NAS to execute this script regularly. (In DSM you can do this in Control Panel -> Task Scheduler) It might be required to assign root privilege.
    ex. php /volume1/home/stefan/Scans/FileBasedMiniDMS.php
    or redirect stdout to see PHP Warnings/Errors:
    php /volume1/home/stefan/Scans/FileBasedMiniDMS.php >> /volume1/home/stefan/Scans/my.log 2>&1


This script works in three steps. Each step can be turned on/off in config.php:

Step 1: OCR

OCR pdf files in the $inboxfolder, whose filename matches $matchWithoutOCR

Step 1.1: Rename ocr'ed files based on keywords and date

The pdf is going to be renamed to following structure: "<date> <name> <tags>.pdf"

<date>: The script tries to find a date in the pdf. If none is found the current date is used.
<name>: You can define $renamerules. The first rule which matches the ocr'ed content of the first page is used. You can use the operators & (AND) and , (OR) and you can use the wildcard operators ? and *.
<tags>: In $tagrules you can specify your tags. All matching rules will add their tag to the filename. You can use the same operators here.

Step 2: Tagging

This script creates a subfolder for each hashtag it finds in your filenames and creates a hardlink in that folder. Documents are expected to be stored flat in one folder. Name-structure needs to be like "<any name> #hashtag1 #hashtag2.extension".

eg: "Documents/Scans/2015-12-25 Bill of Santa Clause #bills #2015.pdf" will be linked into:

  • "Documents/Scans/tags/2015/2015-12-25 Bill of Santa Clause #bills.pdf"
  • "Documents/Scans/tags/bills/2015-12-25 Bill of Santa Clause #2015.pdf"


Q: How do I assign another tag to my file?
A: Simply rename the file in the $scanfolder and add the tag at the end (but before the extension).

Q: How can I fix a typo in a documents filename?
A: Simply rename the file in the $scanfolder. The tags are created from scratch at the next scheduled interval and the old links and tags are automatically getting removed.


Make sure to have a backup before you start using this script. You use this software on your own risk.