Warning marver is not ready, there is very little UI and there are no guarantees it won't photoshop all your family photos with clown hats on your uncle. Watch the repo for updates, or come back in a few months.
marver scours your files and makes them all pretty and viewable, pulling as much information as possible.
- Search
- If someone searches for
pink clouds
and aclouds
tag exists, add theclouds
tag to the file filter. - If someone searches
pink clouds #australia
thenaustralia
should be added as a tag filter, andpink clouds
should be used as the semantic search term. - If a search query contains a name, bias towards images with that face
- Index the file name as an embedding
- Index text from OCR
- Negate search terms with
-
, for exampleapple -device
would search for apples, not apple devices. - Upload a photo/video and use perceptual hash/embeddings to find similar images/videos
- Index existing and generated subtitles
- Clicking on a subtitle-provided result will take you to that time in the video
- If someone searches for
- Subtitles
- Use whisper to generate subtitles.
- Extract existing subtitles from files
- Face recognitition
- Find all images with a face by clicking on that face
- Its possible to auto-guess names for faces.
- If a file named
ryan soccer 2002.jpg
has a single face, it's a good indicator that the person's name is "Ryan". - If a file named
dave ryan.jpg
andellie ryan.jpg
exist, we can assume that the name "Ryan" is for the face that is in both photos. - If we have enough of these matches we can be fairly confident in the name.
- If a file named
- OCR
- Clean up OCR results automaticaly and/or with a small LLM
- Translate OCR results into the user's language
- Overlay the bounding box on the image for copy/paste
- Run OCR on videos
- It might be possible to train a small model that takes in CLIP embeddings and determines if a frame has text in it.
- Only running OCR on frames "different enough" from the last one would be a good optimization
- Extract metadata from other files
- If
img_22.png
andimg_22.png.json
(json/xml/csv), try rip data from the json file and use it as metadata. - If we find
.sqlite
or.db
files, a small LLM could check if they contain useful metadata and extract it in bulk if they do. - Having some recognised schemas for common formats would be good - tools like
yt-dlp
,gallery-dl
and others may be common sources of metadata with standard formats we have schemas for. - As a fallback, a small LLM could generate schemas for metadata files we find, which we can then blindly apply to other files just like the preset schemas we might have.
- If
- Automatic collections
- Folders under the source folder should always become a collection,
photos/2021/August
should be in aAugust
collection that is a child of the2021
collection. - Look for gallery indexes in file names and generate collections from them
img_1.png
andimg_2.png
should be in a collection together- Heuristics will probably be useful here. If there are lots of gaps,
img_1.png
andimg_3.png
we should probably ignore it. If there are only one or two gaps across two dozen photos, we should probably include them. - Don't generate collections that are too large, maybe 50 items max.
- Use a small LLM or train a model to find and generate collections that aren't as obvious.
- Folders under the source folder should always become a collection,
- Tags
- Tags should be a graph
- If you have an
apple pie
tag that is a child of thepies
tag, searching forpies
should also search forapple pie
.
- If you have an
- Tags should be a graph
- Support other media types
STL/OBJ/3D
files should have a rendered interactive view and have thumbnails- PDFs, markdown, and other text-based files should be viewable with thumbnails
- Index text from text-based files (pdfs, markdown, txt)