hound-search/hound

Bug: Big-ish files not being indexed

critchtionary opened this issue · 1 comments

Version 0.5.1, running in Docker.

In one of our repositories we have a ~150KB YAML file that Hound does not provide search results for. The issue seems somewhat related to the size of the file, as if I split the file into two separate files, both halves can be searched. However, we have another 2.2MB JSON file that search is working perfectly fine.

Steps to reproduce:

  1. Create a new Git repo
  2. Commit exampleyaml.txt (have replaced all YAML values with random strings to remove any sensitive data)
  3. Configure hound to index this repo
  4. Attempt to search for a string in this file e.g. permanent

Something else that points to it being size-related is my first attempt to remove sensitive data replaced every character in a value string with a. This file was searchable in Hound, possibly because it was able to compress to a smaller size.

Splitting this file is not a suitable workaround, as it's possible that there are other files that are not searchable that we are not aware of.

Hmm, thank you for opening this bug. I wonder if it's related to the 32-bit-based indexing Hound uses (see #351). That would align with your attempt to replace everything with the letter a, since that would in theory create a much smaller index. That would unfortunately mean waiting until we have the time to rewrite the indexing to use 64-bit offsets.