Bug: Big-ish files not being indexed
critchtionary opened this issue · 1 comments
Version 0.5.1, running in Docker.
In one of our repositories we have a ~150KB YAML file that Hound does not provide search results for. The issue seems somewhat related to the size of the file, as if I split the file into two separate files, both halves can be searched. However, we have another 2.2MB JSON file that search is working perfectly fine.
Steps to reproduce:
- Create a new Git repo
- Commit exampleyaml.txt (have replaced all YAML values with random strings to remove any sensitive data)
- Configure hound to index this repo
- Attempt to search for a string in this file e.g.
permanent
Something else that points to it being size-related is my first attempt to remove sensitive data replaced every character in a value string with a
. This file was searchable in Hound, possibly because it was able to compress to a smaller size.
Splitting this file is not a suitable workaround, as it's possible that there are other files that are not searchable that we are not aware of.
Hmm, thank you for opening this bug. I wonder if it's related to the 32-bit-based indexing Hound uses (see #351). That would align with your attempt to replace everything with the letter a
, since that would in theory create a much smaller index. That would unfortunately mean waiting until we have the time to rewrite the indexing to use 64-bit offsets.