Canop/rhit

cannot open any files, just get -- Error: no log file found in "..."

tstack opened this issue · 8 comments

Used cargo install on MacOS and I can't seem to get this working. I tried a couple of files that seem like they should work and it just reports the error mentioned in the title. I even copied a test log message from one of the source files and pasted it into a file, but that doesn't work either.

Canop commented

Can you please paste one of your log files somewhere ? If you don't want to have it public, you can poste it privately in miaou.

I tried the example logs from here:

https://github.com/elastic/examples/blob/master/Common%20Data%20Formats/nginx_logs/nginx_logs

And, as I said, I tried this log line from one of the tests:

static SIO_PULL_LINE: &str = r#"10.232.28.160 - - [22/Jan/2021:02:49:30 +0000] "GET /socket.io/?EIO=3&transport=polling&t=NSd_nu- HTTP/1.1" 200 99 "https://miaou.dystroy.org/3" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36""#;

It looks like the is_access_log_path() function is called against the filename I gave on the command-line (!?):

pub fn is_access_log_path(path: &Path) -> bool {

Checking for an explicit name like "access.log" is not going to work.

Canop commented

Ok. Sorry if I appear dense but what you're saying is that your files have a different naming pattern ?

Note: I just noticed rhit tests the name even on a single given file, I'll change that

you're saying is that your files have a different naming pattern ?

Yes, my logs (and I would guess many other folks), have a prefix on the file name (e.g. slserver.access.log).

I would suggest not looking at the file name and, instead, read the first line of the file and see if it can be parsed successfully.

Canop commented

Reading the first line won't be done in standard, because we first have to unzip files and it's the slowest operation. But I think I'll start with

  1. devising a more general naming pattern
  2. not checking the name when a single file is given (ie. not a directory)
  3. having an option to not check the names
Canop commented

With the last commit, rhit should be able to open your logs.

Feel free to reopen if you consider the problem isn't solved now.

Reading the first line won't be done in standard, because we first have to unzip files and it's the slowest operation.

Are you sure? I don't see how decompressing the first chunk of a file would be that expensive. You're also checking for the ".gz" file extension, do you know how GzDecoder::new() behaves if the stream is not gzipped? You might want to try to gunzip the file and fall back to treating it as a non-compressed file if that fails.

Canop commented

You're right. The penalty isn't very important (about 5% to 10% of the total time). With the --no-name-check option, all files are tried.