rkalla/cloudfront-log-parser

Not easy for caller to handle specific exception cases well

Opened this issue · 3 comments

Right now the parse method throws:
IOException - any IO issue, including creating the original GZipInputStream
RuntimeException - malformed content

Throwing IOException is great, except it is also thrown when the private GZipInputStream is closed which the catcher would have no access to, to try and correctly close or dispose of it.

Throwing RuntimeException is a bit misleading as the problem is more specific with that; malformed content.

The caller would want to know that so they could skip the log file or discard it, as opposed to retrying the parse operation just incase they thought it was a library or memory or actual VM-runtime failure that caused the original issue.

Fixed. parse has the following behavior:

  1. IllegalArgumentException - thrown when something wrong with the method args.
  2. IOException - thrown when the underlying InputStream cannot be read from or some other IO problem with it.
  3. MalformedContentException - thrown when the parser finds incorrectly formatted CloudFront logs.
  4. RuntimeException - thrown when parse() tries to cleanly close the internal GZipInputStream but fails to. The caller don't need to worry about this as much as he should just make sure to close his own input stream he passed in.

Reopened issue to try and troubleshoot GitHub Milestone automatically closing itself and becoming unusable, it's a GitHub bug:
http://support.github.com/discussions/issues-issues/594-milestone-not-used

Additionally, protected "parseFieldsDirective" method was updated to throw the MalformedContentException as well if the log type could not be determined by parsing the Fields directive which it should be able to do.

It checks up to 5 field names unique to DOWNLOAD and STREAMING log files to determine type; if it cannot, there is something wrong with the log file content.