bluedenim/log4j-s3-search

Extra end of line added when CsvParameterLayout is used

oshyshkin opened this issue · 10 comments

Hi. I'm using CsvParameterLayout to wite data in CSV format.
My config looks like:

<Log4j2Appender name="s3Appender">
    <CsvParameterLayout delimiter=","/>
    <verbose>false</verbose>

    <stagingBufferAge>1</stagingBufferAge>
    <s3Bucket>bucket</s3Bucket>
    <s3Path>path/</s3Path>

    <s3Region>us-east-1</s3Region>
</Log4j2Appender>

And I call logger like this:

logger.info("Another round through the loop!","0","0","0")

I expect to see data insisde file in the next format:

0,0,0
0,0,0
0,0,0

But instead I see:

0,0,0

0,0,0

0,0,0

image
I tried RollingFile appender and it looks fine. Probably there is some \n appending when S3Appender store data.

I am adding a line separator here: https://github.com/bluedenim/log4j-s3-search/blob/master/appender-core/src/main/java/com/van/logging/aws/S3PublishHelper.java#L84
However, maybe I didn't have to if I just add %n to the layout.

You rock! Thanks for a quick response.
Well, I'm not sure what is more suitable, I'll try to look at log4j2 sources and see how they handle it. But I guess it is easier to add a line separator later (in layout or message) than remove it when your log is written.

Release 2.3.0 should also address this. No config changes necessary from what you had. Please verify.

Everything looks good. Thank you!

The only thing bothers me is that each time you start logging it creates a small file and only then it starts to accumulate messages. For example:
image
My task is to store audit logs to S3 for a program which works in batch kinda style. So if I run it 10k times I'll have 10k extra small files, which is not good for S3. That's not a problem for me now, just wondering why does it work in such way.

Secondly, do you know approximately how long does your batch program take to finish each time you run it? Also: how long does the program take for the FIRST run?

Not sure about the first time, but it should be the same as on average: 15-20 minutes.

I assume you’ve tuned the stagingBufferSize or stagingBufferAge parameters. The example programs’ values (especially stagingBufferSize since it says to publish as soon as it gathers 10 lines of logs) are very small to both allow easier testing and to verify early the set up is correct. Since there is no easy way to “append” to an S3 key, each publish will be to a different key.

I see. As you mentioned, I do tune buffer parameters and I set them very low (for example I write logs to a bucket each minute) for debugging purpose. The only thing bothers me is the first file because it is super small in comparison to the rest files. But I think the reason is the next: ...it says to publish as soon as it gathers 10 lines of logs

... allow easier testing and to verify early the set up is correct.

Yes, I know the problem. Maybe as an improvement, we can try put some test file (without log messages) with a standard name like __S3_APPENDER_SUCCESS__ so it will be overwritten each time (the only small file) and you can simply delete it if needed. Anyhow, it is not a problem for me now.

Thank you for the advice! I'll try to play with buffer arguments to see how it affects logging and let you know if I find something.