Extra end of line added when CsvParameterLayout is used

Question

Extra end of line added when CsvParameterLayout is used

oshyshkin opened this issue 5 years ago · 10 comments

Hi. I'm using CsvParameterLayout to wite data in CSV format.
My config looks like:

<Log4j2Appender name="s3Appender">
    <CsvParameterLayout delimiter=","/>
    <verbose>false</verbose>

    <stagingBufferAge>1</stagingBufferAge>
    <s3Bucket>bucket</s3Bucket>
    <s3Path>path/</s3Path>

    <s3Region>us-east-1</s3Region>
</Log4j2Appender>

And I call logger like this:

logger.info("Another round through the loop!","0","0","0")

I expect to see data insisde file in the next format:

0,0,0
0,0,0
0,0,0

But instead I see:

0,0,0

0,0,0

0,0,0

I tried RollingFile appender and it looks fine. Probably there is some \n appending when S3Appender store data.

Answer 1 · 2019-07-18T16:44:55.000Z

Ok. I will look into this also Get Outlook for Android<https://aka.ms/ghei36>

…

________________________________ From: oshyshkin <notifications@github.com> Sent: Thursday, July 18, 2019 3:56:39 AM To: bluedenim/log4j-s3-search Cc: Subscribed Subject: [bluedenim/log4j-s3-search] Extra end of line added when CsvParameterLayout is used (#42) Hi. I'm using CsvParameterLayout to wite data in CSV format. My config looks like: <Log4j2Appender name="s3Appender"> <CsvParameterLayout delimiter=","/> <verbose>false</verbose> <stagingBufferAge>1</stagingBufferAge> <s3Bucket>bucket</s3Bucket> <s3Path>path/</s3Path> <s3Region>us-east-1</s3Region> </Log4j2Appender> And I call logger like this: logger.info("Another round through the loop!","0","0","0") I expect to see data insisde file in the next format: 0,0,0 0,0,0 0,0,0 But instead I see: 0,0,0 0,0,0 0,0,0 [image]<https://user-images.githubusercontent.com/13902645/61452094-a49b2c00-a963-11e9-8482-e78e0cf95377.png> I tried RollingFile appender and it looks fine. Probably there is some \n appending when S3Appender store data. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#42?email_source=notifications&email_token=AAOPF6AJQHBZO2U56EDOBJLQABD6PA5CNFSM4IEZPEVKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G77DPMA>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAOPF6DDQC6Y6RYYCKWDY53QABD6PANCNFSM4IEZPEVA>.

Answer 2 · 2019-07-19T05:40:53.000Z

I am adding a line separator here: https://github.com/bluedenim/log4j-s3-search/blob/master/appender-core/src/main/java/com/van/logging/aws/S3PublishHelper.java#L84
However, maybe I didn't have to if I just add %n to the layout.

Answer 3 · 2019-07-19T08:12:03.000Z

You rock! Thanks for a quick response.
Well, I'm not sure what is more suitable, I'll try to look at log4j2 sources and see how they handle it. But I guess it is easier to add a line separator later (in layout or message) than remove it when your log is written.

Answer 4 · 2019-07-19T17:07:59.000Z

I think for logging its better to defer that to the pattern. Tbh i think its a bug to add it in code. But i want to see how the cvs pattern affects the solr and elasticsearch integration since those publishers may assume each mesg to be on a new line. Will update Get Outlook for Android<https://aka.ms/ghei36>

…

________________________________ From: oshyshkin <notifications@github.com> Sent: Friday, July 19, 2019 1:12:03 AM To: bluedenim/log4j-s3-search <log4j-s3-search@noreply.github.com> Cc: Van Ly <vancly@hotmail.com>; Comment <comment@noreply.github.com> Subject: Re: [bluedenim/log4j-s3-search] Extra end of line added when CsvParameterLayout is used (#42) You rock! Thanks for a quick response. Well, I'm not sure what is more suitable, I'll try to look at log4j2 sources and see how they handle it. But I guess it is easier to add a line separator later (in layout or message) than remove it when your log is written. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#42?email_source=notifications&email_token=AAOPF6BEHBTGHFFCBTLJ2VDQAFZNHA5CNFSM4IEZPEVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2K5VFY#issuecomment-513137303>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAOPF6E2BGMGLAAHOBLMMSDQAFZNHANCNFSM4IEZPEVA>.

Answer 5 · 2019-07-21T01:49:42.000Z

Release 2.3.0 should also address this. No config changes necessary from what you had. Please verify.

Answer 6 · 2019-07-21T08:54:20.000Z

Everything looks good. Thank you!

Answer 7 · 2019-07-21T09:00:58.000Z

The only thing bothers me is that each time you start logging it creates a small file and only then it starts to accumulate messages. For example:

My task is to store audit logs to S3 for a program which works in batch kinda style. So if I run it 10k times I'll have 10k extra small files, which is not good for S3. That's not a problem for me now, just wondering why does it work in such way.

Answer 8 · 2019-07-21T16:05:41.000Z

Some possibilities: I assume you’ve tuned the stagingBufferSize or stagingBufferAge parameters. The example programs’ values (especially stagingBufferSize since it says to publish as soon as it gathers 10 lines of logs) are very small to both allow easier testing and to verify early the set up is correct. Since there is no easy way to “append” to an S3 key, each publish will be to a different key. In practice, you should increase stagingBufferSize to fit your needs once everything is set up. Secondly, do you know approximately how long does your batch program take to finish each time you run it? Also: how long does the program take for the FIRST run? One limitation of the approach of a in-process logger like log4j-s3-search is that it will die when you program dies. So if any run of your batch program lasts only for a short time and finishes, then vey little log is collected. And as your program exits, the appender flushes and publishes on cleanup. An alternative is to run a separate log aggregator program with log4j-s3-search to collect log events to then publish on its own so that it has an independent life cycle than your batch program. Your batch program can send log events to it as it runs. This is the approach taken by some analytics packages. The problem with this approach is that it increases the complexity of the infrastructure a bit because now you have to ensure the aggregator program is always up and running and your batch program can communicate with it efficiently. In summary, I don’t know if it’s possible now, but if you can get your batch to run longer before exiting and set up stagingBufferSize to be large enough, the S3 entries should be larger. Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

…

________________________________ From: oshyshkin <notifications@github.com> Sent: Sunday, July 21, 2019 2:00:59 AM To: bluedenim/log4j-s3-search <log4j-s3-search@noreply.github.com> Cc: Van Ly <vancly@hotmail.com>; Comment <comment@noreply.github.com> Subject: Re: [bluedenim/log4j-s3-search] Extra end of line added when CsvParameterLayout is used (#42) The only thing bothers me is that each time you start logging it creates a small file and only then it starts to accumulate messages. For example: [image]<https://user-images.githubusercontent.com/13902645/61589134-88d79600-abae-11e9-8cc5-e8953b449dd2.png> My task is to store audit logs to S3 for a program which works in batch kinda style. So if I run it 10k times I'll have 10k extra small files, which is not good for S3. That's not a problem for me now, just wondering why does it work in such way. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#42?email_source=notifications&email_token=AAOPF6EM54MZG63J2YLJCVLQAQQUXA5CNFSM4IEZPEVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2N7FDY#issuecomment-513536655>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAOPF6CMOTZBA67PSVTY2ZDQAQQUXANCNFSM4IEZPEVA>.

Answer 9 · 2019-07-21T16:09:52.000Z

You can experiment with https://github.com/bluedenim/log4j-s3-search/tree/master/appender-log4j2-sample. Run the program for 3+ minutes each time and adjust the stagingBufferSize here: https://github.com/bluedenim/log4j-s3-search/blob/master/appender-log4j2-sample/src/main/resources/log4j2.xml#L15 to a larger and larger value between runs. You should see the S3 entries getting bigger in size (and fewer in numbers). Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

…

________________________________ From: Van Ly <vancly@hotmail.com> Sent: Sunday, July 21, 2019 9:05:34 AM To: bluedenim/log4j-s3-search <reply@reply.github.com> Subject: RE: [bluedenim/log4j-s3-search] Extra end of line added when CsvParameterLayout is used (#42) Some possibilities: I assume you’ve tuned the stagingBufferSize or stagingBufferAge parameters. The example programs’ values (especially stagingBufferSize since it says to publish as soon as it gathers 10 lines of logs) are very small to both allow easier testing and to verify early the set up is correct. Since there is no easy way to “append” to an S3 key, each publish will be to a different key. In practice, you should increase stagingBufferSize to fit your needs once everything is set up. Secondly, do you know approximately how long does your batch program take to finish each time you run it? Also: how long does the program take for the FIRST run? One limitation of the approach of a in-process logger like log4j-s3-search is that it will die when you program dies. So if any run of your batch program lasts only for a short time and finishes, then vey little log is collected. And as your program exits, the appender flushes and publishes on cleanup. An alternative is to run a separate log aggregator program with log4j-s3-search to collect log events to then publish on its own so that it has an independent life cycle than your batch program. Your batch program can send log events to it as it runs. This is the approach taken by some analytics packages. The problem with this approach is that it increases the complexity of the infrastructure a bit because now you have to ensure the aggregator program is always up and running and your batch program can communicate with it efficiently. In summary, I don’t know if it’s possible now, but if you can get your batch to run longer before exiting and set up stagingBufferSize to be large enough, the S3 entries should be larger. Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

________________________________ From: oshyshkin <notifications@github.com> Sent: Sunday, July 21, 2019 2:00:59 AM To: bluedenim/log4j-s3-search <log4j-s3-search@noreply.github.com> Cc: Van Ly <vancly@hotmail.com>; Comment <comment@noreply.github.com> Subject: Re: [bluedenim/log4j-s3-search] Extra end of line added when CsvParameterLayout is used (#42) The only thing bothers me is that each time you start logging it creates a small file and only then it starts to accumulate messages. For example: [image]<https://user-images.githubusercontent.com/13902645/61589134-88d79600-abae-11e9-8cc5-e8953b449dd2.png> My task is to store audit logs to S3 for a program which works in batch kinda style. So if I run it 10k times I'll have 10k extra small files, which is not good for S3. That's not a problem for me now, just wondering why does it work in such way. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#42?email_source=notifications&email_token=AAOPF6EM54MZG63J2YLJCVLQAQQUXA5CNFSM4IEZPEVKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2N7FDY#issuecomment-513536655>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AAOPF6CMOTZBA67PSVTY2ZDQAQQUXANCNFSM4IEZPEVA>.

Answer 10 · 2019-07-22T06:32:28.000Z

Secondly, do you know approximately how long does your batch program take to finish each time you run it? Also: how long does the program take for the FIRST run?

Not sure about the first time, but it should be the same as on average: 15-20 minutes.

I assume you’ve tuned the stagingBufferSize or stagingBufferAge parameters. The example programs’ values (especially stagingBufferSize since it says to publish as soon as it gathers 10 lines of logs) are very small to both allow easier testing and to verify early the set up is correct. Since there is no easy way to “append” to an S3 key, each publish will be to a different key.

I see. As you mentioned, I do tune buffer parameters and I set them very low (for example I write logs to a bucket each minute) for debugging purpose. The only thing bothers me is the first file because it is super small in comparison to the rest files. But I think the reason is the next: ...it says to publish as soon as it gathers 10 lines of logs

... allow easier testing and to verify early the set up is correct.

Yes, I know the problem. Maybe as an improvement, we can try put some test file (without log messages) with a standard name like __S3_APPENDER_SUCCESS__ so it will be overwritten each time (the only small file) and you can simply delete it if needed. Anyhow, it is not a problem for me now.

Thank you for the advice! I'll try to play with buffer arguments to see how it affects logging and let you know if I find something.