Multiple files publishing with S3 location control

Question

Multiple files publishing with S3 location control

oshyshkin opened this issue 5 years ago · 6 comments

Hi,

I wonder if it is possible to store logs into multiple locations on S3. For example.
We have standard arguments like:

s3Bucket: my_bucket
s3Path: my_path

In the case, we will store logs to:

 s3://my_bucket/my_path/timestamp_host_cache

(according to a composeNamespacedCacheName in BufferPublisher)

But I would like to add some kind of partitioning. In my process, it is important to isolate job and its log. So the desired output would look as an example below:

 s3://my_bucket/my_path/job=value1/timestamp_host_cache 
 s3://my_bucket/my_path/job=value2/timestamp_host_cache 
etc.

Where value1 and value2 are arguments which we put in runtime somehow.
I'm not quite sure how we accumulate log data and if a behavior I described is possible.
Any ideas?

Thanks!

Answer 1 · 2019-08-29T00:26:14.000Z

Hello,

If possible, I suggest exploring the option of creating multiple appenders and loggers and have each logger used by a "job" in your example.

log4j.appender.S3AppenderJob1.s3Bucket=my_bucket
log4j.appender.S3AppenderJob1.s3Path=my_path/job_value1
log4j.appender.S3AppenderJob2.s3Bucket=my_bucket
log4j.appender.S3AppenderJob2.s3Path=my_path/job_value2
...

log4j.logger.job1=INFO, S3AppenderJob1, ConsoleAppender
log4j.logger.job2=INFO, S3AppenderJob2, ConsoleAppender
....

Then, in the code:

Logger logger = Logger.getLogger("job" + jobNumber);  // jobNumber = 1 or 2 or ....
logger.info(...);

This works if there is a manageable number of jobs. If you're dealing with 100s or more, then it gets crazy, and we'll need to add something to support this.

Let me know if the above works for your case.

Answer 2 · 2019-08-29T09:01:32.000Z

I do use a few loggers for different events (I need to store logs into separate locations) but for the current case, it won't work. The problem is that the number of jobs is dynamic. Actually, my application is a parser which takes a list of files and parses them elsewhere. 1 list = 1 job. And I don't control the job value (ID) so I can not set it ahead.

However, I can re-init logger in runtime at the beginning of each job. Something like (Kotlin syntax):

val builder = ConfigurationBuilderFactory.newConfigurationBuilder()
val parsedRecordsAppender = builder.newAppender("recordsAuditData", "Log4j2Appender")
        .addAttribute("s3Bucket", "my_bucket")
        .addAttribute("s3Path", "my_path/to/table/job_id=$jobId")
        .addAttribute("s3SseKeyType", "SSE_S3")

builder.add(parsedRecordsAppender)

val parsedRecordsLogger = LogManager.getLogger("RecordsAuditData")
val started = System.currentTimeMillis()
val now = System.currentTimeMillis()

// Loop for 3 minutes
while (now - started < TimeUnit.MINUTES.toMillis(1)) {
    logger.info("Another round through the loop. Job ID: $jobId")
    parsedRecordsLogger.info("Another round through the loop!", 1,2,3)
    parsedBundlesLogger.info("Another round through the loop!", 0,2,3)

    // Sleep for 7 seconds before logging messages again so we don't produce too much data
    Thread.sleep(TimeUnit.SECONDS.toMillis(7))

}

I'm not familiar with Java so I lookup code here and slightly modified it.
My XML config file:

<?xml version="1.0" encoding="UTF-8"?>

<Configuration status="INFO">
    <Appenders>
        <Console name="consoleAppender" target="SYSTEM_OUT">
            <PatternLayout pattern="[%d %t %-5p] - %c - %m%n"/>
        </Console>
        <Log4j2Appender name="recordsAuditData">
            
            <CsvParameterLayout delimiter=","/>
            <verbose>false</verbose>
            <stagingBufferAge>10</stagingBufferAge>
            <s3Region>us-east-1</s3Region>
        
        </Log4j2Appender>
        <Log4j2Appender name="bundlesAuditData">
            
            <CsvParameterLayout delimiter="," />
            <verbose>false</verbose>
            <stagingBufferAge>10</stagingBufferAge>
            <s3Region>us-east-1</s3Region>
        
        </Log4j2Appender>
        <!--async appenders should be in the end: https://logging.apache.org/log4j/2.x/manual/migration.html-->
        <!--Not sure it works -->
        <Async name="asyncRecords">
            <AppenderRef ref="recordsAuditData"/>
        </Async>
        <Async name="asyncBundles">
            <AppenderRef ref="bundlesAuditData"/>
        </Async>
    </Appenders>
    <Loggers>
        <Logger name="RecordsAuditData" level="INFO" additivity="false">
            <appender-ref ref="asyncRecords" level="INFO" />
        </Logger>
        <Logger name="BundlesAuditData" level="INFO" additivity="false">
            <appender-ref ref="asyncBundles" level="INFO" />
        </Logger>
        <Root level="INFO">
            <AppenderRef ref="consoleAppender" />
        </Root>
    </Loggers>
</Configuration>

As you see I'm trying to configure the logger programatically, however I'm not sure if my implementation is wrong or it won't work in general (I don't see output in s3://my_bucket/my_path/to/table/job_id=$jobId/ and there are no exceptions).
Am I right when I add your appender as Log4j2Appender?

Answer 3 · 2019-08-29T22:59:42.000Z

Hm. Kotlin is interesting. I've been wanting to learn.

Anyway. Back to business: I don't know how to programmatically add/modify the configuration from an existing log4j2.xml. However, if I build the configuration from scratch, it seems to work. For example, I don't even have a log4j2.xml but only one file main.kt, and its contents are:

import org.apache.logging.log4j.Level
import org.apache.logging.log4j.LogManager
import org.apache.logging.log4j.core.config.Configurator
import org.apache.logging.log4j.core.config.builder.api.ConfigurationBuilderFactory

fun main() {
    val jobId = 45
    val builder = ConfigurationBuilderFactory.newConfigurationBuilder()
    val parsedRecordsAppender = builder.newAppender("recordsAuditData", "Log4j2Appender")
        .addAttribute("s3Bucket", "my_bucket")
        .addAttribute("s3Path", "my_path/to/table/job_id=$jobId")
        .addAttribute("stagingBufferSize", "3")
        .add(builder.newLayout("PatternLayout").
            addAttribute("pattern", "%d{HH:mm:ss,SSS} [%t] %-5p %c{36} - %m%n"))

    builder.add(parsedRecordsAppender)
    builder.add(builder.newLogger("RecordsAuditData", Level.DEBUG, true).
        add(builder.newAppenderRef("recordsAuditData")).
        addAttribute("additivity", false))

    Configurator.initialize(builder.build())

    val logger = LogManager.getLogger("RecordsAuditData")

    logger.info("Another round through the loop.")
    logger.info("Another round through the loop!", 1,2,3)
    logger.info("Another round through the loop!", 0,2,3)
}

This seems to put something into my S3 bucket.

The program doesn't exit for some reason, but I think that's a different problem.

Answer 4 · 2019-08-30T11:45:02.000Z

I hope you liked Kotlin)

I tried to build a logger from scratch too but it didn't work for some reason. I think I need more time to play with it. I let you know how it is going next Monday.

Thanks for the example!

Answer 5 · 2019-11-18T05:00:01.000Z

Any progress with this? I'd like to check up with you before I close this.

Answer 6 · 2019-11-18T05:43:29.000Z

Well, I have tried a few examples but it didn't work for me.
For now, we postponed the project, so the issue is not relevant for the nearest future.
Thanks for asking!