This project converts unstructured nginx log file data, into actionable structured data within a MySql database. The project is based on a Java spring boot + JPA (native MySql queries only :) stack, and is set up for local deployment using docker-compose.
Smart logs API utilizes a standard 3 tier architectural approach. There are 4 packages in total:
-
controller
: Web layerApi
: Exposes all web endpoints.
-
dao
: Data access layerDatabaseApiInterface
: Handles the direct interaction with the MySql DB.
-
model
: Business entitiesNginxLogfile
: Represents a single log fileNginxLogfileRow
: Represents a single line within such a file
-
service
: Most of the business logic goes hereDetokenizer
: Is the core of this system; takes in a single log file line, and outputs a map of the detokenized content.LogfileResolver
: Log files are stored in the cloud. Log file resolver takes in a "key" or uri, and fetches it. In this demo example it's just fetching a few static files from the target classpath, but the rest of the application doesn't know or care.NginxLogfileParser
: Parses an nginx log file and saves its data into the database.Token
: Represents a single expression in the collective token stream that makes up a line within a log file.
-
dto
: The objects that are actually returned from the API endpointsTop5Tuple
: The return type of the/top5
endpoint;{url, hits}
RangeTuple
: The return type of the/rate
endpoint;{url, dateTime, hits}
In addition to the above, you'll find the init.sql
in the root directory. You do not need to execute it, as docker-compose will
launch and initialize the database automatically.
-
A
POST
request is made to the/enqueue-log-files
API endpoint, containing an array of object keys as a request parameter. -
These keys are placed on a thread pool based queue, which uses the
LogfileResolver
service to fetch the actual file associated with a given key (from s3, etc.). -
The parsing process consists of 2 primary stages:
- parsing the "$log-format" config parameter to compose an exact list of expected tokens.
- parsing an actual line / token stream, and matching the actual content with the token templates compiled in the previous phase.
-
The newly structured data is persisted into the database.
Since all of the above parsing process in taking place on a separate thread, the
/top5
and /rate
API endpoints are available from launch time.
Note: In this demo stage number 1 is simulated by passing in the locations of static classpath resources.
Clone this repo, cd
into the project root directory, and then execute the docker-compose up
command.
The API is exposed on port 8080, the database is exposed on port 3306 (make sure that in your environment they're freed up).