Personal challenge to write some code every day for 100 days.
Runtime comparison of different ways to count the logs message notifyStuckThreadDetected
in tomcat logs.
Insights
- A basic Node.js is faster than
grep
orfgrep
. - Currently
rg
(ripgrep) outperforms all approaches (so far). - Running things parallel is not faster than
fs.readFileSync
in this scenario. - See: report-task-1.md
Repeat the example in a new language: Go
First steps (https://go.dev/doc/tutorial/getting-started)
# Init module
go mod init tilmanschweitzer/tomat-stuck-thread-log-parse
# Execute
go run .
Insights
- Some aspects like appending strings to arrays are more different as I expected
- Executing time is 2-3 times faster than with Node.js
- Possible next step: learn to write more idiomatic Go code or parallelize the execution.
First draft to implement a concurrent Go version.
Insights
- Ran into issues with too many open file handles
- Ran into errors with semacquire when processing to many files parallel
- Output is still unordered
Rework the concurrent Go implementation to create an ordered output and fix the other issues.
- The approach to order the output does not feel elaborate -> TODO: Read more Go code to find better approaches
- TODO: Refactor the code
Add synchronous python implementation as preparation for an async version tomorrow.
Add Node.js implementation using worker threads to speed up parsing each file.
Insights
- Reworked implementation with worker threads is not so far away from the Go speed
- Good example to understand the difference between I/Ointensive and CPU-intensive workload
- Helpful to compare different parallelization concepts between Go, Python and Node.js
Reorganize files and add basic JS tests.
Insights
- Currently, only jasmine seems to work with ES modules out-of-the-box.
- I should use TDD for further features
Add sync implementation in Java.
Insights
- Execution time is much better than I expected and comparable to Node.js and Python
- Implementing the basic version felt much smoother than I expected
- Parallelizing the sync implementation with Streams.parallel() much simpler than all previous parallel implementations (still unordered in the first attempt)
Add async and ordered implementation in Java.
Activate the async implementation with the parameter --async
.
Split up the implementation into separate clases with a common interface and abstract class.
Insights
- Executors are very easy to handle and creating an ordered output from a multi-core parallelization feels much easy than in Node.js or Go
Add basic JUnit 5 test setup for Java
First attempt to parse stack traces to be able to generate statistical reports.
Add test case with multiple stuck threads and try to extract meaningful information from the stack traces.
Insights
- Currently, the direction of the features is not clear, and I feel a little stuck with the next steps
- It should be possible add parameters to ignore the new statistic feature to be able to still compare the speed between implementations
Refactoring of the whole java implementation. Allow to enable different handlers via commandline parameter to ignore costly but unnecessary calculations. Split up the parser and the stuck thread handlers to meet to open-closed principle.
Insights
- The open-closed principle really helped to reason about the relationship between the parser and the handler
- The existing test cases helped a lot to find defects
- Tests should cover more lines
Use one StuckThreadHandler per file to avoid concurrency issues. The new approach with the separation of the parser lead to a shared state in the handler instances. This caused race conditions and ConcurrentModificationExceptions when started with the --async parameter.
Insights
- It's not necessary to make everything immutable, but Suppliers can help to provide new instances for every potentially concurrent operations.
Remove coupling to static Files functions allow tests without mocking static functions. The implementation relied on the methods Files.readLines and Files.walk. With this dependency to the global state of the file system it is hard to tests parts of the functionality. Therefore, I added interfaces to remove the direct dependency to these static functions
Add tests to check how the parsers walk through the files.
Add heuristic to weight the code lines depending on the line. Usually, code lines higher in the stack trace contain more meaningful information than the later framework or filter code. Therefore, a weight based on the average line number of the code line can help to get more useful reports.
Add tests for StuckThread and refactor related code parts.
Extract core classes into separate maven module to create a stricter barrier between the parser, and the commandline interface.
Move print function to commandline app to separate interface.
Replace LogFileParserResult interface with a generic type to make the parser completely independent of the String output (getPrintableResult
).
Add test setup with bats-core to verify commandline output of different implementations
Start with sudoku-solver as new challenge. Implement first draft of a java application to parse sudoku data. Use dataset under public domain with 9 million sudokus: https://www.kaggle.com/datasets/rohanrao/sudoku
Add validation methods to check the correctness of sudoku solutions as preparation for the backtracking implementation of the solver.
Add SudokuSolver interface and a backtracking implementation. Implement an ExecutionTimer class to be able to measure the execution time and to quantify improvements.
Add first implementation DeductiveSudokuSolver which needs to be supported by a fallback solver for some sudokus, but it already speeds up the execution time.
Extend DeductiveSudokuSolver to be able to solve more sudokus by ruling out possible values indirectly. Still, some sudokus need more complicated deductive methods and need the BacktrackingSolver as fallback.
Split classes up into multiple packages. Clean up several code parts and separate deduction levels as speed optimization. Add toString method to analyze the internal model of the deductive solver.
Extend DeductiveSudokuSolver with a method to solve conjugate pairs.
Extend DeductiveSudokuSolver with a method to find and rule out XY wings.
Use advent of code 2020 as TDD exercise.
Extract folders as custom repositories.
# sudoku-solver
git filter-branch -f --prune-empty --msg-filter 'sed "s/sudoku-solver: //g"' --subdirectory-filter sudoku-solver/java master
git remote set-url origin git@github.com:tilmanschweitzer/sudoku-solver.git
New repository: sudoku-solver
# tomcat-stuck-thread-log-parser
git filter-branch -f --prune-empty --msg-filter 'sed "s/tomcat-stuck-thread-log-parser: //g"' --subdirectory-filter tomcat-stuck-thread-log-parser master
git remote set-url origin git@github.com:tilmanschweitzer/tomcat-stuck-thread-log-parser.git
New repository: tomcat-stuck-thread-log-parser
# advent-of-code
git filter-branch -f --prune-empty --msg-filter 'sed "s/advent-of-code: //g"' --subdirectory-filter advent-of-code master
git remote set-url origin git@github.com:tilmanschweitzer/advent-of-code.git
New repository: advent-of-code
- Add tests for Node.js (and check out test libs for Go)
- Implementation in Java?
- Add statistics on the stack traces of the stuck threads to find similarities quickly