This project has been developed as part of the course "Advanced Software Engineering" at the BHT. It is a commandline tool containing the basic functionality of searching for one or more search patterns inside a raw text string, a text file or a directory containing several txt-files. As can be seen in the UML diagrams and the DDD files, the vision of the project is to provide a tool that can be used to search for information in a smart way by using different search algorithms and NLP techniques.
This tool has been developed and tested using Python 3.9.
To install this commandline tool, execute the following commands in your terminal:
git clone https://github.com/bogdankostic/SmartSearch.git
cd SmartSearch
sudo -H ./install.sh
After installation, you can use this tool directly from the commandline by executing the following command:
search [-h] [-n] [-i] SEARCH_PATTERN [SEARCH_PATTERN ...] TEXT_INPUT
Positional Arguments:
- SEARCH_PATTERN – Search pattern to search for in the provided text inputs
- TEXT_INPUT – Raw text, text file or directoy containing .txt-files
Optional Arguments:
- -h / --help – Show help message explaining how to use this tool
- -n / --naive – Use naive string matching algorithm instead of Boyer-Moore algorithm
- -i / --case-insensitive – Perform case-insensitive search
Each match is printed on a new line with the following tab-seperated formats:
SEARCH_PATTERN \t POSITION_IN_TEXT/FILE
FILE_NAME \t SEARCH_PATTERN \t POSITION_IN_FILE
Throughout the project, Git and GitHub were used as tools for version control. The commit history can be found here.
The directory uml
contains the following UML
diagrams as images and PlantUML files:
- Class Diagram
- Sequence Diagram
- Component Diagram
The event storming, the core domain diagram, and the relationship mapping can be found in the
domain_driven_design.pdf
file.
Code Metrics are tracked on Coveralls for test coverage and SonarCloud for code quality.
Examples of clean code development principles used in the project:
- Don't Repeat Yourself: The project uses functions and classes to avoid code duplication.
Example: Input validation inBaseMatcher
. - Usage of type hints to specify the types of function arguments and return values.
Example - Usage of docstrings to document functions and classes.
Example - Usage of meaningful variable and function names.
Example - Short functions that do one thing.
Example
My personal clean code development cheat sheet can be found in
the clean_code_cheat_sheet.md
file.
The project uses GitHub Actions for continuous integration and delivery. The workflow can be found here. The workflow is triggered on every push to the main branch and runs the tests, measures the test coverage, and uploads the coverage report to Coveralls.
The unit tests for the project can be found in the test
directory.
The tests can be executed by running the following command:
pytest test/
Throughout the project, the PyCharm IDE was used. My favorite key shortcuts are:
- ⌘ Command + ⇧ Shift + F: Search in all files
- ⌘ Command + ⇧ Shift + R: Replace in all files
- ⌘ Command + /: Comment/uncomment code
- ⌘ Command + K: Commit changes
- ⌘ Command + ⇧ Shift + K: Push changes
As the main use case for the SmartSearch application is to find information by executing search queries, the Domain Specific Langauge for the SmartSearch project could be inspired by SQL or a similar query language. An example of a query that would use most of the features of the DSL is:
SELECT
document
FROM
document_idx
WHERE
exact_search('software engineering')
AND semantic_search('How to construct a DSL?', similarity = 0.8)
AND meta.year >= 2021
This query would search for documents that contain the exact phrase 'software engineering',
documents that are semantically similar to 'How to construct a DSL?' with a similarity threshold of 80%,
and documents that have the meta field 'year'
greater than or equal to 2021.
Using a DSL that is inspired by SQL has many benefits. SQL is a powerful language that already comes with many features that would be useful for the SmartSearch application, such as aggregation functions, sorting, and filtering. Furthermore, many developers are already familiar with SQL, so they would be able to use the DSL without much additional training. They would just need to learn the specific functions and features of the SmartSearch DSL.
The project follows aspects of functional programming, for example:
- Final data structures: data structures used are immutable, for example tuples (see here).
- Side-effect free functions: functions are designed to be side-effect free, for example the
search
function in theNaiveMatcher
class (see here). - Anonymous functions: lambda functions are used to define a
defaultdict
(see here).
There was no need for higher-order functions and using functions as parameters and return values in the project, so I
created the file functional_programming.py
to demonstrate these aspects of functional programming.