Applying SOLID
principles to the application design, as a result we can scale easily:
- By making each
class
has a single job to do only (single responsibility principle) as theEnglishDocumentProcessor Class
that is responsible for processing of english documents only, but if we need to processspanish language documents
we will need to make a new class that handles spanish only. . - Our
EnglishDocumentProcessor Class
doesn't depend on the database engine (open-closed principle), so no matter the type of database our application uses, we cann connect to the database without any problems. - Every
Class
implements the interface that it needs it's method only (Interface segregation principle). - All
Classes
Depend on abstraction, not on concretions (Dependency inversion principle), there isn't anyClass
we implement that need to depend on the instantiation of another class, which allows forDecoupling
.
TextDocumentReader
(under thedocumentreader
Package): This is aTODO Class
that we can handle the scale of our search engine with, by parsing text documents and pass it to our language processors. We can also implmenet similiarClasses
that handle the parsing of special type only asPdfDocumentReader
andDoxDocumentReader
.EnglishDocumentProcessor
(under thelangdocumentprocessors
Package): This is the class responsible for implementing ourinverted index
,term frequency
,inverse document frequency
and sorting based onmultiplication of termFreq and inverseDocFreq
for english documents only.SqliteIndexWriter
(under thestoragewriter
Package): This class handles the insertion of words, documents and their ranking based on the sorting mechanism we use to ourSqliteDB
.SqliteIndexReader
(under thestoragereader
Package): this class handles the querying of the SqliteDB.
- We used
SqliteDB
as it is convenient for it's ease of use and the small scale we work on:- Created a database called ->
search_engine.db
. - Created table ->
CREATE TABLE search_engine (id INTEGER PRIMARY KEY AUTOINCREMENT, word VARCHAR, docs VARCHAR, sorting_score FLOAT)
. - Created an
index
on theword field
-> Due to we will alwaysfilter
based on theword
(WHERE word=='word we need to filter with'
). - our
search_engine.db Sqlite file
is under theresources directory
-> inside thedatabase directory
.
- Created a database called ->
- We unit tested our
EnglishDocumentProcessor Class
by Passinglist of documents
to it'sprocessDocument Method
(to give us Actual Result). - we implemented an
Array List
, then give it thewords
with theirdocuments
andtf*idf sorting score
for each word related to a specfici document (to be the Expected Result). - Used
Junit AssertEquals Method
to compare theExpected Result
andActual Result
.
- We make an integration testing to our
EnglishDocumentProcessor Class
andSqliteIndexWriter Class
by testing if theinsertion
of the result returned from theEnglishDocumentProcessor Class -> processDocument Method
will be successful to ourSqliteDB
or not.
- We make a functional testing by querying our
SqliteDB
usingSqliteIndexReader Class -> read Method
:read Method
acceptsword
we need to query fromSqliteDB
as a parameter.- Then issue an SQL statement ->
SELECT docs FROM search_engine WHERE word=='brown' ORDER BY sorting_score DESC
. - We then implemented an
Array List
and add to it the Expected Result ->[document 1, document 2]
- Then comparing the
Expected Result
to theActual Result
usingassertEquals
.