- Easy integration, just write the files in the right folder
- Index documents on demand
- Asynchronous indexing
- Automatically handles adding, changing and deleting documents
- Embedded web server with search API
- Aggregates result into a single pdf file
- Serves images with resizing support
index.folder
: Apache Lucene index destinationdocs.folder
: Folder where to get the documents to indeximages.folder
: Folder where to get the images to servehighlight.enabled
: Highlight search matches in the output filehttp.port
: The port that the embedded web server will listen on when running to handle clients
stopwords.txt
: list of stop words (used by Lucene StopFilter)delimiters.txt
: list of custom word delimiters (used in tokenization)
mvn clean exec:java -Dexec.mainClass="br.com.wpivotto.Main"
mvn clean package
- Put any text file inside the configured folder.
Ex: C:\Filebox\Docs
- The system will extract the text from the document and index it automatically
- Open a browser window and access the url http:\\localhost:
PORT
\search?q=TERM
- All search results will be grouped into a single PDF document and made available for download
Returns a single PDF file with all document matches (See Lucene Query Syntax)
- URL
/search?q=[query]
-
Method:
GET
-
URL Params
Required:
q=[string]
-
Success Response:
- Code: 200
Content: Complete data stream of the file contents.
- Code: 200
-
Sample Call:
GET http://localhost:8099/search?q=test
- search in electrical projects
- search in instruction manuals
- search in calibration reports
- search in piping and instrumentation diagrams (P&ID)
- show pictures in embedded browsers (ActiveX) on SCADA Systems
- Handle text orientation in tokenizer
- Support for pattern replace (CAD files tend to export words without spacing)