A collection of mostly very useful scripts containing various algorithms.
These scripts are provided as-is. There is no guarantee that they will work. You will need to understand them to use them in your projects.
Requests, fixes, suggestions, and new scripts are welcome. Please use issues to provide feedback.
- Column Locator detects text columns in a document
- Dynamic Fuzzy Search Locator POWERFUL fuzzy search a document for values from a previous locator!
- Compare 2 documents POWERFUL script that detects all differences between two documents
- NLP (Natural Language Prcessing)
- Passport MRZ Locator
- Run Previous Locators from Script VERY POWERFUL your script locators now know which locators they are dependent on and run then on-demand only if needed, saving you valuble time. Just press Test on the locator and everything is automatically calculated!
- UK VAT Locator look up VAT id's online at UK government. Only works inside UK.
- Webservice
- Scripting Field Formatters
- Fuzzy Field Formatter useful to make a spellchecker!
- Name Suggestor Demo
- UK VAT Formatter
- Fuzzy Validation Rule useful for finding unusual spellings and suggesting potential corrections
- Move Zones by Script
- Perform Zone OCR in script
- Register Zones on difficult pages.
- Automatically Generate Zone Locators from external coordinate data
- How to Use Table Locators
- Copy Zones into to Table
- Copy Subfields into a Table
- Fast Table Lassoing quickly and interactively select table columns and rows in the Validation Interface
- 3-way Line Item Matching demo a complete project showing Line Item Matching Locator, 3-way matching and interactive SQL database lookup in Validation
- Table Detection by Gridlines
- Table Extraction by Regex
- Table Header Pack Parser
- Insert Missing Rows into a Table automatically finds missing rows that the table locator missed
- Force Table Locator to use a particular algorithm the table locator has 5 internal algorithms that are all run and voted against. Here you decide which algorithm wins always
- Validate Table Rows with a Fuzzy Database
- Write Table to CSV
- Table Scripting Framework a powerfu& generic approach to enhance table locators
- How to customize any locator
- Force Format Locator to search across multiple lines the format locator only searches within each line of text. This makes it search further..
- Fuzzy search a database from script
- Database script functions
- Fuzzy search a dictionary
- update database per document POWERFUL changes a fuzzy database instantly per document. If you know who the document is from you can search ONLY for their address, phone number, date of birth - the database will contain no-one else
- Fuzzy Dictionary Substitution POWERFUL fuzzy search a document for words/phrases and return associated fields for these values
- Fast Table Lassoing demo video and script quickly and interactively select table columns and rows in the Validation Interface
- Custom Classification
- Page Classification
- Page Locators VERY POWERFUL * write locators at the page level*
- Paragraph Classification
- String Classification VERY POWERFUL classify any string, even a word or phrase!
- Text Layout Classification VERY POWERFUL a completely new classification strategy. No configuration required. It classifies a page based on the position of every word on the page. It is very sensitive to subtle changes between similar documents. If your forms only vary slightly, this will detect that!
- Find Left Margin of a Page very precise and fuzzy with sub-pixel accuracy for the left margin of a page. Useful for comparing two pages and paragraph detection
- Field Copy VERY POWERFUL This is the most important KT script! intelligently & recursively copy a field, locator, alternative, subfield, cell, row, xdoc into another. This script will dramitically simplify your own scripts and make them much more readable.
- File System Get All files, File_Exists, Dir_Exists, File_NameWithoutExtension etc
- Sorting Alternatives
- Fuzzy Match Text VERY POWERFUL fuzzy match any two pieces of text. 0%=no match, 100%=exact match
- IBAN validation
- JSON quick and dirty JSON parser
- Quicksort VERY POWERFUL sort alternatives fast by confidence, alphabetically, coordinates, page, textline, etc.
- String Regex *Split a string via regex. eg "2004-12-23" into "2004","12","23"
- Numbers to Text Convert numbers to text eg "1234" to "one thousand two hundred and thirty four". Useful for checking that numbers match their text form
- Write Fields to CSV
- Write Table to CSV
- Write Fields to Excel including colors, formats, images and more!
- Detect Page Size detects whether a page is A4, A3, US Letter, Foolscap, etc. Landscape vs Portrait. Works well on cropped images too
- Text Deskew *If a document is not deskewed before or during OCR the textlines can be messed up. This calculates the page skew AFTER OCR and then realigns all the words into their correct text lines.
- Convert PDF to TIFF VERY POWERFUL convert your PDF samples to TIFF while preserving the Text layer. Speeds locator testing x10 !
- Gibberish/Nonsense/Bad OCR Detection check if a document is mostly unreadable OCR or corrupted/encrpyted PDF. Useful for language detection as well
- How to read Russian Invoices
These are advanced scripting techniques to access to project and locator settings via script. This gives you the power to create, delete and edit classses, fields, locators, and almost any setting in the project. This is very dangerous and can destroy your projects. Also note that the Project Builder will not be updated with changes you make to the project and will cause GUI errors. Tread carefully and - you are on your own - don't expect support from Tech Support!