/aszalo

A general purpose search interface for annotated sentences inspired by Mazsola (Verb Argument Browser)

Primary LanguagePythonGNU Lesser General Public License v3.0LGPL-3.0

Aszaló -- a general purpose search interface for annotated sentences inspired by Mazsola (Verb Argument Browser)

Mazsola (Verb Argument Browser) is a web application built upon the 28 million syntactically analysed sentences and 500,000 verb structures data set. The database is actually a TSV file (containing tabulated data) on which Mazsola uses the grep command to serve queries that can be specified on the web interface.

The platform returns examples from the corpus for the selected features of verb argument structures (e.g. the case fragment of the verb argument). The radio buttons on the interface can be used used to select a criteria from the available feature set to classify the results according to it, and further criteria can be used to specifically narrow the search. The platform lists the results in order of importance (salience) according to the selected feature values. This provides an oveview on the multitude of examples returned.

Aszaló generalizes the basic idea of Mazsola to modern technologies, other databases and usage needs (configurable field list, SQL database, JSON export, etc.).

Features

  • Ability to configure fields and set default values in the HTML form
  • Permalink for each search
  • Ability to handle sparse features by separate SQL tables
  • JSON and TSV export of the query result (ready for machine/non-interactive usage)
  • CLI frontend

Setup

  1. This software is tested on Python 3.10 on Linux
  2. Install requirements from requirements.txt
  3. Create config.yaml with the appropriate values (see Configuration section) and create an SQLite database
  4. Run the application:
    • app class in main.py with an ASGI server like uvicorn e.g. uvicorn main:app
    • Run main.py with CLI arguments

Configuration

You are encouraged to create your really own database in Aszaló DB schema! :)

There are plenty of example scripts, configurations and documentation is provided to start with. Details on the configurations and advanced options can be found here.

In case of questions, feel free to ask!

Examples

There are example configs and scripts bundled to be able to easily start, using the following databases:

Web UI usage examples

CLI usage examples

  • The actual corpus forms for the agyon preverb in TSV format: python3 main.py --prev agyon --sort actform
  • The actual corpus forms for the agyon preverb in JSON format: python3 main.py --prev agyon --sort actform --format JSON
  • The actual corpus forms for the agyon preverb in HTML format, limited to the 30th-40th occurrence: python3 main.py --prev agyon --sort actform --limit 10 --page 2

License

This project is licensed under the terms of the GNU LGPL 3.0 license.

Acknowledgement

This software is inspired by but has no common part with Mazsola. The authors of this software would like to gratefully thank Bálint Sass and Ágnes Kalivoda for their great databases which can be used in this software. Aszaló could not be created without the initial idea and implementation of Bálint Sass.

The authors created this software in the hope of encouraging researchers to create databases similar to the aforementioned ones, as these help the corpus linguist community to gain valuable insights into the data they are using.