Aszaló -- a general purpose search interface for annotated sentences inspired by Mazsola (Verb Argument Browser)
Mazsola
(Verb Argument Browser)
is a web application built upon the
28 million syntactically analysed sentences and 500,000 verb structures
data set.
The database is actually a TSV file (containing tabulated data) on which Mazsola uses the grep
command
to serve queries that can be specified on the web interface.
The platform returns examples from the corpus for the selected features of verb argument structures (e.g. the case fragment of the verb argument). The radio buttons on the interface can be used used to select a criteria from the available feature set to classify the results according to it, and further criteria can be used to specifically narrow the search. The platform lists the results in order of importance (salience) according to the selected feature values. This provides an oveview on the multitude of examples returned.
Aszaló generalizes the basic idea of Mazsola to modern technologies, other databases and usage needs (configurable field list, SQL database, JSON export, etc.).
- Ability to configure fields and set default values in the HTML form
- Permalink for each search
- Ability to handle sparse features by separate SQL tables
- JSON and TSV export of the query result (ready for machine/non-interactive usage)
- CLI frontend
- This software is tested on Python 3.10 on Linux
- Install requirements from requirements.txt
- Create config.yaml with the appropriate values (see Configuration section) and create an SQLite database
- Run the application:
app
class inmain.py
with an ASGI server like uvicorn e.g.uvicorn main:app
- Run
main.py
with CLI arguments
You are encouraged to create your really own database in Aszaló DB schema! :)
There are plenty of example scripts, configurations and documentation is provided to start with. Details on the configurations and advanced options can be found here.
In case of questions, feel free to ask!
There are example configs and scripts bundled to be able to easily start, using the following databases:
- PrevCons created by Ágnes Kalivoda (the database is also bundled and the created demo service is available at: https://aszalo.onrender.com/ )
- Mazsola (Verb Argument Browser) created by Bálint Sass (the database is too large to be included in this repository, but the conversion script is included in the scripts directory)
- The actual corpus forms for the
agyon
(lit. to death) preverb in HTML format: https://aszalo.onrender.com/?prev=agyon&sort=actform - The actual corpus forms for the
agyon
preverb in TSV format: https://aszalo.onrender.com/?prev=agyon&sort=actform&format=TSV - The actual corpus forms for the
agyon
preverb in JSON format: https://aszalo.onrender.com/?prev=agyon&sort=actform&format=JSON - The actual corpus forms for the
agyon
preverb in HTML format, limited to the 30th-40th occurrence: https://aszalo.onrender.com/?prev=agyon&sort=actform&limit=10&page=2
- The actual corpus forms for the
agyon
preverb in TSV format:python3 main.py --prev agyon --sort actform
- The actual corpus forms for the
agyon
preverb in JSON format:python3 main.py --prev agyon --sort actform --format JSON
- The actual corpus forms for the
agyon
preverb in HTML format, limited to the 30th-40th occurrence:python3 main.py --prev agyon --sort actform --limit 10 --page 2
This project is licensed under the terms of the GNU LGPL 3.0 license.
This software is inspired by but has no common part with Mazsola. The authors of this software would like to gratefully thank Bálint Sass and Ágnes Kalivoda for their great databases which can be used in this software. Aszaló could not be created without the initial idea and implementation of Bálint Sass.
The authors created this software in the hope of encouraging researchers to create databases similar to the aforementioned ones, as these help the corpus linguist community to gain valuable insights into the data they are using.