Converts CHAT, FoLiA, PaQu metadata, plaintext and TEI XML files to Alpino XML files. Each sentence in the input file is parsed separately.
pip install corpus2alpino
corpus2alpino -s localhost:7001 folia.xml -o alpino.xml
Or from project root:
python -m corpus2alpino -s localhost:7001 folia.xml -o alpino.xml
from corpus2alpino.converter import Converter
from corpus2alpino.annotators.alpino import AlpinoAnnotator
from corpus2alpino.collectors.filesystem import FilesystemCollector
from corpus2alpino.targets.memory import MemoryTarget
from corpus2alpino.writers.lassy import LassyWriter
alpino = AlpinoAnnotator("localhost", 7001)
converter = Converter(FilesystemCollector(["folia.xml"]),
# Not needed when using the PaQuWriter
annotators=[alpino],
# This can also be ConsoleTarget, FilesystemTarget
target=MemoryTarget(),
# Set to merge treebanks, also possible to use PaQuWriter
writer=LassyWriter(True))
# get the Alpino XML output, combined into one treebank XML file
parses = converter.convert()
print(''.join(parses)) # <treebank><alpino_ds ... /></treebank>
It is possible to add custom properties to (existing) Lassy/Alpino files. This is done using a csv-file containing the node attributes and values to look for and the custom properties to assign.
For example:
python -m corpus2alpino tests/example_lassy.xml -e tests/enrichment.csv -of lassy
See corpus2alpino.annotators.enrich_lassy
for more information.
python -m unittest
See: https://packaging.python.org/tutorials/packaging-projects/#generating-distribution-archives
Make sure setuptools
and wheel
are installed. Then from the virtualenv:
pip install build
python -m build
twine upload dist/*
- Alpino parser running as a server:
Alpino batch_command=alpino_server -notk server_port=7001
- Python 3.8 or higher
- libfolia-dev
- libxml2-dev
sudo apt install libfolia-dev libxml2-dev
pip install -r requirements.txt