This repository includes a Python script that can produce graphical renderings of the dependency parse trees as returned by the Rosette API syntax/dependencies
endpoint. The script relies on GraphViz to produce SVG image files that can be viewed in a web browser.
The Python script deps_to_graph.py
is written for Python 3 (specifically Python 3.6.0, but should be compatible with most versions of Python 3). To install the Python dependencies it's recommended to setup a virtual environment with virtualenv
. To begin, make sure virtualenv
is installed and up to date with pip
:
$ pip3 install -U virtualenv
Create a virtual environment in the current directory:
$ python3 $(which virtualenv) .
Then, activate the virtual environment:
$ source bin/activate
Finally, install the Python 3 dependences from the requirements.txt
file:
$ pip3 install -r requirements.txt
deps_to_graph.py
relies on the dot
program from the GraphViz package to produce SVG images, so the last step is to install GraphViz. GraphViz can be installed from most popular managers (Homebrew, RPM, etc.). Once you've installed the GraphViz package, you can check if the dot
binary is available:
$ which dot
If the above command prints a path to the dot
binary, and you've followed the earlier steps to install the Python dependencies, then you're all set to start visualizing some dependency parse trees!
This section describes how to use deps_to_graph.py
:
$ ./deps_to_graph.py -h
usage: deps_to_graph.py [-h] [-i INPUT] [-u] [-o OUTPUT] [-k KEY] [-a API_URL]
[-l LANGUAGE] [-b]
Render Rosette API dependency parse trees as SVG via Graphviz
optional arguments:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Path to a file containing input data (if not specified
data is read from stdin) (default: None)
-u, --content-uri Specify that the input is a URI (otherwise load text
from file) (default: False)
-o OUTPUT, --output OUTPUT
Path to a file where SVG will be written (if not
specified data is written to stdout) (default: None)
-k KEY, --key KEY Rosette API Key (default: None)
-a API_URL, --api-url API_URL
Alternative Rosette API URL (default:
https://api.rosette.com/rest/v1/)
-l LANGUAGE, --language LANGUAGE
A three-letter (ISO 639-2 T) code that will override
automatic language detection (default: None)
-b, --label-indices Add token index labels to show the original token
order; this can help in reading the trees, but it adds
visual clutter (default: False)
The script takes input, makes a request to the Rosette API on your behalf, and converts the resulting JSON into an SVG image. The SVG data is written to /dev/stdout
by default, or to the given file path specified with the -o/--output
option.
Note: If you want to circumvent supplying your Rosette API key with -k/--key
(or typing it at the prompt) on every execution of the script, you can set a ROSETTE_USER_KEY
enviroment variable:
$ echo "export ROSETTE_USER_KEY=<your-user-key>" >> ~/.bash_profile
$ . ~/.bash_profile
deps_to_graph.py
can read input from /dev/stdin
, a file path, or a URI.
$ echo "Let's make a graph" | ./deps_to_graph.py > graph.svg
You can look at the resulting graph.svg
by opening it in your web browser.
It will look like this:
# create a sample text file to read from
$ echo "The rules governing the hierarchical structure of a sentence are called “syntax” in linguistics. Knowing the structure of a sentence helps to understand how the words in a sentence are related to each other." > data.txt
# create an SVG visualization using the content of the text file as input
$ ./deps_to_graph.py -i data.txt > data.svg
The results will look like this:
Another option is to rely on Rosette API to extract the textual content of a web page and parse it. You can do this by giving a URI with the -i/--input
option and additionally specifying the -u/--content-uri
option to indicate that the input is a URI:
$ ./deps_to_graphy.py -u -i 'www.your-favorite-site.com' > your-favorite-site.svg
One additional convenience option is the -b/--label-indices
option. This labels each node in the parse tree with an index. This can help with reading the content from the graph as it shows the original order of the word tokens. This is left off by default as it adds visual clutter, but it can be a useful option:
$ ./deps_to_graph.py -b -i data.txt > labeled-data.svg
With token index labels, the trees looks like this: