MatplotLLM is a natural language layer over Matplotlib to visualize data. The main purpose is to accelerate building a certain your way of visualizing data points without meddling with innards of a tool like matplotlib. As of now, this is a system to be used from within Emacs/Org system. The motivation is coming from something I wrote in a blog on AI co-programming here.
You might be interested in reading my blog post on MatplotLLM too.
There are two descriptions that you need to provide, both in natural language. One that describes the data source. Second that describes how to plot. The first is a static text description, which you can change as needed between calls of course.
For second, you can provide an iterative description like in a conversation interface. You could start with a first pass description and then keep adding more specifications as feedback.
You can use these in an org-babel source block with language name of matplotllm
as shown in below example. There is an org mode divider -----
used to separate
data description and plot description. In current design this distinction might
look useless, but it might be helpful later on. In plot description, you add
empty lines to give iterative feedback. Every redraw shows the current code to
the LLM, provides feedback, and asks for new code.
You will have to set the value of matplotllm-openai-key
to use this first. We
call GPT4
as the backing LLM right now.
The example below is trying to reproduce—I haven’t done justice to it yet—the plot in my blog post on Learning Colemak-DH.
The data file to read is named `log.txt`. Here is how it looks like:
+ [2023-07-20 Thu] 97 WPM, acc 98%
Stopped daily tracking
+ [2023-05-16 Tue] 66 WPM, acc 91% | 66 WPM, acc 87%
+ [2023-05-15 Mon] 68 WPM, acc 89% | 65 WPM, acc 90% | 71 WPM, acc 93% | Colemak-DH as default.
+ [2023-05-14 Sun] 65 WPM, acc 92% | 62 WPM, acc 87% | 65 WPM, acc 91% | 70 WPM, acc 90%
Each line is for a day has WPM and accuracy entries from multiple tries in a day. Some line may have ill-structured text which you can ignore.
-----
Plot a minimal scatter plot that shows WPM plotted against dates. Use accuracy values as color of the scatter plot, darker (blue purple) is more accurate.
Clear up the axis and only show faint grid lines, and show dates where ticks have months written in them without crowding them too much.
Annotate the first and last point with actual WPM value.
Few development notes:
- The iterative feedback could be provided in a better shell like way rather than in a babel source block.
- Many times the generated code fails. This happens silently right now and needs better visibility. Though I might want to keep code out of sight and prefer things like heuristic retries over actually letting user debug. My target user—not in the Emacs version—is someone who doesn’t want to use matplotlib manually.
- Features like saving the plot in high quality, allowing stylistic templates in system prompts, etc. would come in later versions.
- Plot description usually needs a lot of iterations to get to what you want and also just to keep redraws consistent. This is obvious as writing specifications of anything brings out many hidden unshared models between the parties involved. Data description, on the other hand, works well once specified.
LLM Assisted data visualization system
Copyright (c) 2023 Abhinav Tushar and contributors
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.