napari-chatgpt

Home of Omega, a napari-aware autonomous LLM-based agent specialised in image processing and analysis.

A napari plugin that levegares OpenAI's Large Language Model ChatGPT to implement Omega a napari-aware agent capable of performing image processing and analysis tasks in a conversational manner.

This repository was created as a 'week-end project' by Loic A. Royer who leads a research group at the Chan Zuckerberg Biohub. It levegages OpenAI's ChatGPT API via the LangChain Python library, as well as napari, a fast, interactive, multi-dimensional image viewer for Python, another of Loic's week-end projects.

What is Omega?

Omega is a LLM-based and tool-armed autonomous agent that demonstrates the potential for Large Language Models (LLMs) to be applied to image processing, analysis and visualisation. Can LLM-based agents write image processing code and napari widgets, correct its coding mistakes, perform follow-up analysis, and control the napari viewer? The answer appears to be yes.

In this video I ask Omega to segment an image using the SLIC algorithm. It makes a first attempt using the implementation in scikit-image, but fails because of an inexistant 'multichannel' parameter. Realising that, Omega tries again, and this time, succeeds:

1_RGBImageSlicSegmentationSyntaxErrorCorrection_evenfaster.mp4

After loading in napari a sample 3D image of cell nuclei, I ask Omega to segment the nuclei using the Otsu method. My first request was very vague, so it just segmented foreground versus background. I then ask to segment the foreground into distinct segments for each connected component. Omega does a rookie mistake by forgetting to 'import np'. No problem, it notices, tries again, and succeeds:

2_Cells3DOtsuThenLabelsRecoversFromError_evenfaster.mp4

As LLMs continue to improve, Omega will become even more adept at handling complex image processing and analysis tasks. The current version of ChatGPT, 3.5, has a cutoff date of 2021, which means that it lacks nearly two years of knowledge on the napari API and usage, as well as the latest versions of popular libraries like scikit-image, OpenCV, numpy, scipy, etc... Despite this, you can see in the videos below that it is quite capable. While ChatGPT 4.0 is a significant upgrade, it is not yet widely available.

Omega could eventually help non-experts process and analyse images, especially in the bioimage domain. It is also potentially valuable for educative purposes as it could assist in teaching image processing and analysis, making it more accessible. Although ChatGPT, which powers Omega, may not be yet on par with an expert image analyst or computer vision expert, it is just a matter of time...

Omega holds a conversation with the user and uses the following tools to acheive answer questions, download and operate on images, write widgets for napari, and more:

napari related tools:

napari viewer control: Gives Omega the ability to control all aspects of the napari viewer.
napari query: Gives Omega the ability to query information about the state of the viewer, of its layers, and their contents.
napari widget maker: Gives Omega the ability to make napari functional widgets that take layers as input and return a new layer.

cell segmentation tools:

cell and nuclei segmentation: This tool specialises in segmenting cells and nuclei in images using some predefined segmentation algorithms. Right now only cellpose is implemented.

Generic python installation queries:

python function signature query: Lets Omega query the signature of function when it is unsure how to call a function and what the names and type of the parameters are.

web search related tools:

web search: Usefull to give Omega access to the knowledge accessible through the web
web image serach: Streamlined path to search the web for images and open them in napari
wikipedia search: Gives Omega access to the whole wikipedia

Installation from within napari:

You can install napari-chatgpt directly from within napari in the Plugins>Install/Uninstall Plugins menu. (Please note that the Omega agent will hapilly install packages in the corresponding environment).

IMPORTANT NOTE: Makre sure you have a recent version of napari! Ideally the latest one!

Installation in an new conda environment (RECOMMENDED):

Make sure you have an anaocnda/miniconda installation on your system. Ask ChatGPT what is that all about if you are unsure ;-)

Create environment:

conda create -y -n napari-chatgpt -c conda-forge python=3.9

Activate environment:

conda activate napari-chatgpt

Install napari in the environment using conda-forge: (important on Apple M1/M2)

conda install -c conda-forge napari

Install the repo in enbvironment:

pip install napari-chatgpt

Installation variations:

To install latest development version :

git clone https://github.com/royerlab/napari-chatgpt.git
cd napari-chatgpt
pip install -e .

or:

pip install git+https://github.com/royerlab/napari-chatgpt.git

Requirements:

You need an OpenAI key, there is no way around this, unless we add some other, potentially local LLMs compatible to LangChain (llama.cpp and similar models come to mind). However, this will likely be at the cost of cognitive performance, which I am not sure is worth it at this point. Please prove me wrong. You can get your OpenAI key by signing up here. Developing Omega cost me $13.97, hardly a fortune. OpenAI pricing on ChatGPT 3.5 is very reasonable at 0.002 dollars per 1K tokens, which means $2 per 750000 words. A bargain. Now, ChatGPT 4.0 is about 10x more expensive... But that could eventually drop, hopefully.

Note: you can limit the burn-rate to a certain amount of dollars per month, just in case you let Omega thinking over the week end and forget to stop it (don't worry, this is actually not possible).

Usage:

Once all is installed, and if it is not already running, start napari:

napari

You can then the Omega napari plugin via the plugins menu:

You just opened the plugin as a widget, you now need to actually start Omega:

If you have not set the 'OPENAI_API_KEY' environment variable as is typicall done, Omega will ask you for your OpenAI API key, and will store it safely in an encrypted way on your machine (~/.omega_api_keys/OpenAI.json):

Just enter an encryption/decription key, your OpenAI key, and everytime you start Omega it will just ask for the decryption key:

(The idea is that you might not be able to remember your openAI key by heart, but you might be able to do so with your own password or passphrase)

You can then direct your browser to: http://0.0.0.0:9000/ and start having an hopefully nice chat with Omega.

Example prompts:

Here are example prompts/questions/requests to try:

What is your name?
What tools do you have available?
Make me a Gaussian blur widget with sigma parameter
Open this tiff file in napari: https://people.math.sc.edu/Burkardt/data/tif/at3_1m4_03.tif
Make a widget that applies the transformation: y = x^alpha + y^beta with alpha and beta two parameters.
Create a widget to multiply two images
Can you open in napari a photo of Albert Einstein?
Downscale by a factor 3x the image on layer named 'img'
Rename selected layer to 'downscaled_image'
Upscale image 'downscaled_image' by a factor 3 using some smart interpolation scheme of your choice (not nearest-neighboor)
Caveat: makes a plugin instead of actually doing teh job
How many channels has the image on layer 0
Make a image sharpening filter widget, expose relevant parameters
Can you open this file in napari: https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr
Split the two channels of the first layer (first axis) into two separate layers
Switch viewer to 3d
Create a napari widget for a function that takes two image layers and returns a 3D image stack of n images where each 2D image corresponds to a linear blending of the two layer images between 0 and 1.
[Loaded the ‘cell’ sample image] there is one cell in the image on the first layer, it is roughly circular and brighter than its surroundings, ca you write segmentation code that returns a labels layer for it?
Can you create a widget to blend two images?
Can you tell me more about the 'guided Canny edge filter' ?
Write a configurable RGB to grayscale widget, ensure weights sum to 1

Video Demos:

Not everyone will want, or can, get an API key for the latest and best LLM models, so here are videos showcasing what's possible. You will notice that Omega sometimes fails on its first attempt, typically because of mistaken parameters for functions, or other syntax errors. But it also often recovers by having access to the error message, and reasoning its way to the right piece of code. This is what ChatGPT 3.5 can do, imagine what will be possible with 4.0 and future more capable models...

In this first video, I ask Omega to make a napari widget to convert images from RGB to grayscale:

1.2_MozartConvertToGrayscale_good_evenfaster.mp4

Of course Omega is capable of holding a conversation, it sort of knows 'who it is', can search the web and wikipedia. Eventually I imagine it could leverage the ability to search for improving its responses, and I have seen doing it a few times:

1.7_SayHelloTellMe_okish_evenfaster.mp4

Following-up from the previous video, I ask Omega to create a new labels layer containing just the largest segment. The script that Omega writes as another rookie mistake: it confuses layers and images. The error message then confuses Omega into thinking that it got the name of the layer wrong, setting it off in a quest to find the name of the labels layer. It succeeds at writting code that searches for the labels layer, and uses that name to write a script that then does extracts te largest segment into its own layer. Not bad:

3_Cells3DOtsuThenLabelsSelectLargestSegment_evenfaster.mp4

In this video, I ask Omega to write a 'segmentation widget'. Pretty unspecific. The answer is a vanilla yet effective widget that uses the Otsu approach to threshold the image and then finds the connected components. Note that when you ask Omega to make a widget, it won't know of any runtime issues with the code because it is not running the code itself, yet. It can tell if there is a syntax problem though... Nevertheless, the widget ends up working just fine:

4_CellsSegment_evenfaster.mp4

Now it gets more interesting. Following up on the previous video, can we ask Omega to do some follow- up analysis on the segments themselves? I ask Omega to list the 10 largest segments and compute their areas and centroids. No problem:

5_CellsListSegmentsByAreaTrimmed_evenfaster.mp4

Note: You could even ask for it in markdown format, which would look better (not shown here).

Next I ask Omega to make a widget that lets me filter segments by area. And it works beautifully. Arguably it is not rocket science, but the thought-to-widget time ratio must be in the hundreds when comparing Omega to an average user trying to write their own widget:

6_CellsSegmentsFilterByArea_evenfaster.mp4

This is an example of a failed widget. I ask for a widget that can do dilations and erosions. The widget is created but is 'broken' because Omega made the mistake of using floats for the number of dilations and erosions: (In the next video I tell Omega to fix it)

7_CellsErosionDilationFirstPart_evenfaster.mp4

Following up from previous video, I explain that I want the two parameters (number erosions and dilations) to be integers. Notice that I exploit the conversational nature of the agent by assuming that it remembers what the widget is about:

8_CellsErosionDilationSecondPartFixed_evenfaster.mp4

This video demos a specialised 'cell and nuclei segmentation tool' which leverages cellpose 2.0 to segment cell cytoplasms or nuclei. In general, we can't assume that LLMs know about every single image processing library, especially for specific domains. So it can be a good strategy to provide such specialised tools. After Omega successfully segments the nuclei, I ask from it to count the nuclei. Answer: 340. Notice that the code generated 'searches' the layer with name 'segmented' with a loop. Cute:

9_CellsSegmentCellPoseCount_evenfaster.mov

Enough with cells. Aparently The 'memory' of ChatGPT is filled with unescessary information, it knows the url of Albert Einstein's photo on wikipedia, and combined with the 'napari file open' tool it can therefore open that photo in napari:

10_EinsteinPhoto_success_evenfaster.mp4

You can ask for rather incongruous widgets, widgets you would probably never write because you just need them once or something. Here I ask for a widget that applies a rather odd non-linear transformation to each pixel. The result is predictably boring, but it works, and I don't think that the answer was 'copy pasted' from somewhere else...

10.5_EinsteinPixelFormula_evenfaster.mp4

In this one, starting again from our beloved Albert, I ask to rename that layer to 'Einstein' which looks better than just 'array'. Then I ask Omega to apply a Canny edge filter. Predictably it uses scikit-image:

11_EinsteinPhotoCannyEdge_evenfaster.mp4

Then I ask for a 'Canny edge detection widget'. It happily makes the widget and offers relevant parameters:

11.5_EinsteinPhotoCannyEdgeWidget_good_evenfaster.mp4

Following up on previous video, I play with dilations on the edge image. Omega has some trouble when I ask to 'do it again'. Fine, sometimes you have a bit more explicit:

12_EinsteinPhotoCannyEdgeMorphological_good_evenfaster.mp4

You can also experiment with more classic 'numpy' code by creating and manipulating arrays and visualising the output live:

12.5_3DArrayFormulaProject_evenfaster.mp4

This video demonstrates that Omega understand many aspects of the napari viewer API. It can switch viewing modes, translate layers, etc... :

13_NapariViewerControl_evenfaster.mp4

I never thought this one would work: I ask Omega to open in napari a mp4 video from a URL and then use OpenCV to detect people. It does it. But the one thing that Omega does not know is that creating a layer for each frame of the video is not a practical approach. Not clear what happened to the colors though. Probably an RGB ordering or format issue:

15_LoadMP4VideoFromURLOpenCVPeopleDetection_evenfaster.mp4

Disclaimer:

Do not use this software lightly, it will download libaries by its own volition, write any code that it deems nescessary, it might actually do what you ask, even if it is a bad idea. Also, beware that it might misundertand what you ask and then do something bad. For example, it is unwise to use Omega to delete 'some' files from your system, it might end up deleteing more than that if you are unclear in your request.
To be 100% safe, we recommend that you use this software from within a sandboxed virtual machine.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Contributing

Contributions are extremely welcome. Tests can be run with tox, please ensure the coverage at least stays the same before you submit a pull request.

License

Distributed under the terms of the BSD-3 license, "napari-chatgpt" is free and open source software

Issues

If you encounter any problems, please file an issue along with a detailed description.

lbgbox/napari-chatgpt