plotting in jupyter without using python
marcrasi opened this issue · 19 comments
Currently, you can make inline plots in Jupyter by asking swiftplot for a base64 encoded png and then passing that to display(base64EncodedPNG:)
. But display(base64EncodedPNG:)
is implemented using Python and it would be nice to do it using native Swift.
KernelCommunicator.swift
is the thing that sends display messages from Swift to Jupyter. The file has a lot of comments explaining what the things do.
Here's a simple example of code using KernelCommunicator.swift
:
KernelCommunicator.handleParentMessage { (m: KernelCommunicator.ParentMessage) in
// This print statement should execute before each cell starts executing, and it'll print out the jupyter "parent message".
print(m)
}
KernelCommunicator.afterSuccessfulExecution {
// This print statement should execute after each cell executes.
print("Cell executed")
// Messages returned in this array will get sent to jupyter for displaying.
// So you'll probably want to collect plots into some kind of global, and then
// return them whenever this callback gets called.
return []
}
(I haven't actually run this, so there may be some typos.)
These "messages" that you return are described here: https://jupyter-client.readthedocs.io/en/stable/messaging.html Specifically the display_data
messages.
The reason you need a "handleParentMessage" is that each message includes a field specifying what the parent message is. So you'll need to keep track of that (in some global probably) and use it when creating the messages.
Jupyter does message signatures, and the keys necessary for that should be in KernelCommunicator.jupyterSession
. Though there is some flag when you run jupyter notebook
that turns off signature verification so that you don't have to implement this immediately.
This project contains some Swift code that constructs and signs jupyter messages: https://github.com/KelvinJin/iSwift . You might be able to borrow some of it.
One thing that might be useful would be to add some print statements in the python display implementation consumeDisplayMessages
to see what the messages are supposed to look like.
@marcrasi
when I try to print displayMessages
in consumeDisplayMessages
I get
[__lldb_expr_1.KernelCommunicator.JupyterDisplayMessage(parts: [__lldb_expr_1.KernelCommunicator.BytesReference, __lldb_expr_1.KernelCommunicator.BytesReference, __lldb_expr_1.KernelCommunicator.BytesReference, __lldb_expr_1.KernelCommunicator.BytesReference, __lldb_expr_1.KernelCommunicator.BytesReference, __lldb_expr_1.KernelCommunicator.BytesReference, __lldb_expr_1.KernelCommunicator.BytesReference])]
I don't understand this. Also I tried looking up the messages part of this
let displayMessages = IPythonDisplay.socket.messages.map {
KernelCommunicator.JupyterDisplayMessage(parts: $0.map { bytes($0) })
}
here: https://www.tensorflow.org/swift/api_docs/Structs/PythonObject
But I couldn't find anything.
Is the purpose of EnableIPythonDisplay to just display images?
Are we looking to just shift the decoding of the image to Swift or the whole display portion to Swift?
Is it so that the message will have the data for the image we need to display, and we need to create that message once our display function is called?
Meanwhile I'll start implementing scatter plot.
when I try to print displayMessages in consumeDisplayMessages I get
[__lldb_expr_1.KernelCommunicator.JupyterDisplayMessage(parts: [__lldb_expr_1.KernelCommunicator.BytesReference, __lldb_expr_1.KernelCommunicator.BytesReference, __lldb_expr_1.KernelCommunicator.BytesReference, __lldb_expr_1.KernelCommunicator.BytesReference, __lldb_expr_1.KernelCommunicator.BytesReference, __lldb_expr_1.KernelCommunicator.BytesReference, __lldb_expr_1.KernelCommunicator.BytesReference])]
If you add a custom description to the BytesReference (in KernelCommunicator.swift), it should print out something more informative:
public var description: String {
(bytes + [0]).withUnsafeBufferPointer { ptr in
String(cString: ptr.baseAddress!)
}
}
Also I tried looking up the messages part of this
let displayMessages = IPythonDisplay.socket.messages.map {
KernelCommunicator.JupyterDisplayMessage(parts: $0.map { bytes($0) })
}
here: https://www.tensorflow.org/swift/api_docs/Structs/PythonObject
But I couldn't find anything.
PythonObject is a magical thing using dynamicMemberLookup that lets you access fields on the underlying python object that it refers to. In this case, the "socket" gets constructed here: https://github.com/google/swift-jupyter/blob/77aae916c14d0ce8daf0bd09554b77641c684652/EnableIPythonDisplay.swift#L64 which is a call to this python function: https://github.com/google/swift-jupyter/blob/77aae916c14d0ce8daf0bd09554b77641c684652/swift_shell/__init__.py#L46 . So it's an instance of the CapturingSocket and the messages
field is just the array of messages that got appended to it.
Is the purpose of EnableIPythonDisplay to just display images?
The purpose of it is to register a handler for jupyter messages coming from IPython. When you ask IPython to display an image, it generates a jupyter message. If you have run EnableIPythonDisplay, then EnableIPythonDisplay handles the message by forwarding it to the jupyter kernel, which causes the image to get displayed.
Are we looking to just shift the decoding of the image to Swift or the whole display portion to Swift?
Shift everything to Swift. Specifically, there could be some "EnableSwiftDisplay.swift" file that is like "EnableIPythonDisplay.swift" except that it never uses python. When you run it, it defines some methods (like display(base64EncodedPNG: String)) that cause messages to get forwarded out to the kernel.
@marcrasi sorry for the late reply. I'll got hrough the links above, and try to get a better understanding of what's going on and what I need to do this weekend. I think I might need a little bit more time to implement this.
@marcrasi the description was as follows:
display_data
<IDS|MSG>
4326a28f31796adb360892e9c8f3768018f267d9dd9c83ad1704d2b7d2fb5955
{"version":"5.3","date":"2019-06-16T10:49:34.713911Z","session":"02b0c143-470869163c6607700f0d5e72","username":"karthik","msg_type":"display_data","msg_id":"e9136fe3-a6ebe33878d97aec10c2c781"}
{}
{}
{"data":{"image/png":"\n","text/plain":"<IPython.core.display.Image object>"},"metadata":{},"transient":{}}
What is this? 4326a28f31796adb360892e9c8f3768018f267d9dd9c83ad1704d2b7d2fb5955
Is this HMAC signature? How can I get this?
Are the empty braces referring to parent header and metadata?
And what is the "text/plain" tag in the data part?
My understanding is that I need to create the above message with the relevant data and create a JupyterDisplayMessage from its ByteReference. But I still have a doubt.
What exactly is a parent message and parent header? Where can I get it from?
@marcrasi I tried to implement this. Here's a gist: https://gist.github.com/KarthikRIyer/064d90f4895df5f0592de48e9259b152
But this does not work. I am also unable to print and debug.
I get this in the terminal
[E 23:15:41.122 NotebookApp] Uncaught exception in zmqstream callback
Traceback (most recent call last):
File "/home/karthik/swift-jupyter/venv/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 456, in _handle_events
self._handle_recv()
File "/home/karthik/swift-jupyter/venv/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 486, in _handle_recv
self._run_callback(callback, msg)
File "/home/karthik/swift-jupyter/venv/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 438, in _run_callback
callback(*args, **kwargs)
File "/home/karthik/swift-jupyter/venv/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 203, in <lambda>
self.on_recv(lambda msg: callback(self, msg), copy=copy)
File "/home/karthik/swift-jupyter/venv/lib/python3.6/site-packages/notebook/services/kernels/handlers.py", line 313, in _on_zmq_reply
idents, fed_msg_list = self.session.feed_identities(msg_list)
File "/home/karthik/swift-jupyter/venv/lib/python3.6/site-packages/jupyter_client/session.py", line 844, in feed_identities
idx = msg_list.index(DELIM)
ValueError: b'<IDS|MSG>' is not in list
ERROR:asyncio:Exception in callback BaseAsyncIOLoop._handle_events(65, 1)
handle: <Handle BaseAsyncIOLoop._handle_events(65, 1)>
Traceback (most recent call last):
File "/usr/lib/python3.6/asyncio/events.py", line 145, in _run
self._callback(*self._args)
File "/home/karthik/swift-jupyter/venv/lib/python3.6/site-packages/tornado/platform/asyncio.py", line 138, in _handle_events
handler_func(fileobj, events)
File "/home/karthik/swift-jupyter/venv/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 456, in _handle_events
self._handle_recv()
File "/home/karthik/swift-jupyter/venv/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 486, in _handle_recv
self._run_callback(callback, msg)
File "/home/karthik/swift-jupyter/venv/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 438, in _run_callback
callback(*args, **kwargs)
File "/home/karthik/swift-jupyter/venv/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 203, in <lambda>
self.on_recv(lambda msg: callback(self, msg), copy=copy)
File "/home/karthik/swift-jupyter/venv/lib/python3.6/site-packages/notebook/services/kernels/handlers.py", line 313, in _on_zmq_reply
idents, fed_msg_list = self.session.feed_identities(msg_list)
File "/home/karthik/swift-jupyter/venv/lib/python3.6/site-packages/jupyter_client/session.py", line 844, in feed_identities
idx = msg_list.index(DELIM)
ValueError: b'<IDS|MSG>' is not in list
What am I doing wrong? Am I missing something?
@marcrasi any ideas regarding this?
Nice start! I experimented with it a bit and made a few changes to get it working: https://gist.github.com/marcrasi/3e11704c135481f21d2abd73c4e32b16
That works for me, if I run jupyter with jupyter notebook --Session.key='b""'
, to disable message signing.
It still needs some cleanup. For example, I used Python to do some JSON decoding, which defeats the purpose of writing a pure-Swift implementation :p. So that should be switched to use some Swift thing.
The changes I made to fix it were:
- I needed to
.dropLast()
on theutf8CString
to remove the null terminator. I discovered this by addingself.log.error(messages)
here, which showed me that the messages being forwarded to the jupyter client had extra null terminators in them. - I set the parent header in the messages to the parent header that we receive from jupyter.
- I made the
Message
optional so that it doesn't try to send empty messages to jupyter. The empty messages were confusing it and causing errors.
@marcrasi thanks a lot for fixing it! I've converted the python JSON part to Swift:
https://gist.github.com/KarthikRIyer/064d90f4895df5f0592de48e9259b152
Now for the message signing I think the relevant part I should be looking at is:
After the delimiter is the HMAC signature of the message, used for authentication. If authentication is disabled, this should be an empty string. By default, the hashing function used for computing these signatures is sha256.
The signature is the HMAC hex digest of the concatenation of:
A shared key (typically the key field of a connection file)
The serialized header dict
The serialized parent header dict
The serialized metadata dict
The serialized content dict
In Python, this is implemented via:
# once:
digester = HMAC(key, digestmod=hashlib.sha256)
# for each message
d = digester.copy()
for serialized_dict in (header, parent, metadata, content):
d.update(serialized_dict)
signature = d.hexdigest()
stated here: https://jupyter-client.readthedocs.io/en/stable/messaging.html
For SHA256, If we want to include CommonCrypto, we'll need a bridging header. How can I do that here?
https://stackoverflow.com/a/38788437/9126612
I was looking at this way to use sha256.
Bridging headers are header files that let you import Objective-C code into Swift.
Here are pics for Importing obj-c in swift:
https://developer.apple.com/documentation/swift/imported_c_and_objective-c_apis/importing_objective-c_into_swift
I'll have to look if we can lift the code from somewhere.
So I've used CommonCrypto before with Swift, and here's a sample bridging header for it:
#ifndef BridgingHeader_h
#define BridgingHeader_h
#import <CommonCrypto/CommonCrypto.h>
#endif /* BridgingHeader_h */
There's also a modulemap that can be used to pull this into a framework.
However, CommonCrypto isn't available on non-Mac platforms, so I don't think that will help here. There's IBM's BlueCryptor: https://github.com/IBM-Swift/BlueCryptor , but I don't know how heavy of a dependency you want to introduce for doing this.
@BradLarson I'm trying out BlueCryptor. It doesn't compile with the S4TF toolchain. It comiles with the original Swift release.
I was trying out the example that they gave in the docs:
import Cryptor
let myKeyData = "0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b"
let myData = "4869205468657265"
let key = CryptoUtils.byteArray(fromHex: myKeyData)
let data : [UInt8] = CryptoUtils.byteArray(fromHex: myData)
let hmac = HMAC(using: HMAC.Algorithm.sha256, key: key).update(byteArray: data)?.final()
print("\(hmac)")
if let string = String(bytes: hmac!, encoding: .utf8) {
print(string)
} else {
print("not a valid UTF-8 sequence")
}
I added a few lines to convert the hmac into a String. This says that the result is not a valid UTF-8 sequence. I printed out the hmac variable and got this:
Optional([176, 52, 76, 97, 216, 219, 56, 83, 92, 168, 175, 206, 175, 11, 241, 43, 136, 29, 194, 0, 201, 131, 61, 167, 38, 233, 55, 108, 46, 50, 207, 247])
I also used the same key and data in an online tool to get hmac using sha256 and the bytes I got there were completely different. I think sha256 hmac is supposed to give 64 bytes output. BlueCryptor gave me just 32 bytes and the online tool gave 64 bytes.
I asked on IBM Swift's Slack. String is hex so I had to use another function fromtheir lib to get a String.
But I cant use the official swift release with swift-jupyter beacue it doesnt have python 3.6 support. I'm able to compile Cryptor using the latest S4TF nightly build but I don't know why it too doesn't have python 3.6.
So now I'm trying to build it myself.
@marcrasi I think using the library @BradLarson suggested is a good option. I got the signing to work using it. But the current nightly build of S4TF doesn't have python3.6 so we can't use it with swift-jupyter. I had to build it myself with python3. The swift version that Google Colab is using right now is not able to compile the BlueCryptor library.
But the nightly build with python3 should work just fine.
Here is the new code: https://gist.github.com/KarthikRIyer/064d90f4895df5f0592de48e9259b152
If this is ok I'll open a PR.
Cool!
I don't understand what the problem with python3.6 and the S4TF nightly is. The tests that we run on the nightly builds run swift-jupyter on a system with python3.6, and they are passing. So it should work. Is it not working for you?
I have a lot of comments on the implementation, but a PR is a good place to do comments, so feel free to open one whenever :) Here's my initial round of comments:
- Let's name it
EnableJupyterDisplay.swift
-- in the future when Python interop has become obsolete, everyone will expect everything to be pure Swift, so "Swift" in the name will be redundant. - All the global variables and functions should be wrapped in an
enum JupyterDisplay
, to avoid polluting the global namespace. (LikeEnableIPythonDisplay.swift
does withenum IPythonDisplay
). - Instead of
message: Message?
, domessages: [Message]
so that there can be multiple images displayed from one cell. - Use a JSON library to construct the JSON instead of using string interpolation. This will be safer against unescaped characters, and it should make the code more readable.
hmacSignature
should be a computed property (var hmacSignature: String { <do some computation> return result }
instead of a stored property that you have to remember to update.messageParts
should also be a computed property.Message
should be a struct.- I'll go through and thoroughly comment on Swift style when there is a PR.
I don't understand what the problem with python3.6 and the S4TF nightly is. The tests that we run on the nightly builds run swift-jupyter on a system with python3.6, and they are passing. So it should work. Is it not working for you?
When I downloaded the nightly build no python 3.6 folder was present. I downloaded one yesterday.
I'll complete the above tasks and open a PR.
Is there a plan to cut off Python interop? Why so? Isn't it good to have interop with multiple languages?
Is there a plan to cut off Python interop? Why so? Isn't it good to have interop with multiple languages?
We never plan to remove it. But a long term goal of our project is to build up enough libraries in Swift that Python interop becomes unnecessary nearly all the time. (Having a plotting library in Swift is one part of that :D).
Oh ok. I understand.
@marcrasi PR opened