Slice visitor's `visitString` method doesn't handle sentinel-terminated slices
Closed this issue · 3 comments
Description
Deserializing into a sentinel-terminated slice from a string does not work (a compiler error is raised).
The problem is because the slice visitor's visitString
method always returns a non-terminated u8
slice (e.g., []u8
or []const u8
, I forget). Those slices can't cast into a a sentinel-terminated one, hence the error.
How to Reproduce the Bug
const std = @import("std");
const json = @import("json");
pub fn main() !void {
_ = try json.fromSlice(std.heap.page_allocator, [:0]u8, "\"foobar\"");
}
Additional Context
No response
So the case is similar to https://github.com/ziglang/zig/blob/master/lib/std/json.zig#L1689-L1693 however getty splits that code into 2 domains:
Deserializer.deserializeString()
which PRE-allocates memory, doesn't know (or does it?) the type of slice, and callsvisitor.visitString()
visitor.visitString()
which knows the type (Value
const field) and takes pre-allocated memory
The issue is that the 2nd would lack 1 byte for sentinel when a sentinel slice is returned conditionally.
I have not found a way to get info about sentinel in the 1st layer. If it was easily possible then it could conditionally call either allocator.dupe()
or allocator.dupeZ()
.
One solution might be to delegate duplicataction and/or unescaping string inside of the Visitor instead of its caller.
Okay, I've decided to try shifting the allocation responsibility to the visitor. That would:
- Let us fix this problem since the visitor would be able to tell whether the final type is sentinel-terminated and thus allocate appropriately.
- Make things a bit simpler for deserializer methods (I think). They wouldn't need to allocate anything. But if they do (e.g., intermediate values), just always deallocate it before returning.
Now, Getty actually worked like that before. But I switched from it for reasons I can't seem to remember... I think it had to do with deserializing aggregate types (e.g., sequences, maps). However, I can't think of any reasons why this shouldn't work now though, so I'll give it a try and see what happens.
edit: I remember one reason: performance. The visitor has to copy the string to its allocated buffer, even though the string might already be allocated (say, for strings that needs escaping). Moving escaping into the visitor wouldn't work though, since they're supposed to be data format-agnostic and data formats each have their own escaping rules.
edit: I remember one reason: performance.
We can avoid this pretty easily with a deserializer-defined DB, so now I really can't think of any reason to not use this new allocation model. I really do need to document it somewhere though...