Attempt to get element -1 from stack of size -1
olegrok opened this issue · 8 comments
I have a problem that happens only under load.
Problem occurs only on MacOS (seems GC64 is enabled).
Avro-schema version: 3.0.3
Tarantool version: 2.2.1
Also we use only "validate" method (there are no flatten, unflatten, etc).
The problem appears when I load some big amount of data to Tarantool more 2GB.
[string "avro.utils.fstack"]:30: Attempt to get element -1 from stack of size -1
stack traceback:
[string "avro.utils.fstack"]:30: in function 'get'
...ects/tdg/.rocks/share/tarantool/avro_schema/frontend.lua:946: in function 'copy_data_eh'
...ects/tdg/.rocks/share/tarantool/avro_schema/frontend.lua:965: in function 'validate'
my object:
{
"id": 1,
"value": 1,
"body": "tdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdg"
}
schema:
{
"type": "record",
"name": "TestObject",
"logicalType": "Aggregate",
"fields": [
{"name": "id", "type": "long"},
{"name": "value", "type": "int"},
{"name": "body", "type": "string*"}
]
}
Problem does not appear if jit.off() is called
Can you share a reproducer?
It happens inside my application. Avro schema code that I extracted does not reproduce it.
I think jit traces are broken inside my application and root of problem could be inside another place
I propose to work on a reproducer (at least via avro-schema, at max reduce it to just Lua code) during some fixed time (say, two working days) and:
- If it'll succeed, file an issue against tarantool/tarantool or tarantool/luajit regarding GC 64 / Mac OS.
- If it'll fail, close this issue (or what else we can do?).
This logic looks like:
local fiber = require('fiber')
local json = require('json')
local avro_schema = require('avro_schema')
local json_schema = [[
[
{
"type": "record",
"name": "TestObject",
"fields": [
{"name": "id", "type": "long"},
{"name": "value", "type": "int"},
{"name": "body", "type": "string*"}
]
}
]
]]
local schema = json.decode(json_schema)
local ok, handle = avro_schema.create(schema)
assert(ok, handle)
local object = {
TestObject = {
id = 1,
value = 1,
body = "tdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdgtdg",
}
}
local function validate_object(obj)
local ok, err = avro_schema.validate(handle, obj)
assert(ok, err)
end
validate_object(object)
box.cfg{memtx_memory = 4 * 2^30}
local space = box.schema.space.create('test_space', {if_not_exists = true})
space:create_index('pk', {if_not_exists = true})
local function insert_object(obj)
space:replace({obj.id, obj.value, obj.body})
end
local worker_count = 1e3
for i = 1, worker_count do
local obj = table.deepcopy(object)
obj['TestObject']['id'] = i
fiber.new(function()
while true do
validate_object(obj)
insert_object(obj['TestObject'])
obj['TestObject']['id'] = obj['TestObject']['id'] + worker_count
end
end)
end
But seems it doesn't reproduce a problem.
Can you share a reproducer based on your application (it is okay to do so privately; preferably via an issue in the application repository)?
I haven't faced this issue for long. As I remember it was perftest for TDG1. I'm not sure I'm able to reproduce it again but probably I should try to do it.
Also feel free to close this issue. I'm not sure it's avro-schema issue it looks like luajit bug. The most awful that I don't have isolated testcase.
With non-isolated test case we at least can make a guess about similarity to othre known problems and try to bisect on tarantool and/or luajit commits.