man-group/ArcticDB

Frequent log messages to run `list_symbols` on some of my libraries

Closed this issue · 10 comments

Describe the bug

How can I get rid of this log messaging?

See here for more information: https://docs.arcticdb.io/technical/on_disk_storage/#symbol-list-caching

To resolve, run `list_symbols` through to completion frequently.

Note: This warning will only appear once.

20240920 15:34:01.207517 3812 E arcticdb.root | E_ASSERTION_FAILURE Cannot write string symbol name, existing symbols are numeric

20240920 15:34:01.515011 3812 W arcticdb.symbol | Ignoring error while trying to compact the symbol list: E_ASSERTION_FAILURE Cannot write string symbol name, existing symbols are numeric

I already did what is suggested in the message running:

symbol_list = library.list_symbols()

on every read and write operation.

Also this log message did not appears once but on every read or write operation for me.

# My current code executes this on every read and write op
# Is this why the log appears multiple times?
ac = self.__init_db()
...
ac.get_library(lib, create_if_missing=False)

Is there some why to supress this log message?

Steps/Code to Reproduce

See here for more information: https://docs.arcticdb.io/technical/on_disk_storage/#symbol-list-caching

To resolve, run `list_symbols` through to completion frequently.

Note: This warning will only appear once.

20240920 15:34:01.207517 3812 E arcticdb.root | E_ASSERTION_FAILURE Cannot write string symbol name, existing symbols are numeric

20240920 15:34:01.515011 3812 W arcticdb.symbol | Ignoring error while trying to compact the symbol list: E_ASSERTION_FAILURE Cannot write string symbol name, existing symbols are numeric

Expected Results

Unncessary log messaging

OS, Python Version and ArcticDB Version

Python: 3.11.0 | packaged by conda-forge | (main, Oct 25 2022, 06:12:32) [MSC v.1929 64 bit (AMD64)]
OS: Windows-10-10.0.22621-SP0
ArcticDB: 4.5.0

Backend storage used

AWS S3

Additional Context

No response

Thanks for reporting this. The issue is, I think, explained by the E_ASSERTION_FAILURE Cannot write string symbol name, existing symbols are numeric.

We have a under-documented feature here, which is that you can use integers as symbol names, e.g.,

lib.write(9, df)
lib.read(9).data

However, you can't currently mix these with string symbol names as you'll get this issue.

If that is the problem here, then the work-around is to

  • create your symbols with all string names, lib.write(str(symbol), df).
  • delete any symbols with integer names.
  • I would also call lib.reload_symbol_list() to make sure that you have a consistent symbol-list

I terms of fixing this issue, I think there are two options.

Either

  • deprecate integer symbol name support and,
  • change symbol-list compaction so this error is thrown (not just a log message),

or,

  • fix symbol-list compaction to work for both ints and strings.

Thank you @jamesmunro, I will try out your suggested work-around

I've tested the issue a bit wider, and with the lmdb backend, list_symbols throws, so doesn't support any number of int named symbols.

import numpy as np
import arcticdb as adb
lib = adb.Arctic('lmdb://test1').create_library('test')
lib.write(9, np.array([1,2,3]))
lib.list_symbols()
InternalException                         Traceback (most recent call last)
[<ipython-input-19-c96ae63b6644>](https://localhost:8080/#) in <cell line: 5>()
      3 lib = adb.Arctic('lmdb://test1').create_library('test')
      4 lib.write(9, np.array([1,2,3]))
----> 5 lib.list_symbols()

1 frames
[/usr/local/lib/python3.10/dist-packages/arcticdb/version_store/library.py](https://localhost:8080/#) in list_symbols(self, snapshot_name, regex)
   1470             Symbols in the library.
   1471         """
-> 1472         return self._nvs.list_symbols(snapshot=snapshot_name, regex=regex)
   1473 
   1474     def has_symbol(self, symbol: str, as_of: Optional[AsOf] = None) -> bool:

[/usr/local/lib/python3.10/dist-packages/arcticdb/version_store/_store.py](https://localhost:8080/#) in list_symbols(self, all_symbols, snapshot, regex, prefix, use_symbol_list)
   2137                 log.warning("Cannot use symbol list with all_symbols=True as it only stores undeleted symbols")
   2138             use_symbol_list = False
-> 2139         return list(self.version_store.list_streams(snapshot, regex, prefix, use_symbol_list, all_symbols))
   2140 
   2141     def compact_symbol_list(self) -> int:

InternalException: std::bad_variant_access(std::get: wrong index for variant)

lib
Library(Arctic(config=S3(endpoint=s3_name_endpoint_name, bucket=my_bucket_name)), path=equities, storage=s3_storage)

Calling this:
ib.reload_symbol_list()
gave me:
arcticdb_ext.exceptions.InternalException: E_ASSERTION_FAILURE Read invalid serialized key

In my AWS S3 Console I went through all my symbols and I could not find any int or float as symbol names, however I have many names that are like "sp500_index", "sp500_index_monthly", euribor_1_month", "fed_funds_6_month_cont", "u.s._midwest_domestic_hot-rolled_coil_steel_commodity_future_cont_month_1"

But they are all strings. Though I did not explicitly called lib.write(str(symbol), df)
I assumed that since the names are represented as a string before the write operation, it would also be stored as a string in ArcticDB after the write operation.

Could maybe this caused the issue? u.s. in "u.s._midwest_domestic_hot-rolled_coil_steel_commodity_future_cont_month_1"

Hi @philsv. That error suggests a different issues.
arcticdb_ext.exceptions.InternalException: E_ASSERTION_FAILURE Read invalid serialized key

It is not recognizing the object in S3 as an ArcticDB object, the key is the first part in the object name. Have you separately written objects to the S3 bucket? That would explain an unrecognised object.

As a work around, I think at this point it's easiest to remake the library. create_library then copy over the symbols, for the latest versions that would be: lib.write(symbol, lib.read(symbol).data).

If you would like us to help you more with this issue then can you please send us a list of objects in the library?
e.g.

Find the full storage name of the library:

aws s3 ls 's3://<BUCKET>/<LIBRARY>'

then take the full library path printed and list all the items under /vref/, e.g.

aws s3 ls 's3://<BUCKET>/<LIBRARY>1727270002981265920/vref/'

You can send the results to arcticdb@man.com.

RE: lib.write(str(symbol), df)

If isinstance(symbol, str) then there is no need. Only if isinstance(symbol, int).

@jamesmunro I think what caused the error arcticdb_ext.exceptions.InternalException: E_ASSERTION_FAILURE Read invalid serialized key on my side was using lib.reload_symbol_list().

Just a moment ago I was recreating the library and this error poped up:

Traceback (most recent call last):
  File "c:\Users\user\anaconda3\envs\py11\Lib\site-packages\arcticdb\version_store\library.py", line 1070, in read
    return self._nvs.read(
           ^^^^^^^^^^^^^^^
  File "c:\Users\user\anaconda3\envs\py11\Lib\site-packages\arcticdb\version_store\_store.py", line 1725, in read
    read_result = self._read_dataframe(symbol, version_query, read_query, read_options)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "c:\Users\user\anaconda3\envs\py11\Lib\site-packages\arcticdb\version_store\_store.py", line 1799, in _read_dataframe
    return ReadResult(*self.version_store.read_dataframe_version(symbol, version_query, read_query, read_options))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
arcticdb_ext.storage.NoDataFoundException: When trying to read version 0 of symbol `usd_cash_crude_palm_oil_electronic_commodity_future_cont_1_cme`, failed to read key i:usd_cash_crude_palm_oil_electronic_commodity_future_cont_1_cme:0:0xde40806327158334@1702432447748890800[1274659200000000000,1366588800000000001]: Not found: Composite: i:usd_cash_crude_palm_oil_electronic_commodity_future_cont_1_cme:0:0xde40806327158334@1702432447748890800[1274659200000000000,1366588800000000001], 

In terminal:

20240925 16:43:33.846556 13032 W arcticdb.storage | Failed to find segment for key 'i:usd_cash_crude_palm_oil_electronic_commodity_future_cont_1_cme:0:0xde40806327158334@1702432447748890800[1274659200000000000,1366588800000000001]' : No response body.

I think that might have been the issue.
Fortunately this is just an old dataset we currently are not using. But I can't really tell what was the exact issue with the dataset.

For all the other symbols in the library, no issues.

Have you separately written objects to the S3 bucket?

If you mean loading the data in batches. No I did not. But I very frequently (weekly, daily) update the datasets.

Not found: Composite: i: is saying that you're missing an index object. That shouldn't be possible normally, as it's written by ArcticDB before the object that refers to it. This would suggest that it's been removed (and not by ArcticDB), or there is a bug here.

These errors your getting are all pointing to either missing or malformed objects in the S3 bucket. I think we would need to understand your environment and setup better to get to the bottom of this.

  • You're using AWS S3?
  • Are you using any kind of proxy on top of S3?
  • Are you reading and writing to the same bucket? - i.e. are you using replicated buckets?
  • Are you able to provide a script that replicates the errors?

You can also get a detailed log with ARCTICDB_AWS_LogLevel_int=6, see: https://docs.arcticdb.io/latest/runtime_config/#logging-configuration. I wouldn't post the result of that here as it may contain information you don't want to share.

Closing as no reply