Field limit of 131072 characters?
Closed this issue · 3 comments
I am getting _csv.Error: field larger than field limit (131072)
when validating a CLDF dataset.
Sure, I am storing large strings (whole book chapters in mixed markdown/HTML) in a single CSV cell.
If a ChapterTable in a CLDF dataset is not the proper way to store such data, what is?
If that is the proper way and this is not supposed to happen:
import sys
csv.field_size_limit(sys.maxsize)
The cldf
command has an option to enlarge field size limit, see
$ cldf -h
usage: cldf [-h] [--log-level LOG_LEVEL] [-z FIELD_SIZE_LIMIT] COMMAND ...
optional arguments:
-h, --help show this help message and exit
--log-level LOG_LEVEL
log level [ERROR|WARN|INFO|DEBUG] (default: 20)
-z FIELD_SIZE_LIMIT, --maxfieldsize FIELD_SIZE_LIMIT
Maximum length of a single field in any input CSV
file. (default: 131072)
I thought about making a larger value the default, but this would have meant
- overriding Python's default
- potentially removing a check that is often useful (e.g. when using the wrong cell delimiter for reading).
If you call Dataset.validate
from Python code, then yes, explicitly setting field size limit via csv.field_size_limit
would be the recommended way.
The
cldf
command has an option to enlarge field size limit, see$ cldf -h usage: cldf [-h] [--log-level LOG_LEVEL] [-z FIELD_SIZE_LIMIT] COMMAND ... optional arguments: -h, --help show this help message and exit --log-level LOG_LEVEL log level [ERROR|WARN|INFO|DEBUG] (default: 20) -z FIELD_SIZE_LIMIT, --maxfieldsize FIELD_SIZE_LIMIT Maximum length of a single field in any input CSV file. (default: 131072)
Ah, I didn't even check there. Thanks!