Some uploads failing due to character set
Closed this issue · 3 comments
jorvis commented
Example dataset title: Stress-induced β cell early senescence .....
On upload the error is:
Exception: Failed to insert metadata: 1366 (HY000): Incorrect string value: '\xCE\xB2 cel...' for column 'title' at row 1
To support this, need to change the encoding to utf8mb4
jorvis commented
Here are the commands needed to patch an existing DB. @adkinsrs , you'll need to do this for your local devel instances.
ALTER TABLE dataset DROP INDEX text_idx;
ALTER TABLE dataset DROP INDEX text_with_geo_idx;
ALTER TABLE dataset DROP INDEX text_with_geo_pubmed_idx;
ALTER TABLE dataset MODIFY COLUMN title VARCHAR(255)
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL;
ALTER TABLE dataset MODIFY COLUMN ldesc TEXT
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE dataset MODIFY COLUMN geo_id VARCHAR(50)
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE dataset MODIFY COLUMN pubmed_id VARCHAR(20)
CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE dataset ADD FULLTEXT INDEX text_idx (title, ldesc);
ALTER TABLE dataset ADD FULLTEXT INDEX text_with_geo_idx (title, ldesc, geo_id);
ALTER TABLE dataset ADD FULLTEXT INDEX text_with_geo_pubmed_idx (title, ldesc, geo_id, pubmed_id);
jorvis commented
This has been deployed on NeMO.
adkinsrs commented
Will do. Will need to check the collation for tables I added for nemoarchive importing which required collation as well (to reference with other tables)
…------------------------------------------------------
Ricky "Shaun" Adkins
Bioinformatics Software Engineer II
Institute for Genome Sciences
University of Maryland, School of Medicine
Phone (410) 706-6766
________________________________
From: Joshua Orvis ***@***.***>
Sent: Friday, December 1, 2023 12:11 PM
To: IGS/gEAR ***@***.***>
Cc: Adkins, Ricky (Shaun) ***@***.***>; Mention ***@***.***>
Subject: Re: [IGS/gEAR] Some uploads failing due to character set (Issue #633)
Closed #633<#633> as completed.
—
Reply to this email directly, view it on GitHub<#633 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABLHI6VWHRDKLP2O642OC2TYHIFS3AVCNFSM6AAAAABADBYLM2VHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJRGEZDINRRGMYDCMQ>.
You are receiving this because you were mentioned.Message ID: ***@***.***>