[Bug] 22021: invalid byte sequence for encoding "UTF8": 0x00
pyliakm opened this issue · 3 comments
pyliakm commented
Context / Scenario
I am using PostgreSQL + pgvector, and I got an exception when saving the result to the database.
var kernelMemory = new KernelMemoryBuilder()
.WithPostgresMemoryDb(new PostgresConfig() { ConnectionString = _supabaseConfig.ConnectionString })
.WithOpenAIDefaults(_encryptionModelService.Decrypt(organization!.OpenAIAccessToken))
.WithContentDecoder<CustomImageDecoder>()
.Build<MemoryServerless>();
What happened?
I expect that it should work for any PDF documents. It is the PDF that fails.
SynergyOS Design Guide.pdf
Importance
I cannot use Kernel Memory
Platform, Language, Versions
Microsoft.KernelMemory.MemoryDb.Postgres v0.62.240604.1
Relevant log output
Microsoft.KernelMemory.Postgres.PostgresException: 22021: invalid byte sequence for encoding "UTF8": 0x00
---> Npgsql.PostgresException (0x80004005): 22021: invalid byte sequence for encoding "UTF8": 0x00
at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
at Npgsql.NpgsqlCommand.ExecuteNonQuery(Boolean async, CancellationToken cancellationToken)
at Microsoft.KernelMemory.Postgres.PostgresDbClient.UpsertAsync(String tableName, PostgresMemoryRecord record, CancellationToken cancellationToken)
at Microsoft.KernelMemory.Postgres.PostgresDbClient.UpsertAsync(String tableName, PostgresMemoryRecord record, CancellationToken cancellationToken)
at Microsoft.KernelMemory.Postgres.PostgresDbClient.UpsertAsync(String tableName, PostgresMemoryRecord record, CancellationToken cancellationToken)
Exception data:
Severity: ERROR
SqlState: 22021
MessageText: invalid byte sequence for encoding "UTF8": 0x00
Where: unnamed portal parameter $4
File: mbutils.c
Line: 1665
Routine: report_invalid_encoding
--- End of inner exception stack trace ---
at Microsoft.KernelMemory.Postgres.PostgresDbClient.UpsertAsync(String tableName, PostgresMemoryRecord record, CancellationToken cancellationToken)
at Microsoft.KernelMemory.Postgres.PostgresMemory.UpsertAsync(String index, MemoryRecord record, CancellationToken cancellationToken)
at Microsoft.KernelMemory.Handlers.SaveRecordsHandler.SaveRecordAsync(DataPipeline pipeline, IMemoryDb db, MemoryRecord record, HashSet`1 createdIndexes, CancellationToken cancellationToken)
at Microsoft.KernelMemory.Handlers.SaveRecordsHandler.InvokeAsync(DataPipeline pipeline, CancellationToken cancellationToken)
at Microsoft.KernelMemory.Pipeline.InProcessPipelineOrchestrator.RunPipelineAsync(DataPipeline pipeline, CancellationToken cancellationToken)
at Microsoft.KernelMemory.Pipeline.BaseOrchestrator.ImportDocumentAsync(String index, DocumentUploadRequest uploadRequest, CancellationToken cancellationToken)
pyliakm commented
@pyliakm could you provide a PDF that allows to reproduce this error?
I have added it to the issue description.
dluc commented
Thanks for the report! Bug fixed