microsoft/kernel-memory

[Bug] 22021: invalid byte sequence for encoding "UTF8": 0x00

pyliakm opened this issue · 3 comments

Context / Scenario

I am using PostgreSQL + pgvector, and I got an exception when saving the result to the database.

var kernelMemory = new KernelMemoryBuilder()
                    .WithPostgresMemoryDb(new PostgresConfig() { ConnectionString = _supabaseConfig.ConnectionString })
                    .WithOpenAIDefaults(_encryptionModelService.Decrypt(organization!.OpenAIAccessToken))
                    .WithContentDecoder<CustomImageDecoder>()
                    .Build<MemoryServerless>();

What happened?

I expect that it should work for any PDF documents. It is the PDF that fails.
SynergyOS Design Guide.pdf

Importance

I cannot use Kernel Memory

Platform, Language, Versions

Microsoft.KernelMemory.MemoryDb.Postgres v0.62.240604.1

Relevant log output

Microsoft.KernelMemory.Postgres.PostgresException: 22021: invalid byte sequence for encoding "UTF8": 0x00
 ---> Npgsql.PostgresException (0x80004005): 22021: invalid byte sequence for encoding "UTF8": 0x00
   at Npgsql.Internal.NpgsqlConnector.ReadMessageLong(Boolean async, DataRowLoadingMode dataRowLoadingMode, Boolean readingNotifications, Boolean isReadingPrependedMessage)
   at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
   at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
   at Npgsql.NpgsqlDataReader.NextResult(Boolean async, Boolean isConsuming, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteReader(Boolean async, CommandBehavior behavior, CancellationToken cancellationToken)
   at Npgsql.NpgsqlCommand.ExecuteNonQuery(Boolean async, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Postgres.PostgresDbClient.UpsertAsync(String tableName, PostgresMemoryRecord record, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Postgres.PostgresDbClient.UpsertAsync(String tableName, PostgresMemoryRecord record, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Postgres.PostgresDbClient.UpsertAsync(String tableName, PostgresMemoryRecord record, CancellationToken cancellationToken)
  Exception data:
    Severity: ERROR
    SqlState: 22021
    MessageText: invalid byte sequence for encoding "UTF8": 0x00
    Where: unnamed portal parameter $4
    File: mbutils.c
    Line: 1665
    Routine: report_invalid_encoding
   --- End of inner exception stack trace ---
   at Microsoft.KernelMemory.Postgres.PostgresDbClient.UpsertAsync(String tableName, PostgresMemoryRecord record, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Postgres.PostgresMemory.UpsertAsync(String index, MemoryRecord record, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Handlers.SaveRecordsHandler.SaveRecordAsync(DataPipeline pipeline, IMemoryDb db, MemoryRecord record, HashSet`1 createdIndexes, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Handlers.SaveRecordsHandler.InvokeAsync(DataPipeline pipeline, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Pipeline.InProcessPipelineOrchestrator.RunPipelineAsync(DataPipeline pipeline, CancellationToken cancellationToken)
   at Microsoft.KernelMemory.Pipeline.BaseOrchestrator.ImportDocumentAsync(String index, DocumentUploadRequest uploadRequest, CancellationToken cancellationToken)
dluc commented

@pyliakm could you provide a PDF that allows to reproduce this error?

@pyliakm could you provide a PDF that allows to reproduce this error?
I have added it to the issue description.

dluc commented

Thanks for the report! Bug fixed