StopWords are missing from deserialization of result from Indices.GetSettingsAsync
Closed this issue · 7 comments
Elastic.Clients.Elasticsearch version: 8.18.3 (upgrade from 8.17.1)
Elasticsearch version: 8.17.2
.NET runtime version: net8.0
Operating system version: Linux
Description of the problem including expected versus actual behavior:
I use custom polish stopwords in my project and I need to display all index settings for a managing user.
In version 8.17.1 it was working fine despite using temporary workaround (reading from BackingDictionary marked as Obsolete). In version 8.18.3 deserialized json in place of "stopwords" contains empty object {} instead of array of strings ["a","aby",....]. Other properties are displaying correctly as in earlier version.
Steps to reproduce:
- Define some custom stopwords in index settings (simplified):
IndexSettings settings = new IndexSettings();
IndexSettingsAnalysis analysis = new IndexSettingsAnalysis();
string[] stopwords = ["a", "aby", "bo", "co", "i", "z", ];
StopWords sw = new StopWords(stopwords);
TokenFilters tokenFilters = new TokenFilters();
tokenFilters.Add("custom_polish_stop", new StopTokenFilter() {Stopwords = sw, RemoveTrailing = true} );
analysis.TokenFilters = tokenFilters;
settings.Analysis = analysis;
CreateIndexRequest createIndexRequest = new CreateIndexRequest(Indices.Index(indexName));
createIndexRequest.Settings = settings;
var creationStatus = await elasticClient.Indices.CreateAsync(createIndexRequest);
// ...
- Read these settings:
GetIndicesSettingsRequest request = new GetIndicesSettingsRequest(Indices.Index(indexName)) {};
IndexSettings settings = null;
var settingsResult = await elasticClient.Indices.GetSettingsAsync(request);
if (settingsResult.IsValidResponse)
{
settings = settingsResult.Values[indexName].Settings.Index;
}
//....
var returned = new JsonResult(new {Settings = settings});
- Compare returned json with initial values
Expected behavior
I belive that this problem was created when You introduce new type StopWords being an Union (it takes a predefined language name OR array of strings as parameter). Deserializer cannot correctly "guess" which object should be returned and just gives up returning empty object. Funny thing is that in debug I see both values under Stopwords property as Item1 = Turkish (why? I didn't set any) and Item2 = my array of stopwords (see screenshot).
I expect that Stopwords property should contain language name OR array of strings, depending what was set up. Even returning both Item1 and Item2 should be ok (but then language should be set to "custom" ?)
I'm sorry for any errors in code, and if I'm doing something wrong then please explain how to get all index settings correctly.
Hi @apr-un ,
regarding the union containing both variants (language and stopwords array), this is not an issue since the union as well contains a tag that specifies which variant we are actually dealing with (the other value must be ignored since it's most likely invalid or simply the first member of the enum in this case).
The correct way of accessing union values is described here:
#8572 (comment)
Regarding the actual issue:
Could you please show the code that you use to serialize the IndexSettings object to a JSON string? I suspect that you are not using the RequestResponseSerializer.
Hi @flobernd ,
thanks for quick response, my actual code is returning ActionResult from public method as seen on screenshot, so instead
var returned = new JsonResult(new {Settings = settings});
I really have
ActionResult result; // it can be depending on the flow -> JsonResult, NoContentResult or StatusCodeResult
//....
result = new JsonResult(new {Settings = settings, MappingView = testMappings});
//...
return result;
As You see I'm returning entire settings object here as JsonResult without any special action...
That MappingView is just List which contains manually crafted strings with mappings (because serialization of TypeMapping properties throws some error but its not important here ).
I didn't know about RequestResponseSerializer - are there any examples of using it?
@apr-un All Elasticsearch types in the client library must be (de-)serialized using the RequestResponseSerializer, otherwise the behavior is completely undefined:
var json = client.ElasticsearchClientSettings.RequestResponseSerializer.SerializeToString(...);There are more useful extension methods like SerializeToString() in the Elastic.Transport.Extensions namespace.
Besides that:
The client versions < 8.19 do not support round-trip serialization. This means that types only used in requests can only be serialized while types only used in responses can only be deserialized. Quite a few types are used in both contexts and some other types might still work - but there are no guarantees.
We always suggest to use custom POCO classes when working with data from the client (serialization, storing in database, etc.). This introduces a bit of overhead for assigning all properties of interest to the POCOs, but in the other hand this as well ensures your model does not break if the Elasticsearch classes in the client get changed (e.g. in this case with Stopwords).
If you don't want to use custom POCOs:
Round-trip serialization support is available starting from 8.19.x-preview and 9.0.x.
Thanks for explaining, I wasn't aware about requirement of using RequestResponseSerializer :(
Is it mentioned somewhere in documentation?
I tried to use RequestResponseSerializer on settings (exactly like in Your example ) and it correctly return all settings with all stopwords :)
Thank You very much
PS:
I wasn't using POCOs here because I just need to "quick glance" at the values in index, and getting all props for settings is a real overhead :D
In the meantime I was able to get my stopwords using workaround with .Match but it looks overcomplicated:
//....
string[] stopwords = null;
ActionResult result;
GetIndicesSettingsRequest request = new GetIndicesSettingsRequest(Indices.Index(indexName)) {};
IndexSettings settings = null;
var settingsResult = await elasticClient.Indices.GetSettingsAsync(request);
if (settingsResult.IsValidResponse)
{
settings = settingsResult.Values[indexName].Settings.Index;
if (settings != null)
{
var sw = settings.Analysis.TokenFilters.FirstOrDefault(t => t.Key == "custom_polish_stop");
if (sw.Value != null)
{
stopwords = ((StopTokenFilter)sw.Value).Stopwords.Match( _ => null, collection => collection).ToArray(); // first: language, second: real stopwords
}
//... simplified
result = new JsonResult(new {Settings = settings, Stopwords = stopwords});
}
}
return result;
@flobernd I found mention in documentation about this requirement ( must use RequestResponseSerializer for serialization/deserialization ) but it's only on version 9.0.0 of elastic client:
https://www.elastic.co/docs/release-notes/elasticsearch/clients/dotnet - point 10. Serialization
Please consider updating docs for 8.18.x with this requirement under Breaking Changes section.
Thanks & have a nice weekend :)
@apr-un Happy to clarify this in the docs, but it's not a breaking change since that behavior is the same since NEST (v7).
Going to add this on Monday. Have a nice weekend as well!
