Column filter doesn't work
beauttie opened this issue · 8 comments
Expected Behavior
If I search for a column by the exact name or a string with a wildcard, I would expect to see search results listing tables containing that column.
Current Behavior
Currently, no search results are returned when applying the column filter as shown in the screenshot below.
I also see this same error message when searching for a column by the exact name.
Possible Solution
I was able to resolve this bug by changing the value of the column
field in this line to column_names.keyword
as shown in the screenshot below.
Steps to Reproduce
I ran the search, metadata, and frontend service locally on my computer. I used this AwsSearchConfig module for the search, a DevConfig
module that connects to a Neo4j proxy client for the metadata, and this LocalConfig module for the frontend.
Context
I couldn't deploy a version of the frontend with the column filter as it would incorrectly return no search results. As instructed in these docs, I ran the configured SearchMetadatatoElasticsearchTask
and confirmed that the new table index has the new mappings. If I get the new index via the Kibana console, what I do notice is that there is a column_names
property in addition to the columns
property.
"mappings" : {
"_meta" : {
"version" : 2
},
"properties" : {
...,
"column_names" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"columns" : {
"type" : "text",
"fields" : {
"general" : {
"type" : "text",
"term_vector" : "with_positions_offsets",
"analyzer" : "general_analyzer"
},
"keyword" : {
"type" : "keyword"
},
"ngram" : {
"type" : "text",
"term_vector" : "with_positions_offsets",
"analyzer" : "ngram_analyzer_table_columns"
}
},
"term_vector" : "with_positions_offsets",
"analyzer" : "stemming_analyzer"
},
...
}
I also called the Elasticsearch REST APIs to confirm what I was seeing above based on the request body parameters logged out by es_proxy_v2_1
. For example, when I submit this GET request, I get no hits:
GET table_search_index_v2_1/_search
{"query": {"bool": {"filter": [{"bool": {"should": [{"wildcard": {"columns.keyword": "*key"}}], "minimum_should_match": 1}}]}}, "from": 0, "size": 10, "highlight": {"fields": {"name": {"type": "fvh", "number_of_fragments": 0}, "description": {"type": "fvh", "number_of_fragments": 0}, "columns.general": {"type": "fvh", "number_of_fragments": 10, "order": "score"}, "column_descriptions": {"type": "fvh", "number_of_fragments": 5, "order": "score"}}}}
whereas this GET request (same as above except for the key after the wildcard
key) returns hits
GET table_search_index_v2_1/_search
{"query": {"bool": {"filter": [{"bool": {"should": [{"wildcard": {"column_names.keyword": "*key"}}], "minimum_should_match": 1}}]}}, "from": 0, "size": 10, "highlight": {"fields": {"name": {"type": "fvh", "number_of_fragments": 0}, "description": {"type": "fvh", "number_of_fragments": 0}, "columns.general": {"type": "fvh", "number_of_fragments": 10, "order": "score"}, "column_descriptions": {"type": "fvh", "number_of_fragments": 5, "order": "score"}}}}
Your Environment
Amundsen version used:
amundsendatabuilder == 7.4.3
amundsensearch == 4.1.0
amundsenmetadata == 3.12.0
amundsenfrontend == 4.2.0
I also have an AWS OpenSearch Service (OSS) domain using Elasticsearch 7.10
.
Thanks for opening your first issue here!
You should not have both a columns
and column_names
field in your mappings. For us we only have columns and the existing filtering functionality works for the column filter. Do you periodically drop and recreate your indices? If so does it create both fields every time?
I just made a small fix, this is a python inheritance issue that makes it so the mapping of the parent class is used rather than the newest search class. Thanks for raising this! #2121
You should not have both a
columns
andcolumn_names
field in your mappings. For us we only have columns and the existing filtering functionality works for the column filter. Do you periodically drop and recreate your indices? If so does it create both fields every time?
We recreate the index daily, and it creates both fields every time.
@allisonsuarez I just tried building with your fix in this line, but it doesn't fix the issue. Would this rather be an issue with how the mappings are created in amundsendatabuilder
?
@beauttie are you using the packages?the release for that fix was just made
@allisonsuarez I updated what was in the release for search-4.1.1
for the metadata, search, and frontend service, and I still see the same bug. Again, I think this is an issue with how the mappings are created in amundsendatabuilder
. We currently use the latest version 7.4.3
.
hi @beauttie, I'm going to close this issue as a triaged state, and people can upvote it or someone can choose to work on the issue from there