koursaros-ai/nboost

Index multiple fields.

Closed this issue · 6 comments

I couldn't index multiple fields in the index of Elasticsearch using nboost-index command. My csv file contains 5 columns and I want to index all the field and search on one field. How can I achieve that in NBoost?

nboost-index is a helper to index documents with a single text column that can be ranked on with BERT.

So you can either use the --id_col and --body_col switches with nboost-index to specify a single text column to index

or index the whole csv in elasticsearch as you normally would e.g. this link and then specify which field you want to rerank on with nboost using the --cvalues_path and the jsonpath of the choice

I just added new documentation for the dsl. You can find it here.

@colethienes I also encountered the same problem, could you be more specific on how to index multiple columns? Thanks!

BTW, great work!

I just updated the nboost-index tool to automatically index multiple columns (using the column headers as the field names. Use --id_col to specify whether the first column is an id. You can also check nboost-index --help for more options.

Let me know if you have any further issues.

It works with 0.2.1. I notice some difference between 0.2.0 and 0.2.1.

Screenshot 2020-01-20 at 10 14 09 AM

  • Default value of id_col is False.
  • In cli, there is no col_name setting anymore.

Does this mean, there is no need to set col_name, and nboost will index all the colume in csv file?

Thanks!

That's correct. No need to set column names except for in the csv. If set to True, --id_col will assume the first column of the csv contains ids (_id for elasticsearch).