--fts ignores --parameters, --field, --sort
Opened this issue · 5 comments
Hi,
I am doing ia search --parameters="..."
...but I do not know what parameters it accepts.
Is there a list or documentation anywhere?
My goal is to return a small number of results sorted by most recently "added" first.
- on the website that is
sort=-publicdate
- and in advanced search it is
sort createdate desc
- this page says
sort_by=-addeddate
But those do not seem to work with ia search
, or maybe I am doing it wrong?
I have also tried
ia search --parameters="rows=10" --sort="addeddate desc" "hanafuda"
ia search --parameters="rows:10" --sort="created_on desc" "hanafuda"
Any help appreciated.
Thanks!
OK, I figured it out and support seems to be missing, so I will rename the issue.
ia search 'hanafuda' --parameters rows:10 --field addeddate --sort "addeddate desc"
- returns expected results (GOOD)
But...
ia search 'hanafuda' --fts --parameters rows:10 --field addeddate --sort "addeddate desc"
- returns more rows than requested (BAD)
- returns unsorted results (BAD)
I am using:
pip install internetarchive
- version 3.4.0
The confusion here is that ia search
uses various endpoints depending on several things. It uses the Scrape API by default, Advanced Search when either rows
or page
parameters are specified, and our beta FTS API when either --fts
or --dsl-fts
are specified.
The reasoning behind this is because the Advanced Search API is not designed for scraping/retrieving full result sets (it's capable of doing so, but it's not designed for it). The Scrape API is designed for dumping full result sets. I assume that most people want full result sets when using ia search
, and that's why the Scrape API is the default. When a user specifies that they only want a subset of the results (i.e. via page
or rows
params), then Advanced Search is used.
Then there's the FTS API. This is in beta, is not currently documented publicly, and is subject to change. The specific parameter you're after though is size
as opposed to rows
:
» ia search 'hanafuda' --fts --parameters size:10 | wc -l
10
--fields
is not currently supported with --fts
, all indexed fields are returned by default. addeddate is not returned, but publicdate is (under .fields.meta_publicdate
). Sorting is not supported in the beta FTS API at this time.
Sorry for the confusion. We hope to consolidate these endpoints in the future!
Thanks @jjjake very informative. I'll keep an eye on progress.
It seems very wasteful to query the whole set when I only want the most X recent (for example any new items since the last time I did the query). But maybe I'm overthinking it!? I prefer to keep things lean and save time and electricity on this earth.
The "beta FTS API" doesn't seem to point to the right endpoint.
results from "ia search" are not the same as the one used by https://archive.org/search?query=...
JS from this page uses https://archive.org/services/search/beta/page_production/, which return cleaner results.
Is there any plan to switch to that endpoint?