magnusmanske/petscan_rs

Insource search to allow a search like that on wiki..

Closed this issue · 4 comments

Whats the problem:?

On a site like Commons, I can use a regexp to find a specific phrasing within the text of a page. For example on "insource; usda-nurseryandseedcatalog" within File namespace on Commons would find.

would find- https://commons.wikimedia.org/wiki/File:Plant_Lambert%27s_quality_seed_always_(IA_plantlambertsqua1972rela).pdf

It is not currently possible to use such an insource search to select specfic items from a results set generated by Petscan.

What is desired:

The ability to specify an insource: regexp , which can be used to limit the results displayed, from a results set generated by PETSCAN.

I was wanting this functionality to set up queries to find for example PD-expired items with a post 1925 publication date. I can't do this with the current PETSCAN. (It could I think be done with Quarry, but that needs to me to write complex SQL, which I lack the expertise to do ...)

This is done now. Basically, you can enter a "filter search", just like the on-wiki search, that will automatically insert the page ID for each result, and run the search. If the page is returned, the query matches, the page is kept, otherwise it is removed.

Hmm -
https://petscan.wmflabs.org/?psid=18625999 Didn't generate any results.

It should have by my understanding the new feature.... But I don't see it in the UI yet. Waiting for a deployment push to Labs?

Ah, forgot to push it live! Done now. Example: https://petscan.wmflabs.org/?psid=18638683

Nice..
BTW I plan on limiting queries with the new function to around 20 at a time... I can't review faster than that.
Thanks for including the Guidance note.. :)