CXuesong/WikiClientLibrary

Add support for bounding box in GeoSearchGenerator

zstadler opened this issue ยท 11 comments

The Wikimedia geosearch supports the use of a bounding box as an alternative to the coordinates+radius as a Geograpic selector:

gsbbox: Bounding box to search in: pipe (|) separated coordinates of top left and bottom right corners.

and provides an example:

api.php?action=query&list=geosearch&gsbbox=37.8|-122.3|37.7|-122.4

Since the coordinates+radius approach is limited to a 10000 meter radius, combining multiple requests in order to cover a larger area is a challenge. On the other hand, the use of a bounding box for searching Wikimedia is easier to aggregate and to integrate with other Geographic systems

Please consider adding support for search based on a bounding box.

See also this Wikimedia API bug report related to the use of gsbbox for geosearch.

Thanks for your links, @zstadler ! I will check on this and work on the implementation after the holiday, which is, tomorrow ๐Ÿ˜„

Published v0.7.0-int.6. You may now use GeoSearchGenerator.BoundingRectangle to specify a small rectangle with the left (longitude), top (latitude), width, height and search for the pages.

I'm planning to refector GeoSearch, GeoCoordinate and GeoCoordinateRectangle API. I'm going to extract the Dimension and Global from GeoCoordinate structure, and GeoCoordinateRectangle may need some polishment. If you have any more suggestion / feature requests regarding to these API, feel free to open another issue and let me know ๐Ÿ˜‰

Thanks for this! :-)
What's a small rectangle?
I'm getting the following error:
OperationFailedException: toobig: Bounding box is too big - the exception should indicate which bbox I should be using I think...
Also toobig is missing a space :-)

I've tried this roughly, and ranges less than 0.2 degrees in longitude and lattitude seem okay.

[Fact]
public async Task WpEnGeoSearchTest2()
{
var site = await WpEnSiteAsync;
var gen = new GeoSearchGenerator(site) { BoundingRectangle = new GeoCoordinateRectangle(1.9, 47.1, 0.2, 0.2) };
var result = await gen.EnumItemsAsync().Take(20).FirstOrDefaultAsync(r => r.Page.Title == "France");
ShallowTrace(result);
Assert.NotNull(result);
Assert.True(result.IsPrimaryCoordinate);
}

My hypothesis is that on MW API server, eventually you cannot bypass the Radius limitation of GeoSearch. 10km is roughly 0.28 degrees on earth.

So if you are planning to scan on some larger area the earth, you may need to split your range into a grid, and request for the smaller tiles one by one from the client.

And toobig is actually the error code from MW API response, like permissiondenied or badtoken.

Thanks for the quick response!
This is what I do right now with the 10Km radius search, only the circles are overlapping and I though I'll be able to do it in one call of bbox instead of around 1000.
Here's the relevant code I was hoping to simplify... :-/
https://github.com/IsraelHikingMap/Site/blob/5bf63fc2a0e2c1a22bf82d3f1175141b45c25356/IsraelHiking.API/Services/Poi/WikipediaPointsOfInterestAdapter.cs#L77

When using the GeoSearchGenerator it seems that I can't cross the pagination size of 500 in terms of number of results.
The following is generating a 500 items results but I don't know how to continue to the next page:

                    var geoSearchGenerator = new GeoSearchGenerator(new WikiSite(wikiClient, new SiteOptions($"https://he.wikipedia.org/w/api.php")))
                    {
                        BoundingRectangle = GeoCoordinateRectangle.FromBoundingCoordinates(34.75, 32, 34.9, 32.15),
                        PaginationSize = 1000 // this is ignored
                    };
                    var results = await geoSearchGenerator.EnumItemsAsync().ToListAsync(); // this returns only 500...

Let me know if you want me to open a new issue on this or am I missing out something?

Same request from the browser:
https://he.wikipedia.org/w/api.php?action=query&maxlag=5&list=geosearch&gsradius=10&gsprimary=primary&gslimit=500&gsbbox=32.15%7C34.75%7C32%7C34.9
Seems like the response doesn't have a continuation parameter? not sure...

It seems so. GeoSearch does not support pagination for now. Example response of https://en.wikipedia.org/w/api.php?action=query&maxlag=5&list=geosearch&gsradius=10&gsprimary=primary&gslimit=2&gsbbox=32.15%7C34.75%7C32%7C34.9

{
    "batchcomplete": "",
    "query": {
        "geosearch": [
            {
                "pageid": 18328987,
                "ns": 0,
                "title": "Beit Zvi",
                "lat": 32.078408333333336,
                "lon": 34.821713888888894,
                "dist": 489.4,
                "primary": ""
            },
            {
                "pageid": 46324352,
                "ns": 0,
                "title": "HaAliya HaShniya Garden",
                "lat": 32.0697,
                "lon": 34.8148,
                "dist": 1127.4,
                "primary": ""
            }
        ]
    }
}

I think the continuation problem is originally tracked with phab:T95241 and closed as duplicate of phab:T78703.

Unfortunately, I don't think T78703 is going to be resolved soon...

Let's use #64 to track this.