georust/gdal

[Feature Request] Guess the Driver Type based on the file extension

Atreyagaurav opened this issue · 18 comments

Currently, you can open a dataset with just Dataset::open("filename.ext") without giving the driver, but we cannot create a file without driver. If we can, please let me know.

My attempt to do Dataset::open on a new file has failed, even when using Dataset::open_ex like below didn't work:

Dataset::open_ex(
                filename,
                DatasetOptions {
                    open_flags: gdal::GdalOpenFlags::GDAL_OF_UPDATE
                        .union(gdal::GdalOpenFlags::GDAL_OF_VECTOR),
                    ..Default::default()
                },
            )

I want to be able to output to any gis file format. GPKG, shp, json, and so on.

So far I only know how to make a new file like this:

let driver = DriverManager::get_driver_by_name(driver)?;
let mut out_data = driver.create_vector_only(filename)?;

Since we need to pass filename as well as the driver, it feels redundant, or extra work for user, where they might make mistakes.

So is there a way to add DriverManager::get_driver_by_extension, that can guess the driver based on the file exntension?

Considering the Dataset::open doesn't need driver I thought maybe there is already a function, but it seems to call C functions and take some valid drivers list, so I wasn't able to find how to replicate that for writing new files.

The GDAL tools use these two internal functions. I've thought before about implementing something similar, so if you want to file a PR, it's going to be appreciated.

I don't know if we can get the driver metadata, so it might not be straightforward.

So here is a crude implementation. It works.

Atreyagaurav@22a1cf7

We can probably put it in once_cell and save a HashMap if we are likely to call the function a lot. I think we probably won't call this function enough times to justify that, but we can do it.

I looked at the code you linked, those functions seem like they were manually checking for gpkg, and shp and not much. But gdal can handle so many extensions, so there must be something. So we can look at that if this implementation seems very crude.

If you think this is good enough. I can add documentations and other things and make a pull request.

I think think we should keep the logic a little closer to the original. We should probably pass in the filename (because .shp.zip is annoying to handle and the caller might forget) and check the driver capabilities.

I don't think we should cache the result it since the drivers can be loaded and unloaded at runtime.

Ok, this one is more or less rewriting of the function:

Atreyagaurav@5f44a24

The original one only checks for DMD_EXTENSIONS but this one checks for DMD_EXTENSION as well. Other than that it should be similar.

The tests all pass on my laptop, but I don't know if they'll pass on others (if drivers are missing or something).

Since we're checking for DCAP_CREATE, should we make a Database::create() function to parallel Database::open(), and call it from there?

Not sure when I'll be able to look at it properly, but please file a PR so we can keep track of it.

Since we're checking for DCAP_CREATE, should we make a Database::create() function to parallel Database::open(), and call it from there?

But we already have Driver::create_xxx and.. Dataset::open, hmm dunno.

Done. It seems to work for my use case. I don't know about other use-cases and edge cases.

I've corrected the suggestions from clippy. Please refer to the comments in the pull request for any other details. I'll correct any errors that might come in the CI once it's been run again.

Commenting here, as I don't think you'll get a notification if I comment on the pull request.

If you're the one approving the CI run, I've updated the code, can you approve the CI tests? It should pass this time. You can feel free to review on your free time, I just wanna make sure I can update it to pass the tests if it fails again. Sorry for the trouble.

Sorry, I do get notifications, but didn't have yet a chance to take a proper look. And I'm a little confused about your GPKG issue, as far as I know, it should work fine on CI (I can see the failure in the Actions history, but have no explanation for it).

Anyway, I just triggered the CI in the PR.

Yeah, I saw that issue, and saw some other tests using gpkg, so I don't understand it either. But the new test should account for that, if GPKG driver is available, but the test is failing somehow, it'll still fail, and I'll look into it. And if it doesn't fail this time, then maybe GPKG driver is not available while running that test.

EDIT: Thank you, I also just figured out how to run CI on my fork, so I can use that for trial and error if they fail again. tests passing locally and failing in CI has made it a bit hard to pin down the error.

EDIT2: the manually triggering the CI run on the fork didn't work, they fail on compiling gdal and other things.

It should be available. I'd still like to figure this out because it might point to a deeper issue, not necessarily in your PR.

It failed again, so the driver is available. could be that the metadata doesn't have .gpkg.zip. Is there a way to explore easily? maybe a docker image of the CI.. I looked at the metada from my python gdal library to test things.

The CI runs on the ghcr.io/osgeo/gdal:ubuntu-full-X.Y.Z images, you should be able to run Python in those to check.

Found the problem, refer the comment on the pull request. On retrospect, I was thinking, if github didn't cancel other tests and ran all of them, we'd know if it was version problem coz I saw the old ones were the ones that failed. But I didn't say anything. And it turned out to be true lol.