wpoa/JATS-to-Mediawiki

Image size differences

Daniel-Mietchen opened this issue · 11 comments

Compare https://commons.wikimedia.org/wiki/File:An-updated-checklist-of-aquatic-plants-of-Myanmar-and-Thailand-biodiversity_data_journal-2-e1019-g001.jpg and https://commons.wikimedia.org/wiki/File:Collecting_aquatic_plants_in_a_lake_in_Myanmar.jpg .

The former is a smaller version of the latter. The smaller version is available from
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3964724/figure/F351027/ ,
the larger one after another click, but hardly downloadable this way, whereas it downloads fine from the journal site (that's where I had gotten it).

How can we make sure - using information in the XML - that we grab the largest version of the file?

I just thought of hack around this problem in theory, but I'm not sure exactly how to implement it, or if its feasible. In the jpeg EXIF metada is captured the camera version and time and date of picture taken etc. Even if the picture changes sizes, the EXIF metadata remains. That means that potentially we could spot duplicates where the EXIF metadata over laps. Pictures that were take on the same second on the same camera.

Potential pitfalls

  • Will not work on images that have had their EXIF metadata stripped.
  • Medium complication to maintain realtime database of OAMI's uploaded metadata.
    • Would be easier of OAMI published this information
    • Technically possible to watch OAMI contributions download the images and then make a private database of the EXIF metadata.

I will ask.

Big problem here is that basically no publisher keeps EXIF metadata intact, and while some (notably PLOS) do embed metadata in their images (typically via XMP rather than EXIF, I believe), it's about the publication, not the capturing.

There is no way to get these easily from the PMC site. Please write to the help desk to ask that they add this feature -- I think it's a good idea to provide full-res images of OA content (but the request should come through the help desk).

Ok, I wrote to help desk about this.

any response from the help desk?

No, did not receive a response.

Max Klein
http://notconfusing.com/

On Sat, Jul 26, 2014 at 11:37 AM, Matt Senate notifications@github.com
wrote:

any response from the help desk?


Reply to this email directly or view it on GitHub
#20 (comment)
.

For now, this can be skipped this new feature is supported in PMC via request from helpdesk.

Not critical if some images are lower-resolution, as they can be updated manually or programmatically later.

Just got this response:

Dear Colleague,

My sincere apologies for missing your message when you sent it in.  I will take a look at this and respond.

Best regards,

Monica Romiti
PubMed Central
Contactor

so more waiting

And another nice response:

I have consulted with PubMed Central coordination staff and PMC will not be able to provide this feature in the near future. The high resolution images are in archival storage and are not easily accessible. It will take a significant change in the PMC architecture to provide this service and we don't have the resources to do that now. At this point we can't predict when it might be feasible to do.

Best regards,

Monica Romiti

Given the response and the expectation of no timely development, let's consider this a fringe case and move on.

Images that are of lower resolution should be replaced either manually by editors sourced by publishers, authors, other users (etc), or automatically in the future when new PMC features are implemented to expose full resolution versions.

Closing