cschwem2er/imgrec

Feature Request 👍

TheGincgoGreco opened this issue · 1 comments

Hallo Carsten,

Hut ab vor dir für dein fantastisches imgrec Paket! 💯 While I am only a novice programmer, I can say this package is by far the most robust and thorough interface I have encountered between R and Vision API.

I am currently working on a project that highlights the utility of (and my need for) a couple more features, to make imgrec truly comprehensive and extensible. Kindly overlook the amateur nature of my code below; naturally, I don't presume to submit it as a formal suggestion, let alone a pull request.

1. Distinct max_res for Each Feature

While Vision presents some hurdles for scalability, I would love to take advantage of its flexibility to obtain annotations in different volumes for different features, as seen in Google's demo for Vision API:

{
  "requests": [
    {
      "image": {
        "source": {
          "imageUri": "https://media.istockphoto.com/photos/tropical-paradise-landscape-picture-id1033545162"
        }
      },
      "features": [
        {
          "type": "LABEL_DETECTION",
          "maxResults": 10
        },
        {
          "type": "SAFE_SEARCH_DETECTION",
          "maxResults": 5
        }
      ]
    }
  ]
}

Would it be possible to generalize imgrec::get_annotations, such that max_res is a numeric vector containing one maxResults value for each selected feature?

get_annotations <- function(images, features, max_res = rep(10, length(features)), mode) {
  ⋮
  if (!is.numeric(max_res) || any(max_res %% 1 != 0) || any(max_res <= 0) || length(max_res) != length(features)) {
      stop('"max_res" only accepts a vector of positive integers corresponding to the selections in "features".')
  
    }

I suppose this would require a modification to build_features:

build_features <- function(features, max_res = 10) {
  ⋮
  for (feat in seq_along(features)) {
    feature <- list(type = .imgrec$feature_table[[features[feat]]])
    if (!features[feat] %in% c('text')) {
      feature$maxResults <- max_res[feat]
  ⋮
}

Perhaps this could be further modified to accommodate Inf for practically unlimited volumes of responses.

2. Authorization via Service Account

Some R interfaces with the Google Cloud Platform (GCP), permit API authorization via a service account, whose credentials are downloaded in JSON format. For example, the bigrquery package has

bq_auth(
  ⋮
  path = NULL,
  scopes = c("https://www.googleapis.com/auth/bigquery", "https://www.googleapis.com/auth/cloud-platform"),
  ⋮
)

This permits convenient parameterization: one can codelessly update the credentials by simply overwriting the JSON file (stored locally at path). Furthermore, service accounts can limit activity by the R client to only those scopes assigned by an authorized user within GCP.

Would a imgrec::gvision_auth function be possible, with analogous parameters to load local credential files for GCP service accounts?

3. Batch Size as a Parameter

In my own project, I managed to cobble together a few functions, to (reversibly) mutate .imgrec$img_per_req in the imgrec environment:

library(imgrec)

# Store the default settings for the "imgrec" package.
imgrec_get_environment <- function() {
  return(environment(fun = imgrec::get_annotations))
}

# Gets the parameters for the "imgrec" package.
imgrec_get_parameters <- function() {
  return(imgrec_get_environment$.imgrec)
}

# Make a static copy of the parameters, disassociated from the changeable environment.
DEFAULT_IMGREC_PARAMETERS <- as.environment(as.list(imgrec_get_parameters()))

# Mutates the size for batches, among the "imgrec" parameters.
imgrec_set_batch_size <- function(size = DEFAULT_IMGREC_PARAMETERS$img_per_req) {
  if(is.numeric(size) && size %in% 1:DEFAULT_IMGREC_PARAMETERS$img_per_req) {
    assign(x = "img_per_req", value = size, envir = imgrec_get_parameters())
  } else {
    warning("\"size\" should be an integer between 1 and ", DEFAULT_IMGREC_PARAMETERS$img_per_req, "!")
  }
}
# Resets the size for batches, among the "imgrec" parameters, to its value when the package loaded.
imgrec_reset_batch_size <- function() {
  imgrec_set_batch_size(DEFAULT_IMGREC_PARAMETERS$img_per_req)
}

Yet while I need such functionality for my project, I am extremely nervous about tampering with the environments of packages built by programmers far more experienced than me. Could you possibly add a batch_size parameter to imgrec::get_annotations?

get_annotations <- function(images, features, max_res, mode, batch_size = .imgrec$img_per_req) {
  ⋮
  if(!(batch_size %in% 1:.imgrec$img_per_req)) {
    stop('"batch_size" accepts only positive integers up through ', .imgrec$img_per_req, '.')
  }
  
  # build chunks for multiple images
  image_chunks <- build_chunks(images, batch_size)
  ⋮

I suppose it would require a modification to build_chunks.

build_chunks <- function(images, batch_size) {
  # build request chunks for multiple images
  split(images, ceiling(seq_along(images)/batch_size))
}

To clarify, I do understand that Vision caps batches at 16 images apiece, and that smaller batch sizes are less efficient under quotas on Vision calls. Until the Vision developers can make larger batch sizes more efficient, I doubt this third feature would have much significance. However, when those developers eventually do so, the batch_size parameter would allow imgrec users to independently adapt their code, without waiting for those Vision updates to be reflected in the imgrec parameters.

Anyway, thank you for your consideration!

Best Regards — Greg

Thank you so much for the kudos and the detailed suggestions Greg! Would you be interested in trying to code this up via a pull request? I'm afraid I currently do not have the time to implement this on my own, but the features would certainly improve the package.