cloudyr/aws.ec2

Resurrect the Package

Opened this issue · 2 comments

I'm putting together a checklist of what needs to be done before I consider this ready to return to CRAN:

  • Update DESCRIPTION to add maintainer & clean up
  • Verify ec2HTTP.R
  • Update the behavior or dry_run in ec2HTTP.
  • Make sure ec2HTTP behaves as desired. What do we want to do with the "version" parameter?
    I added a note that "version" isn't validated (for future-proofing), and I'll leave it at that. The only people calling ec2HTTP directly should be people who want to do something very specific.
  • Consider moving the bit to make the list names syntactic into ec2HTTP. Will it be useful for all/most actions? Leaving it in the one function that needs it until I find a second case.
  • describe_instances
  • start_instances
  • stop_instances
  • Either remove tests/old_functions or find a permanent home for it.
  • Update README.rmd
  • devtools::release checks

Here's what I think the MVP for this package should include functionality-wise:

  • Launch / create-new instances (run_instances())
    • doing this
  • Get a list of running instances (or a single isntance) with basic attributes (describe_instances())
    • Unfortunately (probably because of the API?) this is currently spread out over a few functions including:
      • describe_instances()
      • instance_status()
      • get_instance_public_ip()
    • From a user's perspective it feels like there should be one call that would return all of these things (even if that doesn't match the API exactly). Maybe we start out matching the API and then add wrappers on top to return more user-friendly objects?
  • Start / stop instances (start_instances(), stop_instances())
  • Terminate instances (terminate_instances())

I think the next tier down are functions for managing VPCs and Security groups, e.g.,

  • create_keypair()
  • create_sgroup()
  • authorize_ingress(), authorize_egress()

In general I think it's a good principle that for every *_create() function we have a matching *_describe() and a matching *_terminate() or *_stop() or *_delete()

I would say everything else that's in the package comes in the third tier -- I'd rather have a package that supports the core use-cases well than one that extensively covers every possible thing you can do with AWS -- if you really need to be tightly managing EBS's, images, security groups, and VPCs there are other better tools out there (e.g., terraform & friends).

I think the use-case we should focus on supporting is the small data science team that wants to be able to script setting up ad-hoc computing instances which I think the above covers well. Anything that doesn't happen in this library, they can always do from the AWS CLI (which is really good!) or from the admin panel, so I am not very concerned about "completeness". I want to identify the most common workflows, and support those really well.

I completely agree on the target for a baseline!

I'm going to start with the separate-by-API-action functions, and then we can figure out wrappers for them.

I can also imagine building in wrappers for things like launching an instance using the Rstudio server GPU snapshot, etc. Basically I think once we have it buttoned down, if we do anything using the marketplace, we should consider adding it as its own function :D

Updating the checklist to reflect your MVP suggestions.

I did describe_account_attributes simply because that file was first (alphabetically) in the original project, but I don't think it hurts to have that baked.