PRX/Porter

Build FTP task

Closed this issue · 4 comments

Followup to #81

  • Create a Docker image based on amazonlinux that has Ruby installed
  • Create a Ruby script to handle the FTP transfer
  • Capture debug/error information from the FTP transfer, and write it to S3
  • Add a post-FTP state to recall the information from S3 and add it to the job execution state
  • Make sure linting/etc is set up for Ruby, since we haven't had any Ruby code in the project yet

@kookster Here are examples of other files being written to the artifact bucket:

The second one that uses event.TaskIteratorIndex is probably the better model, since it allows multiple FTP tasks in a single job without conflicts. The ingest function only ever gets called once per job. I don't think there's a need for a prefix like with the transcription lambda.

@kookster Added some code to pull a JSON file from S3 and handle the results. The file name and the status and message keys are just made up, so feel free to format those however makes the most sense in the FTP code.

https://github.com/PRX/Porter/blob/feat/ftp_copy/lambdas/FtpCopyTaskOutputLambdaFunction/index.js#L28

@kookster The prototype task input I made up looked like this:

{
    "Type": "Copy",
    "Mode": "FTP",
    "URL": "ftp://foo:bar@example.com"
}

I'm guessing we'll need some way of indicating active vs. passive. Below are some variants I thought of:

{
    "Type": "Copy",
    "Mode": "FTP/Active",
    "URL": "ftp://foo:bar@example.com"
}
{
    "Type": "Copy",
    "Mode": "FTP",
    "Connection": "Active",
    "URL": "ftp://foo:bar@example.com"
}
{
    "Type": "Copy",
    "Mode": "FTP",
    "ConnectionMode": "Active",
    "URL": "ftp://foo:bar@example.com"
}

The Connection property is my favorite. FTP/Active also makes sense in the JSON, but it requires additional logic in the state machine definition, which feels unnecessary. ConnectionMode is confusing next to Mode to me.

Actually that's not true. FTP/Active and FTP/Passive can be detected in the state machine definition with a wildcard, so it's not more complex that handling any sort of FTP copy task.

The explicit Connection property is still my favorite, though.