graphile/crystal

How to handle file upload?

Closed this issue · 17 comments

Is there a way to handle file upload?

Currently, no.

@danielbuechele I know you've some experience with file uploads and GraphQL; would this be something we could add to our HTTP handler pretty easily? Does it require support from the GraphQL schema or is it purely a transport concern?

https://medium.com/@danielbuechele/file-uploads-with-graphql-and-apollo-5502bbf3941e

amizz commented

Create a simple REST api/microservice for uploading file. Save the file location in your database. Then you can use GraphQL to query the database after that.

Another alternative would be to upload direct to S3 (or similar) then post just the S3 result to the server.

Perhaps create a binary type (e.g. bytea), and if the POST request has header Content-Type: multipart/form-data;, then retrieve and convert into base64 string and store it into the database?

Currently I use multer and multer-s3 combo and create a upload file endpoint in backend, then I update the result to the graphql server

Create a simple REST api/microservice for uploading file. Save the file location in your database. Then you can use GraphQL to query the database after that.

In my understanding, this will make uploading a file and mutations two separate actions that are more difficult to put into a single transaction, no?

Well, I would not advise storing files in the database anyway (it’ll make backups and restorations a lot slower) so you have to deal with this issue anyway. You could upload the file to a staging area that gets automatically wiped and thrn transactionally move it to a different area.

You could also upload as base64-encoded string via a field on a plugin's custom resolver.

For example, a plugin called "UploadFile" and a mutation like:

mutation {
  uploadFile(input: {file: "base64....string....here....=="}) {
    url
  }
}

The plugin could take care of uploading it to S3 and storing the URL in the database

The approach I described in my post could also be applied to this project. Using base64 would work, but isn't suitable for larger files, as they become even bigger using base64.

I would suggest the following approach as described in my post on medium:

  1. use multipart/form-data
  2. add the file to the post request and give it a unique name
  3. add a middleware to store the file and replace the unique name with the URL for the file

If I have the time to, I'll try to implement an example middleware for this.

Personally, my rule is to restrict my GraphQL APIs to CRUD'ing data records in Postgres. Processing & managing file/object blobs is a problem I put under the domain of another system that's designed to handle potentially large data objects. I use Postgres to manage the metadata for those objects.

My projects use React in the browser, so I use Uppy as a component for handling client-side file selection & uploading. On the server-side, the uploads are sent to Tus which enables pause/resumable on the uploads & progress events. Tus is configured to send uploads to Minio, which is a S3-compatible object store.

There are a bunch of options to trigger processing of the uploaded files, for example:

  • Clients upload the file to Tus/Minio & get back the id for it, after which they invoke a mutation to notify the server of the file & to perform any necessary processing. I'm not a huge fan of this approach because it delegates the event triggers to the client. I prefer that metadata for the event go with the upload and then leave it to the server to automatically process the upload.
  • Tus supports events that can be configured to invoke a webhook on uploads, which can then implement whatever logic is necessary to handle the upload.
  • Minio also supports Webhook-configurable events, but also supports AMQP, Redis, Kafka, and others. More importantly, it supports Postgres. One of the methods that it provides is by INSERTING/UPDATING/DELETING rows from a specific table that reflect the object changes on the Minio server. From there, it's as simple as setting up TRIGGERs on any further processing that's required.

Anyway, I know a lot of that isn't PostGraphile-specific, but figured it might provide some options. Personally, for PostGraphile, I'd look into pairing it with Minio & configuring event notification into PostgreSQL for it. It's what I'm currently doing and so far I haven't had any issues. Eventually, when subscriptions are a thing, clients can subscribe to notifications that alert them of files that have been uploaded & processed. I'm using RabbitMQ & STOMP for that currently for my browser, Android, & iOS clients.

Thanks for sharing such detail! 🙏

Uppy & Tus look great. I started down the same path but eventually opted for a simpler solution based on @jaydenseric's graphql-multipart-request-spec / apollo-upload-server / apollo-upload-client.

I put together an example app in case anyone's interested: https://github.com/mattbretl/postgraphile-upload-example

@mattbretl Looks good! :)

I haven't taken an in-depth look yet so I might be mistaken here, but if I remember correctly PostGraphile has a pretty small body size limit on POST requests. I'd imagine that needs to explicitly be bumped up?

For my own case with tus + minio, I have nginx in front of it. So I had to be sure client_max_body_size was set to something large or disabled all together. Also, I wanted to make sure chunked transfers were implemented properly, otherwise memory allocations would become really inefficient. Generally, I prefer to let nginx buffer by default to some preset size and then let node handle it like a block copy.

@xorander00 The apollo-upload-server middleware handles the multipart request, so there's no need to adjust the body size limit of PostGraphile. I tested it with a 1GB file; no issues. You'd probably want set limits in production using nginx (or equivalent) or use the maxFileSize/maxFiles options in apollo-upload-server.

[semi-automated message] We try and keep the open issues to actual issues (bugs, etc); this seems like more of a discussion right now, so I'm closing it but please feel free to keep discussing it below 👍

@mattbretl
Thanks for your example and explanation.
Is it possible to save the file instead of the file path by returning it from the resolver?