etr/libhttpserver

[BUG] File upload very slow

azertyalex opened this issue ยท 7 comments

Prerequisites

Description

I made a simple http server example which works fine, except when uploading rather "larger" files.
Also get_arg is really slow (request.get_arg("file_name"));
See below for performance I get from Orbit profiler

Steps to Reproduce

  1. Create simple httpserver
  2. Upload "larger" file ( +20MB ) could be more/less

Expected behavior: Takes only a couple of seconds

Actual behavior: Very slow file upload

Reproduces how often: Always

Versions

  • Linux pop-os 5.15.15-76051515-generic #202201160435~1642693824~20.04~97db1bb-Ubuntu SMP Fri Jan 21 14: x86_64 x86_64 x86_64 GNU/Linux

  • version packaged

Additional Information

image

etr commented

Thanks for notifying. Taking a look at your code would help as well, so send it if you can.

There seem to be two different issues in here:

  1. The file upload itself is slow: I think that some of the work happening in this PR (#257) should address that. The issue right now is that the library does quite a lot of work to load the file in memory both in the content and in the args. The PR introduces options to disable this behavior and make this faster.

  2. The get_arg is slow: I believe this might be due to the argument getting being lazy and using the connection arguments when the data is preloaded already by the post_processor. A look at your code would help me understand if (1) can solve this as well or not.

I use

void HttpServer::run() {
  t2 = new std::thread([&] {
    webserver ws = create_webserver(9002);

    UploadResource uploadResource;
    UploadDndResource uploadDndResource;
    ClipboardResource clipboardResource;
    DownloadResource downloadResource;
    WaitResource waitResource;
    ws.register_resource("/upload", &uploadResource);
    ws.register_resource("/uploaddnd", &uploadDndResource);
    ws.register_resource("/uploadpaste", &clipboardResource);
    ws.register_resource("/downloads", &downloadResource);
    ws.register_resource("/wait", &waitResource);
    ws.start(true);
  });
  t2->detach();
};

To spin up the webserver

And this

const std::shared_ptr<http_response> UploadResource::render(
    const http_request& request) {
  auto content = request.get_content();

  std::string uploadId = request.get_arg("uploadId");

  UploadPtr upload = UploadManager::instance().getUploadById(uploadId);
  CefString uniquepath = upload->getUniquePath();

  auto formFiles = request.get_args();
  saveFiles(formFiles, uniquepath);
  upload->finish();

  CefRefPtr<CefFileDialogCallback> callback = upload->getCallback();

  std::vector<CefString> file_paths;
  std::filesystem::path path = std::filesystem::path(uniquepath);
  for (auto item : std::filesystem::directory_iterator(
           std::filesystem::directory_iterator(path))) {
    file_paths.push_back(CefString(item.path()));
  }
  callback->Continue(0, file_paths);  //@todo 0 seems to work all the time
  return std::shared_ptr<http_response>(new http_response(200, "ok"));
}

is my handler

A short comment on the mentioned PR #257

The reason I implemented a new file upload handling was due to the slow upload speed in the previous libhttpserver. Especially I noticed, that the time to upload files did not increase linear (as I would have expected).

I also think that's due to the fact, that the data is stored to memory in chunks, which makes all the string handling (internal reallocating and concatenating) slower and slower the larger the file gets. Additionally it was always stored twice in memory. Once in the content and once in the args.
Additionally this could cause problems on embedded systems if the files are getting bigger but RAM is short.

So (in my opinion) the best solution is to let libhttpserver store the files to disk directly when it processes the data and not stuff it into the memory.

When I look at your provided code, you do lot's of the things on the retrieved arg, that the PR adds as features.
So you might want to look at the example for the file_upload in the PR.
Then you can get rid of the get_arg() completely, and the libhttpserver will already put the file to the file system within a specified directory and with a unique name.

okay, is there anything I can do at this moment to fix my issue or do I wait for #257 to come through?

etr commented

The PR has just been approved, so you might want to try out if the new options are of any help to this case.

I quickly put in the code and everything is blazing fast now ๐Ÿ’ฏ
Thanks guys!

etr commented

Nice to hear; closing this issue then