ex-aws/ex_aws_s3

Streaming by line example skips final chunk

Closed this issue · 0 comments

Environment

  • Elixir & Erlang versions (elixir --version): Erlang/OTP 24 [erts-12.2.1] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1], Elixir 1.13.3 (compiled with Erlang/OTP 24)
  • ExAws version: * ex_aws (Hex package) (mix) locked at 2.3.1 (ex_aws) 90a02490
  • HTTP client version: * hackney (Hex package) (rebar3) locked at 1.18.1 (hackney) a4ecdaff

Current behavior

The example for Streaming by line (https://hexdocs.pm/ex_aws_s3/ExAws.S3.html#download_file/4) covers exactly what I want to do, but its behavior is unexpected to me. The intent of the code is to use Stream.chunk_while/4 to read the file data in chunks but stream it in lines. When I supply a text file as input that looks something like this:

Firstline
Secondline
Thirdline
Fourthline
Fifthline

and I process it using the example plus the following code:

    file_line_count = generate_stream("my-bucket", upload_file_path)
    |> Enum.count()

I get 2. Inspecting the stream output, I see two lines emitted by the stream:

  1. Firstline
  2. Secondline\nThirdline\nFourthline\nFifthline\n

It looks like the split operation only happens once on the final chunk, regardless of how many lines it might contain.

Expected behavior

I expect count() to return 5, and I expect to see each line from the input file represented by a line in the stream.