intel/qpl

Unexpected Result for software_path/qpl_op_expand/bit width=1

Closed this issue · 6 comments

Hi,

experimenting with QPL, I could observe unexpected behavior (see reproducible example below) with the following settings:

  • qpl_op_expand
  • input bit width is 1
  • single input byte
  • mask consists of a single 0
  • qpl_ow_8 as output format

Regardless of the practicality of this example, I would expect that the 0 mask bit writes a single (uint8_t)0 to the destination buffer for any sufficiently sized input. Furthermore, since a single byte was written, the job->total_out field should be 1. Instead I observe job->total_out to be 0.

I do not want to rule out that I am doing something wrong, perhaps I am misunderstanding some semantics.


Minimum reproducible example (based on expand_example.cpp)

#include <iostream>
#include <vector>
#include <numeric>
#include <stdexcept> // for runtime_error

#include "qpl/qpl.h"

void run(uint32_t input_vector_width) {
    // Default to Software Path
    qpl_path_t execution_path = qpl_path_software;

    // Source and output containers
    std::vector<uint8_t> source      = {0b0000'0001};
    std::vector<uint8_t> destination = {0};
    std::vector<uint8_t> reference   = {0};

    qpl_job    *job;
    qpl_status status;
    uint32_t   size                  = 0;

    // Job initialization
    status = qpl_get_job_size(execution_path, &size);
    if (status != QPL_STS_OK) {
        throw std::runtime_error("An error acquired during job size getting.");
    }

    job    = (qpl_job *) std::malloc(size);
    status = qpl_init_job(execution_path, job);
    if (status != QPL_STS_OK) {
        throw std::runtime_error("An error acquired during job initializing.");
    }

    // Performing an operation
    job->next_in_ptr        = source.data();
    job->available_in       = static_cast<uint32_t>(source.size());
    job->next_out_ptr       = destination.data();
    job->available_out      = static_cast<uint32_t>(destination.size());
    job->op                 = qpl_op_expand;
    job->src1_bit_width     = input_vector_width;
    job->src2_bit_width     = 1;
    job->available_src2     = 1;
    job->num_input_elements = 1;
    job->out_bit_width      = qpl_ow_8;
    uint8_t mask            = 0b0000000'0; // mask is single 0
    job->next_src2_ptr      = const_cast<uint8_t *>(&mask);

    status = qpl_execute_job(job);
    if (status != QPL_STS_OK) {
        throw std::runtime_error("An error acquired during job execution.");
    }

    const auto expand_size = job->total_out;
    if (expand_size != 1) {
        throw std::runtime_error("too few bytes");
    }

    // Freeing resources
    status = qpl_fini_job(job);
    if (status != QPL_STS_OK) {
        throw std::runtime_error("An error acquired during job finalization.");
    }

    std::free(job);

    // Check if everything was alright
    for (size_t i = 0; i < expand_size; i++) {
        if (destination[i] != reference[i]) {
            throw std::runtime_error("Incorrect value was chosen while operation performing.");
        }
    }

    std::cout << "Expand was performed successfully." << std::endl;
}

auto main(int argc, char** argv) -> int {
    run(3); // works
    run(2); // works
    run(1); // fails at assertion
    return 0;
}

Output:

Expand was performed successfully.
Expand was performed successfully.
terminate called after throwing an instance of 'std::runtime_error'
  what():  too few bytes
Aborted

I did some rudimentary debugging, but I cannot quite figure out why the total_out is not being written properly. Here are some observations:

  • expand kernel (qplc_expand_8u) picked
    • seems to work fine for both 1 and 2 bits
    • debugging showed that 0 is written to intermediate buffer
  • perform_pack
    • (in sources/middle-layer/analytics/output_stream.cpp:45)
    • different pack_index_kernel implementations are picked for 1 and 2 bits
    • 2 bits
      • resolves to qplc_pack_bits_nu -> qplc_pack_8u8u -> qplc_copy_8u
      • copies over 0 correctly
      • previous output stream creation:
        • (in sources/c_api/filter_operations/expand_job.cpp:163)
        • .nominal(false)
    • 1 bit
      • resolves to qplc_pack_index_8u
      • does not advance dst_ptr since src_ptr[i] == 0 -> bytes_written() == 0 -> total_out == 0
      • previous output stream creation:
        • (in sources/c_api/filter_operations/expand_job.cpp:163)
        • .nominal(true)
        • I do not get why the nominality of the output stream depends on the input bit width if the intermediate buffer has a bit width of 8

I am not familiar with QPL's internal structure, but I suppose a fix would include changing the .nominal line to something like job_ptr->out_bit_width == qpl_ow_nom. This should choose the right qplc_copy_8u implementation and advance the pointer properly (untested).

Hey @Jonas-Heinrich, thanks for the super descriptive issue.

I think that this is the expected behavior of qpl with these parameters. For the qpl_ow_8 (or qpl_ow_16/32) on a bit vector it invokes output modification where the output is a list of indices of 1's rather than the entire bit vector.

More information can be found in the Output Modification for Nominal Bit Vector Output section of the QPL Documentation

Sorry if the required information was tucked away in a difficult to find location. I'll work on getting it more easily accessible.

Hi @abdelrahim-hentabli, thank you for the reply!

Even though I had read the entire documentation, including the Output Modification for Nominal Bit Vector Output document you linked you linked and even parts of the architecture documentation, I did not quite connect the dots in this way.

I can imagine why it has to be this way (AFAIK perhaps the output modification is a separate hardware "module" afterwards that does not care about semantics of previous operation) and accept that this is the intended behavior.


I tried to analyze where my confusion came from (since this is some time ago), and I want to share some notes in the hopes of informing documentation improvements:

Output Modification

[...]

Modification When Output is Normally a Bit Vector

[...]

Modification when Output is Normally an Array

If the output of a function is normally an array of elements, then the bit width of the output elements is normally the same  as the input bit width; i.e., the output is packed.

(source: architecture document pp. 24)

Furthermore:

If you take the output of expand and perform a select operation on it (with the same bit vector as source-2), then you get back the same data as the original source-1. source

After reading the docs, I worked on other QPL-related stuff before turning to the expand operation. Just combining the architecture with the QPL expand operation documentation and how it is explained with the inverse would lead me to believe that the output is an array and therefore packed. (Which is where my behavior expectation in the original question above comes from).

If I review the operations page again, I now realize in hindsight that expand outputs an Array or Bit Vector (implicitly depending on the bit width). The strict definition of bitvector and array are also involved in this, but they are in yet another document.

Overall the documentation is correct and consistent, it's ultimately the scattering of definitions and their interactions that confused me. Just referring to the standalone documentation of an operation is not enough. I don't want you to change the documentation just to suit me, but perhaps this gave some insight into the thinking of somebody who is very interested in this library :)

BTW, the architecture doc specifies a "Force Array Output Modification Support" flag (p25, p31), which would make my program work as I originally expected -- right? However, I cannot seem to find support for it in QPL.

BTW, the architecture doc specifies a "Force Array Output Modification Support" flag (p25, p31), which would make my program work as I originally expected -- right? However, I cannot seem to find support for it in QPL.

Hi @Jonas-Heinrich, this feature is in the works. I could notify you once it would be implemented.

hi @Jonas-Heinrich Force Array Output Modification is now supported and available in QPL 1.5.0. Please, check out the example at https://github.com/intel/qpl/blob/develop/examples/low-level-api/expand_with_force_array_output_mod_example.cpp and let me know if this resolve your initial request.

Closing as feature was completed and no reply in a long time.