Unexpected Result for software_path/qpl_op_expand/bit width=1
Closed this issue · 6 comments
Hi,
experimenting with QPL, I could observe unexpected behavior (see reproducible example below) with the following settings:
- qpl_op_expand
- input bit width is 1
- single input byte
- mask consists of a single 0
qpl_ow_8
as output format
Regardless of the practicality of this example, I would expect that the 0
mask bit writes a single (uint8_t)0
to the destination buffer for any sufficiently sized input. Furthermore, since a single byte was written, the job->total_out
field should be 1. Instead I observe job->total_out
to be 0.
I do not want to rule out that I am doing something wrong, perhaps I am misunderstanding some semantics.
Minimum reproducible example (based on expand_example.cpp
)
#include <iostream>
#include <vector>
#include <numeric>
#include <stdexcept> // for runtime_error
#include "qpl/qpl.h"
void run(uint32_t input_vector_width) {
// Default to Software Path
qpl_path_t execution_path = qpl_path_software;
// Source and output containers
std::vector<uint8_t> source = {0b0000'0001};
std::vector<uint8_t> destination = {0};
std::vector<uint8_t> reference = {0};
qpl_job *job;
qpl_status status;
uint32_t size = 0;
// Job initialization
status = qpl_get_job_size(execution_path, &size);
if (status != QPL_STS_OK) {
throw std::runtime_error("An error acquired during job size getting.");
}
job = (qpl_job *) std::malloc(size);
status = qpl_init_job(execution_path, job);
if (status != QPL_STS_OK) {
throw std::runtime_error("An error acquired during job initializing.");
}
// Performing an operation
job->next_in_ptr = source.data();
job->available_in = static_cast<uint32_t>(source.size());
job->next_out_ptr = destination.data();
job->available_out = static_cast<uint32_t>(destination.size());
job->op = qpl_op_expand;
job->src1_bit_width = input_vector_width;
job->src2_bit_width = 1;
job->available_src2 = 1;
job->num_input_elements = 1;
job->out_bit_width = qpl_ow_8;
uint8_t mask = 0b0000000'0; // mask is single 0
job->next_src2_ptr = const_cast<uint8_t *>(&mask);
status = qpl_execute_job(job);
if (status != QPL_STS_OK) {
throw std::runtime_error("An error acquired during job execution.");
}
const auto expand_size = job->total_out;
if (expand_size != 1) {
throw std::runtime_error("too few bytes");
}
// Freeing resources
status = qpl_fini_job(job);
if (status != QPL_STS_OK) {
throw std::runtime_error("An error acquired during job finalization.");
}
std::free(job);
// Check if everything was alright
for (size_t i = 0; i < expand_size; i++) {
if (destination[i] != reference[i]) {
throw std::runtime_error("Incorrect value was chosen while operation performing.");
}
}
std::cout << "Expand was performed successfully." << std::endl;
}
auto main(int argc, char** argv) -> int {
run(3); // works
run(2); // works
run(1); // fails at assertion
return 0;
}
Output:
Expand was performed successfully.
Expand was performed successfully.
terminate called after throwing an instance of 'std::runtime_error'
what(): too few bytes
Aborted
I did some rudimentary debugging, but I cannot quite figure out why the total_out
is not being written properly. Here are some observations:
- expand kernel (
qplc_expand_8u
) picked- seems to work fine for both 1 and 2 bits
- debugging showed that 0 is written to intermediate buffer
perform_pack
- (in
sources/middle-layer/analytics/output_stream.cpp:45
) - different pack_index_kernel implementations are picked for 1 and 2 bits
- 2 bits
- resolves to
qplc_pack_bits_nu
->qplc_pack_8u8u
->qplc_copy_8u
- copies over 0 correctly
- previous output stream creation:
- (in
sources/c_api/filter_operations/expand_job.cpp:163
) .nominal(false)
- (in
- resolves to
- 1 bit
- resolves to
qplc_pack_index_8u
- does not advance
dst_ptr
sincesrc_ptr[i] == 0
->bytes_written() == 0
->total_out == 0
- previous output stream creation:
- (in
sources/c_api/filter_operations/expand_job.cpp:163
) .nominal(true)
- I do not get why the nominality of the output stream depends on the input bit width if the intermediate buffer has a bit width of 8
- (in
- resolves to
- (in
I am not familiar with QPL's internal structure, but I suppose a fix would include changing the .nominal
line to something like job_ptr->out_bit_width == qpl_ow_nom
. This should choose the right qplc_copy_8u
implementation and advance the pointer properly (untested).
Hey @Jonas-Heinrich, thanks for the super descriptive issue.
I think that this is the expected behavior of qpl with these parameters. For the qpl_ow_8
(or qpl_ow_16/32
) on a bit vector it invokes output modification where the output is a list of indices of 1's rather than the entire bit vector.
More information can be found in the Output Modification for Nominal Bit Vector Output section of the QPL Documentation
Sorry if the required information was tucked away in a difficult to find location. I'll work on getting it more easily accessible.
Hi @abdelrahim-hentabli, thank you for the reply!
Even though I had read the entire documentation, including the Output Modification for Nominal Bit Vector Output document you linked you linked and even parts of the architecture documentation, I did not quite connect the dots in this way.
I can imagine why it has to be this way (AFAIK perhaps the output modification is a separate hardware "module" afterwards that does not care about semantics of previous operation) and accept that this is the intended behavior.
I tried to analyze where my confusion came from (since this is some time ago), and I want to share some notes in the hopes of informing documentation improvements:
Output Modification
[...]
Modification When Output is Normally a Bit Vector
[...]
Modification when Output is Normally an Array
If the output of a function is normally an array of elements, then the bit width of the output elements is normally the same as the input bit width; i.e., the output is packed.
(source: architecture document pp. 24)
Furthermore:
If you take the output of expand and perform a select operation on it (with the same bit vector as source-2), then you get back the same data as the original source-1. source
After reading the docs, I worked on other QPL-related stuff before turning to the expand operation. Just combining the architecture with the QPL expand operation documentation and how it is explained with the inverse would lead me to believe that the output is an array and therefore packed. (Which is where my behavior expectation in the original question above comes from).
If I review the operations page again, I now realize in hindsight that expand outputs an Array or Bit Vector
(implicitly depending on the bit width). The strict definition of bitvector and array are also involved in this, but they are in yet another document.
Overall the documentation is correct and consistent, it's ultimately the scattering of definitions and their interactions that confused me. Just referring to the standalone documentation of an operation is not enough. I don't want you to change the documentation just to suit me, but perhaps this gave some insight into the thinking of somebody who is very interested in this library :)
BTW, the architecture doc specifies a "Force Array Output Modification Support" flag (p25, p31), which would make my program work as I originally expected -- right? However, I cannot seem to find support for it in QPL.
BTW, the architecture doc specifies a "Force Array Output Modification Support" flag (p25, p31), which would make my program work as I originally expected -- right? However, I cannot seem to find support for it in QPL.
Hi @Jonas-Heinrich, this feature is in the works. I could notify you once it would be implemented.
hi @Jonas-Heinrich Force Array Output Modification is now supported and available in QPL 1.5.0. Please, check out the example at https://github.com/intel/qpl/blob/develop/examples/low-level-api/expand_with_force_array_output_mod_example.cpp and let me know if this resolve your initial request.
Closing as feature was completed and no reply in a long time.