suyashkumar/dicom

Read/write of SL tag

Closed this issue ยท 12 comments

Dear all, great work. I have an application using this library to alter a few UIDs. When I load the altered dicom instance using pydicom, pydicom erred with this message:

pydicom.errors.BytesLengthException: Expected total bytes to be an even multiple of bytes per value. Instead received b'e\x01' with length 2 and struct format 'l' which corresponds to bytes per value of 4. This occurred while trying to parse (0019, 1026) according to VR 'SL'. To replace this error with a warning set pydicom.config.convert_wrong_length_to_UN = True.

Here is the tag before and after:
(0019,1026) SL 357 # 4, 1 DegreesOfAzimuth

(0019,1026) SL # 2, 0 DegreesOfAzimuth

It looks like the tag is altered and I can confirm we are not programmatically changing (0019,1026)

Any idea what the issue is?

Thanks

Hi there! At first glance, I wonder if this has to do with this tag being a private tag and the dataset being in an implicit transfer syntax. In this situation, when the dicom parser reads your DICOM and encounters this private tag, it doesn't know what VR to read it in as (because it's a private tag we don't have in our dictionary), so it uses a fallback. I actually have a wip branch to handle this fallback explicitly and correctly, where unknown VRs are defaulted to be read as undefined length SQ (see #220 (comment)).

Let me take a look back at that branch and see if we can get that merged and test it with a unit test to simulate your case.

So by default I don't think the above will address your issue, because it only applies to UN tags with an undefined length VL.

In any case, I'll see if I can take a closer look at this later this week to see what the best way to handle unknown private tags like this might be. Ideally what should happen is we should blindly read the value bytes (maybe as OW) and write them back on the other end if we're reading & writing in implicit mode. One caveat with that is that if the user makes a change such that the write back out is done in explicit mode, we may end up writing the data back out with an UN VR (not that we'd know the right VR anyway, unless the user modified the Element before write back out). Will think on this and prototype something.

EDIT: the default should be read as a string list in cases like yours. As for why the value is removed in your case I'm not sure, nor can I reproduce it with a test trying to simulate it (writing out an element like this and reading it back in).

Here's a branch to take a quick look, where I tried to repo one side of this. I created an element with UN VR with a stringList value type, which is what should be read in by the parser for an unknown tag. Things get written out and read back in as expected. Now I'll need to try to other side of this--that is, what are things read in as for your situation. You can see the next test case which attempts to emulate this by writing out an SL tag, which is later read in (and printed) by the test. The printed value looks like

[
  Tag: (0019,1027)
  Tag Name: 
  VR: VRStringList
  VR Raw: UN
  VL: 4
  Value: [d]
]

Which, given how the reader reads things back in by defaulting to strings for Unknown VR, seems correct to me (that the reader is functioning as we expect).

So in other words, I'm not sure how to best reproduce your case. If you're able to provide any more details or logs from the Go dicom parser to see what this element looks like (if there is no PHI or identifiable health information and you have the license to post) then that would be helpful! If there are any other aspects of your modification program that are relevant, and any WriteOptions you're passing, that would be helpful too.

In this case it looks like the length is indeed defined for the element even though its VR is unknown.
Wouldn't it make more sense to treat a UN element with a known length as a VRBytes value type, rather than a VRStringList Value Type?

From the DICOM spec:

If the Value Field has an Explicit Length then the Value Length Field shall contain a value equal to the length (in bytes) of the Value Field

Sorry if I wasn't clear last night--the default choice of stringlist or bytes aside (stringlist as a default was a legacy choice from a reference implementation, though it's more or less the same as bytes with some caveats, and maybe should be revisited), I can't reproduce this behavior of data being removed in the tag based on tests in the branch above. The example you show shows the VL changing from 4->2 as well, which is puzzling.

I'm guessing you're writing out in an implicit transfer syntax, so VR won't be written out, only Tag, VL, and the serialized value. As far as I can see in my toy examples, Tag, VL, and the serialized data appear to be written out correctly, with no VL changes like you see.

That's why I wanted to see if there was any more relevant context to help repro this case. I presume you're encountering no errors on your Write() call either? Are there any other details of your program or dicom that might be relevant?

No, no errors on the Write() call.

I think you did reproduce the issue with the second test in your branch. It gets back to the issue I mentioned with treating unknown, defined-length VRs as string lists.

To reproduce:

  1. add the following debug statement here and here

if t.String() == "(0019,1027)" { fmt.Printf("string data: %s, raw bytes: %+v\n", str, []byte(str)) }

  1. run make build on the branch you made
  2. you will see that the "raw bytes" slice gets shrunk because of the string treatment, even though the string value looks the same

Ah, I see what's going on--thanks! While the VL itself is not changing (the ValueLength is still written out as 4), trailing zeros (as seen in the string interpretation) are trimmed. I think moving to handling these as bytes (OW OB perhaps) as you suggest and I alluded to earlier in #231 (comment) is probably the best way to go. I will also say, perhaps the string implementation should be modified so it could roundtrip perfectly (that is, if at write stage the VL on the Element is 4, but the content is 1, we should write padded zeros, and look into reversing anything else that needs reversing if possible...)--but regardless, I think treating unknown tag Values as bytes is simple and honestly probably more useful for someone who wants to decode it using some custom logic.

Thanks for raising the issue and helping take a look!

Let me send a PR shortly to take a look at this!

quickly put together #232 which I think should address this issue. Will kick the tires some more before merging, lmk if any thoughts!

Thanks!

Thanks again for raising this issue (please re-open if needed)! Are y'all able to help test these new changes against your original use case? I can also cut a release version to include this change as well, if helpful!

It seems to have worked for the original use case, thanks!

Out of curiosity, why did you decide to interpret the unknown values as OW instead of OB?

Also, this doesn't apply to the use case we've discussed here, but I'm not sure this fix would work for a UN element with no length defined?

Per this comment it sounds like that case calls for parsing the element as a sequence

Glad to hear it worked for the original use case!

To your second comment--that's correct, the changes for UN element with undefined length are on this WIP branch: https://github.com/suyashkumar/dicom/tree/s/read-un-as-sq need clean it up, write some tests, etc. Hope to get that in soon!