ethanbass/chromConverter

Inconsistent scaling in 'Agilent' `.ch` files

Closed this issue · 21 comments

Hi Ethan,

Firstly, chromConverter is awesome! Thanks for all your work on this - its brilliant to be able to read these proprietary files into R. I've had a slight issue with importing my .ch files, in that the scaling for the intensity doesn't seem to be correct. Opening in Agilent MassHunter gives intensities around 2E 4, and the unscaled values appear to be something around 6.66 (using your code for read_chemstation_ch and bypassing the data transformation line data <- data * scaling_factor + intercept). I've attached a link to a repository with the .ch file along with the two data frames showing scaled and unscaled intensities (can't attach .ch directly here). The usual location of the scaling factor in a 179 .ch file (which this is) doesn't appear to be correct for this .ch? I believe it is usually 0x127c, but in this .ch file from our instrument is doesn't appear to be so?

I must admit to being a bit out of my depth with interpreting this though! I'm not all too confident navigating it in hex editors etc. For what its worth, the relative intensities are all correct (ie. its the correct chromatogram - peaks all look as they should), but not too sure where the scaling factor actually is.

I don't really need to do much analysis for this all being told - it was just to input it into R so I can apply themes to the chromatogram so it is in keeping with other plots I am making. So I could just scale the intensities from 0 to 100 and make it relative. But if you were able to offer any guidance about how to get the scaling correct then I'd be really grateful

Cheers!
Tom

Attachments:
https://github.com/TomWarburtonIA/agilent_ch

Hmm, I reran again in a restarted RStudio environment, removing the data scaling (data <- data instead of data <- data * scaling_factor + intensity) and it seems to now give the correct intensities (must have been some conflicting values/variables in my environment, as I was running each line of your the function consecutively to track it rather than using the read_chemstation_ch function itself). Ran a couple other FID .ch files through just in case and it seems for our instrument the values are unscaled. The original I attached earlier was a blank sample, and wasn't sure if a scaling factor was something dynamically included if the intensities weren't large. Hope that all makes sense.

But all sorted now! Feel free to keep the .ch if it helps at all, I'll leave the repository up.

Hi Tom,

Thanks for checking in about this and providing the test files. I suspect this may be the same issue addressed in #22 -- it seems that MassHunter ignores the scaling factor for some ChemStation files, but when I opened the file in ChemStation it displays the scaled values.

In terms of actual solutions, I could add a "scale" argument to the "read_chemstation_ch" function to toggle the scaling if that would be helpful? I think I would keep the scaled ChemStation values as the default since the data is generated in ChemStation.

Best,
Ethan

Hi Ethan,

Thanks for getting back to me this quickly. Apologies for bringing up something thats already been discussed!

A scale toggle would be really helpful. For the work I'm going through at the moment I just saved a local version without the data scaling operation, but yeah having it as a parameter directly in the function call would be great. Interesting that MassHunter and ChemStation like this, don't really see the reason for it to be honest.

Thanks again! :)

Cheers,
Tom

No worries! And yes, it is rather perplexing. I will add the toggle to the next update probably in the next couple of days. Will update you here

After taking a closer look at your file I think you’re right that’s something is off with the scaling factor in your file. The values shouldn’t be on the scale of 10^-300. I’m going to put your file in chemstation and see how it looks in there.

Yeah its odd because it seems to be the standard location for scaling factors in literally any other 179 .ch file I've looked at other than those from our lab. As I say though the actual values themself aren't scaled in this file. Now I'm curious what it is at the 0x127c location for our files!

These .ch files and .ms files are automatically synced with our integration software, which isn't anything to do with Agilent - could be the reason ours isn't scaled? Although I'm really clutching at straws!

So I tried to open your file in my version of ChemStation (Rev. B.04.02 SP1 [212]) ) and it's just showing a blank plot, so that doesn't really resolve anything, but kind of interesting. Maybe it is applying the scaling factor and getting confused because the values are too small.

Yeah maybe? I've just tried opening in OpenChrom and it gives the 'unscaled' values, ie ~2E 4 (for both importing as .xy and .ocb).

image

Just out of curiosity I opened up the attached FID file from #22 in OpenChrom too and it also gave the 'unscaled' values, ie what it is showing in MassHunter.

image

Do you have access to other 179 .ch files? I could try opening them in OpenChrom too to see which values it imports. Could it be there is something within the .ch file which informs whether there is scaling or not?

Ya, it's definitely possible... There are still some values in the headers that I don't know what they are.

I have a small collection of 179 files that I've accumulated from various sources. You're welcome to take a look! (https://cornell.box.com/s/yfsiaav9puyug5y39dqekealtujd8gfu). Let me know if you can't access them through the Box link for some reason.

Awesome, cheers! I'll have to take a look tomorrow, it's nearly 10pm here in the UK haha.

Thanks again! 😊

I added a scale argument in the new version (v0.6.3) to turn off the scaling for chemstation CH files

So I opened up the FID1A.ch file from 8byte > mustang > rainbow_yellow.D into OpenChrom, and it produced this chromatogram:

image

This file offers the same scaling factor of ~0.00013 as in #22 , so is scaled accordingly in read_chemstation_ch. I also opened it up in our MassHunter Qualitative program, and that gave the same intensities:

image

So I have "Asterix". I guess one possibility is that the file from "Mustang" Chemstation are unscaled and my ChemStation is incorrectly interpreting them by adding the scaling factor. You don't by any chance have access to a Mustang Chemstation box you could check on do you?

Ah I'm afraid not - I've tried to hunt around in the MassHunter acquisition software but can't find anything about data scaling in the file outputs. Tbh I didn't know there was a version Asterix - I only knew about Mustang (GCMS) and Leonardo (GC). If you have a .ch for a sample you know is scaled using the scaling factor in the .ch file, could you send it over? I can try run it in OpenChrom, and if you have the .D file for it then I can try open it in MassHunter Qualitative (that opens MS from a .D first and then extracts non-MS chromatograms so needs the full .D file).

Hi Tom,
I didn't know that the names corresponded to different detectors like that. Is this documented somewhere? As far as I can tell, my Asterix ChemStation has a GC "device" (unless the GC device is actually Mustang or Leonardo (never heard of this last one before?!). The reason I think my ChemStation is Asterix is because it says Asterix in the files it generates. Do you know if these code names are also reflected somewhere else where it could be looked up more directly?

I just checked out the intensities of all the ChemStation test files again (I can't open the Open Lab files). My version of ChemStation displays both of the Mustang files with the scaled intensities. Whether that's correct or not I can't say for sure. I noticed that the Asterix files are actually scaled incorrectly by chromConverter compared to how they appear in ChemStation, however the unscaled version is also wrong. Maybe the scaling factor has a different encoding in these files...?

Unfortunately, I already shared with you everything that I have as far as 179 files go. I don't have the full .D folders for most of those test files.

I've attached screenshots of the incorrectly scaled chromatograms from the Asterix files.

nhadler_FID1A

NV_FID1A

NV_FID2B

NV_TCD3C

Ethan

Tbh it was only from one of the guys at the lab that told me about Leonardo - I think its only for GC acquisition, but I'm really not too sure. All of the files from our lab seem to be Mustang.

But, found some possibly cool stuff! If you extract the zip I've attached, and load the FID1A.ch into read_chemstation_ch without scaling, you get the dataframe which I've attached below ('extracted_RT_and_unscaled_intensities.csv'). However, if you go into the
20240419_00177.D folder, and then into AcqData, and load both FID1.cd and FID1.cg into a hex editor (I use HxD), and make sure the byte order is little endian. First go to FID.cd and offset 0x9E, you find that after the intensity unicode characters, you get a number 1. And then read FID.cg into a hex editor, you can literally scroll through all of the intensities if you stay on the 4th byte of the row (changing to 8 bytes per row helps as you can just press the down arrow and go through them all). So I think that the .ch file is probably a composite of the stuff in AcqData. Obviously I can't speak for it all as I only know our files, but its possible the scaling for our stuff is held in the FID[number].cd file. I checked for the rainbow_yellow.D too and they have the same number 1 at offset 0x9E in their FID1.cd file. Interesting stuff!

20240419_00177.D.zip
extracted_RT_and_unscaled_intensities.csv

Yeah our data is acquired through MassHunter. Won't know without seeing more of these data files, but I think this is probably the cause of the scaling issue when using a MassHunter-acquired chromatogram. I'm not sure if there is a way to discern it specifically from the .ch file, so perhaps its best left down to the user to determine if the data has been scaled or not if using read_chemstation_ch - if there isn't anything specific in the .ch file, then the only way to do a lookup I can think of off the top of my head would probably be to load in the .D data folder and check if there is an AcqData folder with an FID[number].cd file, and then do an intensity lookup from there using the above offset? Although I'm not 100% that the value at offset 0x9E in the .cd file is actually a scaling value - just seems a bit coincidental otherwise.

Ya, I'm also not 100% convinced though it certainly seems like a good hypothesis!

I think for now I will just leave it up to the user to toggle the scale factor and maybe add a note to the documentation that CH files generated by masshunter may be unscaled.

Thanks for looking into this! (and certainly happy to keep the conversation going if you make any further discoveries).

Yeah probably for the best to be left up to the user - I think its fair to assume a certain level of understanding of their .ch files, and if the scaled values seem to be inaccurate then its just a simple logical to change in the code when calling read_chemstation_ch.

No problem! It wouldn't be nearly as difficult if Agilent was at least consistent with data acquisition and documentation methods, but maybe thats the point!

Yes, it would certainly be nice!! I will close this for now, but feel free to reopen if anything comes up in connection with this!
All best,
Ethan