dankelley/dod

add ability to search NCEI database

Closed this issue · 6 comments

I've just spend about 2 hours trying to search for some data. Why must this be so hard? I ran across some things that looked useful (e.g. ref 1) but that interface is really quite confusing, and when I finally got to a spot where I could request data -- which did not give a fixed URL, but send an email with a temporary URL, making it useless for reproducible work -- I got a file that was in a format I've never seen, and that did not match the documentation. It was just a string of numbers and space. But I would not make much of that because there were so many choices to make in the UI that maybe I got what I asked for. (It might be that format that ODV uses??)

Then I found ref 2, which cannot be viewed in Safari, but is OK in Firefox. It is very nice because

  1. it has a map you can click to get data in Marsden squares
  2. clicking on that map gives URLs, which we can likely reverse engineer. See for example ref 3.

Plan. I will be exploring the data for a while, but if things look promising, I may return here and write code so dod can download such data. I don't see any further UI to e.g. select by year or whatever, but frankly, my dear, I don't give a damn because in 10 minutes anybody can write code to do such things, and 10 minutes is a lot faster than the 2 hours I've spent so far today trying to find some bloody data.

PS. @richardsc might be interested in these links.

References

  1. https://www.ncei.noaa.gov/products/world-ocean-database
  2. https://www.ncei.noaa.gov/access/world-ocean-database/datawodgeo.html
  3. https://www.ncei.noaa.gov/data/oceans/woa/WOD/GEOGRAPHIC/CTD/OBS/CTDO7306.gz This is high-resolution CTD data in the box from 60 to 70W and 30 to 40N. The first 3 letters designate the sampler, the "O" means at obdserved levels (I think) and the number, 7306, is a Marsden Square code.

Um, maybe I ought to curb my enthusiasm a bit. Below is a snapshot of an editor looking at the start of the file I downloaded for that box 7306. It looks to be in the same format as the other files I've received. I guess I'll need to dig into that format, or find another source.

Yup, a 2 million line file of numbers and some letters. It kind of looks as though a line starting with C is identifying a cruise, or a station in a cruise.

To test that, I did

➜  ctd-data grep "^C" CTDO7306|head
C58847173322249US42073197211 655312667664329817674-699983422240 3110111511041211
C58278573322276US42073197211 7553167014423501674-699667420830 311011151104120111
C44195811500731GB499011973 3 86642201674423235452-603931020 31191115110412911151
C44239811500732GB499011973 3166642143334423163452-636331030 31101115110412011151
C59922073237291US493831973 3172210055330050674-698160425580 31141115110412011151
C57262173322820US430811973 330-664359900674-679830418350 31101115110412011151104
C59745073237389US493931973 4 9331160664304150674-667917425780 311011151104120111
C55898473322984US5267361973 4284421500664304930452-6699415270 311011151104120111
C55934073323158US5114891973 5104421400664301830563-66917415350 31101115110412011
C59796273237844US491641973 6 733119055330150674-672330425910 3110111511041201115

and I see "US" in some lines, and "GB" in others. So those look like country codes. But then we get into more and more stuff and there is no point in guessing. I'll spend a few minutes searching for documentation on the format. If there is no documentation, then it might still be worth adding something to dod because maybe some user will know how to read the data. (I imagine odv can read it, but I don't like odv.)

Screenshot 2024-02-24 at 8 24 36 AM

I see at https://www.nodc.noaa.gov/OC5/WOD/wod_programs.html that they provide some programs to read the data. I tried

  • wodC.c -- cannot compile with gcc
  • wodtodepthmatrix.c -- cannot compile with gcc
  • wodASC.f -- cannot compile with gfortran

Lots of jumping around on websites here. I am looking for at station 61 , located at 36 deg 40.03 min N, 70 deg 49.49 mi W. At https://joa_old.cchdo.io/data/reid/Atlantic/entire I searched down for something in that lon-lat range, and got to item

274. | A1982EVO.TXT | (Endeavor) | JOA: natl.1982.EV.70W.joa
-- | -- | -- | --
  | A section of 16 stations from 35.003 degrees North to 40.001 degrees North, along 70W. All have temperature, salinity, oxygen. Most have phosphate, silicate. None have nitrate. All stations have full depth casts.

274. 	[A1982EVO.TXT](https://joa_old.cchdo.io/data_files/reid/nodc_sd2/A1982EVO.TXT) 	(Endeavor) 	JOA: [natl.1982.EV.70W.joa](https://joa_old.cchdo.io/data_files/reid/joa/natl.1982.EV.70W.joa)
	A section of 16 stations from 35.003 degrees North to 40.001 degrees North, along 70W. All have temperature, salinity, oxygen. Most have phosphate, silicate. None have nitrate. All stations have full depth casts.

That looks promising. I clicked the TXT link (https://joa_old.cchdo.io/data_files/reid/nodc_sd2/A1982EVO.TXT) and got a text file. Searching down, I see

0 3253080005819  730760603N36400W0705964820823   EV    042304252      00154252 1
00002199------088     61                           9    9    02000020          2
00015  266293 361453           904672               9    9    9   9   9   9    3
00102  213813 366743           904672               9    9    9   9   9   9    3
00208  186563 365663           904732               9    9    9   9   9   9    3
00312  180943 365313           904822               9    9    9   9   9   9    3
00399  177863 364963           904672               9    9    9   9   9   9    3
00584  158843 361613           904072               9    9    9   9   9   9    3
00784  114913 354913           903462               9    9    9   9   9   9    3
00945  073003 350843           903912               9    9    9   9   9   9    3
01170  049143 350023           905442               9    9    9   9   9   9    3
01391  043733 349773           905842               9    9    9   9   9   9    3
01574  041573 349953           905912               9    9    9   9   9   9    3
01876  039163 349793           906062               9    9    9   9   9   9    3
02177  036983 349753           906062               9    9    9   9   9   9    3
02476  034113 349663           906092               9    9    9   9   9   9    3
02773  031233 349633           906102               9    9    9   9   9   9    3
03066  028293 349403           906152               9    9    9   9   9   9    3
03450  024463 349173           906212               9    9    9   9   9   9    3
03850  022713 349033           906162               9    9    9   9   9   9    3
04117  022223 348983           906072               9    9    9   9   9   9    3
04252  022203 348983           906092               9    9    9   9   9   9    3

I'm just guessing here, but the lines

00015  266293 361453           904672               9    9    9   9   9   9    3
00102  213813 366743           904672               9    9    9   9   9   9    3

might mean depths 15 and 102m (or decibars, whatever), temperatures 26.6293 and 21.3913 (guessing on a decimal place), salinities 36.1453 and 36.6743.

From other spots in the file, I think the 9 is a missing value code. I don't what the 04672 is. But I'm going to try a plot of the data under my assumption.

PS to anybody reading this ... I know I am going far afield. When/if I sort some things out, I will likely make a blog posting about this. But, for now, it's a bit easier doing copy/paste into this spot.

I'm going to write some code now because I may see how to at least get some trial data. I'll return to this issue early in the week.

A video on the code-writing process is at https://youtu.be/zLPF9hvIGDk

This is almost a year old and I've not been interested enough to actually accomplish the goal (as stated in the subject line) so I'll close this as not-interesting-enough-to-do.