robertoostenveld/bids-tools

CTF's <blabla>.infods file contains information about the date and time of data acquisition

Opened this issue · 5 comments

I came across this while removing the date and timestamps from the *.res4 files of data that Kristijan and I are going to share with the rest of the world.
I was surprised that ft_read_header still managed to contain a date that referred to the date of acquisition. hdr.orig.res4 is nicely stripped of course after running ctf_remove_datetime, but hdr.orig.infods still has this info available.

To reproduce:

cd /project/3011020.13/bids/sub-V1020/meg/sub-V1020_task-visual_meg.ds
x=ft_read_header('sub-V1020_task-visual_meg.ds');
x.orig.infods(41)

alternatively, you can also read the 'sub-V1020_task-visual_meg.infods text file...

I would assume that the .infods files are usually not removed from the data before sharing, so I would be game to write some voodoo shell script that can butcher the .infods file.

should we continue with these scripts, or use https://www.fieldtriptoolbox.org/faq/how_can_i_anonymize_a_ctf_dataset/#using-matlab

I recently used the latter for a Parkinson MEG dataset.

It is this dataset https://data.donders.ru.nl/collections/di/dccn/DSC_3018009.04_857. I checked one infods and that looked like

WS1_���
_PATIENT_INFO����WS1_���_PATIENT_UID���
��������_PATIENT_NAME_FIRST���
��������_PATIENT_NAME_MIDDLE���
��������_PATIENT_NAME_LAST���
��������_PATIENT_ID���
����NOT FOR CLINICAL USE����_PATIENT_BIRTHDATE���
�������_PATIENT_SEX������������_PATIENT_PACS_NAME���
��������_PATIENT_PACS_UID���
��������_PATIENT_INSTITUTE���
����NOT FOR CLINICAL USE����EndOfParameters����_PROCEDURE_INFO����WS1_����_PROCEDURE_VERSION������������_PROCEDURE_UID���
��������_PROCEDURE_ACCESSIONNUMBER���
��������_PROCEDURE_TITLE���
��������_PROCEDURE_SITE���
��������_PROCEDURE_STATUS������������_PROCEDURE_TYPE������������_PROCEDURE_STARTEDDATETIME���
��������_PROCEDURE_CLOSEDDATETIME���
��������_PROCEDURE_COMMENTS���
��� writeCTFds  NOT FOR CLINICAL USE����_PROCEDURE_LOCATION���
��������_PROCEDURE_ISINDB������������EndOfParameters���
_DATASET_INFO����WS1_����_DATASET_VERSION�����������_DATASET_UID���
��������_DATASET_PATIENTUID���
��������_DATASET_PROCEDUREUID���
��������_DATASET_STATUS���
��� writeCTFds  NOT FOR CLINICAL USE����_DATASET_RPFILE���
��������_DATASET_PROCSTEPTITLE���
����run title  NOT FOR CLINICAL USE����_DATASET_PROCSTEPPROTOCOL���
��������_DATASET_PROCSTEPDESCRIPTION���
��������_DATASET_COLLECTIONDATETIME���
��������_DATASET_COLLECTIONSOFTWARE���
���
writeCTFds����_DATASET_CREATORDATETIME���
����20210409140701����_DATASET_CREATORSOFTWARE���
���
writeCTFds����_DATASET_KEYWORDS���
��������_DATASET_COMMENTS���
����NOT FOR CLINICAL USE����_DATASET_OPERATORNAME���
��������_DATASET_LASTMODIFIEDDATETIME���
����20210409140701����_DATASET_NOMINALHCPOSITIONS������������_DATASET_COEFSFILENAME���
��������_DATASET_SENSORSFILENAME���
��������_DATASET_SYSTEM���
��������_DATASET_SYSTEMTYPE���
��������_DATASET_LOWERBANDWIDTH����������������_DATASET_UPPERBANDWIDTH����@r¿���������_DATASET_ISINDB������������_DATASET_HZ_MODE������������_DATASET_FITERRORTOLERANCE����?©ôôôôôö����_DATASET_MOTIONTOLERANCE����?‚7ÊO§Â$����_DATASET_MAXHEADMOTION����?“7ÊO§Â$����_DATASET_MAXHEADMOTIONTRIAL�����S����_DATASET_MAXHEADMOTIONCOIL���
����3����_DATASET_CROSSTALKENABLED������������EndOfParameters����EndOfParameters

I think that using the referenced strategy is better. Yet, for my current use case it feels a bit as an overkill, because it requires a full copy of the data to be created (i.e. no in place update of the descriptors seems possible, unless the code is hacked).

Also, from what I read in writeCTFds.m it seems as if the hz.ds/hz2.ds are not included in the output. (although I am not sure whether this would be a problem).

Also, it seems that the *.acq files also may contain run_date and run_time.

writeCTFds uses writeCPersist to write the acq and infods files. This is a separate function (i.e. no subfunction from writeCTFds), so it should be possible to overwrite these metadata files, without the need of rewriting the whole data directory.

OK, I have written a prototype function (inspired by the function that @robertoostenveld referred to above) that rewrites the files in the *.ds dir that contain dates and times, i.e. the res4, acq and infods. This without the need of creating a full copy of the binary data as well. Currently, my prototype function moves the originals into *.res4_old etc, but if we are sure that it works fine, I think that the originals can be overwritten. Would it be an idea to use this code to refresh the referenced website, and or consider to make this part of the standard bidsification procedure in data2bids?

@KristijanArmeni I will soon do a full sweep of the Sherlock data to scrub it from date and time (and operator :) )