antonior92/automatic-ecg-diagnosis

Is the order of `exam_id ` corresponding to the order of `tracing` in full dataset?

rdyan0053 opened this issue · 3 comments

Hi,
I want to use the full dataset for training, but I only need the data of the AF category.

So I want to use the following code to extract the data of the AF category and convert it into the mat format. But I'm not sure whether exam_id and tracing order correspond.
Here is my code:

import h5py
path_to_file = '../code15/exams_part0.hdf5'
f = h5py.File(path_to_file, 'r')
traces_ids = np.array(f['exam_id'])    # (20001,)
traces = np.array(f['tracings'])   # (20001,4096,12)
df = pd.read_csv('../code15/exams.csv')
for i, (trace_id, trace) in enumerate(zip(traces_ids, traces)):  # I'm not sure whether `exam_id` and `tracing` order correspond
    print(trace_id, trace)
    # use the trace_id to get the label
    res = df[df['exam_id'] == trace_id]
    if res['AF'] == 'TRUE':
        # save af file to mat file

You should reorder df according to traces_ids to make sure they match. See pandas functions reindex/ set_index.

Yeah, the traces_ids corresponds to the exam_id in df.
But I'm not sure if there is a one-to-one correspondence between trace_ids and traces.

I'm sorry I didn't describe the problem clearly. I want to explain again.
When I debug the code:

sub_data_path_to_file = '../code15/exams_part0.hdf5'
f = h5py.File(sub_data_path_to_file, 'r')
traces_ids = np.array(f['exam_id'])    # (20001,)
traces = np.array(f['tracings'])   # (20001,4096,12)

The result of traces_ids and traces are as follows:
image
Here, I am not sure
whether the trace_ids[0] refers to the traces[0],
whether the trace_ids[1] refers to the traces[1],
whether the trace_ids[2] refers to the traces[2]
......

I know there is a file exams.csv in this dataset. The traces_id is refer to exam_id in the csv file. But I can only find which hdf5 file this record belongs to. For example, I can know a file belong to exams_part0.hdf5 if trace_ids[0]=3158243, But I cannot sure this file refer to which index of the traces, 20001 records dataset.
image

Hi @rdyan0053, thanks for the clarification, now I understand your question..

Here, I am not sure whether the trace_ids[0] refers to the traces[0], whether the trace_ids[1] refers to the traces[1], whether the trace_ids[2] refers to the traces[2]

Yes. Exactly

The traces_id is refer to exam_id in the csv file

yes you can match trace_id with exam_id