Is the order of `exam_id ` corresponding to the order of `tracing` in full dataset?
rdyan0053 opened this issue · 3 comments
Hi,
I want to use the full dataset for training, but I only need the data of the AF category.
So I want to use the following code to extract the data of the AF category and convert it into the mat
format. But I'm not sure whether exam_id
and tracing
order correspond.
Here is my code:
import h5py
path_to_file = '../code15/exams_part0.hdf5'
f = h5py.File(path_to_file, 'r')
traces_ids = np.array(f['exam_id']) # (20001,)
traces = np.array(f['tracings']) # (20001,4096,12)
df = pd.read_csv('../code15/exams.csv')
for i, (trace_id, trace) in enumerate(zip(traces_ids, traces)): # I'm not sure whether `exam_id` and `tracing` order correspond
print(trace_id, trace)
# use the trace_id to get the label
res = df[df['exam_id'] == trace_id]
if res['AF'] == 'TRUE':
# save af file to mat file
You should reorder df
according to traces_ids
to make sure they match. See pandas functions reindex/ set_index.
Yeah, the traces_ids
corresponds to the exam_id
in df
.
But I'm not sure if there is a one-to-one correspondence between trace_ids
and traces
.
I'm sorry I didn't describe the problem clearly. I want to explain again.
When I debug the code:
sub_data_path_to_file = '../code15/exams_part0.hdf5'
f = h5py.File(sub_data_path_to_file, 'r')
traces_ids = np.array(f['exam_id']) # (20001,)
traces = np.array(f['tracings']) # (20001,4096,12)
The result of traces_ids
and traces
are as follows:
Here, I am not sure
whether the trace_ids[0] refers to the traces[0],
whether the trace_ids[1] refers to the traces[1],
whether the trace_ids[2] refers to the traces[2]
......
I know there is a file exams.csv
in this dataset. The traces_id
is refer to exam_id
in the csv file. But I can only find which hdf5 file this record belongs to. For example, I can know a file belong to exams_part0.hdf5
if trace_ids[0]=3158243
, But I cannot sure this file refer to which index of the traces
, 20001 records dataset.
Hi @rdyan0053, thanks for the clarification, now I understand your question..
Here, I am not sure whether the trace_ids[0] refers to the traces[0], whether the trace_ids[1] refers to the traces[1], whether the trace_ids[2] refers to the traces[2]
Yes. Exactly
The traces_id is refer to exam_id in the csv file
yes you can match trace_id with exam_id