n0obcoder/NIH-Chest-X-Rays-Multi-Label-Image-Classification-In-Pytorch

ValueError: a must be non-empty

ljk1072911239 opened this issue · 13 comments

When I run the code,why exits the error as below?How can I solve it?Thanks!
File "/home/a504/linjinke/NIH-Chest-X-Rays-Multi-Label-Image-Classification-In-Pytorch-master/datasets.py", line 111, in choose_the_indices
for i in tqdm(list(np.random.choice(range(length),length, replace = False))):
File "mtrand.pyx", line 1126, in mtrand.RandomState.choice
ValueError: a must be non-empty

HI @ljk1072911239 ! :)
Thanks for reaching out to me. I would like you to check once, if the variable length is non-empty. If that is non-empty, try printing
np.random.choice(range(length),length, replace = False)
and kindly share the output with me.

I think there is some path mismatch of the dataset. Have you downloaded the 42 GB dataset ? If yes pls try printing self.df.shape in class XRaysTrainDataset(Dataset) after line 24 in datasets.py

Pls let me know if this solves your problem
Thanks :)

ok. try this. In lin e72, datasets.py, there is a variable named train_val_list.
make sure that this list is not empty. if this is empty, you need to further debug the get_train_val_df function defined in line 69, datasets.py

Pls let me know the outcomes once you have tried out these things :)

@ljk1072911239 pls let me know if you have been able to remove the bug, so that i could close the issue
Thanks :)

sorry.I cannot figure it out.I only rewrite the code:
def get_train_val_list(self):
# f = open(os.path.join('data', 'NIH Chest X-rays', 'train_val_list.txt'), 'r')
to:
f = open(os.path.join('./data/sample/', 'sample_labels.csv'), 'r')

I donot know why it ouputs is:
Overwriting stage to 1, as the model training is being done from scratch
TRAINING THE MODEL FROM SCRATCH

data/sample/sample_labels.csv found: True
self.df.shape: (5606, 2)

train_val_df.pickle: loaded
self.train_val_df.shape: (0, 0)

Sampling the huuuge training dataset
Traceback (most recent call last):
File "/home/a504/linjinke/NIH-Chest-X-Rays-Multi-Label-Image-Classification-In-Pytorch-master/main.py", line 68, in
XRayTrain_dataset = XRaysTrainDataset(data_dir, transform = config.transform)
File "/home/a504/linjinke/NIH-Chest-X-Rays-Multi-Label-Image-Classification-In-Pytorch-master/datasets.py", line 47, in init
self.the_chosen, self.all_classes, self.all_classes_dict = self.choose_the_indices()
File "/home/a504/linjinke/NIH-Chest-X-Rays-Multi-Label-Image-Classification-In-Pytorch-master/datasets.py", line 111, in choose_the_indices
for i in tqdm(list(np.random.choice(range(length),length, replace = False))):
File "mtrand.pyx", line 1126, in mtrand.RandomState.choice
ValueError: a must be non-empty

@ljk1072911239 see the problem is that you are somehow not able to read the csv properly. thats why the shape of self.train_val_df.shape is (0, 0).

make sure that the path that you are providing is correct. you can do that by using os.path.exists function
Try this...
os.path.exists( os.path.join('./data/sample/', 'sample_labels.csv') )

if everything is correct, then True should be returned

It is Ture.
And I run these code:
f = open(os.path.join('./data/sample/', 'sample_labels.csv'), 'r')
print('f', f)
print(os.path.exists(os.path.join('./data/sample/', 'sample_labels.csv')))
train_val_list = str.split(f.read(), '\n')
print( train_val_list)

The output is:
f <_io.TextIOWrapper name='./data/sample/sample_labels.csv' mode='r' encoding='UTF-8'>
True
['Image Index,Finding Labels,Follow-up #,Patient ID,Patient Age,Patient Gender,View Position,OriginalImageWidth,OriginalImageHeight,OriginalImagePixelSpacing_x,OriginalImagePixelSpacing_y', '00000013_005.png,Emphysema|Infiltration|Pleural_Thickening|Pneumothorax,005,00000013,060Y,M,AP,3056,2544,0.139,0.139', '00000013_026.png,Cardiomegaly|Emphysema,026,00000013,057Y,M,AP,2500,2048,0.168,0.168', '00000017_001.png,No Finding,001,00000017,077Y,M,AP,2500,2048,0.168,0.168', '00000030_001.png,Atelectasis,001,00000030,079Y,M,PA,2992,2991,0.143,0.143',

and so on .

Process finished with exit code 0

pls print
os.path.join(config.pkl_dir_path, config.train_val_df_pkl_path)
and delete the pickle file at the printed path...
And try running the code again. One thing is for sure that there is some issue with reading the csv. That's why the self.train_val_df.shape is (0, 0)

Also train_val_list is supposed to contain the paths of the images like ['00000013_026.png' , '00000017_001.png', ....]

Your train_val_list doesnt seem to be working fine. look into that pls

print(os.path.join(config.pkl_dir_path, config.train_val_df_pkl_path))
The result is "pickles/train_val_df.pickle"

I know where is the problem.
The code:
def get_train_val_list(self):
# f = open(os.path.join('data', 'NIH Chest X-rays', 'train_val_list.txt'), 'r')
f = open(os.path.join('./data/sample/', 'sample_labels.csv'), 'r')
train_val_list = str.split(f.read(), '\n')
return train_val_list
Its output is :
['Image Index,Finding Labels,Follow-up #,Patient ID,Patient Age,Patient Gender,View Position,OriginalImageWidth,OriginalImageHeight,OriginalImagePixelSpacing_x,OriginalImagePixelSpacing_y', '00000013_005.png,Emphysema|Infiltration|Pleural_Thickening|Pneumothorax,005,00000013,060Y,M,AP,3056,2544,0.139,0.139', .....

But as you said we need its output is like:
['00000013_026.png' , '00000017_001.png', ....]

So, I change the code like:
def get_train_val_list(self):
with open(os.path.join('./data/sample/', 'sample_labels.csv'), 'r') as f:
reader = csv.reader(f)
train_val_list = [row[0] for row in reader] # get the first colume
del train_val_list[0] # delete the first element:'Image Index'
return train_val_list

Cause the issue maybe is your train_val_list.txt file only contains the images, but my csv file contains other information in each line.

Now,it can train,you can close the issue.Thanks a lot.

@ljk1072911239
glad to know that your model is finally getting trained ! :D let me know if you have any other issues. and dont forget to post about the score that you manage to get or any improvements to the model that you might make.
Thanks a lot for reaching out me :)