using the dataset
DenisSouth opened this issue · 6 comments
I downloaded this dataset https://www.kaggle.com/sophatvathana/casia-dataset
but it has no any description, and some strange folder tree
├───CASIA1
│ ├───Au
│ ├───ela
│ └───Sp
├───CASIA2
│ ├───Au
│ └───Tp
└───__MACOSX
==├───CASIA1
==│ ├───Au
==│ └───Sp
==└───CASIA2
====├───Au
====└───Tp
which one should i use for train, which for test? which one is original pic which one is modified?
also i know the csv format
file_name,1 or 0 (fake or real image)
example for real image:
'datasets/train/real/Au_ani_00001.jpg',0
but i have no idea which folder should i use for source...
I appreciate for your great work, and I want repeat it by myself :- )
=========================================
so. i made this
I upload zip to google drive
unzip it to '/content/gdrive/My Drive/casia_dataset/
in google colab i generated csv by following code
is it right?
import os
path_orig = '/content/gdrive/My Drive/casia_dataset/CASIA2/Au/'
path_modif = '/content/gdrive/My Drive/casia_dataset/CASIA2/Tp/'
folder_orig = os.listdir()
folder_modif = os.listdir()
strings = []
for file in os.listdir(path_orig):
try:
if file.endswith('jpg'):
if int(os.stat(path_orig + file).st_size) > 10000:
line = path_orig + file + ',1\n'
strings.append(line)
except:
print(path_orig+file)
for file in os.listdir(path_modif):
try:
if file.endswith('jpg'):
if int(os.stat(path_modif + file).st_size) > 10000:
line = path_modif + file + ',0\n'
strings.append(line)
except:
print(path_modif+file)
for line in strings:
with open('/content/gdrive/My Drive/casia_dataset/dataset.csv', 'a') as f:
f.write(line)
Yup, I think that is correct.
For the datasets, I think Au stands for Authentic meanwhile Tp stands for Tampered. Hope this will help.
If you already solved this issue, please close it :). Thank you very much.
What are the images with Sp
??
What are the images with
Sp
??
Au is Authentic pics
Tp is Tampered pics
make CSV for train
import os
path_orig = 'casia/CASIA2/Au/' #Authentic
path_modif = 'casia/CASIA2/Tp/' #Tampered
folder_orig = os.listdir()
folder_modif = os.listdir()
strings = []
for file in os.listdir(path_orig):
if file.endswith('jpg'):
if int(os.stat(path_orig + file).st_size) > 10000:
line = path_orig + file + ',1\n'
strings.append(line)
for file in os.listdir(path_modif):
if file.endswith('jpg'):
if int(os.stat(path_modif + file).st_size) > 10000:
line = path_modif + file + ',0\n'
strings.append(line)
for line in strings:
with open('casia/dataset.csv', 'a') as f:
f.write(line)
@DenisSouth What are the images with Sp
?? what kind of images are they?
@DenisSouth What are the images with
Sp
?? what kind of images are they?
https://www.kaggle.com/sophatvathana/casia-dataset
it is modified jpg image