agusgun/FakeImageDetector

using the dataset

DenisSouth opened this issue · 6 comments

I downloaded this dataset https://www.kaggle.com/sophatvathana/casia-dataset
but it has no any description, and some strange folder tree

├───CASIA1
│ ├───Au
│ ├───ela
│ └───Sp
├───CASIA2
│ ├───Au
│ └───Tp
└───__MACOSX
==├───CASIA1
==│ ├───Au
==│ └───Sp
==└───CASIA2
====├───Au
====└───Tp

which one should i use for train, which for test? which one is original pic which one is modified?
also i know the csv format

file_name,1 or 0 (fake or real image)
example for real image:
'datasets/train/real/Au_ani_00001.jpg',0

but i have no idea which folder should i use for source...

I appreciate for your great work, and I want repeat it by myself :- )

=========================================
so. i made this

I upload zip to google drive
unzip it to '/content/gdrive/My Drive/casia_dataset/
in google colab i generated csv by following code

is it right?

import os
path_orig = '/content/gdrive/My Drive/casia_dataset/CASIA2/Au/'
path_modif = '/content/gdrive/My Drive/casia_dataset/CASIA2/Tp/'

folder_orig = os.listdir()
folder_modif = os.listdir()

strings = []

for file in os.listdir(path_orig):
  try:
    if file.endswith('jpg'):
      if int(os.stat(path_orig + file).st_size) > 10000:
        line =  path_orig + file  + ',1\n'
        strings.append(line)
  except:
    print(path_orig+file)

for file in os.listdir(path_modif):
    try:
      if file.endswith('jpg'):
         if int(os.stat(path_modif + file).st_size) > 10000:
            line =  path_modif + file + ',0\n'
            strings.append(line)
    except:
      print(path_modif+file)

for line in strings:
      with open('/content/gdrive/My Drive/casia_dataset/dataset.csv', 'a') as f:
         f.write(line)

Yup, I think that is correct.

For the datasets, I think Au stands for Authentic meanwhile Tp stands for Tampered. Hope this will help.

If you already solved this issue, please close it :). Thank you very much.

What are the images with Sp??

What are the images with Sp??

Au is Authentic pics
Tp is Tampered pics

make CSV for train

import os
path_orig = 'casia/CASIA2/Au/' #Authentic 
path_modif = 'casia/CASIA2/Tp/' #Tampered

folder_orig = os.listdir()
folder_modif = os.listdir()

strings = []

for file in os.listdir(path_orig):
    if file.endswith('jpg'):
      if int(os.stat(path_orig + file).st_size) > 10000:
        line =  path_orig + file  + ',1\n'
        strings.append(line)

for file in os.listdir(path_modif):
      if file.endswith('jpg'):
         if int(os.stat(path_modif + file).st_size) > 10000:
            line =  path_modif + file + ',0\n'
            strings.append(line)

for line in strings:
      with open('casia/dataset.csv', 'a') as f:
         f.write(line)

@DenisSouth What are the images with Sp ?? what kind of images are they?

@DenisSouth What are the images with Sp ?? what kind of images are they?

https://www.kaggle.com/sophatvathana/casia-dataset

image

it is modified jpg image