using the dataset

Question

using the dataset

DenisSouth opened this issue 6 years ago · 6 comments

I downloaded this dataset https://www.kaggle.com/sophatvathana/casia-dataset
but it has no any description, and some strange folder tree

├───CASIA1
│ ├───Au
│ ├───ela
│ └───Sp
├───CASIA2
│ ├───Au
│ └───Tp
└───__MACOSX
==├───CASIA1
==│ ├───Au
==│ └───Sp
==└───CASIA2
====├───Au
====└───Tp

which one should i use for train, which for test? which one is original pic which one is modified?
also i know the csv format

file_name,1 or 0 (fake or real image)
example for real image:
'datasets/train/real/Au_ani_00001.jpg',0

but i have no idea which folder should i use for source...

I appreciate for your great work, and I want repeat it by myself :- )

=========================================
so. i made this

I upload zip to google drive
unzip it to '/content/gdrive/My Drive/casia_dataset/
in google colab i generated csv by following code

is it right?

import os
path_orig = '/content/gdrive/My Drive/casia_dataset/CASIA2/Au/'
path_modif = '/content/gdrive/My Drive/casia_dataset/CASIA2/Tp/'

folder_orig = os.listdir()
folder_modif = os.listdir()

strings = []

for file in os.listdir(path_orig):
  try:
    if file.endswith('jpg'):
      if int(os.stat(path_orig + file).st_size) > 10000:
        line =  path_orig + file  + ',1\n'
        strings.append(line)
  except:
    print(path_orig+file)

for file in os.listdir(path_modif):
    try:
      if file.endswith('jpg'):
         if int(os.stat(path_modif + file).st_size) > 10000:
            line =  path_modif + file + ',0\n'
            strings.append(line)
    except:
      print(path_modif+file)

for line in strings:
      with open('/content/gdrive/My Drive/casia_dataset/dataset.csv', 'a') as f:
         f.write(line)

Answer 1 · 2019-03-08T08:29:17.000Z

Yup, I think that is correct.

For the datasets, I think Au stands for Authentic meanwhile Tp stands for Tampered. Hope this will help.

If you already solved this issue, please close it :). Thank you very much.

Answer 2 · 2019-03-08T16:12:25.000Z

Sure :). I already add the LICENSE too

…

On Fri, Mar 8, 2019 at 3:35 PM DenisSouth ***@***.***> wrote: Yup, I think that is correct. For the datasets, I think Au stands for Authentic meanwhile Tp stands for Tampered. Hope this will help. If you already solved this issue, please close it :). Thank you very much. thanks. may i fork it. change and add MIT license? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AOKcgTrNs30YqOBZkdaXZnCB2dFJHGuBks5vUiDfgaJpZM4bkv22> .

-- - Agus Gunawan 13515143 Sekolah Teknik Elektro dan Informatika Institut Teknologi Bandung

Answer 3 · 2019-12-03T11:44:59.000Z

What are the images with Sp??

Answer 4 · 2019-12-03T13:24:30.000Z

What are the images with Sp??

Au is Authentic pics
Tp is Tampered pics

make CSV for train

import os
path_orig = 'casia/CASIA2/Au/' #Authentic 
path_modif = 'casia/CASIA2/Tp/' #Tampered

folder_orig = os.listdir()
folder_modif = os.listdir()

strings = []

for file in os.listdir(path_orig):
    if file.endswith('jpg'):
      if int(os.stat(path_orig + file).st_size) > 10000:
        line =  path_orig + file  + ',1\n'
        strings.append(line)

for file in os.listdir(path_modif):
      if file.endswith('jpg'):
         if int(os.stat(path_modif + file).st_size) > 10000:
            line =  path_modif + file + ',0\n'
            strings.append(line)

for line in strings:
      with open('casia/dataset.csv', 'a') as f:
         f.write(line)

Answer 5 · 2019-12-03T14:00:13.000Z

@DenisSouth What are the images with Sp ?? what kind of images are they?

Answer 6 · 2019-12-03T14:11:04.000Z

@DenisSouth What are the images with Sp ?? what kind of images are they?

https://www.kaggle.com/sophatvathana/casia-dataset

it is modified jpg image