chullhwan-song/Reading-Paper

Fashion-Gen: The Generative Fashion Dataset and Challenge

Opened this issue · 1 comments

Abstract

  • 293,008 high definition (1360 x 1360 pixels) fashion images > 데이터셋 소개
    • 이들 이미지들은
      • " item descriptions provided by professional stylists"과 같이 pair정보로 제공.
      • Each item is photographed from a variety of angles.
  • 이 연구는 2가지 baseline results : ProGAN, StackGAN
    • high-resolution image generation
    • 주어진 text description의 조건에 의해 생성된 fashion 이미지
  • 이 연구의 기본은 Fashion 관련 좋은 데이터셋을 공개하고 이게 얼마나 좋은지, 어떤 포멧/정보를 담고 있는지 설명하고, 위에서 말한 기존 두가지 알고리즘을 적용했더니 공개한 데이터셋이 잘 working하면서, 이 셋에 대한 baseline을 제시, 그래서 이를 이용하여 연구자들이 참여(이용)하는 좋은 연구에 기여.

contribution

  • 데이터셋에 대한 세부 통계정보.
  • 존재하는 데이터셋에 대한 비교
  • text to image generation에 대한 소개 > competition criteria & evaluation process
  • ProGAN과 의한 high-resolution image generation 결과
  • StackGAN-v1/ StackGAN-v2에 의한 text-to-image translation 결과

Our Fashion Dataset

  • The dataset consists of 293, 008 images (260, 480 images for training, 32, 528 for validation, 32, 528 for test), which is larger than other available datasets for the task of text to image translation.
    image
    image
    • We provide full HD images photographed under consistent studio conditions. There are no other datasets with comparable resolution and consistent photographing condition.
  • All fashion items are photographed from 1 to 6 different angles depending on the category of the item. To our knowledge, this is the first dataset of this scale consisting of multiple angles of each item.
    image
  • Each product belongs to a main category and a more fine-grained category (i.e: subcategory). There are 48 main categories, and 121 fine-grained categories in the dataset. The name and density of each category is plotted in 2. Table 3 presents the number of images by category and subcategory.
    image
  • Each fashion item is paired with paragraph-length descriptive captions sourced from experts (professional designers). The distribution of the length of descriptions is presented in Figure 4.
    image
  • For each item, we also provide metadata such as stylist recommended matched items, the fashion season, designer and the brand. We also provide the distribution of colors extracted from the text description presented in Figure 3
    image

Our Challenge

  • the task of textto-image synthesis
  • pose or category같은 정보 제공
  • Inception Score 제공 - 카테고리 정보도 있으니, generated image들을 분류한 결과?
    • We provide a framework that enables researchers to easily compare the performance of
      their models with an evaluation metric based on an Inception Score (Salimans et al., 2016).
    • was trained on the training set for classifying the images into the categories presented in Figure 2.
  • 최종 challenge 평가를 위해, 학습된 모델로 부터 test set에 대한 score가 제공된다고 하니..이를 이용하여 평가하는듯.
  • 이런것도 문제가 조금 있으니, provide a human evaluation as we outline below
  • Human Evaluation setup
    • Inception scores는 주어지 이미지와 text정보사이의 상관관계를 고려하지 않음.
    • 그래서 이런 케이스는 humans에 의해 평가.

Experiments with the Dataset

  • Generating high-resolution images using P-GANs (ProGAN을 의미)
    image
  • Text-to-Image synthesis
    • 위에서 언급했듯이 StackGAN-v1/ StackGAN-v2으로 테스트
      image
      image
      image
      image