๐ŸŽจ ์ผ์ƒ์— ์˜ˆ์ˆ ์„ ๋”ํ•˜๋‹ค, 3D Exhibition

๐Ÿฅ‡DSL 23-2 ๋ชจ๋ธ๋ง ํ”„๋กœ์ ํŠธ ์šฐ์ˆ˜ํŒ€ ์„ ์ •๐Ÿฅ‡

ํ”„๋กœ์ ํŠธ ๊ฒฐ๊ณผ๋ฌผ๋กœ ๊ตฌ์„ฑํ•œ ๋ฉ”ํƒ€๋ฒ„์Šค ์ „์‹œํšŒ์— ๋†€๋Ÿฌ์˜ค์„ธ์š”!
โ†ช๏ธ๐Ÿ–ผ๏ธ ๋ฉ”ํƒ€๋ฒ„์Šค ์ „์‹œํšŒ

๐Ÿ›๏ธ Curator Information

์—ฐ์„ธ๋Œ€ํ•™๊ต ๋ฐ์ดํ„ฐ์‚ฌ์ด์–ธ์Šค ํ•™ํšŒ Data Science Lab 9๊ธฐ & 10๊ธฐ,
Team CV_B,

๊น€์„œ์ง„ ๋ฐ•์„œ์—ฐ ์œคํ˜•์ง„ ์ž„์„ ๋ฏผ

๐ŸŽž๏ธ ๋ฐœํ‘œ ์˜์ƒ โ†’ ์ถ”ํ›„ ์ถ”๊ฐ€ ์˜ˆ์ •
๐Ÿ“š ๋ฐœํ‘œ ์ž๋ฃŒ

๐Ÿ”Š Project Introduction

Data Science Lab 23-2 ๋ชจ๋ธ๋ง ํ”„๋กœ์ ํŠธ์—์„œ ์„ ๋ณด์ธ CV_B ํŠน๋ณ„์ „ ์— ์˜ค์‹  ์—ฌ๋Ÿฌ๋ถ„์„ ํ™˜์˜ํ•ฉ๋‹ˆ๋‹ค!

์ด๋ฒˆ ์ „์‹œ๊ฐ€ ํŠน๋ณ„ํ•œ ์ด์œ ๋Š”
์ฒซ ๋ฒˆ์งธ, ์ผ์ƒ์ ์ธ ํ’๊ฒฝ์„ ํ™”๊ฐ€๋“ค์˜ ํ™”ํ’์œผ๋กœ ์žฌํ•ด์„ํ–ˆ๋‹ค๋Š” ๊ฒƒ,
๋‘ ๋ฒˆ์งธ, ์ด๋ฅผ 3D๋กœ ๊ตฌํ˜„ํ•˜์—ฌ ์ž…์ฒด๊ฐ์„ ๋ถˆ์–ด๋„ฃ์—ˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

์ „์‹œ๋Š” ์ž‘ํ’ˆ ์ƒ์„ฑ์˜ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋”ฐ๋ผ ๋‹ค์Œ์˜ ๋‘ ์ฝ”๋„ˆ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

  1. AdaIN์„ ํ†ตํ•œ ๊ณ ์ „๊ณผ์˜ ๋งŒ๋‚จ
  2. NeRF๋ฅผ ํ†ตํ•ด ์‚ด์•„๋‚œ ์ž…์ฒด

์•„๋ž˜์— ๊ด€๋žŒ๊ฐ ์—ฌ๋Ÿฌ๋ถ„์˜ ์ดํ•ด๋ฅผ ๋„์šธ ํ•ด์„ค์„ ์ค€๋น„ํ–ˆ์œผ๋‹ˆ, ํ•จ๊ป˜ ๋‘˜๋Ÿฌ๋ณด์‹œ๊ฒ ์Šต๋‹ˆ๋‹ค!

โœ’๏ธ 1. Overall Pipeline

  • Style Transfer ๋ชจ๋ธ AdaIN(Adaptive Instance Normalization)
  • 3D view๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ NeRF(Neural Radiance Fields)

๋ฅผ ํ™œ์šฉํ•ด, ๊ธฐ์กด NeRF ๋ฐ์ดํ„ฐ์…‹์— ํ™”ํ’์„ ์ž…ํžˆ๊ณ  3D๋กœ ๊ตฌํ˜„ํ•˜๋Š” ํ•˜๋‚˜์˜ ํŒŒ์ดํ”„๋ผ์ธ์„ ์™„์„ฑํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿ—‚๏ธ 2.Dataset

  1. NeRF LLFF(Local Light Field Fusion)

    • ์ผ์ • ๊ฑฐ๋ฆฌ์—์„œ front-facing
    • ๊ณ ํ•ด์ƒ๋„
    • 8๊ฐœ์˜ scene
  2. Best Artworks of All Time

๐Ÿ–ฅ๏ธ 3. Model

1) AdaIN

style transfer ๋ชจ๋ธ์ธ AdaIN ์€ Adaptive Instance Normalization์˜ ์•ฝ์ž๋กœ,
adaptive๋ผ๋Š” ์ด๋ฆ„์ฒ˜๋Ÿผ ๋ฏธ๋ฆฌ ํ•™์Šตํ•˜์ง€ ์•Š์€ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ๋„ transfer๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.

๊ธฐ์กด CIN(Conditional Instance Normalization)์˜ learned parameter $\beta, \gamma$๋ฅผ
์Šคํƒ€์ผ ์ด๋ฏธ์ง€์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์ด๋ผ๋Š” ํ†ต๊ณ„๋Ÿ‰์œผ๋กœ ๋Œ€์ฒดํ•˜์—ฌ

  1. ํ•™์Šต์‹œ์ผœ์•ผ ํ•  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ค„์ด๋ฉด์„œ
  2. ์ž„์˜์˜ ์Šคํƒ€์ผ์— ๋Œ€ํ•ด์„œ๋„ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ–ˆ์Šต๋‹ˆ๋‹ค.

๐ŸŽฏQ. style transfer๋ฅผ ์œ„ํ•ด AdaIN ์„ ์„ ํƒํ•œ ์ด์œ ?

โ†’ ์ด๋ฒˆ ํ”„๋กœ์ ํŠธ๋ฅผ ํ†ตํ•œ ์ตœ์ข…์ ์ธ ๋ชฉํ‘œ๋Š” style transfer์™€ NeRF ๋ชจ๋ธ์„ ํ•ฉ์ณ end-to-end๋กœ
์Šคํƒ€์ผ๋ง๋œ ์ด๋ฏธ์ง€์˜ 3D view๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  1. NeRF์˜ ๊ตฌ๋™ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๋Š” ๋งŒํผ style transfer ๊ณผ์ •์„ ์ตœ๋Œ€ํ•œ ๊ฐ„์†Œํ™”
    (์ž„์˜์˜ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ๋„ ๊ตฌ๋™ ๊ฐ€๋Šฅํ•˜๋ฉด์„œ GAN์ฒ˜๋Ÿผ heavyํ•˜์ง€ ์•Š์€ AdaIN ์„ ํƒ)
  2. Style-GAN ๋“ฑ style transfer GAN ๋‚ด๋ถ€์—์„œ ํ•ต์‹ฌ์ ์ธ ์Šคํƒ€์ผ๋ง ์ž„๋ฌด ๋‹ด๋‹นํ•˜๋Š” ๊ฒƒ๋„ ์‚ฌ์‹ค์€ AdaIN

2) NeRF

NeRF ๋Š” 9๊ฐœ์˜ FC layer๋กœ ๊ตฌ์„ฑ๋œ(MLP) 3D view synthesis ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
view synthesis๋ž€ ๋ช‡ ๊ฐœ์˜ ์‹œ์ ์—์„œ ์ดฌ์˜๋œ ๋ถˆ์—ฐ์†์  ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ ์•Œ์ง€ ๋ชปํ•˜๋Š” ์‹œ์ ์—์„œ์˜ ๋ชจ์Šต์„ ์ถ”์ธกํ•˜์—ฌ
์ด๋ฏธ์ง€๊ฐ€ ์—ฐ์†์ ์œผ๋กœ ๊ตฌ์„ฑ๋  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ธฐ์ˆ ์ž…๋‹ˆ๋‹ค.

์ผ๋ถ€ ์‹œ์ ์—์„œ์˜ 2D ์ด๋ฏธ์ง€๋งŒ ์ฃผ์–ด์ ธ๋„, ๋‚˜๋จธ์ง€ ์‹œ์ ์—์„œ์˜ ์ด๋ฏธ์ง€๋“ค์„ ์ƒ์„ฑํ•ด๋‚ผ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—
์ด๋“ค์„ ๋ชจ๋‘ ํ•ฉ์น˜๋ฉด ๋ฌผ์ฒด๋ฅผ ์ž…์ฒด์ ์œผ๋กœ ๋ณด๋Š” ๊ฒƒ๊ณผ ๊ฐ™์€ ํšจ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

  • input: ๋ฌผ์ฒด์˜ ์œ„์น˜ ์ •๋ณด (x, y, z), ๋ฐฉํ–ฅ ์ •๋ณด ($\theta, \phi$)
    In our model...,
    • ์œ„์น˜ ์ •๋ณด (x, y, z): stylized ์ด๋ฏธ์ง€์˜ ๊ฐ’
    • ๋ฐฉํ–ฅ ์ •๋ณด ($\theta, \phi$): ๊ธฐ์กด NeRF ๋ฐ์ดํ„ฐ์…‹(LLFF)์˜ ๊ฐ’
  • output: (์ƒˆ๋กญ๊ฒŒ ์ƒ์„ฑํ•˜๊ณ ํ”ˆ view์—์„œ์˜) ๋ฌผ์ฒด์˜ RGB๊ฐ’, density๊ฐ’(ํˆฌ๋ช…๋„์˜ ์—ญ์ˆ˜)
  1. ์ƒˆ๋กญ๊ฒŒ ์ƒ์„ฑํ•˜๊ณ ํ”ˆ view๋กœ๋ถ€ํ„ฐ ๋ฌผ์ฒด๋ฅผ ํ–ฅํ•ด ray ๋ฐœ์‚ฌ
  2. ray ์ƒ์˜ ์—ฌ๋Ÿฌ ํฌ์ธํŠธ sampling โ†’ ๊ฐ ํฌ์ธํŠธ์—์„œ์˜ output ์˜ˆ์ธก
    • ์ฒ˜์Œ 8๊ฐœ FC layer: ์œ„์น˜ ์ •๋ณด (x, y, z)๋งŒ ํ†ต๊ณผ์‹œ์ผœ density ์˜ˆ์ธก
    • ๋งˆ์ง€๋ง‰ 1๊ฐœ FC layer: ๋ฐฉํ–ฅ ์ •๋ณด ($\theta, \phi$)๋ฅผ ํ•ฉ์ณ RGB ์˜ˆ์ธก

(1) Positional Encoding
5์ฐจ์›์˜ ์ €์ฐจ์› input โ†’ ๊ณ ์ฐจ์›์œผ๋กœ ๋งคํ•‘, high-frequency ์ •๋ณด ๋ณด์กด

(2) Volume Rendering
๋ชจ๋ธ์˜ output์ธ ํ•œ ray์ƒ์˜ ์—ฌ๋Ÿฌ sample ํฌ์ธํŠธ์—์„œ์˜ RGB, density๊ฐ’์„ ํ•˜๋‚˜์˜ pixel๋กœ ๋ณ‘ํ•ฉ

๐Ÿ–ผ๏ธ 4. Result

1) Final Output

.mp4๋ฅผ .gif๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ ์—…๋กœ๋“œํ•˜๋Š” ๊ณผ์ •์—์„œ ๋ถ€๋“์ดํ•˜๊ฒŒ ํ™”์งˆ ์ €ํ•˜๊ฐ€ ๋ฐœ์ƒํ•˜์—ฌ
์›๋ณธ์€ ๋ฉ”ํƒ€๋ฒ„์Šค ์ „์‹œํšŒ์—์„œ ํ™•์ธ ๋ถ€ํƒ๋“œ๋ฆฝ๋‹ˆ๋‹ค!

  • Successful Output

input output
  • Failure

input output

2) Limitations and Future Works

  • Limitations

    1. ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค์น˜ ๋ฌธ์ œ๋กœ Colmap ์‚ฌ์šฉ ์‹คํŒจ
      colmap์„ ํ†ตํ•ด ์นด๋ฉ”๋ผ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ง์ ‘ ์–ป๋Š” ๋ฐ ์‹คํŒจํ•˜์—ฌ ๊ธฐ์กด ๋ฐ์ดํ„ฐ์…‹์˜ positional encoding ๋ณ€์ˆ˜ ํ™œ์šฉ

    2. ๋ชจ๋ธ size
      Tensorflow ๊ธฐ๋ฐ˜ ๊ณต์‹ ์ฝ”๋“œ์˜ ๋ชจ๋ธ ์‚ฌ์ด์ฆˆ ๋ฐ ๋ฐ์ดํ„ฐ์…‹์ด ์ปค์„œ PyTorch ๋ฒ„์ „์œผ๋กœ ์žฌ์„ค๊ณ„, ์„œ๋ฒ„ ํ™œ์šฉ ํ•™์Šต

  • Future Works

    1. end-to-end ๋ชจ๋ธ๋กœ ์™„์„ฑํ•˜๊ธฐ ์œ„ํ•œ ์ž‘์—… ์ง„ํ–‰ ์ค‘
    2. ๋ณด๋‹ค ๋ฐœ์ „๋œ ๋ฒ„์ „์˜ NeRF ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ ๊ณ ๋„ํ™”
    3. ์ง์ ‘ ์ดฌ์˜ํ•œ ์ด๋ฏธ์ง€๋กœ๋„ ํ•™์Šต ์˜ˆ์ •

๐Ÿƒ How to Run?

(end-to-end๋กœ ์™„์„ฑํ•˜๋Š” ์ž‘์—… ๋งˆ๋ฌด๋ฆฌ ์ดํ›„ ์ถ”๊ฐ€ ์˜ˆ์ •)

python run.py --config configs.txt

๐Ÿ“‚ File description

  • main (์‹ค์ œ ๊ตฌ๋™ํ•˜๋Š” ํŒŒ์ผ)
    • main.py
  • model (๋ชจ๋ธ ๋‚ด๋ถ€ ๊ตฌ์กฐ ํŒŒ์ผ)
    • encoder.py
    • decoder.py
  • data (์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ or ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ํŒŒ์ผ) (์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค! ๊ฐ ํŒ€์˜ ํ”„๋กœ์ ํŠธ ํŒŒ์ผ ๊ตฌ์กฐ์— ๋”ฐ๋ผ ์ž์œ ๋กญ๊ฒŒ ์™„์„ฑํ•ด์ฃผ์„ธ์š”)