[TRN] Temporal Relational Reasoning in Videos

Question

Closed this issue 2 years ago · 1 comments

당시 문제점
- Motion 특징 추출을 optical flow에 의존함 → 시스템 효율을 낮춤
- 3D Conv는 dense frame을 처리하기 때문에 computation이 많이 요구됨
해결 방안 및 특징
- Multi-scale temporal input
- 2D CNN 기반 모델에 쉽게 적용 가능
  - Two-stream(spatial & temporal) 모델에 적용할 경우 성능 더욱 향상
동작 과정
- Input
  - Video clip에서 2, 3, …, N frames를 시간순으로 입력 (𝑁=2~8)
  - 𝐾(=3) relation만 확인 (2/3/4-frames, 3/4/5-frames, …)
- BN-Inception 모델 이용
- Feature map → 2-layer MLP(unit: 256) → 1-layer MLP(unit: class number)
- Global average pooling layer

.