stuarteiffert/RNN-for-Human-Activity-Recognition-using-2D-Pose-Input

Would the model work on videos where there are multiple actions and these are to be temporally detected?

burhanusman opened this issue · 1 comments

Hi,
Nice project.
I was wondering would the model perform well on videos where an agent performs multiple actions throughout the video. Especially when these actions might have an inherent order (like receiving the bill and paying cash).

I am also confused with output. Does the model output one label for a set of 32 frames?
Thanks a lot!

Hi,
Sorry I missed your comment. Yes the output is currently a single label for 32 frames, although the input doesn't necessarily have to be 32 frames long.
To approach your problem, you could have the model constantly running, starting a new sequence every n frames (ie overlapping windows of frames) and outputting a sequence of the most likely actions it is observing. However, in order to include any high level knowledge, such as inherent action order, you would need to use this model as part of a higher order system, such as a Hidden Markov Model or Markov Chain, where the system learns or is told prior how various states, or 'actions' are related