Action Recognition from Video Using Deep Learning Models in Python

Action recognition is a computer vision problem in which we have to recognize. Action recognition means recognizing human actions performed in a video clip.

Recognizing human actions from video automatically is very useful for AI and many different applications such as surveillance, video search. It is a trending topic in computer vision. Therefore, creating a model that can take video as an input and recognize all of the actions performed by humans present in that video will be very beneficial. The most popular method for action recognition from the video is using deep learning models. Which are neural network models with lots of layers, they learn important features using gradient descent algorithm. These are iterative algorithm, which reduces the error over multiple iterations. This whole process is automatic, except for creating a labeled dataset.

There are many factors, which are hard to deal with when doing action recognition from videos. Such as occlusion (view of some object is blocked), Intraclass similarity (similarity between 2 or more different classes), and camera motion or camera viewpoint.

Human Actions

Humans perform many different kinds of actions from small actions to some group activity, some of these actions are labeled and others are not mentioned most of the time.
Human actions are of different types such as:
Gestures
Gestures include small body part movements, such as waving an arm or leg.
Actions
Action is a combination of gestures to perform certain actions such as walling, running, dancing, etc.
Human-Object Interaction
Video clips in which human interaction with some objects such as opening the door, moving chairs, etc.
Interaction among 2 people
Video clips in which human interaction with another human such as shaking hands, hugging, fighting, etc.
Group Activities
Video clips in which a group of people combines to do some activities such as playing a football match.

Real-World Application of Action Recognition

Action recognition from the video is in high demand in many different areas such as surveillance, entertainment, content-based video retrieval, Human-Computer interaction, Robotics.
Video Surveillance
In many different parts of the world, many different people have installed surveillance cameras around their homes or businesses, but constant monitoring of these cameras is not an easy task since a human cannot focus on videos all the time. These cameras are normally used as a fact-checking source after an incident has occurred. Having an automatic monitoring system for videos will help to make surveillance easy and improve the efficiency of the surveillance by a lot.
Content-based video retrieval
Many platforms have billions of videos and several thousand videos are uploaded daily on such platforms, but these videos can only be searched through their title, tags, description or the user who upload them. Sometimes these videos are either misleading or their title could be clickbait. If we could only understand what is in the videos, then we could improve the search of these videos based on their content.
Entertainment
Action Recognition can be used in gaming to get input from the user, or we can transfer the whole-body movement of a person into a game, which could be used to play games such as dance or sports. This can make gaming very interactive for many different people.
Human-Computer Interaction
This action recognition can be used as an input for the computer or any computer technology such as robots or Automatic driving cars. Robots can observe us and help us when needed, or robots can detect the intention of a person then act according to it.




Colab Notebook

https://colab.research.google.com/drive/18D1hWvHAonuUm-G3OFEiDQ7B5hR4eqjH?usp=sharing

Deep Learning Introduction

What is deep learning?

Deep learning is new popular field of machine learning, in deep learning you have to use neural networks.

Meaning of deep learning?

The meaning of deep learning is "learning using deep neural networks" here word deep mean you have lots of layers in neural networks.

Example of Deep learning models

  • Convolution neural network, which are used for classification
  • Long short term memory, which are used for natural language processing
  • Auto Encoder
  • Generative Adversarial Networks 
  • many more

Deep Learning Tools

You can use python language to create and train deep learning models, for which there are several different libraries such as keras, pytorch.

In deep learning, Convolution layer is really popular and useful when it comes to images classification. Convolution layer which uses a filter of your defined size to extract features from image, these filter has random in the beginning but later on using gradient decent these filter are updated.

When creating deep learning model you have to use several layers of convolution neural network. normally you can use max pooling layer after convolution layer.

Deep Learning Use Cases

 Deep learning is used in many different cases specially where it is very hard to explain how we do something, such as image recognizing, translation of language, understanding language, bioinformatics.

In these task it is not easy explain how actually it is done, because our brain do most stuff automatically.

Recently Deep Learning is also used to generate image, or do complex task such as playing game, making decision.

Building a CLI-Based People Tracking and Dwell Time Analytics System Using YOLOv8 and DeepSORT

  Introduction Tracking people across video frames and analyzing their behavior (like  dwell time ) is a crucial task for many real-world ap...