Activity recognition

Visual sensors that incorporate color and depth information, such as the Kinect, allow more accurate automatic action recognition and fuse many emerging applications such as interactive education[3] and smart environments.

Thus, statistical modeling has been the main thrust in this direction in layers, where the recognition at several intermediate levels is conducted and connected.

Furthermore, at the highest level, a major concern is to find out the overall goal or subgoals of an agent from the activity sequences through a mixture of logical and statistical reasoning.

All actions and plans are uniformly referred to as goals, and a recognizer's knowledge is represented by a set of first-order statements, called event hierarchy.

[15] Kautz's general framework for plan recognition has an exponential time complexity in worst case, measured in the size of the input hierarchy.

Probability theory and statistical learning models are more recently applied in activity recognition to reason about actions, plans and goals under uncertainty.

Using sensor data as input, Hodges and Pollack designed machine learning-based systems for identifying individuals as they perform routine daily activities such as making coffee.

[21][22][23] Some of these works infer user transportation modes from readings of radio-frequency identifiers (RFID) and global positioning systems (GPS).

[25][26][27][28] Discriminative models such as Conditional Random Fields (CRF) are also commonly applied and also give good performance in activity recognition.

At each stage of the hierarchy, the most distinctive and descriptive features are learned efficiently through data mining (Apriori rule).

Vision-based activity recognition has found many applications such as human-computer interaction, user interface design, robot learning, and surveillance, among others.

Researchers have attempted a number of methods such as optical flow, Kalman filtering, Hidden Markov models, etc., under different modalities such as single camera, stereo, and infrared.

Sensory information from these depth cameras have been used to generate real-time skeleton model of humans with different body positions.

[39][40] With the recent emergency of deep learning, RGB video based activity recognition has seen rapid development.

[44] Despite remarkable progress of vision-based activity recognition, its usage for most actual visual surveillance applications remains a distant aspiration.

This capability relies not only on acquired knowledge, but also on the aptitude of extracting information relevant to a given context and logical reasoning.

This method entails structuring activities hierarchically, creating a framework that represents connections and interdependencies among various actions.

Techniques such as dynamic Markov Networks, CNN and LSTM are often employed to exploit the semantic correlations between consecutive video frames.

Geometric fine-grained features such as objective bounding boxes and human poses facilitate activity recognition with graph neural network.

When activity recognition is performed indoors and in cities using the widely available Wi-Fi signals and 802.11 access points, there is much noise and uncertainty.

[53] One of the primary thought of Wi-Fi activity recognition is that when the signal goes through the human body during transmission; which causes reflection, diffraction, and scattering.

By calculating the Doppler Shift of the receiving signal, we can figure out the pattern of the movement, thereby further identifying human activity.

The Fresnel zone was initially used to study the interference and diffraction of the light, which is later used to construct the wireless signal transmission model.

By automatically monitoring human activities, home-based rehabilitation can be provided for people suffering from traumatic brain injuries.