UCF-ARG Dataset
UCF-ARG (University of Central Florida-Aerial camera, Rooftop camera and Ground camera) Dataset is a Multiview Human Action dataset. UCF-ARG consists of 10 actions performed by 12 actors recorded from a ground camera, a rooftop camera at a height of 100 feet, and an aerial camera mounted onto the payload platform of a 13’ Kingfisher Aerostat helium balloon as illustrated in the figure. The 10 actions are Boxing, Carrying, Clapping, Digging, Jogging, Open-Close Trunk, Running, Throwing, Walking and Waving. Except for Open-Close Trunk, all the other actions are performed 4 times by each actor in different directions. Open-Close Trunk is performed only 3 times, i.e. on 3 cars parked in different directions. The actions are captured using a high-definition camcorder (Sanyo Xacti FH1A camera) 1920 X 1080 at 60fps (frames per second).
Actions | Number of actors | Number of instances per each actor | Total videos per action in each camera | Videos | Total videos per action in all 3 cameras | ||
---|---|---|---|---|---|---|---|
Aerial Camera | Ground Camera | Rooftop Camera | |||||
Boxing | 12 | 4 | 48 | yes | yes | yes | 144 |
Carrying | 12 | 4 | 48 | yes | yes | yes | 144 |
Clapping | 12 | 4 | 48 | yes | yes | yes | 144 |
Digging | 12 | 4 | 48 | yes | yes | yes | 144 |
Jogging | 12 | 4 | 48 | yes | yes | yes | 144 |
Open-close trunk | 12 | 4 | 48 | yes | yes | yes | 144 |
Running | 12 | 4 | 48 | yes | yes | yes | 144 |
Throwing | 12 | 4 | 48 | yes | yes | yes | 144 |
Walking | 12 | 4 | 48 | yes | yes | yes | 144 |
Waving | 12 | 4 | 48 | yes | yes | yes | 144 |
Download all actions from aerial camera - resolution 960 x 540 and 30 fps (2.65 GB)
Download all actions from rooftop camera - resolution 960 x 540 and 30 fps (2.17 GB)
Download all actions from ground camera - resolution 960 x 540 and 30 fps (2.24 GB)
UCF-ARG Evaluation Set
Evaluation set has approximately 3 minutes of video captured using aerial, rooftop and ground cameras. At any given instance the number of actors in the camera view can vary from 4 to 8 and the actors are free to performing any of the 10 actions and can change the action being performed at any time. The sequences from aerial camera are annotated using the VIPER format for evaluation. Note: Please note that the evaluation set videos are annotated at 1920x1080 resolution and 60 fps. The annotations might have action that dont belong to the dataset like gesturing, standing, picking-up, tennis swing and also combination of actions like throwing while walking and more. Please inform us of any serious annotation errors. We are aware of some missing annotations. Videos in "mpg" format are provided which can be used to visualize the annotations in VIPER. For questions regarding this dataset, please contact Kishore Reddy.
Sequence number | Video length | Maximum number actors | Videos | VIPER annotation | ||||
---|---|---|---|---|---|---|---|---|
Aerial Camera | Ground Camera | Rooftop Camera | Aerial Camera | Ground Camera | Rooftop Camera | |||
Sequence 1 | 2:05 min | 8 | yes | yes | yes | yes | no | no |
Sequence 2 | 0:48 min | 7 | yes | yes | yes | yes | no | no |
Download both the sequences for evaluation set (resolution 1920 x 1080 and 60 fps) ~500MB
The details of our aerial platform setup can be found at [1].
References:
[1] http://spie.org/x41092.xml?ArticleID=x41092
Screenshots of walking action from the three cameras
Screenshots of different actions in all the three cameras