UCF-ARG Dataset



UCF-ARG (University of Central Florida-Aerial camera, Rooftop camera and Ground camera) Dataset is a Multiview Human Action dataset. UCF-ARG consists of 10 actions performed by 12 actors recorded from a ground camera, a rooftop camera at a height of 100 feet, and an aerial camera mounted onto the payload platform of a 13’ Kingfisher Aerostat helium balloon as illustrated in the figure. The 10 actions are Boxing, Carrying, Clapping, Digging, Jogging, Open-Close Trunk, Running, Throwing, Walking and Waving. Except for Open-Close Trunk, all the other actions are performed 4 times by each actor in different directions. Open-Close Trunk is performed only 3 times, i.e. on 3 cars parked in different directions. The actions are captured using a high-definition camcorder (Sanyo Xacti FH1A camera) 1920 X 1080 at 60fps (frames per second).


Actions Number of actors Number of instances per each actor Total videos per action in each camera Videos Total videos per action in all 3 cameras
Aerial Camera Ground Camera Rooftop Camera
Boxing 12 4 48 yes yes yes 144
Carrying 12 4 48 yes yes yes 144
Clapping 12 4 48 yes yes yes 144
Digging 12 4 48 yes yes yes 144
Jogging 12 4 48 yes yes yes 144
Open-close trunk 12 4 48 yes yes yes 144
Running 12 4 48 yes yes yes 144
Throwing 12 4 48 yes yes yes 144
Walking 12 4 48 yes yes yes 144
Waving 12 4 48 yes yes yes 144



Download all actions from aerial camera - resolution 960 x 540 and 30 fps (2.65 GB)
Download all actions from rooftop camera - resolution 960 x 540 and 30 fps (2.17 GB)
Download all actions from ground camera - resolution 960 x 540 and 30 fps (2.24 GB)

UCF-ARG Evaluation Set



Evaluation set has approximately 3 minutes of video captured using aerial, rooftop and ground cameras. At any given instance the number of actors in the camera view can vary from 4 to 8 and the actors are free to performing any of the 10 actions and can change the action being performed at any time. The sequences from aerial camera are annotated using the VIPER format for evaluation. Note: Please note that the evaluation set videos are annotated at 1920x1080 resolution and 60 fps. The annotations might have action that dont belong to the dataset like gesturing, standing, picking-up, tennis swing and also combination of actions like throwing while walking and more. Please inform us of any serious annotation errors. We are aware of some missing annotations. Videos in "mpg" format are provided which can be used to visualize the annotations in VIPER. For questions regarding this dataset, please contact Kishore Reddy.

Sequence number Video length Maximum number actors Videos VIPER annotation
Aerial Camera Ground Camera Rooftop Camera Aerial Camera Ground Camera Rooftop Camera
Sequence 1 2:05 min 8 yes yes yes yes no no
Sequence 2 0:48 min 7 yes yes yes yes no no

Download both the sequences for evaluation set (resolution 1920 x 1080 and 60 fps) ~500MB


The details of our aerial platform setup can be found at [1].



References:



[1] http://spie.org/x41092.xml?ArticleID=x41092

Screenshots of walking action from the three cameras



Screenshots of different actions in all the three cameras