UCF-ARG Dataset

UCF-ARG (University of Central Florida-Aerial camera, Rooftop camera and Ground camera) Dataset is a Multiview Human Action dataset. UCF-ARG consists of 10 actions performed by 12 actors recorded from a ground camera, a rooftop camera at a height of 100 feet, and an aerial camera mounted onto the payload platform of a 13’ Kingfisher Aerostat helium balloon as illustrated in the figure. The 10 actions are Boxing, Carrying, Clapping, Digging, Jogging, Open-Close Trunk, Running, Throwing, Walking and Waving. Except for Open-Close Trunk, all the other actions are performed 4 times by each actor in different directions. Open-Close Trunk is performed only 3 times, i.e. on 3 cars parked in different directions. The actions are captured using a high-definition camcorder (Sanyo Xacti FH1A camera) 1920 X 1080 at 60fps (frames per second).

Actions	Number of actors	Number of instances per each actor	Total videos per action in each camera	Videos			Total videos per action in all 3 cameras
Actions	Number of actors	Number of instances per each actor	Total videos per action in each camera	Aerial Camera	Ground Camera	Rooftop Camera	Total videos per action in all 3 cameras
Boxing	12	4	48	yes	yes	yes	144
Carrying	12	4	48	yes	yes	yes	144
Clapping	12	4	48	yes	yes	yes	144
Digging	12	4	48	yes	yes	yes	144
Jogging	12	4	48	yes	yes	yes	144
Open-close trunk	12	4	48	yes	yes	yes	144
Running	12	4	48	yes	yes	yes	144
Throwing	12	4	48	yes	yes	yes	144
Walking	12	4	48	yes	yes	yes	144
Waving	12	4	48	yes	yes	yes	144

Download all actions from aerial camera - resolution 960 x 540 and 30 fps (2.65 GB)
Download all actions from rooftop camera - resolution 960 x 540 and 30 fps (2.17 GB)
Download all actions from ground camera - resolution 960 x 540 and 30 fps (2.24 GB)

UCF-ARG Evaluation Set

Evaluation set has approximately 3 minutes of video captured using aerial, rooftop and ground cameras. At any given instance the number of actors in the camera view can vary from 4 to 8 and the actors are free to performing any of the 10 actions and can change the action being performed at any time. The sequences from aerial camera are annotated using the VIPER format for evaluation. Note: Please note that the evaluation set videos are annotated at 1920x1080 resolution and 60 fps. The annotations might have action that dont belong to the dataset like gesturing, standing, picking-up, tennis swing and also combination of actions like throwing while walking and more. Please inform us of any serious annotation errors. We are aware of some missing annotations. Videos in "mpg" format are provided which can be used to visualize the annotations in VIPER. For questions regarding this dataset, please contact Kishore Reddy.

Sequence number	Video length	Maximum number actors	Videos			VIPER annotation
Sequence number	Video length	Maximum number actors	Aerial Camera	Ground Camera	Rooftop Camera	Aerial Camera	Ground Camera	Rooftop Camera
Sequence 1	2:05 min	8	yes	yes	yes	yes	no	no
Sequence 2	0:48 min	7	yes	yes	yes	yes	no	no

Download both the sequences for evaluation set (resolution 1920 x 1080 and 60 fps) ~500MB

The details of our aerial platform setup can be found at [1].

References:

[1] http://spie.org/x41092.xml?ArticleID=x41092

Screenshots of walking action from the three cameras

Screenshots of different actions in all the three cameras