Collective activity recognition is an important subtask of human action recognition, where the existing datasets are mostly limited. In this paper, we look into this issue and introduce the ``Collective Sports (C-Sports)'' dataset, which is a novel benchmark dataset for multi-task recognition of both collective activity and sports categories. Various state-of-the-art techniques are evaluated on this dataset, together with multi-task variants which demonstrate increased performance. From the experimental results, we can say that while while sports categories of the videos are inferred accurately, there is still room for improvement for collective activity recognition, especially regarding the generalization ability beyond previously unseen sports categories. In order to evaluate this ability, we introduce a novel evaluation protocol called unseen sports, where the training and test are carried out on disjoint sets of sports categories. The relatively lower recognition performances in this evaluation protocol indicate that the recognition models tend to be influenced by the surrounding context, rather than focusing on the essence of the collective activities. We believe that C-Sports dataset will stir further interest in this research direction.
In C-Sports dataset, there are 11 sports categories and five collective activity categories. Sports categories are American football, basketball, dodgeball, football, handball, hurling, ice hockey, lacrosse, rugby, volleyball and water polo, whereas five collective activities are gathering, dismissal, passing, attack and wandering . Gathering can be defined as people approaching each other for a specific purpose. Dismissal is the separation of people to different directions after gathering. Pass is the act of passing items, such as balls, hockey rubbers, etc., between players, whereas attack is the movement of the team players towards a specific goal. Wandering activity, on the other hand, can be defined as the free movements of team players.
Sample video frames for sports classes are given in Fig. 1, and for collective activity classes in Fig. 2, Each video in the dataset has two labels, one indicating the sports category and the other indicating the class of the ongoing collective activity.