Abstract
Popular object category recognition systems currently use an approach that represents images by a bag of visual words. However, these systems can be improved in two ways using the framework of visual bits with optimization. First, instead of representing each image feature by a single visual word, each feature is represented by a sequence of visual bits. Second, instead of separating the processes of codebook generation and classifier training, we unify them into a single framework. We propose a new way to learn visual bits using direct feature selection to avoid the complicated optimization framework. Our results confirm that visual bits outperform the bag of words model on object category recognition.