Part-based R-CNNs for Fine-grained Category Detection

Methods for pose-normalized representations have been proposed, but generally presume bounding box annotations at test time due to the difficulty of object detection.

The recent success of convolutional networks, like [27], on the ImageNet Challenge[23] has inspired further work on applying deep convolutional features to related image classification [14] and detection tasks [21].

A limitation of these methods is their use of weak features (usually HOG [12]).

part-based RCNNs

strongly supervised learning:
Given these part annotations, at training time all objects and each of their parts are initially treated as independent object categories. (one-versus-all linear SVM)

IoU >= 0.7 Positive

IoU <= 0.3 Negative

Thus, whole-object as “root” SVM weights, part SVM weights are learnt.

Geometric constraints

  • Box constraints : 保证part在root box 内。
  • Geometric constraints : constraints over the layout

Fine-grained categorization

For a new test image, we apply the whole and part detectors with the geometric scoring function to get detected part locations and use the features for prediction. If a particular part i was not detected anywhere in the test image (due to all proposals falling below the part detector’s threshold, set to achieve high recall), we set its features φ(x i ) = 0 (zero vector).