Methods for pose-normalized representations have been proposed, but generally presume bounding box annotations at test time due to the difficulty of object detection.
The recent success of convolutional networks, like [27], on the ImageNet Challenge[23] has inspired further work on applying deep convolutional features to related image classification [14] and detection tasks [21].
A limitation of these methods is their use of weak features (usually HOG [12]).
part-based RCNNs
strongly supervised learning:
Given these part annotations, at training time all objects and each of their parts are initially treated as independent object categories. (one-versus-all linear SVM)
IoU >= 0.7 Positive
IoU <= 0.3 Negative
Thus, whole-object as “root” SVM weights, part SVM weights are learnt.
Geometric constraints
- Box constraints : 保证part在root box 内。
- Geometric constraints : constraints over the layout
Fine-grained categorization
For a new test image, we apply the whole and part detectors with the geometric scoring function to get detected part locations and use the features for prediction. If a particular part i was not detected anywhere in the test image (due to all proposals falling below the part detector’s threshold, set to achieve high recall), we set its features φ(x i ) = 0 (zero vector).