Rich feature hierarchies for accurate object detection and semantic segmentation

R-CNNs : combine region proposals with CNNs

Imgur

we focused on two problems: localizing objects with a deep network and training a high-capacity model with only a small quantity of annotated detection data.

Pedestrian detection with unsupervised multi-stage feature
learning. (fine-tune, unsupervised pre-train)

supervised pre-training on a large auxiliary dataset (ILSVRC), followed by domain-specific fine-tuning on a small dataset (PASCAL), is an effective paradigm for learning high-capacity CNNs when data is scarce.

Modules:

  • category-independent region proposals(selective search)
  • CNNs
  • linear SVM

intersection-over-union(IoU)

Training:

1) supervised pre-train

2) finetune on specific categories. (IoU > 0.5 as positive)

We start SGD at
a learning rate of 0.001 (1/10th of the initial pre-training
rate),

3) extract features from CNNs, Iou < 0.3 as negatives , ->linear SVM

Imgur

Feature

Histograms of sparse codes for object detection. In CVPR, 2013
DPM