深度学习与像素级标注

  • semantic image segmentataion with deep con-volutional nets and fully connected crfs
1
2
3
1. There are two technical hurdles in the application of DCNNs to image labeling tasks: signal down-sampling, and spatial ‘insensitivity’ (invariance).
2. The main difference between our model and other state-of-the-art models is the combination of pixel-level CRFs and DCNN-based ‘unary terms’.
3. DCNN score maps can reliably predict the presence and rough position of objects in an image but are less well suited for pin-pointing their exact outline.
  • Parsing Natural Scenes and Natural Language with Recursive Neural Networks
1
1. Deep Learning in vision applications can find lower dimensional representations for fixed size input images which are useful for classification
  • Deep Human Parsing with Active Template Regression
1
2
3
Farabet et al. [9] trained a multiscale convolutional network from raw pixels to extract dense features for assigning the label to each pixel. However, multiple complex post-processing methods were required for accurate prediction. The recurrent convolutional neural network [25] was proposed to speed up scene parsing and achieved the state-of-theart performance. Girshick et al. [12] also proposed to classify the candidate regions by CNN for semantic segmentation. All of these approaches use the CNNs as local or semi-local classifiers either over superpixels or region hypotheses.

All images in these three datasets contain standing people in frontal/near-frontal view with good visibilities of all body parts.