2015-07-13

深度学习与像素级标注

semantic image segmentataion with deep con-volutional nets and fully connected crfs

1
2
3

1. There are two technical hurdles in the application of DCNNs to image labeling tasks: signal down-sampling, and spatial ‘insensitivity’ (invariance).
2. The main difference between our model and other state-of-the-art models is the combination of pixel-level CRFs and DCNN-based ‘unary terms’.
3. DCNN score maps can reliably predict the presence and rough position of objects in an image but are less well suited for pin-pointing their exact outline.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks

1	1. Deep Learning in vision applications can find lower dimensional representations for fixed size input images which are useful for classification

Deep Human Parsing with Active Template Regression

1
2
3

Farabet et al. [9] trained a multiscale convolutional network from raw pixels to extract dense features for assigning the label to each pixel. However, multiple complex post-processing methods were required for accurate prediction. The recurrent convolutional neural network [25] was proposed to speed up scene parsing and achieved the state-of-theart performance. Girshick et al. [12] also proposed to classify the candidate regions by CNN for semantic segmentation. All of these approaches use the CNNs as local or semi-local classifiers either over superpixels or region hypotheses.

All images in these three datasets contain standing people in frontal/near-frontal view with good visibilities of all body parts.

semantic image segmentataion with deep con-volutional nets and fully connected crfs

Parsing Natural Scenes and Natural Language with Recursive Neural Networks

Deep Human Parsing with Active Template Regression