Published: 09 Oct 2015 Category: deep_learning


Deep Joint Task Learning for Generic Object Extraction

Highly Efficient Forward and Backward Propagation of Convolutional Neural Networks for Pixelwise Classification

Segmentation from Natural Language Expressions

Semantic Object Parsing with Graph LSTM

Fine Hand Segmentation using Convolutional Neural Networks

Feedback Neural Network for Weakly Supervised Geo-Semantic Segmentation

FusionNet: A deep fully residual convolutional neural network for image segmentation in connectomics

A deep learning model integrating FCNNs and CRFs for brain tumor segmentation

Texture segmentation with Fully Convolutional Networks

Fast LIDAR-based Road Detection Using Convolutional Neural Networks

Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs

Annotating Object Instances with a Polygon-RNN

Efficient Interactive Annotation of Segmentation Datasets with Polygon-RNN++

Semantic Segmentation via Structured Patch Prediction, Context CRF and Guidance CRF

Distantly Supervised Road Segmentation

Ω-Net: Fully Automatic, Multi-View Cardiac MR Detection, Orientation, and Segmentation with Deep Neural Networks

Ω-Net (Omega-Net): Fully Automatic, Multi-View Cardiac MR Detection, Orientation, and Segmentation with Deep Neural Networks

Superpixel clustering with deep features for unsupervised road segmentation

Learning to Segment Human by Watching YouTube

W-Net: A Deep Model for Fully Unsupervised Image Segmentation

End-to-end detection-segmentation network with ROI convolution

A Foreground Inference Network for Video Surveillance Using Multi-View Receptive Field

Piecewise Flat Embedding for Image Segmentation

A Pyramid CNN for Dense-Leaves Segmentation

Capsules for Object Segmentation

Deep Object Co-Segmentation

Semantic Aware Attention Based Deep Object Co-segmentation

Contextual Hourglass Networks for Segmentation and Density Estimation


U-Net: Convolutional Networks for Biomedical Image Segmentation

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

UNet 3+: A Full-Scale Connected UNet for Medical Image Segmentation

DeepUNet: A Deep Fully Convolutional Network for Pixel-level Sea-Land Segmentation

TernausNet: U-Net with VGG11 Encoder Pre-Trained on ImageNet for Image Segmentation

A Probabilistic U-Net for Segmentation of Ambiguous Images

Deep Dual Pyramid Network for Barcode Segmentation using Barcode-30k Database

Deep Smoke Segmentation

Smoothed Dilated Convolutions for Improved Dense Prediction

DASNet: Reducing Pixel-level Annotations for Instance and Semantic Segmentation

Improving Fast Segmentation With Teacher-student Learning

DSNet: An Efficient CNN for Road Scene Segmentation

Line Segment Detection Using Transformers without Edges

Unified Image Segmentation

K-Net: Towards Unified Image Segmentation

Masked-attention Mask Transformer for Universal Image Segmentation

Mask2Former for Video Instance Segmentation

Foreground Object Segmentation

Pixel Objectness

A Deep Convolutional Neural Network for Background Subtraction

Learning Multi-scale Features for Foreground Segmentation

Learning Deep Representations for Semantic Image Parsing: a Comprehensive Overview

Semantic Segmentation

Fully Convolutional Networks for Semantic Segmentation

From Image-level to Pixel-level Labeling with Convolutional Networks

Feedforward semantic segmentation with zoom-out features


Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

Weakly- and Semi-Supervised Learning of a DCNN for Semantic Image Segmentation

DeepLab v2

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

DeepLabv2 (ResNet-101)

DeepLab v3

Rethinking Atrous Convolution for Semantic Image Segmentation


Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation


DeeperLab: Single-Shot Image Parser


Auto-DeepLab: Hierarchical Neural Architecture Search for Semantic Image Segmentation

Conditional Random Fields as Recurrent Neural Networks

BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation

Efficient piecewise training of deep structured models for semantic segmentation

Learning Deconvolution Network for Semantic Segmentation


SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

SegNet: Pixel-Wise Semantic Labelling Using a Deep Networks

Getting Started with SegNet

ParseNet: Looking Wider to See Better

Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation

Semantic Image Segmentation via Deep Parsing Network

Multi-Scale Context Aggregation by Dilated Convolutions

Instance-aware Semantic Segmentation via Multi-task Network Cascades

Object Segmentation on SpaceNet via Multi-task Network Cascades (MNC)

Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network

Combining the Best of Convolutional Layers and Recurrent Layers: A Hybrid Network for Semantic Segmentation

Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation

ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation

Laplacian Reconstruction and Refinement for Semantic Segmentation

Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation

Natural Scene Image Segmentation Based on Multi-Layer Feature Extraction

Convolutional Random Walk Networks for Semantic Image Segmentation

ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial Imagery

Deep Learning Markov Random Field for Semantic Segmentation

Region-based semantic segmentation with end-to-end training

Built-in Foreground/Background Prior for Weakly-Supervised Semantic Segmentation

PixelNet: Towards a General Pixel-level Architecture

Exploiting Depth from Single Monocular Images for Object Detection and Semantic Segmentation

  • intro: IEEE T. Image Processing
  • intro: propose an RGB-D semantic segmentation method which applies a multi-task training scheme: semantic label prediction and depth value regression
  • arxiv:

PixelNet: Representation of the pixels, by the pixels, and for the pixels

Semantic Segmentation of Earth Observation Data Using Multimodal and Multi-scale Deep Networks

Deep Structured Features for Semantic Segmentation

CNN-aware Binary Map for General Semantic Segmentation

Efficient Convolutional Neural Network with Binary Quantization Layer

Mixed context networks for semantic segmentation

High-Resolution Semantic Labeling with Convolutional Neural Networks

Gated Feedback Refinement Network for Dense Image Labeling

RefineNet: Multi-Path Refinement Networks with Identity Mappings for High-Resolution Semantic Segmentation

RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation

Light-Weight RefineNet for Real-Time Semantic Segmentation

Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes

Semantic Segmentation using Adversarial Networks

Improving Fully Convolution Network for Semantic Segmentation

The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation

Training Bit Fully Convolutional Network for Fast Semantic Segmentation

Classification With an Edge: Improving Semantic Image Segmentation with Boundary Detection

  • intro: “an end-to-end trainable deep convolutional neural network (DCNN) for semantic segmentation with built-in awareness of semantically meaningful boundaries. “
  • arxiv:

Diverse Sampling for Self-Supervised Learning of Semantic Segmentation

Mining Pixels: Weakly Supervised Semantic Segmentation Using Image Labels

FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation

Understanding Convolution for Semantic Segmentation

Label Refinement Network for Coarse-to-Fine Semantic Segmentation

Predicting Deeper into the Future of Semantic Segmentation

Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach

Guided Perturbations: Self Corrective Behavior in Convolutional Neural Networks

Not All Pixels Are Equal: Difficulty-aware Semantic Segmentation via Deep Layer Cascade

Large Kernel Matters – Improve Semantic Segmentation by Global Convolutional Network

Loss Max-Pooling for Semantic Image Segmentation

Reformulating Level Sets as Deep Recurrent Neural Network Approach to Semantic Segmentation

A Review on Deep Learning Techniques Applied to Semantic Segmentation

Joint Semantic and Motion Segmentation for dynamic scenes using Deep Convolutional Networks

ICNet for Real-Time Semantic Segmentation on High-Resolution Images

Feature Forwarding: Exploiting Encoder Representations for Efficient Semantic Segmentation

LinkNet: Exploiting Encoder Representations for Efficient Semantic Segmentation

Pixel Deconvolutional Networks

Incorporating Network Built-in Priors in Weakly-supervised Semantic Segmentation

Deep Semantic Segmentation for Automated Driving: Taxonomy, Roadmap and Challenges

Semantic Segmentation with Reverse Attention

Stacked Deconvolutional Network for Semantic Segmentation

Learning Dilation Factors for Semantic Segmentation of Street Scenes

A Self-aware Sampling Scheme to Efficiently Train Fully Convolutional Networks for Semantic Segmentation

One-Shot Learning for Semantic Segmentation

An Adaptive Sampling Scheme to Efficiently Train Fully Convolutional Networks for Semantic Segmentation

Semantic Segmentation from Limited Training Data

Unsupervised Domain Adaptation for Semantic Segmentation with GANs

Neuron-level Selective Context Aggregation for Scene Segmentation

Road Extraction by Deep Residual U-Net

Mix-and-Match Tuning for Self-Supervised Semantic Segmentation

Error Correction for Dense Semantic Image Labeling

Semantic Segmentation via Highly Fused Convolutional Network with Multiple Soft Cost Functions

RTSeg: Real-time Semantic Segmentation Comparative Study

ShuffleSeg: Real-time Semantic Segmentation Network

Dynamic-structured Semantic Propagation Network

ESPNet: Efficient Spatial Pyramid of Dilated Convolutions for Semantic Segmentation

Context Encoding for Semantic Segmentation

Adaptive Affinity Field for Semantic Segmentation

Predicting Future Instance Segmentations by Forecasting Convolutional Features

Fully Convolutional Adaptation Networks for Semantic Segmentation

  • intro: CVPR 2018, Rank 1 in Segmentation Track of Visual Domain Adaptation Challenge 2017
  • keywords: Fully Convolutional Adaptation Networks (FCAN), Appearance Adaptation Networks (AAN) and Representation Adaptation Networks (RAN)
  • arxiv:

Learning a Discriminative Feature Network for Semantic Segmentation

Deep Representation Learning for Domain Adaptation of Semantic Image Segmentation

Convolutional CRFs for Semantic Segmentation

ContextNet: Exploring Context and Detail for Semantic Segmentation in Real-time

DifNet: Semantic Segmentation by DiffusionNetworks

Pyramid Attention Network for Semantic Segmentation

Semantic Segmentation with Scarce Data

Attention to Refine through Multi-Scales for Semantic Segmentation

Guided Upsampling Network for Real-Time Semantic Segmentation

Deep Learning for Semantic Segmentation on Minimal Hardware

Future Semantic Segmentation with Convolutional LSTM

BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation

Dual Attention Network for Scene Segmentation

Real-Time Joint Semantic Segmentation and Depth Estimation Using Asymmetric Annotations

Efficient Dense Modules of Asymmetric Convolution for Real-Time Semantic Segmentation

Semantic Image Segmentation by Scale-Adaptive Networks

Recurrent Iterative Gating Networks for Semantic Segmentation

CGNet: A Light-weight Context Guided Network for Semantic Segmentation

CCNet: Criss-Cross Attention for Semantic Segmentation

ShelfNet for Real-time Semantic Segmentation

Improving Semantic Segmentation via Video Propagation and Label Relaxation

RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free

Fast-SCNN: Fast Semantic Segmentation Network

Structured Knowledge Distillation for Semantic Segmentation

In Defense of Pre-trained ImageNet Architectures for Real-time Semantic Segmentation of Road-driving Images

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation

Significance-aware Information Bottleneck for Domain Adaptive Semantic Segmentation

GFF: Gated Fully Fusion for Semantic Segmentation

DADA: Depth-aware Domain Adaptation in Semantic Segmentation

DFANet: Deep Feature Aggregation for Real-Time Semantic Segmentation

ESNet: An Efficient Symmetric Network for Real-time Semantic Segmentation

Gated-SCNN: Gated Shape CNNs for Semantic Segmentation

DABNet: Depth-wise Asymmetric Bottleneck for Real-time Semantic Segmentation

Dynamic Graph Message Passing Networks

Squeeze-and-Attention Networks for Semantic Segmentation

Global Aggregation then Local Distribution in Fully Convolutional Networks

Graph-guided Architecture Search for Real-time Semantic Segmentation

Feature Pyramid Encoding Network for Real-time Semantic Segmentation

ACFNet: Attentional Class Feature Network for Semantic Segmentation

Region Mutual Information Loss for Semantic Segmentation

Category Anchor-Guided Unsupervised Domain Adaptation for Semantic Segmentation

Efficacy of Pixel-Level OOD Detection for Semantic Segmentation

Location-aware Upsampling for Semantic Segmentation

FasterSeg: Searching for Faster Real-time Semantic Segmentation

AlignSeg: Feature-Aligned Segmentation Networks

Deep Grouping Model for Unified Perceptual Parsing

Spatial Pyramid Based Graph Reasoning for Semantic Segmentation

Learning Dynamic Routing for Semantic Segmentation

Learning to Predict Context-adaptive Convolution for Semantic Segmentation

Transferring and Regularizing Prediction for Semantic Segmentation

Tensor Low-Rank Reconstruction for Semantic Segmentation

Representative Graph Neural Network

EfficientFCN: Holistically-guided Decoding for Semantic Segmentation

Improving Semantic Segmentation via Decoupled Body and Edge Supervision

Auto Seg-Loss: Searching Metric Surrogates for Semantic Segmentation

PseudoSeg: Designing Pseudo Labels for Semantic Segmentation

Importance-Aware Semantic Segmentation in Self-Driving with Discrete Wasserstein Training

Pixel-Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

CABiNet: Efficient Context Aggregation Network for Low-Latency Semantic Segmentation

SegBlocks: Block-Based Dynamic Resolution Networks for Real-Time Segmentation

Channel-wise Distillation for Semantic Segmentation

BoxInst: High-Performance Instance Segmentation with Box Annotations

Scaling Semantic Segmentation Beyond 1K Classes on a Single GPU

Cross-Domain Grouping and Alignment for Domain Adaptive Semantic Segmentation

HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation


Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Exploring Cross-Image Pixel Contrast for Semantic Segmentation

Active Boundary Loss for Semantic Segmentation

Learning Statistical Texture for Semantic Segmentation

Cross-Dataset Collaborative Learning for Semantic Segmentation

Vision Transformers for Dense Prediction

InverseForm: A Loss Function for Structured Boundary-Aware Segmentation

Rethinking BiSeNet For Real-time Semantic Segmentation

Segmenter: Transformer for Semantic Segmentation

SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

Per-Pixel Classification is Not All You Need for Semantic Segmentation

A Unified Efficient Pyramid Transformer for Semantic Segmentation

Deep Metric Learning for Open World Semantic Segmentation

Multi-Anchor Active Domain Adaptation for Semantic Segmentation

Generalize then Adapt: Source-Free Domain Adaptive Semantic Segmentation

HRFormer: High-Resolution Transformer for Dense Prediction

Deep Hierarchical Semantic Segmentation

TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation

Mask DINO: Towards A Unified Transformer-based Framework for Object Detection and Segmentation

Instance Segmentation

Simultaneous Detection and Segmentation

Convolutional Feature Masking for Joint Object and Stuff Segmentation

Proposal-free Network for Instance-level Object Segmentation

Hypercolumns for object segmentation and fine-grained localization

SDS using hypercolumns

Learning to decompose for object detection and instance segmentation

Recurrent Instance Segmentation

Instance-sensitive Fully Convolutional Networks

Amodal Instance Segmentation

Bridging Category-level and Instance-level Semantic Image Segmentation

Bottom-up Instance Segmentation using Deep Higher-Order CRFs

DeepCut: Object Segmentation from Bounding Box Annotations using Convolutional Neural Networks

End-to-End Instance Segmentation and Counting with Recurrent Attention

Translation-aware Fully Convolutional Instance Segmentation

Fully Convolutional Instance-aware Semantic Segmentation

InstanceCut: from Edges to Instances with MultiCut

Deep Watershed Transform for Instance Segmentation

Object Detection Free Instance Segmentation With Labeling Transformations

Shape-aware Instance Segmentation

Interpretable Structure-Evolving LSTM

  • intro: CMU & Sun Yat-sen University & National University of Singapore & Adobe Research
  • intro: CVPR 2017 spotlight paper
  • arxiv:

Mask R-CNN

Faster Training of Mask R-CNN by Focusing on Instance Boundaries

Boundary-preserving Mask R-CNN

Semantic Instance Segmentation via Deep Metric Learning

Pose2Instance: Harnessing Keypoints for Person Instance Segmentation

Pixelwise Instance Segmentation with a Dynamically Instantiated Network

Instance-Level Salient Object Segmentation

MEnet: A Metric Expression Network for Salient Object Segmentation

Semantic Instance Segmentation with a Discriminative Loss Function

SceneCut: Joint Geometric and Object Segmentation for Indoor Scenes

S4 Net: Single Stage Salient-Instance Segmentation

Deep Extreme Cut: From Extreme Points to Object Segmentation

Learning to Segment Every Thing

Recurrent Neural Networks for Semantic Instance Segmentation

MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features

Recurrent Pixel Embedding for Instance Grouping

Annotation-Free and One-Shot Learning for Instance Segmentation of Homogeneous Object Clusters

Path Aggregation Network for Instance Segmentation

Learning to Segment via Cut-and-Paste

Learning to Cluster for Proposal-Free Instance Segmentation

Bayesian Semantic Instance Segmentation in Open Set World

TernausNetV2: Fully Convolutional Network for Instance Segmentation

Dynamic Multimodal Instance Segmentation guided by natural language queries

Traits & Transferability of Adversarial Examples against Instance Segmentation & Object Detection

Affinity Derivation and Graph Merge for Instance Segmentation

One-Shot Instance Segmentation

Hybrid Task Cascade for Instance Segmentation

Mask Scoring R-CNN

TensorMask: A Foundation for Dense Object Segmentation

Actor-Critic Instance Segmentation

Instance Segmentation by Jointly Optimizing Spatial Embeddings and Clustering Bandwidth

InstaBoost: Boosting Instance Segmentation via Probability Map Guided Copy-Pasting

SSAP: Single-Shot Instance Segmentation With Affinity Pyramid

YOLACT: Real-time Instance Segmentation

YOLACT++: Better Real-time Instance Segmentation

YolactEdge: Real-time Instance Segmentation on the Edge

PolarMask: Single Shot Instance Segmentation with Polar Representation

PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and Beyond

CenterMask : Real-Time Anchor-Free Instance Segmentation

CenterMask: single shot instance segmentation with point representation

Shape-aware Feature Extraction for Instance Segmentation

PolyTransform: Deep Polygon Transformer for Instance Segmentation

EmbedMask: Embedding Coupling for One-stage Instance Segmentation

SAIS: Single-stage Anchor-free Instance Segmentation

SOLO: Segmenting Objects by Locations

SOLOv2: Dynamic, Faster and Stronger

SOLO: A Simple Framework for Instance Segmentation

RDSNet: A New Deep Architecture for Reciprocal Object Detection and Instance Segmentation

BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation

Conditional Convolutions for Instance Segmentation

PointINS: Point-based Instance Segmentation

1st Place Solutions for OpenImage2019 – Object Detection and Instance Segmentation

Mask Encoding for Single Shot Instance Segmentation

The Devil is in Classification: A Simple Framework for Long-tail Instance Segmentation

Deep Variational Instance Segmentation

Mask Point R-CNN

Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation

Seesaw Loss for Long-Tailed Instance Segmentation

Joint COCO and Mapillary Workshop at ICCV 2019: COCO Instance Segmentation Challenge Track

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

The Devil is in the Boundary: Exploiting Boundary Representation for Basis-based Instance Segmentation

Robust Instance Segmentation through Reasoning about Multi-Object Occlusion

Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation

How Shift Equivariance Impacts Metric Learning for Instance Segmentation

FASA: Feature Augmentation and Sampling Adaptation for Long-Tailed Instance Segmentation

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

Sparse Object-level Supervision for Instance Segmentation with Pixel Embeddings

FAPIS: A Few-shot Anchor-free Part-based Instance Segmenter

ISTR: End-to-End Instance Segmentation with Transformers

Deep Occlusion-Aware Instance Segmentation with Overlapping BiLayers

SOLQ: Segmenting Objects by Learning Queries

1st Place Solution for YouTubeVOS Challenge 2021:Video Instance Segmentation

Rank & Sort Loss for Object Detection and Instance Segmentation

SOTR: Segmenting Objects with Transformers

FaPN: Feature-aligned Pyramid Network for Dense Image Prediction

Instances as Queries

Mask Transfiner for High-Quality Instance Segmentation

SOIT: Segmenting Objects with Instance-Aware Transformers

ContrastMask: Contrastive Learning to Segment Every Thing

Sparse Instance Activation for Real-Time Instance Segmentation

Human Instance Segmentation

PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model

Pose2Seg: Detection Free Human Instance Segmentation

Bounding Box Embedding for Single Shot Person Instance Segmentation

Parsing R-CNN for Instance-Level Human Analysis

Graphonomy: Universal Human Parsing via Graph Transfer Learning

Video Instance Segmentation

SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation

End-to-End Video Instance Segmentation with Transformers

Spatial Feature Calibration and Temporal Fusion for Effective One-stage Video Instance Segmentation

Tracking Instances as Queries

Video Mask Transfiner for High-Quality Video Instance Segmentation

Panoptic Segmentation

Panoptic Segmentation

Panoptic Segmentation with a Joint Semantic and Instance Segmentation Network

Learning to Fuse Things and Stuff

Attention-guided Unified Network for Panoptic Segmentation

  • intro: CVPR 2019
  • intro: University of Chinese Academy of Sciences & Horizon Robotics, Inc. & The Johns Hopkins University
  • arxiv:

Panoptic Feature Pyramid Networks

UPSNet: A Unified Panoptic Segmentation Network

Single Network Panoptic Segmentation for Street Scene Understanding

An End-to-End Network for Panoptic Segmentation

Learning Instance Occlusion for Panoptic Segmentation

SpatialFlow: Bridging All Tasks for Panoptic Segmentation

Single-Shot Panoptic Segmentation

SOGNet: Scene Overlap Graph Network for Panoptic Segmentation

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation

PanDA: Panoptic Data Augmentation

Real-Time Panoptic Segmentation from Dense Detections

Bipartite Conditional Random Fields for Panoptic Segmentation

Unifying Training and Inference for Panoptic Segmentation

Towards Bounding-Box Free Panoptic Segmentation

A Benchmark for LiDAR-based Panoptic Segmentation based on KITTI

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation

EPSNet: Efficient Panoptic Segmentation Network with Cross-layer Attention Fusion

Pixel Consensus Voting for Panoptic Segmentation

EfficientPS: Efficient Panoptic Segmentation

Video Panoptic Segmentation

PanoNet: Real-time Panoptic Segmentation through Position-Sensitive Feature Embedding

Robust Vision Challenge 2020 – 1st Place Report for Panoptic Segmentation

Learning Category- and Instance-Aware Pixel Embedding for Fast Panoptic Segmentation

Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation

Scaling Wide Residual Networks for Panoptic Segmentation

Fully Convolutional Networks for Panoptic Segmentation

MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

Ada-Segment: Automated Multi-loss Adaptation for Panoptic Segmentation

ViP-DeepLab: Learning Visual Perception with Depth-aware Video Panoptic Segmentation

STEP: Segmenting and Tracking Every Pixel

Cross-View Regularization for Domain Adaptive Panoptic Segmentation

MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers

Panoptic Segmentation Forecasting

Exemplar-Based Open-Set Panoptic Segmentation Network

Hierarchical Lovász Embeddings for Proposal-free Panoptic Segmentation

Part-aware Panoptic Segmentation

Panoptic SegFormer

Slot-VPS: Object-centric Representation Learning for Video Panoptic Segmentation

  • intro: Samsung Research China - Beijing (SRC-B) & 2Samsung Advanced Institute of Technology (SAIT) & University of Oxford & The University of Hong Kong
  • arxiv:

CFNet: Learning Correlation Functions for One-Stage Panoptic Segmentation

Panoptic, Instance and Semantic Relations: A Relational Context Encoder to Enhance Panoptic Segmentation

PanopticDepth: A Unified Framework for Depth-aware Panoptic Segmentation

  • intro: CVPR 2022
  • intro: Chinese Academy of Sciences & University of Chinese Academy of Sciences & Horizon Robotics, Inc.
  • arxiv:

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

Uncertainty-aware Panoptic Segmentation

k-means Mask Transformer

Nightime Segmentation

Nighttime sky/cloud image segmentation

Dark Model Adaptation: Semantic Image Segmentation from Daytime to Nighttime

Semantic Nighttime Image Segmentation with Synthetic Stylized Data, Gradual Adaptation and Uncertainty-Aware Evaluation

Guided Curriculum Model Adaptation and Uncertainty-Aware Evaluation for Semantic Nighttime Image Segmentation

Bi-Mix: Bidirectional Mixing for Domain Adaptive Nighttime Semantic Segmentation

DANNet: A One-Stage Domain Adaptation Network for Unsupervised Nighttime Semantic Segmentation

NightLab: A Dual-level Architecture with Hardness Detection for Segmentation at Night

Face Parsing

Face Parsing via Recurrent Propagation

Face Parsing via a Fully-Convolutional Continuous CRF Neural Network

Face Parsing with RoI Tanh-Warping

End-to-End Face Parsing via Interlinked Convolutional Neural Networks

RoI Tanh-polar Transformer Network for Face Parsing in the Wild

Decoupled Multi-task Learning with Cyclical Self-Regulation for Face Parsing

Specific Segmentation

A CNN Cascade for Landmark Guided Semantic Part Segmentation

End-to-end semantic face segmentation with conditional random fields as convolutional, recurrent and adversarial networks

Boundary-sensitive Network for Portrait Segmentation

Boundary-Aware Network for Fast and High-Accuracy Portrait Segmentation

Beef Cattle Instance Segmentation Using Fully Convolutional Neural Network

Face Mask Extraction in Video Sequence

Segment Proposal

Learning to Segment Object Candidates

Learning to Refine Object Segments

FastMask: Segment Object Multi-scale Candidates in One Shot

Scene Labeling / Scene Parsing

Indoor Semantic Segmentation using depth information

Recurrent Convolutional Neural Networks for Scene Parsing

Learning hierarchical features for scene labeling

Multi-modal unsupervised feature learning for rgb-d scene labeling

Scene Labeling with LSTM Recurrent Neural Networks

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models

“Semantic Segmentation for Scene Understanding: Algorithms and Implementations” tutorial

Semantic Understanding of Scenes through the ADE20K Dataset

Learning Deep Representations for Scene Labeling with Guided Supervision

Learning Deep Representations for Scene Labeling with Semantic Context Guided Supervision

Spatial As Deep: Spatial CNN for Traffic Scene Understanding

Multi-Path Feedback Recurrent Neural Network for Scene Parsing

Scene Labeling using Recurrent Neural Networks with Explicit Long Range Contextual Dependency

FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation


Pyramid Scene Parsing Network

Open Vocabulary Scene Parsing

Deep Contextual Recurrent Residual Networks for Scene Labeling

Fast Scene Understanding for Autonomous Driving

  • intro: Published at “Deep Learning for Vehicle Perception”, workshop at the IEEE Symposium on Intelligent Vehicles 2017
  • arxiv:

FoveaNet: Perspective-aware Urban Scene Parsing

BlitzNet: A Real-Time Deep Network for Scene Understanding

Semantic Foggy Scene Understanding with Synthetic Data

Scale-adaptive Convolutions for Scene Parsing

Restricted Deformable Convolution based Road Scene Semantic Segmentation Using Surround View Cameras

Dense Recurrent Neural Networks for Scene Labeling

DenseASPP for Semantic Segmentation in Street Scenes

OCNet: Object Context Network for Scene Parsing

PSANet: Point-wise Spatial Attention Network for Scene Parsing

Adaptive Context Network for Scene Parsing

Semantic Flow for Fast and Accurate Scene Parsing

Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

S3-Net: A Fast and Lightweight Video Scene Understanding Network by Single-shot Segmentation


MIT Scene Parsing Benchmark

Semantic Understanding of Urban Street Scenes: Benchmark Suite


Large-scale Scene Understanding Challenge

Places2 Challenge

Human Parsing

Human Parsing with Contextualized Convolutional Neural Network

Look into Person: Self-supervised Structure-sensitive Learning and A New Benchmark for Human Parsing

Multiple-Human Parsing in the Wild

Look into Person: Joint Body Parsing & Pose Estimation Network and A New Benchmark

Cross-domain Human Parsing via Adversarial Feature and Label Adaptation

Fusing Hierarchical Convolutional Features for Human Body Segmentation and Clothing Fashion Classification

Understanding Humans in Crowded Scenes: Deep Nested Adversarial Learning and A New Benchmark for Multi-Human Parsing

Macro-Micro Adversarial Network for Human Parsing

Instance-level Human Parsing via Part Grouping Network

Adaptive Temporal Encoding Network for Video Instance-level Human Parsing

Devil in the Details: Towards Accurate Single and Multiple Human Parsing

Cross-Domain Complementary Learning with Synthetic Data for Multi-Person Part Segmentation

Self-Correction for Human Parsing

Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing

Learning Semantic Neural Tree for Human Parsing

Self-Learning with Rectification Strategy for Human Parsing

Correlating Edge, Pose with Parsing

Affinity-aware Compression and Expansion Network for Human Parsing

Renovating Parsing R-CNN for Accurate Multiple Human Parsing

Progressive One-shot Human Parsing

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

Quality-Aware Network for Human Parsing

End-to-end One-shot Human Parsing

CDGNet: Class Distribution Guided Network for Human Parsing

AIParsing: Anchor-free Instance-level Human Parsing

Joint Detection and Segmentation

Triply Supervised Decoder Networks for Joint Detection and Segmentation

D2Det: Towards High Quality Object Detection and Instance Segmentation

Video Object Segmentation

Fast object segmentation in unconstrained video

Recurrent Fully Convolutional Networks for Video Segmentation

Object Detection, Tracking, and Motion Segmentation for Object-level Video Segmentation

Clockwork Convnets for Video Semantic Segmentation

STFCN: Spatio-Temporal FCN for Semantic Video Segmentation

One-Shot Video Object Segmentation

DAVIS: Densely Annotated VIdeo Segmentation

Video Object Segmentation Without Temporal Information

Convolutional Gated Recurrent Networks for Video Segmentation

Learning Video Object Segmentation from Static Images

Semantic Video Segmentation by Gated Recurrent Flow Propagation

FusionSeg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos

Unsupervised learning from video to detect foreground objects in single images

Semantically-Guided Video Object Segmentation

Learning Video Object Segmentation with Visual Memory

Flow-free Video Object Segmentation

Online Adaptation of Convolutional Neural Networks for Video Object Segmentation

Video Object Segmentation using Tracked Object Proposals

Video Object Segmentation with Re-identification

Pixel-Level Matching for Video Object Segmentation using Convolutional Neural Networks

MaskRNN: Instance Level Video Object Segmentation

SegFlow: Joint Learning for Video Object Segmentation and Optical Flow

Video Semantic Object Segmentation by Self-Adaptation of DCNN

Learning to Segment Moving Objects

Instance Embedding Transfer to Unsupervised Video Object Segmentation

Efficient Video Object Segmentation via Network Modulation

Video Object Segmentation with Joint Re-identification and Attention-Aware Mask Propagation

Video Object Segmentation with Language Referring Expressions

Dynamic Video Segmentation Network

Low-Latency Video Semantic Segmentation

Blazingly Fast Video Object Segmentation with Pixel-Wise Metric Learning

Unsupervised Video Object Segmentation for Deep Reinforcement Learning

Fast and Accurate Online Video Object Segmentation via Tracking Parts

ReConvNet: Video Object Segmentation with Spatio-Temporal Features Modulation

Deep Spatio-Temporal Random Fields for Efficient Video Segmentation

Fast Video Object Segmentation by Reference-Guided Mask Propagation

PReMVOS: Proposal-generation, Refinement and Merging for Video Object Segmentation

YouTube-VOS: Sequence-to-Sequence Video Object Segmentation

VideoMatch: Matching based Video Object Segmentation

Mask Propagation Network for Video Object Segmentation

Tukey-Inspired Video Object Segmentation

A Generative Appearance Model for End-to-end Video Object Segmentation

Unseen Object Segmentation in Videos via Transferable Representations

FEELVOS: Fast End-to-End Embedding Learning for Video Object Segmentation

RVOS: End-to-End Recurrent Network for Video Object Segmentation

BubbleNets: Learning to Select the Guidance Frame in Video Object Segmentation by Deep Sorting Frames

Fast video object segmentation with Spatio-Temporal GANs

Video Object Segmentation using Space-Time Memory Networks

Spatiotemporal CNN for Video Object Segmentation


Architecture Search of Dynamic Cells for Semantic Video Segmentation

BoLTVOS: Box-Level Tracking for Video Object Segmentation

MAIN: Multi-Attention Instance Network for Video Segmentation

MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation

Video Instance Segmentation

OVSNet : Towards One-Pass Real-Time Video Object Segmentation

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

RANet: Ranking Attention Network for Fast Video Object Segmentation

DMM-Net: Differentiable Mask-Matching Network for Video Object Segmentation

CapsuleVOS: Semi-Supervised Video Object Segmentation Using Capsule Routing

Towards Good Practices for Video Object Segmentation

Anchor Diffusion for Unsupervised Video Object Segmentation

Learning a Spatio-Temporal Embedding for Video Instance Segmentation

Efficient Semantic Video Segmentation with Per-frame Inference

State-Aware Tracker for Real-Time Video Object Segmentation

Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement

SwiftNet: Real-time Video Object Segmentation

SG-Net: Spatial Granularity Network for One-Stage Video Instance Segmentation


DAVIS Challenge on Video Object Segmentation 2017


Deep Image Matting

Fast Deep Matting for Portrait Animation on Mobile Phone

  • intro: ACM Multimedia Conference (MM) 2017
  • intro: does not need any interaction and can realize real-time matting with 15 fps
  • arxiv:

Real-time deep hair matting on mobile devices

TOM-Net: Learning Transparent Object Matting from a Single Image

Deep Video Portraits

Inductive Guided Filter: Real-time Deep Image Matting with Weakly Annotated Masks on Mobile Devices

Indices Matter: Learning to Index for Deep Image Matting

Disentangled Image Matting

Natural Image Matting via Guided Contextual Attention

F, B, Alpha Matting

Background Matting: The World is Your Green Screen

Hierarchical Opacity Propagation for Image Matting

High-Resolution Deep Image Matting

Learning Affinity-Aware Upsampling for Deep Image Matting

Real-Time High-Resolution Background Matting

Deep Video Matting via Spatio-Temporal Alignment and Aggregation

Trimap-guided Feature Mining and Fusion Network for Natural Image Matting

Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation

MatteFormer: Transformer-Based Image Matting via Prior-Tokens

Referring Image Matting

One-Trimap Video Matting

trimap-free matting

Semantic Human Matting

Instance Segmentation based Semantic Matting for Compositing Applications

A Late Fusion CNN for Digital Matting

Attention-Guided Hierarchical Structure Aggregation for Image Matting

Boosting Semantic Human Matting with Coarse Annotations

End-to-end Animal Image Matting

Is a Green Screen Really Necessary for Real-Time Human Matting?

Multi-scale Information Assembly for Image Matting

Salient Image Matting

Mask Guided Matting via Progressive Refinement Network

Privacy-Preserving Portrait Matting

Highly Efficient Natural Image Matting

PP-HumanSeg: Connectivity-Aware Portrait Segmentation with a Large-Scale Teleconferencing Video Dataset

Situational Perception Guided Image Matting

PP-Matting: High-Accuracy Natural Image Matting

3D Segmentation

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

DA-RNN: Semantic Mapping with Data Associated Recurrent Neural Networks

SqueezeSeg: Convolutional Neural Nets with Recurrent CRF for Real-Time Road-Object Segmentation from 3D LiDAR Point Cloud

SEGCloud: Semantic Segmentation of 3D Point Clouds

3D Instance Segmentation via Multi-task Metric Learning

3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation

PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

Line Parsing

Fully Convolutional Line Parsing


TF Image Segmentation: Image Segmentation framework

KittiSeg: A Kitti Road Segmentation model implemented in tensorflow.

Semantic Segmentation Architectures Implemented in PyTorch

PyTorch for Semantic Segmentation

LightNet: Light-weight Networks for Semantic Image Segmentation

LightNet++: Boosted Light-weighted Networks for Real-time Semantic Segmentation


Segmentation Results: VOC2012 BETA: Competition “comp6” (train on own data)


Mobile Real-time Video Segmentation

Deep Learning for Natural Image Segmentation Priors

Image Segmentation Using DIGITS 5

Image Segmentation with Tensorflow using CNNs and Conditional Random Fields

Fully Convolutional Networks (FCNs) for Image Segmentation

Image segmentation with Neural Net

A 2017 Guide to Semantic Segmentation with Deep Learning

Tutorails / Talks

A Unified Architecture for Instance and Semantic Segmentation

Deep learning for image segmentation