Image / Video Captioning


Im2Text: Describing Images Using 1 Million Captioned Photographs

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Show and Tell

Show and Tell: A Neural Image Caption Generator

Image caption generation by CNN and LSTM

Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge

Learning a Recurrent Visual Representation for Image Caption Generation

Mind’s Eye: A Recurrent Visual Representation for Image Caption Generation

Deep Visual-Semantic Alignments for Generating Image Descriptions

Deep Captioning with Multimodal Recurrent Neural Networks

Show, Attend and Tell

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention (ICML 2015)

Automatically describing historic photographs

Learning like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images

What value do explicit high level concepts have in vision to language problems?

Aligning where to see and what to tell: image caption with region-based attention and scene factorization

Learning FRAME Models Using CNN Filters for Knowledge Visualization (CVPR 2015)

Generating Images from Captions with Attention

Order-Embeddings of Images and Language

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

Expressing an Image Stream with a Sequence of Natural Sentences

Multimodal Pivots for Image Caption Translation

Image Captioning with Deep Bidirectional LSTMs

Encode, Review, and Decode: Reviewer Module for Caption Generation

Review Network for Caption Generation

Attention Correctness in Neural Image Captioning

Image Caption Generation with Text-Conditional Semantic Attention

DeepDiary: Automatic Caption Generation for Lifelogging Image Streams

phi-LSTM: A Phrase-based Hierarchical LSTM Model for Image Captioning

Captioning Images with Diverse Objects

Learning to generalize to new compositions in image understanding

Generating captions without looking beyond objects

SPICE: Semantic Propositional Image Caption Evaluation

Boosting Image Captioning with Attributes

Bootstrap, Review, Decode: Using Out-of-Domain Textual Data to Improve Image Captioning

A Hierarchical Approach for Generating Descriptive Image Paragraphs

Dense Captioning with Joint Inference and Visual Context

Optimization of image description metrics using policy gradient methods

Areas of Attention for Image Captioning

Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

Recurrent Image Captioner: Describing Images with Spatial-Invariant Transformation and Attention Filtering

Recurrent Highway Networks with Language CNN for Image Captioning

Top-down Visual Saliency Guided by Captions

MAT: A Multimodal Attentive Translator for Image Captioning

Deep Reinforcement Learning-based Image Captioning with Embedding Reward

Attend to You: Personalized Image Captioning with Context Sequence Memory Networks

Punny Captions: Witty Wordplay in Image Descriptions

Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner

Actor-Critic Sequence Training for Image Captioning

  • intro: Queen Mary University of London & Yang’s Accounting Consultancy Ltd
  • keywords: actor-critic reinforcement learning
  • arxiv:

What is the Role of Recurrent Neural Networks (RNNs) in an Image Caption Generator?

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning

Self-Guiding Multimodal LSTM - when we do not have a perfect training dataset for image captioning

Contrastive Learning for Image Captioning

Object Descriptions

Generation and Comprehension of Unambiguous Object Descriptions

Video Captioning / Description

Jointly Modeling Deep Video and Compositional Text to Bridge Vision and Language in a Unified Framework

Translating Videos to Natural Language Using Deep Recurrent Neural Networks

Describing Videos by Exploiting Temporal Structure

SA-tensorflow: Soft attention mechanism for video caption generation

Sequence to Sequence – Video to Text

Jointly Modeling Embedding and Translation to Bridge Video and Language

Video Description using Bidirectional Recurrent Neural Networks

Bidirectional Long-Short Term Memory for Video Description

3 Ways to Subtitle and Caption Your Videos Automatically Using Artificial Intelligence

Frame- and Segment-Level Features and Candidate Pool Evaluation for Video Caption Generation

Grounding and Generation of Natural Language Descriptions for Images and Videos

Video Captioning and Retrieval Models with Semantic Attention

  • intro: Winner of three (fill-in-the-blank, multiple-choice test, and movie retrieval) out of four tasks of the LSMDC 2016 Challenge (Workshop in ECCV 2016)
  • arxiv:

Spatio-Temporal Attention Models for Grounded Video Captioning

Video and Language: Bridging Video and Language with Deep Learning

Recurrent Memory Addressing for describing videos

Video Captioning with Transferred Semantic Attributes

Adaptive Feature Abstraction for Translating Video to Language

Semantic Compositional Networks for Visual Captioning

Hierarchical Boundary-Aware Neural Encoder for Video Captioning

Attention-Based Multimodal Fusion for Video Description

Weakly Supervised Dense Video Captioning

Generating Descriptions with Grounded and Co-Referenced People

Multi-Task Video Captioning with Video and Entailment Generation

Dense-Captioning Events in Videos

Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning

Reinforced Video Captioning with Entailment Rewards

End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering

From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning


Learning CNN-LSTM Architectures for Image Caption Generation: An implementation of CNN-LSTM image caption generator architecture that achieves close to state-of-the-art results on the MSCOCO dataset.

screengrab-caption: an openframeworks app that live-captions your desktop screen with a neural net


CaptionBot (Microsoft)


Captioning Novel Objects in Images

Published: 09 Oct 2015

Deep Learning and Autonomous Driving


(Toronto) CSC2541: Visual Perception for Autonomous Driving, Winter 2016

(MIT) 6.S094: Deep Learning for Self-Driving Cars

How to Land An Autonomous Vehicle Job: Coursework


An Empirical Evaluation of Deep Learning on Highway Driving


DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving

End to End Learning for Self-Driving Cars

End-to-End Deep Learning for Self-Driving Cars

Can we unify monocular detectors for autonomous driving by using the pixel-wise semantic segmentation of CNNs?

BRAIN4CARS: Cabin Sensing for Safe and Personalized Driving

Brain4Cars: Sensory-Fusion Recurrent Neural Models for Driver Activity Anticipation

Brain4Cars: Car That Knows Before You Do via Sensory-Fusion Deep Learning Architecture

Car that Knows Before You Do: Anticipating Maneuvers via Learning Temporal Driving Models

Recurrent Neural Networks for Driver Activity Anticipation via Sensory-Fusion Architecture

Long-term Planning by Short-term Prediction

Learning a Driving Simulator open-sources the data it used for its first successful driverless trips

Autonomous driving challenge: To Infer the property of a dynamic object based on its motion pattern using recurrent neural network

Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving

Learning from Maps: Visual Common Sense for Autonomous Driving

SAD-GAN: Synthetic Autonomous Driving using Generative Adversarial Networks

  • intro: Accepted at the Deep Learning for Action and Interaction Workshop, 30th Conference on Neural Information Processing Systems (NIPS 2016)
  • arxiv:

MultiNet: Real-time Joint Semantic Reasoning for Autonomous Driving

Interpretable Learning for Self-Driving Cars by Visualizing Causal Attention

Virtual to Real Reinforcement Learning for Autonomous Driving

Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art

Deep Reinforcement Learning framework for Autonomous Driving

Systematic Testing of Convolutional Neural Networks for Autonomous Driving

MODNet: Moving Object Detection Network with Motion and Appearance for Autonomous Driving


Caffe-Autopilot: Car autopilot software that uses C++, BVLC Caffe, OpenCV, and SFML

Self Driving Car Demo

Autoware: Open-source software for urban autonomous driving

Open Sourcing 223GB of Driving Data

Machine Learning for RC Cars

Self Driving (Toy) Ferrari

Lane Finding Project for Self-Driving Car ND

Instructions on how to get your development environment ready for Udacity Self Driving Car (SDC) Challenges

DeepDrive: self-driving car AI

DeepDrive setup: Run a self-driving car simulator from the comfort of your own PC

DeepTesla: End-to-End Learning from Human and Autopilot Driving


Self-driving cars: How far away are we REALLY from autonomous cars?(7 Aug 2015)

Practice makes perfect: Driverless cars will learn from their mistakes(9 Oct 2015)

Eyes on the Road: How Autonomous Cars Understand What They’re Seeing

Human-in-the-loop deep learning will help drive autonomous cars

Using reinforcement learning in Python to teach a virtual car to avoid obstacles

Autonomous RC car using Raspberry Pi and Neural Networks

The Road Ahead: Autonomous Vehicles Startup Ecosystem

Deep Driving - A revolutionary AI technique is about to transform the self-driving car

**Visualizations for regressing wheel steering angles in self driving cars with Keras **

Published: 09 Oct 2015

Acceleration and Model Compression


Published: 09 Oct 2015

Data Science Resources


Published: 09 Oct 2015

Data Mining Resources


Published: 09 Oct 2015

Recognition, Detection, Segmentation and Tracking

Classification / Recognition

Published: 09 Oct 2015


HOG: Histogram of Oriented Gradients

Published: 09 Oct 2015

Discrete Optimization Resources

Constraint Programming

Published: 01 Oct 2015