|
Debidatta Dwibedi
Research
I want to build intelligent agents that interact with our world in useful ways.
My research lies at the intersection of machine learning, computer vision and robotics. Presently, I am working on imitation learning from videos and investigating the role time can play in learning better visual models.
|
Publications |
|
Temporal Cycle-Consistency Learning
Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman
Computer Vision and Pattern Recognition (CVPR) 2019
Self-supervised representation learning based on temporal alignment for fine-grained video understanding tasks.
paper |
abstract |
bibtex |
project |
poster
We introduce a self-supervised representation learning method based on the task of temporal alignment between videos. The method trains a network using temporal cycle consistency (TCC), a differentiable cycle-consistency loss that can be used to find correspondences across time in multiple videos. The resulting per-frame embeddings can be used to align videos by simply matching frames using the nearest-neighbors in the learned embedding space.
To evaluate the power of the embeddings, we densely label the Pouring and Penn Action video datasets for action phases. We show that (i) the learned embeddings enable few-shot classification of these action phases, significantly reducing the supervised training requirements; and (ii) TCC is complementary to other methods of self-supervised learning in videos, such as Shuffle and Learn and Time-Contrastive Networks. The embeddings are also used for a number of applications based on alignment (dense temporal correspondence) between video pairs, including transfer of metadata of synchronized modalities between videos (sounds, temporal semantic labels), synchronized playback of multiple videos, and anomaly detection.
@InProceedings{Dwibedi_2019_CVPR,
author = {Dwibedi, Debidatta and Aytar, Yusuf and Tompson, Jonathan and Sermanet, Pierre and Zisserman, Andrew},
title = {Temporal Cycle-Consistency Learning},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}
}
|
|
Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning
Ilya Kostrikov, Kumar Krishna Agrawal , Debidatta Dwibedi, Sergey Levine , and Jonathan Tompson
International Conference on Learning Representations (ICLR) 2019
Sample efficient imitation learning using off-policy updates and proper handling of terminal states.
paper |
abstract |
bibtex |
code
We identify two issues with the family of algorithms based on the Adversarial Imitation Learning framework. The first problem is implicit bias present in the reward functions used in these algorithms. While these biases might work well for some environments, they can also lead to sub-optimal behavior in others. Secondly, even though these algorithms can learn from few expert demonstrations, they require a prohibitively large number of interactions with the environment in order to imitate the expert for many real-world applications. In order to address these issues, we propose a new algorithm called Discriminator-Actor-Critic that uses off-policy Reinforcement Learning to reduce policy-environment interaction sample complexity by an average factor of 10. Furthermore, since our reward function is designed to be unbiased, we can apply our algorithm to many problems without making any task-specific adjustments.
@inproceedings{
kostrikov2018discriminatoractorcritic,
title={Discriminator-Actor-Critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning},
author={Ilya Kostrikov and Kumar Krishna Agrawal and Debidatta Dwibedi and Sergey Levine and Jonathan Tompson},
booktitle={International Conference on Learning Representations},
year={2019},
url={https://openreview.net/forum?id=Hk4fpoA5Km},
}
|
|
Learning Actionable Representations from Visual Observations
Debidatta Dwibedi, Jonathan Tompson, Corey Lynch and Pierre Sermanet
International Conference on Intelligent Robots (IROS) 2018
Control agents from pixels by learning self-supervised representations from videos.
paper |
abstract |
bibtex |
project
In this work we explore a new approach for robots to teach themselves about the world simply by observing it. In particular we investigate the effectiveness of learning task-agnostic representations for continuous control tasks. We extend Time-Contrastive Networks (TCN) that learn from visual observations by embedding multiple frames jointly in the embedding space as opposed to a single frame. We show that by doing so, we are now able to encode both position and velocity attributes significantly more accurately. We test the usefulness of this self-supervised approach in a reinforcement learning setting. We show that the representations learned by agents observing themselves take random actions, or other agents perform tasks successfully, can enable the learning of continuous control policies using algorithms like Proximal Policy Optimization (PPO) using only the learned embeddings as input. We also demonstrate significant improvements on the real-world Pouring dataset with a relative error reduction of 39.4% for motion attributes and 11.1% for static attributes compared to the single-frame baseline.
@inproceedings{dwibedi2018learning,
author = {Dwibedi, Debidatta and Tompson, Jonathan and Lynch, Corey and Sermanet, Pierre},
title = {Learning Actionable Representations from Visual Observations},
booktitle = {2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
pages = {1577--1584},
year = {2018},
organization = {IEEE},
url = {https://arxiv.org/abs/1808.00928}
}
|
|
Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection
Debidatta Dwibedi, Ishan Misra and Martial Hebert
International Conference on Computrer Vision (ICCV) 2017
Generate synthetic data for detecting objects in scenes.
paper |
abstract |
bibtex |
code |
poster
A major impediment in rapidly deploying object detection models for instance detection is the lack of large annotated datasets. For example, finding a large labeled dataset containing instances in a particular kitchen is unlikely. Each new environment with new instances requires expensive data collection and annotation. In this paper, we propose a simple approach to generate large annotated instance datasets with minimal effort. Our key insight is that ensuring only patch-level realism provides enough training signal for current object detector models. We automatically `cut' object instances and `paste' them on random backgrounds. A naive way to do this results in pixel artifacts which result in poor performance for trained models. We show how to make detectors ignore these artifacts during training and generate data that gives competitive performance on real data. Our method outperforms existing synthesis approaches and when combined with real images improves relative performance by more than 21% on benchmark datasets. In a cross-domain setting, our synthetic data combined with just 10% real data outperforms models trained on all real data.
@InProceedings{Dwibedi_2017_ICCV,
author = {Dwibedi, Debidatta and Misra, Ishan and Hebert, Martial},
title = {Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection},
booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
month = {Oct},
year = {2017}
}
|
|
Deep Cuboid Detection: Beyond 2D Bounding Boxes
Debidatta Dwibedi, Tomasz Malisiewicz, Vijay Badrinarayanan and Andrew Rabinovich
Arxiv Preprint, 2016
Cuboid detector using deep learning: finds cuboids in scenes and localizes their corners.
paper |
abstract |
bibtex
We present a Deep Cuboid Detector which takes a consumer-quality RGB image of a cluttered scene and localizes all 3D cuboids (box-like objects). Contrary to classical approaches which fit a 3D model from low-level cues like corners, edges, and vanishing points, we propose an end-to-end deep learning system to detect cuboids across many semantic categories (e.g., ovens, shipping boxes, and furniture). We localize cuboids with a 2D bounding box, and simultaneously localize the cuboid's corners, effectively producing a 3D interpretation of box-like objects. We refine keypoints by pooling convolutional features iteratively, improving the baseline method significantly. Our deep learning cuboid detector is trained in an end-to-end fashion and is suitable for real-time applications in augmented reality (AR) and robotics.
@article{dwibedi2016deep,
title={Deep cuboid detection: Beyond 2d bounding boxes},
author={Dwibedi, Debidatta and Malisiewicz, Tomasz and Badrinarayanan, Vijay and Rabinovich, Andrew},
journal={arXiv preprint arXiv:1611.10010},
year={2016}
}
|
|
Characterizing Predicate Arity and Spatial Structure for Inductive Learning of Game Rules
Debidatta Dwibedi and Amitabha Mukerjee
ECCV 2014 Workshop on Computer Vision + Ontology Applied Cross-Disciplinary Technologies 2014
Represent videos as dynamic graphs. Learn rules of games from observing people play games in Kinect videos.
paper |
abstract |
bibtex |
videos
Where do the predicates in a game ontology come from? We use RGBD vision to learn a) the spatial structure of a board, and b) the number of parameters in a move or transition. These are used to define state-transition predicates for a logical description of each game state. Given a set of videos for a game, we use an improved 3D multi-object tracking to obtain the positions of each piece in games such as 4-peg solitaire or Towers of Hanoi. The spatial positions occupied by pieces over the entire game is clustered, revealing the structure of the board. Each frame is represented as a Semantic Graph with edges encoding spatial relations between pieces. Changes in the graphs between game states reveal the structure of a “move”. Knowledge from spatial structure and semantic graphs is mapped to FOL descriptions of the moves and used in an Inductive Logic framework to infer the valid moves and other rules of the game. Discovered predicate structures and induced rules are demonstrated for several games with varying board layouts and move structures.
@inproceedings{dwibedi2014characterizing,
title={Characterizing Predicate Arity and Spatial Structure for Inductive Learning of Game Rules},
author={Dwibedi, Debidatta and Mukerjee, Amitabha},
booktitle={European Conference on Computer Vision},
pages={323--338},
year={2014},
organization={Springer}
}
|
Miscellaneous
Some other unpublished work:
|
|