
Bhairav Mehta
Co-Founder, CEO at CharacterQuilt. AI/ML, Robotics, Entrepreneur
Network
3.1K connectionsSummary
Work
Education
Writing
Three Ways to Fix The Negative Pretraining Effect
January 1, 2021Investigates the effects of pretraining on the plasticity of neural networks, finding that different trajectories induce invariances that can either help or hinder plasticity in multi-task learning scenarios.
A User's Guide to Calibrating Robotics Simulators
January 1, 2020Explores current methods in machine learning system identification, presenting a user's guide on when and where to use each, and introducing the SIPE benchmark for testing and comparing algorithms.
Bisimulation-Inducing Graph Neural Networks
January 1, 2020Demonstrates that bisimulation relations and metrics can be induced by graph neural networks, establishing an equivalence between the original formulation of bisimulation on MDPs and the L2 distance induced by a particular type of GNN embedding.
Generating Automatic Curricula via Self-Supervised Active Domain Randomization
January 1, 2020This work demonstrates that agents trained through self-play in the ADR framework significantly outperform uniform domain randomization in both simulated and real-world transfer, even without explicit rewards.
Curriculum in gradient-based meta-reinforcement learning
January 1, 2020Active Domain Randomization
January 1, 2019This paper explores methods for generating automatic curricula via self-supervised active domain randomization, showing that agents trained via self-play can outperform uniform domain randomization in simulated and real-world transfer.
Symbolic Regression for Interpretable Offline Reinforcement Learning
Describes ISRL, a new paradigm for extracting interpretable symbolic reward functions from noisy data solely via symbolic regression, allowing human experimenters to extract reward functions from data.
Active Domain Randomization and Safety-Critical Few-Shot Learning
A follow-up to ADR, this research shows that adaptive simulators can be learned within the maximum-entropy RL framework, allowing ADR's learned 'randomization-distributions' to serve as a strong, meaningful prior in a domain randomization setting.