EASI: Evolutionary Adversarial Simulator Identification for Sim-to-Real Transfer

Nanjing University
NeurIPS 2024

Abstract

Reinforcement Learning (RL) controllers have demonstrated remarkable performance in complex robot control tasks. However, the presence of reality gap often leads to poor performance when deploying policies trained in simulation directly onto real robots. Previous sim-to-real algorithms like Domain Randomization (DR) requires domain-specific expertise and suffers from issues such as reduced control performance and high training costs. In this work, we introduce Evolutionary Adversarial Simulator Identification (EASI), a novel approach that combines Generative Adversarial Network (GAN) and Evolutionary Strategy (ES) to address sim-to-real challenges. Specifically, we consider the problem of sim-to-real as a search problem, where ES acts as a generator in adversarial competition with a neural network discriminator, aiming to find physical parameter distributions that make the state transitions between simulation and reality as similar as possible. The discriminator serves as the fitness function, guiding the evolution of the physical parameter distributions. EASI features simplicity, low cost, and high fidelity, enabling the construction of a more realistic simulator with minimal requirements for real-world data, thus aiding in transferring simulated-trained policies to the real world. We demonstrate the performance of EASI in both sim-to-sim and sim-to-real tasks, showing superior performance compared to existing sim-to-real algorithms.

EASI

Our goal is to find a parameter distribution for simulator (e.g. Gaussian distribution) that makes the simulator most similar to the realworld. The distance between simulator and reality is measured by a discriminator $D(\mathbf{s}, \mathbf{a}, \mathbf{s}')$ and trained by: $$ \mathop{\max}\limits_{D} \mathbb{E}_{d^{\mathcal{M}}(\mathbf{s}, \mathbf{a}, \mathbf{s}')}[D(\mathbf{s}, \mathbf{a}, \mathbf{s}')] - \mathbb{E}_{d^{\mathcal{B}}(\mathbf{s}, \mathbf{a}, \mathbf{s}')}][D(\mathbf{s}, \mathbf{a}, \mathbf{s}')]. $$

EASI: Evolutionary Adversarial Simulator Identification for Sim-to-Real Transfer

Abstract

EASI

Schematic overview of EASI. ES acts as a generator in adversarial competition with a neural network discriminator distinguishing between simulation and reality state transitions. The discriminator serves as the fitness function, guiding the evolution of the physical parameter distributions.

Sim-to-real policy transfer

Introduction to EASI's sim-to-real experiments on the quadruped robot, Unitree Go2.

Sim-to-sim policy transfer

Evolution process of parameters. WD means target parameter within initial parameter distribution, OOD means target parameter out of initial parameter distribution.

The performance of policy on the pseudo-real environment over the process of training. The X-axis represents the number of optimization steps for the policy during training in the simulation, while the Y-axis indicates the policy's performance when tested in pseudo-real world.