EASI: Evolutionary Adversarial Simulator Identification for Sim-to-Real Transfer

Nanjing University
NeurIPS 2024

Abstract

Reinforcement Learning (RL) controllers have demonstrated remarkable performance in complex robot control tasks. However, the presence of reality gap often leads to poor performance when deploying policies trained in simulation directly onto real robots. Previous sim-to-real algorithms like Domain Randomization (DR) requires domain-specific expertise and suffers from issues such as reduced control performance and high training costs. In this work, we introduce Evolutionary Adversarial Simulator Identification (EASI), a novel approach that combines Generative Adversarial Network (GAN) and Evolutionary Strategy (ES) to address sim-to-real challenges. Specifically, we consider the problem of sim-to-real as a search problem, where ES acts as a generator in adversarial competition with a neural network discriminator, aiming to find physical parameter distributions that make the state transitions between simulation and reality as similar as possible. The discriminator serves as the fitness function, guiding the evolution of the physical parameter distributions. EASI features simplicity, low cost, and high fidelity, enabling the construction of a more realistic simulator with minimal requirements for real-world data, thus aiding in transferring simulated-trained policies to the real world. We demonstrate the performance of EASI in both sim-to-sim and sim-to-real tasks, showing superior performance compared to existing sim-to-real algorithms.

EASI

Our goal is to find a parameter distribution for simulator (e.g. Gaussian distribution) that makes the simulator most similar to the realworld. The distance between simulator and reality is measured by a discriminator $D(\mathbf{s}, \mathbf{a}, \mathbf{s}')$ and trained by: $$ \mathop{\max}\limits_{D} \mathbb{E}_{d^{\mathcal{M}}(\mathbf{s}, \mathbf{a}, \mathbf{s}')}[D(\mathbf{s}, \mathbf{a}, \mathbf{s}')] - \mathbb{E}_{d^{\mathcal{B}}(\mathbf{s}, \mathbf{a}, \mathbf{s}')}][D(\mathbf{s}, \mathbf{a}, \mathbf{s}')]. $$

MY ALT TEXT

Schematic overview of EASI. ES acts as a generator in adversarial competition with a neural network discriminator distinguishing between simulation and reality state transitions. The discriminator serves as the fitness function, guiding the evolution of the physical parameter distributions.

Sim-to-real policy transfer

MY ALT TEXT
MY ALT TEXT

After parameter search with EASI, the simulation become more realistic. In this experiment, we use the same initial policy and speed command (v=1.4m/s) to control the robot's movement in both simulation and reality environments and plot the frequency spectrum of the robot's joint movements. Training in EASI optimized simulator, the robot's ability to follow speed commands is improved.

Introduction to EASI's sim-to-real experiments on the quadruped robot, Unitree Go2.

Sim-to-sim policy transfer

MY ALT TEXT

Evolution process of parameters. WD means target parameter within initial parameter distribution, OOD means target parameter out of initial parameter distribution.

MY ALT TEXT

The performance of policy on the pseudo-real environment over the process of training. The X-axis represents the number of optimization steps for the policy during training in the simulation, while the Y-axis indicates the policy's performance when tested in pseudo-real world.