Robust Robot Walker:
Learning Agile Locomotion over Tiny Traps

In Submission

Abstract

Quadruped robots must exhibit robust walking capabilities in practical applications. In this work, we propose a novel approach that enables quadruped robots to pass various small obstacles, or "tiny traps". Existing methods often rely on exteroceptive sensors, which can be unreliable for detecting such tiny traps. To overcome this limitation, our approach focuses solely on proprioceptive inputs. We introduce a two-stage training framework incorporating a contact encoder and a classification head to learn implicit representations of different traps. Additionally, we design a set of tailored reward functions to improve both the stability of training and the ease of deployment for goal-tracking tasks. To benefit further research, we design a new benchmark for tiny trap crossing. Extensive experiments in both simulation and real-world settings validate the effectiveness and robustness of our method.

Method

Proprioception-Based Two-Stage Training Framework

We introduce a two-stage training framework that relies solely on proprioception, enabling a robust policy that successfully passes tiny traps in both simulation and real-world environments. In the first stage, all the information can be accessed by the robot. We adopt an explicit-implicit dual-state learning. We also introduce a classifier head to guide the policy in learning the connection between the contact force distribution and the trap category. In the second stage, only goal command and proprioception can be observed. We initialize the weight of the estimator and the low-level RNN copied from the first training step. Probability Annealing Selection is used to gradually adapt policies to inaccurate estimates while reducing the degradation of the Oracle policy performance.

pipeline

Explicit-Implicit Dual-State Estimation Paradigm

In the first state of training, the contact force is first encoded by a contact encoder to an implicit latent, and concatenated with explicit privileged state to the dual-state. In the second step of training, the dual state is predicted by the estimator. Without the explicit-implicit dual-state estimation paradigm, the robot will easily misjudge the current state, leading to ineliminable sim-to-real gaps. We employ t-SNE visualization to illustrate the noise-robust latent features learned through the explicit-implicit dual-state estimation paradigm.

boolean
Without the explicit-implicit dual-state estimation paradigm, the robot will easily misjudge the current state, leading to ineliminable sim-to-real gaps.
t-sne
Our method with classification head and force contact has a more separable and continuous encoding, which means our policy can identify and react to traps more effectively.

Goal Tracking

We redefine the task as goal tracking, rather than velocity tracking, and incorporate carefully designed dense reward functions and fake goal commands. This approach achieves approximate omnidirectional movement without motion capture or additional localization techniques in real-world, significantly improving training stability and adaptability across environments.

vel_test
Theoretically, we can select any yaw velocity magnitude and linear velocity direction in real deployment, by setting appropriate fake goal commands.

Real-World Experiment

Bar

The robot learns to step back its front legs after contacting the bar. Even when the hind legs are intentionally trapped, the robot can still detect the collision and lift its hind legs across the bar. In addition, the bar is somewhat elastic unlike simulations, which also proves the generalization ability of our policy.

final-bar

Pit

The robot learns to support its body with the other three legs when one leg steps into a void, lifting the dangling leg out of the pit. Additionally, our robot has learned to forcefully kick its legs to climb out when multiple legs are stuck in the pit.

final-pit

Pole

The robot learns to sidestep to the left or right after colliding with a pole, avoiding the pole by a certain distance before moving forward.

final-pole

New Benchmark

We design a new Tiny Trap Benchmark in simulation. The benchmark consists of a 5m times 60m runway with three types of traps evenly distributed along the path. The traps include 10 bars with heights ranging from 0.05m to 0.2m, 50 randomly placed poles, and 10 pits with widths ranging from 0.05m to 0.2m. For each experiment, 1,000 robots are deployed, starting from the left side of the runway and passing through all the traps to reach the right side. We refer to this as the "Mix" benchmark. Additionally, there are separate "Bar," "Pit," and "Pole" benchmarks, each focusing on one specific type of trap, but with triple the number of traps.

benchmark

BibTeX

@article{zhu2024robust,
  title={Robust Robot Walker: Learning Agile Locomotion over Tiny Traps},
  author={Shaoting, Zhu and Runhan, Huang and Linzhan, Mou and Hang, Zhao},
  journal={arXiv preprint arXiv:2409.07409},
  year={2024}
}