Learning to Design Soft Hands using Reward Models

Xueqian Bai Nicklas Hansen Adabhav Singh Michael T. Tolley Yan Duan Pieter Abbeel Xiaolong Wang Sha Yi

UC San Diego, Amazon FAR (Frontier AI & Robotics)

Abstract

Soft robotic hands promise to provide compliant and safe interaction with objects and environments. However, designing soft hands to be both compliant and functional across diverse use cases remains challenging. Although co-design of hardware and control better couples morphology to behavior, the resulting search space is high-dimensional, and even simulation-based evaluation is computationally expensive. In this paper, we propose a Cross-Entropy Method with Reward Model (CEM-RM) framework that efficiently optimizes tendon-driven soft robotic hands based on teleoperation control policy, reducing design evaluations by more than half compared to pure optimization while learning a distribution of optimized hand designs from pre-collected teleoperation data. We derive a design space for a soft robotic hand composed of flexural soft fingers and implement parallelized training in simulation. The optimized hands are then 3D-printed and deployed in the real world using both teleoperation data and real-time teleoperation. Experiments in both simulation and hardware demonstrate that our optimized design significantly outperforms baseline hands in grasping success rates across a diverse set of challenging objects.

Soft Robot Hand Design Space

We optimize over a richer design space compared to our previous work. Left: The optimized parameters include segment and flexure lengths, tendon waypoint distributions, and segment thicknesses. Right: A three-finger soft hand (base at the top, fingers pointing downward), where finger orientations and mounting positions are also considered as design parameters.

Cross-Entropy Method with Reward Models

System overview. We first collected multiple teleoperation control datasets for each object, which are randomly sampled during optimization. The design action distribution is optimized in the CEM loop, with evaluations from both simulation and a co-trained reward model. The action distribution ultimately converges to the optimal soft hand design.

Teleoperation Data Collection

Human hand poses captured by the Meta Quest 3 are converted into soft-hand motion control commands for real-time teleoperation. The grasping pose, prismatic joint displacement, and tendon motion are collected and augmented in simulation.

Autonomous Deployment

* All videos are real-time execution.

Uniform Design
✨Optimized Design✨

Spoon (31g)

Spoon (31g)

Bunny (108g)

Bunny (108g)

Tomato sauce can (362g)

Tomato sauce can (362g)

Clamp (37g)

Clamp (37g)

Bowl (98g)

Bowl (98g)

Tomato soup can (346g)

Tomato soup can (346g)

Hardware Setup

Optimized soft robot hand. We built our final design with optimized fingers, 3D-printed finger holders, an xArm mount, geared racks, and four servo motors.
Interesting findings of our soft hand design. The fingertips become thicker, offering better frictional and power grip. A lower tendon routing at the fingertip reduces curling, enabling more effective precision grasps of small objects.

Simulation Results for Optimization Methods

Elite reward and loss versus the number of environment interactions, comparing CEM, CEM with reward model, and random sampling. Our hybrid CEM-RM converges faster with significantly fewer environment interactions while achieving a comparable final performance to pure CEM.

Simulation comparison of uniform, best random sampling, and optimized design in NVIDIA Warp.

BibTeX
@misc{bai2025learningdesignsofthands,
      title={Learning to Design Soft Hands using Reward Models}, 
      author={Xueqian Bai and Nicklas Hansen and Adabhav Singh and Michael T. Tolley and Yan Duan and Pieter Abbeel and Xiaolong Wang and Sha Yi},
    }