Model Card for ProSoRo-MVAE

Project Page | Paper | GitHub

Xudong Han¹, Ning Guo², Ronghan Xu¹, Fang Wan¹, Chaoyang Song¹
¹ Southern University of Science and Technology, ² Shanghai Jiao Tong University

Model Card for ProSoRo-MVAE

Model Description

Proprioceptive Soft Robot (ProSoRo) is a proprioceptive soft robotic system that utilizes miniature vision to track an internal marker within the robot's deformable structure. By monitoring the motion of this single point relative to a fixed boundary, we capture critical information about the robot's overall deformation state, significantly reducing sensing complexity. To harness the full potential of this anchor-based approach, we developed a multi-modal proprioception learning framework utilizing a multi-modal variational autoencoder (MVAE) to align motion, force, and shape of ProSoRos into a unified representation based on an anchored observation, involving three stages:

Material identification: Recognizing the impracticality of collecting extensive physical datasets for soft robots, we leveraged finite element analysis (FEA) simulations to generate high-quality training data. We begin by measuring the material's stress-strain curve through the standard uniaxial tension test to obtain the best-fitted material model. Then, we apply an evolution strategy to optimize the material parameters by comparing the calculated force from finite element analysis (FEA) and the measured ground truth from a physical experiment under the same motion of the anchor point. More details can be found in EVOMIA.
Latent proprioceptive learning: The simulation dataset was generated using the optimized material parameters and provided motion in $[D_x, D_y, D_z, R_x, R_y, R_z]^\mathrm{T}$, force in $[F_x, F_y, F_z, T_x, T_y, T_z]^\mathrm{T}$, and shape in node displacements of $[n_x, n_y, n_z]_{3n}^\mathrm{T}$ as the training inputs. To learn these modalities for explicit proprioception, we developed a multi-modal variational autoencoder (MVAE) to encode the ProSoRo's proprioception via latent codes. Three modal latent codes are generated through three specific motion, force, and shape encoders, and the shared code contains fused information from all three modalities by minimizing the errors among the three codes. As a result, the shared codes provide explicit proprioception in the latent space, denoted as latent proprioception, which can be used to reconstruct the three modalities using specific decoders for applied interactions.
Cross-modal inference: In real-world deployments, the shape modality, for example, can be estimated from latent proprioception instead of direct measurement, which is usually impossible to achieve in real-time interactions in robotics. At this stage, we visually capture the ProSoRo’s anchor point as MVAE's input to estimate the force and shape modalities based on the latent knowledge learned from simulation data. We found that our proposed latent proprioception framework to be a versatile solution in soft robotic interactions.

Within the latent code, we identify key morphing primitives that correspond to fundamental deformation modes. By systematically varying these latent components, we can generate a spectrum of deformation behaviors, offering a novel perspective on soft robotic systems' intrinsic dimensionality and controllability. This understanding enhances the interpretability of the latent code and facilitates the development of more sophisticated control strategies and advanced human-robot interfaces.

Intended Use

This model is intended for researchers and practitioners in the field of soft robotics who are interested in developing proprioceptive capabilities for soft robotic systems. See project page for more details.

To load the model:

# Example code to load safetensors
from transformers import AutoModel

model = AutoModel.from_pretrained("asRobotics/prosoro-mvae", prosoro_type="cylinder")
x = torch.zeros((1, 6))  # Example input: batch size of 1, 6D motion
output = model(x)

Or to load the ONNX version:

# Example code to load onnx
import onnxruntime as ort
import numpy as np
from huggingface_hub import hf_hub_download

onnx_model_path = hf_hub_download(repo_id="asRobotics/prosoro-mvae", filename="cylinder/model.onnx")
ort_session = ort.InferenceSession(onnx_model_path)
x = np.zeros((1, 6)).astype(np.float32)  # Example input: batch size of 1, 6D motion
outputs = ort_session.run(None, {"motion": x})

Training Data

The model was trained on the ProSoRo-100K dataset, which contains 100,000 samples of simulated data for various shapes of ProSoRos.

Citation

If you use this model in your research, please cite the following paper:

@article{han2025anchoring,
    title={Anchoring Morphological Representations Unlocks Latent Proprioception in Soft Robots},
    author={Han, Xudong and Guo, Ning and Xu, Ronghan and Wan, Fang and Song, Chaoyang},
    journal={Advanced Intelligent Systems},
    volume={0},
    pages={0-0},
    year={2025}
}

Downloads last month: 7

Video Preview

Robotics

asRobotics
/

prosoro-mvae

Model Card for ProSoRo-MVAE

Table of Contents

Model Description

Intended Use

Training Data

Citation

Dataset used to train asRobotics/prosoro-mvae