🦾 Diffusion Policy for Aloha Insertion (200k Steps)

LeRobot Task UESTC License

🎯 Research Purpose

Important Note: This model was trained primarily for academic comparisonβ€”evaluating the performance difference between Diffusion Policy and ACT algorithms under identical training conditions (using the lerobot/aloha_sim_insertion_human dataset). This is a benchmark experiment designed to analyze different algorithms' learning capabilities for complex 3D manipulation tasks under limited computational resources (Batch Size=8), not to train a highly successful practical model.

Summary: This model represents a benchmark experiment for Diffusion Policy on the challenging Aloha Insertion task (Simulated). It was trained using the LeRobot framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods.

  • 🧩 Task: Aloha Insertion (Simulated, 3D)
  • 🧠 Algorithm: Diffusion Policy (DDPM)
  • πŸ”„ Training Steps: 200,000
  • πŸŽ“ Author: Graduate Student, UESTC (University of Electronic Science and Technology of China)

πŸ”¬ Benchmark Results (vs ACT)

This experiment highlights the significant difficulty of the Aloha Insertion task for generative policies under limited compute constraints (Batch Size=8). While the ACT baseline achieved a 2% success rate (1/50), the Diffusion Policy focused on trajectory learning but struggled with the final insertion alignment.

πŸ“Š Evaluation Metrics (50 Episodes)

Metric Value Comparison to ACT Baseline Status
Success Rate 0.0% Slightly Lower (ACT: 2.0%) πŸ“‰
Avg Max Reward 0.10 Partial Success (Grasping achieved) 🚧
Avg Sum Reward 8.20 Stable Trajectories βœ…

Note: The Aloha Insertion task involves high-dimensional inputs (3 cameras) and precise 3D spatial reasoning. The results indicate that under low batch-size constraints (Batch Size=8), ACT's deterministic policy may converge faster than Diffusion Policy, which likely requires longer training or larger batches for this specific domain.


βš™οΈ Model Details

Parameter Description
Architecture ResNet18 (Vision Backbone) + U-Net (Diffusion Head)
Input 3 Camera Views (Top, Left, Right)
Prediction Horizon 16 steps
Observation History 2 steps
Action Steps 8 steps

πŸ”§ Training Configuration

For reproducibility, here are the key parameters used during the training session.

  • Source: Configuration adapted from CSCSX/LeRobotTutorial-CN.
  • Batch Size: 8 (Limited by 8GB VRAM)
  • Optimizer: AdamW (lr=1e-4)
  • Scheduler: Cosine with warmup
  • Vision: ResNet18 with GroupNorm (Cropped to 420x560)

Original Training Command (My Resume Mode)

python -m lerobot.scripts.lerobot_train \
  --config_path diffusion_aloha.yaml \
  --env.type aloha \
  --env.task AlohaInsertion-v0 \
  --dataset.repo_id lerobot/aloha_sim_insertion_human \
  --wandb.enable true \
  --job_name DP_Aloha_Insertion \
  --policy.repo_id Lemon-03/DP_Aloha_Insertion_test \

diffusion_aloha.yaml

πŸ“„ Click to view full diffusion_aloha.yaml used for training
# @package _global_

# Random seed
seed: 100000
job_name: Diffusion-Aloha-Insertion

# Training parameters
steps: 200000            # Original file states 200k steps (Aloha is difficult to train)
eval_freq: 20000         # Slightly increased frequency to monitor progress
save_freq: 20000
log_freq: 200
batch_size: 8            # ⚠️ Crucial: Aloha requires small batch size, otherwise 8GB VRAM is insufficient

# Dataset
dataset:
  repo_id: lerobot/aloha_sim_insertion_human

# Evaluation settings
eval:
  n_episodes: 50
  batch_size: 8          # Keep consistent with training

# Environment settings
env:
  type: aloha
  task: AlohaInsertion-v0
  fps: 50

# Policy configuration
policy:
  type: diffusion

  # --- Vision processing ---
  vision_backbone: resnet18
  # Aloha images are rectangular, using specific crop dimensions here
  crop_shape: [420, 560]
  crop_is_random: true
  pretrained_backbone_weights: null  # Original config specifies not to load pretrained weights
  use_group_norm: true
  spatial_softmax_num_keypoints: 32

  # --- Diffusion core architecture (U-Net) ---
  down_dims: [512, 1024, 2048]
  kernel_size: 5
  n_groups: 8
  diffusion_step_embed_dim: 128
  use_film_scale_modulation: true

  # --- Action prediction parameters ---
  n_action_steps: 8
  n_obs_steps: 2
  horizon: 16

  # --- Noise scheduler (DDPM) ---
  noise_scheduler_type: DDPM
  num_train_timesteps: 100
  num_inference_timesteps: 100
  beta_schedule: squaredcos_cap_v2
  beta_start: 0.0001
  beta_end: 0.02
  prediction_type: epsilon
  clip_sample: true
  clip_sample_range: 1.0

  # --- Optimizer ---
  optimizer_lr: 1e-4
  optimizer_weight_decay: 1e-6
  #grad_clip_norm: 10
  
  scheduler_name: cosine
  scheduler_warmup_steps: 500

  use_amp: true

πŸš€ Evaluate (My Evaluation Mode)

To evaluate this model locally, run the following command:

python -m lerobot.scripts.lerobot_eval \
  --policy.type diffusion \
  --policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \
  --eval.n_episodes 50 \
  --eval.batch_size 8 \
  --env.type aloha \
  --env.task AlohaInsertion-v0
Downloads last month
58
Video Preview
loading

Dataset used to train Lemon-03/DP_Aloha_Insertion_test