π¦Ύ Diffusion Policy for Aloha Insertion (200k Steps)
π― Research Purpose
Important Note: This model was trained primarily for academic comparisonβevaluating the performance difference between Diffusion Policy and ACT algorithms under identical training conditions (using the lerobot/aloha_sim_insertion_human dataset). This is a benchmark experiment designed to analyze different algorithms' learning capabilities for complex 3D manipulation tasks under limited computational resources (Batch Size=8), not to train a highly successful practical model.
Summary: This model represents a benchmark experiment for Diffusion Policy on the challenging Aloha Insertion task (Simulated). It was trained using the LeRobot framework to evaluate the algorithm's performance on complex, high-dimensional 3D manipulation tasks compared to baseline methods.
- π§© Task: Aloha Insertion (Simulated, 3D)
- π§ Algorithm: Diffusion Policy (DDPM)
- π Training Steps: 200,000
- π Author: Graduate Student, UESTC (University of Electronic Science and Technology of China)
π¬ Benchmark Results (vs ACT)
This experiment highlights the significant difficulty of the Aloha Insertion task for generative policies under limited compute constraints (Batch Size=8). While the ACT baseline achieved a 2% success rate (1/50), the Diffusion Policy focused on trajectory learning but struggled with the final insertion alignment.
π Evaluation Metrics (50 Episodes)
| Metric | Value | Comparison to ACT Baseline | Status |
|---|---|---|---|
| Success Rate | 0.0% | Slightly Lower (ACT: 2.0%) | π |
| Avg Max Reward | 0.10 | Partial Success (Grasping achieved) | π§ |
| Avg Sum Reward | 8.20 | Stable Trajectories | β |
Note: The Aloha Insertion task involves high-dimensional inputs (3 cameras) and precise 3D spatial reasoning. The results indicate that under low batch-size constraints (Batch Size=8), ACT's deterministic policy may converge faster than Diffusion Policy, which likely requires longer training or larger batches for this specific domain.
βοΈ Model Details
| Parameter | Description |
|---|---|
| Architecture | ResNet18 (Vision Backbone) + U-Net (Diffusion Head) |
| Input | 3 Camera Views (Top, Left, Right) |
| Prediction Horizon | 16 steps |
| Observation History | 2 steps |
| Action Steps | 8 steps |
π§ Training Configuration
For reproducibility, here are the key parameters used during the training session.
- Source: Configuration adapted from CSCSX/LeRobotTutorial-CN.
- Batch Size: 8 (Limited by 8GB VRAM)
- Optimizer: AdamW (
lr=1e-4) - Scheduler: Cosine with warmup
- Vision: ResNet18 with GroupNorm (Cropped to 420x560)
Original Training Command (My Resume Mode)
python -m lerobot.scripts.lerobot_train \
--config_path diffusion_aloha.yaml \
--env.type aloha \
--env.task AlohaInsertion-v0 \
--dataset.repo_id lerobot/aloha_sim_insertion_human \
--wandb.enable true \
--job_name DP_Aloha_Insertion \
--policy.repo_id Lemon-03/DP_Aloha_Insertion_test \
diffusion_aloha.yaml
π Click to view full diffusion_aloha.yaml used for training
# @package _global_
# Random seed
seed: 100000
job_name: Diffusion-Aloha-Insertion
# Training parameters
steps: 200000 # Original file states 200k steps (Aloha is difficult to train)
eval_freq: 20000 # Slightly increased frequency to monitor progress
save_freq: 20000
log_freq: 200
batch_size: 8 # β οΈ Crucial: Aloha requires small batch size, otherwise 8GB VRAM is insufficient
# Dataset
dataset:
repo_id: lerobot/aloha_sim_insertion_human
# Evaluation settings
eval:
n_episodes: 50
batch_size: 8 # Keep consistent with training
# Environment settings
env:
type: aloha
task: AlohaInsertion-v0
fps: 50
# Policy configuration
policy:
type: diffusion
# --- Vision processing ---
vision_backbone: resnet18
# Aloha images are rectangular, using specific crop dimensions here
crop_shape: [420, 560]
crop_is_random: true
pretrained_backbone_weights: null # Original config specifies not to load pretrained weights
use_group_norm: true
spatial_softmax_num_keypoints: 32
# --- Diffusion core architecture (U-Net) ---
down_dims: [512, 1024, 2048]
kernel_size: 5
n_groups: 8
diffusion_step_embed_dim: 128
use_film_scale_modulation: true
# --- Action prediction parameters ---
n_action_steps: 8
n_obs_steps: 2
horizon: 16
# --- Noise scheduler (DDPM) ---
noise_scheduler_type: DDPM
num_train_timesteps: 100
num_inference_timesteps: 100
beta_schedule: squaredcos_cap_v2
beta_start: 0.0001
beta_end: 0.02
prediction_type: epsilon
clip_sample: true
clip_sample_range: 1.0
# --- Optimizer ---
optimizer_lr: 1e-4
optimizer_weight_decay: 1e-6
#grad_clip_norm: 10
scheduler_name: cosine
scheduler_warmup_steps: 500
use_amp: true
π Evaluate (My Evaluation Mode)
To evaluate this model locally, run the following command:
python -m lerobot.scripts.lerobot_eval \
--policy.type diffusion \
--policy.pretrained_path Lemon-03/DP_Aloha_Insertion_test \
--eval.n_episodes 50 \
--eval.batch_size 8 \
--env.type aloha \
--env.task AlohaInsertion-v0
- Downloads last month
- 58