A2C Agent for PandaReachDense-v3

This repository contains a trained Advantage Actor-Critic (A2C) agent for the PandaReachDense-v3 robotics environment from Panda-Gym.

The agent was trained using:

Stable-Baselines3
Gymnasium
Panda-Gym

Environment

The task involves controlling a Franka Panda robotic arm to reach a target position in 3D space.

Environment:

PandaReachDense-v3

Frameworks:

Stable-Baselines3
Gymnasium
Panda-Gym

Training Details

Algorithm:

A2C (Advantage Actor-Critic)

Observation Space:

Continuous

Action Space:

Continuous robotic control

Reward Type:

Dense reward

Evaluation Reward:

Mean Reward: -17.94 +/- 6.03

Usage

Install dependencies:

pip install stable-baselines3 gymnasium panda-gym huggingface_sb3

Load the model:

import gymnasium as gym
from stable_baselines3 import A2C
from huggingface_sb3 import load_from_hub

repo_id = "nirmanpatel/a2c-PandaReachDense-v3"
filename = "a2c-PandaReachDense-v3.zip"

checkpoint = load_from_hub(
    repo_id=repo_id,
    filename=filename,
)

env = gym.make("PandaReachDense-v3")

model = A2C.load(checkpoint)

obs, info = env.reset()

for _ in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)

    if terminated or truncated:
        obs, info = env.reset()

Notes

This project demonstrates:

Reinforcement Learning for robotics
Continuous control using A2C
Gymnasium-compatible RL pipelines
Hugging Face model deployment

Author

Created by Nirman Patel

Downloads last month: 99

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on PandaReachDense-v3
self-reported

-17.94 +/- 6.03