p3nGu1nZz commited on
Commit
e5844fa
·
verified ·
1 Parent(s): 83bab82

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +142 -141
README.md CHANGED
@@ -1,142 +1,143 @@
1
- ---
2
- license: mit
3
- ---
4
-
5
- # Tau LLM Unity ML Agents Project
6
-
7
- Welcome to the Tau LLM Unity ML Agents Project repository! This project focuses on training reinforcement learning agents using Unity ML-Agents and the PPO algorithm. Our goal is to optimize the performance of the agents through various configurations and training runs.
8
-
9
- ## Project Overview
10
-
11
- This repository contains the code and configurations for training agents in a Unity environment using the Proximal Policy Optimization (PPO) algorithm. The agents are designed to learn and adapt to their environment, improving their performance over time.
12
-
13
- ### Key Features
14
-
15
- - **Reinforcement Learning**: Utilizes the PPO algorithm for training agents.
16
- - **Unity ML-Agents**: Integrates with Unity ML-Agents for a seamless training experience.
17
- - **Custom Reward Functions**: Implements gradient-based reward functions for nuanced feedback.
18
- - **Memory Networks**: Incorporates memory networks to handle temporal dependencies.
19
- - **TensorBoard Integration**: Monitors training progress and performance using TensorBoard.
20
-
21
- ## Configuration
22
-
23
- Below is the configuration used for training the agents:
24
-
25
- ```yaml
26
- behaviors:
27
- TauAgent:
28
- trainer_type: ppo
29
- hyperparameters:
30
- batch_size: 256
31
- buffer_size: 4096
32
- learning_rate: 0.00003
33
- beta: 0.005
34
- epsilon: 0.2
35
- lambd: 0.95
36
- num_epoch: 10
37
- learning_rate_schedule: linear
38
- network_settings:
39
- normalize: true
40
- hidden_units: 256
41
- num_layers: 4
42
- vis_encode_type: simple
43
- memory:
44
- memory_size: 256
45
- sequence_length: 256
46
- num_layers: 4
47
- reward_signals:
48
- extrinsic:
49
- gamma: 0.99
50
- strength: 1.0
51
- curiosity:
52
- gamma: 0.995
53
- strength: 0.1
54
- network_settings:
55
- normalize: true
56
- hidden_units: 256
57
- num_layers: 4
58
- learning_rate: 0.00003
59
- keep_checkpoints: 10
60
- checkpoint_interval: 100000
61
- threaded: true
62
- max_steps: 3000000
63
- time_horizon: 256
64
- summary_freq: 10000
65
- ```
66
-
67
- ## Model Naming Convention
68
-
69
- The models in this repository follow the naming convention `Tau_<series>_<max_steps>`. This helps in easily identifying the series and the number of training steps for each model.
70
-
71
- ## Getting Started
72
-
73
- ### Prerequisites
74
-
75
- - Unity 6
76
- - Unity ML-Agents Toolkit
77
- - Python 3.10.11
78
- - PyTorch
79
- - Transformers
80
-
81
- ### Installation
82
-
83
- 1. Clone the repository:
84
- ```bash
85
- git clone https://github.com/p3nGu1nZz/Tau.git
86
- cd tau\MLAgentsProject
87
- ```
88
-
89
- 2. Install the required Python packages:
90
- ```bash
91
- pip install -r requirements.txt
92
- ```
93
-
94
- 3. Open the Unity project:
95
- - Launch Unity Hub and open the project folder.
96
-
97
- ### Training the Agent
98
-
99
- To start training the agent, run the following command:
100
- ```bash
101
- mlagents-learn .\config\tau_agent_ppo_c.yaml --run-id=tau_agent_ppo_A0 --env .\Build --torch-device cuda --timeout-wait 300 --force
102
- ```
103
- Note: The preferred way to run a build is by creating a new build into the `Build` directory which is referenced by the above command.
104
-
105
- ### Monitoring Training
106
-
107
- You can monitor the training progress using TensorBoard:
108
- ```bash
109
- tensorboard --logdir results
110
- ```
111
-
112
- ## Results
113
-
114
- The training results, including the average reward and cumulative reward, can be visualized using TensorBoard. The graphs below show the performance of the agent over time:
115
-
116
- ![Average Reward](path/to/average_reward.png)
117
- ![Cumulative Reward](path/to/cumulative_reward.png)
118
-
119
- ## Citation
120
-
121
- If you use this project in your research, please cite it as follows:
122
-
123
- ```bibtex
124
- @misc{Tau,
125
- author = {K. Rawson},
126
- title = {Tau LLM Unity ML Agents Project},
127
- year = {2024},
128
- publisher = {GitHub},
129
- journal = {GitHub repository},
130
- howpublished = {\url{https://github.com/p3nGu1nZz/Tau}},
131
- }
132
- ```
133
-
134
- ## License
135
-
136
- This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
137
-
138
- ## Acknowledgments
139
-
140
- - Unity ML-Agents Toolkit
141
- - TensorFlow and PyTorch communities
 
142
  - Hugging Face for hosting the model repository
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # Tau LLM Unity ML Agents Project
6
+
7
+ Welcome to the Tau LLM Unity ML Agents Project repository! This project focuses on training reinforcement learning agents using Unity ML-Agents and the PPO algorithm. Our goal is to optimize the performance of the agents through various configurations and training runs.
8
+
9
+ ## Project Overview
10
+
11
+ This repository contains the code and configurations for training agents in a Unity environment using the Proximal Policy Optimization (PPO) algorithm. The agents are designed to learn and adapt to their environment, improving their performance over time.
12
+
13
+ ### Key Features
14
+
15
+ - **Reinforcement Learning**: Utilizes the PPO algorithm for training agents.
16
+ - **Unity ML-Agents**: Integrates with Unity ML-Agents for a seamless training experience.
17
+ - **Custom Reward Functions**: Implements gradient-based reward functions for nuanced feedback.
18
+ - **Memory Networks**: Incorporates memory networks to handle temporal dependencies.
19
+ - **TensorBoard Integration**: Monitors training progress and performance using TensorBoard.
20
+
21
+ ## Configuration
22
+
23
+ Below is the configuration used for training the agents:
24
+
25
+ ```yaml
26
+ behaviors:
27
+ TauAgent:
28
+ trainer_type: ppo
29
+ hyperparameters:
30
+ batch_size: 256
31
+ buffer_size: 4096
32
+ learning_rate: 0.00003
33
+ beta: 0.005
34
+ epsilon: 0.2
35
+ lambd: 0.95
36
+ num_epoch: 10
37
+ learning_rate_schedule: linear
38
+ network_settings:
39
+ normalize: true
40
+ hidden_units: 256
41
+ num_layers: 4
42
+ vis_encode_type: simple
43
+ memory:
44
+ memory_size: 256
45
+ sequence_length: 256
46
+ num_layers: 4
47
+ reward_signals:
48
+ extrinsic:
49
+ gamma: 0.99
50
+ strength: 1.0
51
+ curiosity:
52
+ gamma: 0.995
53
+ strength: 0.1
54
+ network_settings:
55
+ normalize: true
56
+ hidden_units: 256
57
+ num_layers: 4
58
+ learning_rate: 0.00003
59
+ keep_checkpoints: 10
60
+ checkpoint_interval: 100000
61
+ threaded: true
62
+ max_steps: 3000000
63
+ time_horizon: 256
64
+ summary_freq: 10000
65
+ ```
66
+
67
+ ## Model Naming Convention
68
+
69
+ The models in this repository follow the naming convention `Tau_<series>_<max_steps>`. This helps in easily identifying the series and the number of training steps for each model.
70
+
71
+ ## Getting Started
72
+
73
+ ### Prerequisites
74
+
75
+ - Unity 6
76
+ - Unity ML-Agents Toolkit
77
+ - Python 3.10.11
78
+ - PyTorch
79
+ - Transformers
80
+
81
+ ### Installation
82
+
83
+ 1. Clone the repository:
84
+ ```bash
85
+ git clone https://github.com/p3nGu1nZz/Tau.git
86
+ cd tau\MLAgentsProject
87
+ ```
88
+
89
+ 2. Install the required Python packages:
90
+ ```bash
91
+ pip install -r requirements.txt
92
+ ```
93
+
94
+ 3. Open the Unity project:
95
+ - Launch Unity Hub and open the project folder.
96
+
97
+ ### Training the Agent
98
+
99
+ To start training the agent, run the following command:
100
+ ```bash
101
+ mlagents-learn .\config\tau_agent_ppo_c.yaml --run-id=tau_agent_ppo_A0 --env .\Build --torch-device cuda --timeout-wait 300 --force
102
+ ```
103
+ Note: The preferred way to run a build is by creating a new build into the `Build` directory which is referenced by the above command.
104
+
105
+ ### Monitoring Training
106
+
107
+ You can monitor the training progress using TensorBoard:
108
+ ```bash
109
+ tensorboard --logdir results
110
+ ```
111
+
112
+ ## Results
113
+
114
+ The training results, including the average reward and cumulative reward, can be visualized using TensorBoard. The graphs below show the performance of the agent over time:
115
+
116
+ ![Average Reward](chart_tau_B1_10M_a.png)
117
+ ![Average Reward](chart_tau_B1_10M_b.png)
118
+ ![Average Reward](chart_tau_B1_10M_c.png)
119
+
120
+ ## Citation
121
+
122
+ If you use this project in your research, please cite it as follows:
123
+
124
+ ```bibtex
125
+ @misc{Tau,
126
+ author = {K. Rawson},
127
+ title = {Tau LLM Unity ML Agents Project},
128
+ year = {2024},
129
+ publisher = {GitHub},
130
+ journal = {GitHub repository},
131
+ howpublished = {\url{https://github.com/p3nGu1nZz/Tau}},
132
+ }
133
+ ```
134
+
135
+ ## License
136
+
137
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
138
+
139
+ ## Acknowledgments
140
+
141
+ - Unity ML-Agents Toolkit
142
+ - TensorFlow and PyTorch communities
143
  - Hugging Face for hosting the model repository