Spaces:

zeekay
/

zen-training

Paused

Hanzo Dev commited on Nov 5

Commit

435c70a

1 Parent(s): 333f111

Add 8 more top datasets: Magicoder, AgentInstruct, ToolBench, OpenOrca, etc

Files changed (2) hide show

README.md CHANGED Viewed

@@ -44,8 +44,19 @@ Train any Zen model with any dataset combination from HuggingFace. Everything ru
 **Function Calling:**
 - xLAM 60k (Salesforce high-quality function calling)
 **Instruction Tuning:**
 - Alpaca (52k instruction samples)
 ## 🚀 How to Use

 **Function Calling:**
 - xLAM 60k (Salesforce high-quality function calling)
+**Coding:**
+- Magicoder-OSS-Instruct (75k code samples)
+- CodeFeedback-Filtered (157k code instructions)
+- Evol-Instruct-Code (80k evolved code complexity)
+**Advanced Agentic:**
+- AgentInstruct (1M agent trajectories from Microsoft)
+- ToolBench (16k tool use examples)
+- WebArena (2k web navigation tasks)
 **Instruction Tuning:**
 - Alpaca (52k instruction samples)
+- OpenOrca (4.2M reasoning-focused instructions)
 ## 🚀 How to Use

app.py CHANGED Viewed

@@ -114,12 +114,51 @@ DATASETS = {
             "size": "60k samples"
         },
     },
     "Instruction Tuning": {
         "Alpaca": {
             "hf_id": "tatsu-lab/alpaca",
             "config": None,
             "size": "52k samples"
         },
     }
 }

             "size": "60k samples"
         },
     },
+    "Coding Datasets": {
+        "Magicoder-OSS-Instruct": {
+            "hf_id": "ise-uiuc/Magicoder-OSS-Instruct-75K",
+            "config": None,
+            "size": "75k code samples"
+        },
+        "CodeFeedback-Filtered": {
+            "hf_id": "m-a-p/CodeFeedback-Filtered-Instruction",
+            "config": None,
+            "size": "157k code samples"
+        },
+        "Evol-Instruct-Code": {
+            "hf_id": "nickrosh/Evol-Instruct-Code-80k-v1",
+            "config": None,
+            "size": "80k evolved code"
+        },
+    },
+    "Advanced Agentic": {
+        "AgentInstruct": {
+            "hf_id": "microsoft/orca-agentinstruct-1M-v1",
+            "config": None,
+            "size": "1M agent samples"
+        },
+        "ToolBench": {
+            "hf_id": "ToolBench/ToolBench",
+            "config": None,
+            "size": "16k tool use"
+        },
+        "WebArena": {
+            "hf_id": "neulab/agent-data-collection",
+            "config": "nnetnav-wa",
+            "size": "~2k web agent"
+        },
+    },
     "Instruction Tuning": {
         "Alpaca": {
             "hf_id": "tatsu-lab/alpaca",
             "config": None,
             "size": "52k samples"
         },
+        "OpenOrca": {
+            "hf_id": "Open-Orca/OpenOrca",
+            "config": None,
+            "size": "4.2M reasoning"
+        },
     }
 }