File size: 5,677 Bytes
82d05ef 6b3d4bc 82d05ef 6b3d4bc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 |
---
language: zh
license: apache-2.0
tags:
- sentiment-analysis
- chinese
- finance
- finbert
- crypto
- text-classification
- news
datasets:
- custom
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: Chinese Financial Sentiment Analysis (Crypto)
results:
- task:
type: text-classification
name: Sentiment Analysis
metrics:
- type: accuracy
value: 0.645
name: Accuracy
- type: f1
value: 0.6365
name: F1 Score
- type: precision
value: 0.6394
name: Precision
- type: recall
value: 0.645
name: Recall
---
# Chinese Financial Sentiment Analysis Model (Crypto Focus)
中文金融情感分析模型(加密货币领域)
## 模型描述 | Model Description
本模型基于 `yiyanghkust/finbert-tone-chinese` 微调,专门用于分析中文加密货币相关新闻和社交媒体内容的情感倾向。模型可以识别三种情感类别:正面(Positive)、中性(Neutral)和负面(Negative)。
This model is fine-tuned from `yiyanghkust/finbert-tone-chinese` and specifically designed for sentiment analysis of Chinese cryptocurrency-related news and social media content. It can classify text into three sentiment categories: Positive, Neutral, and Negative.
## 训练数据 | Training Data
- **数据量 | Size**: 1000条人工标注的中文金融新闻 | 1000 manually annotated Chinese financial news articles
- **数据来源 | Source**: 加密货币相关新闻和推文 | Cryptocurrency-related news and tweets
- **标注方式 | Annotation**: AI辅助 + 人工修正 | AI-assisted + Manual correction
- **数据分布 | Distribution**:
- Positive(正面): 420条 (42.0%)
- Neutral(中性): 420条 (42.0%)
- Negative(负面): 160条 (16.0%)
## 性能指标 | Performance Metrics
在200条测试集上的表现 | Performance on 200 test samples:
| 指标 Metric | 数值 Value |
|-------------|-----------|
| 准确率 Accuracy | 64.50% |
| F1分数 F1 Score | 63.65% |
| 精确率 Precision | 63.94% |
| 召回率 Recall | 64.50% |
## 使用方法 | Usage
### 快速开始 | Quick Start
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# 加载模型和分词器 | Load model and tokenizer
model_name = "LocalOptimum/chinese-crypto-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# 分析文本 | Analyze text
text = "比特币突破10万美元创历史新高"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
# 预测 | Predict
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_class = torch.argmax(predictions, dim=-1).item()
# 结果映射 | Result mapping
labels = ['positive', 'neutral', 'negative']
sentiment = labels[predicted_class]
confidence = predictions[0][predicted_class].item()
print(f"情感: {sentiment}")
print(f"置信度: {confidence:.4f}")
```
### 批量处理 | Batch Processing
```python
texts = [
"币安获得阿布扎比监管授权",
"以太坊完成Fusaka升级",
"某交易所遭攻击损失100万美元"
]
inputs = tokenizer(texts, return_tensors="pt", truncation=True,
max_length=128, padding=True)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
predicted_classes = torch.argmax(predictions, dim=-1)
labels = ['positive', 'neutral', 'negative']
for text, pred in zip(texts, predicted_classes):
print(f"{text} -> {labels[pred]}")
```
## 训练参数 | Training Configuration
- **基础模型 | Base Model**: yiyanghkust/finbert-tone-chinese
- **训练轮数 | Epochs**: 5
- **批次大小 | Batch Size**: 16
- **学习率 | Learning Rate**: 2e-5
- **最大序列长度 | Max Length**: 128
- **训练设备 | Device**: NVIDIA GeForce RTX 3060 Laptop GPU
- **训练时间 | Training Time**: ~5分钟 | ~5 minutes
## 适用场景 | Use Cases
- ✅ 加密货币新闻情感分析
- ✅ 社交媒体舆情监控
- ✅ 金融市场情绪指标
- ✅ 实时新闻情感跟踪
- ✅ 投资决策辅助参考
## 局限性 | Limitations
- ⚠️ 主要针对加密货币领域的金融新闻,其他金融领域可能表现不佳
- ⚠️ 负面样本相对较少(16%),对负面情感的识别可能不够敏感
- ⚠️ 短文本(少于10字)的分析准确率可能下降
- ⚠️ 仅支持简体中文
- ⚠️ 模型不能替代人工判断,仅供参考
## 许可证 | License
Apache-2.0
## 引用 | Citation
如果使用本模型,请引用:
```bibtex
@misc{watchtower-sentiment-2025,
title={Chinese Financial Sentiment Analysis Model (Crypto Focus)},
author={Onefly},
year={2025},
howpublished={\url{https://huggingface.co/YOUR_USERNAME/sentiment-finetuned-1000}},
note={Fine-tuned from yiyanghkust/finbert-tone-chinese}
}
```
## 基础模型 | Base Model
本模型基于以下模型微调:
- [yiyanghkust/finbert-tone-chinese](https://huggingface.co/yiyanghkust/finbert-tone-chinese)
感谢原作者的贡献!
## 更新日志 | Changelog
### v2.0 (2025-12-09)
- ✅ 扩充训练数据至1000条
- ✅ 修正标注错误,提升数据质量
- ✅ 优化类别分布,提升模型平衡性
- ✅ F1分数提升2.01%(0.6165 → 0.6365)
### v1.0 (Initial Release)
- 基于500条标注数据的初始版本
## 联系方式 | Contact
如有问题或建议,欢迎提 issue 或 PR。
---
**维护者 | Maintainer**: Onefly
**最后更新 | Last Updated**: 2025-12-09 |