|
|
--- |
|
|
language: zh |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- sentiment-analysis |
|
|
- chinese |
|
|
- finance |
|
|
- finbert |
|
|
- crypto |
|
|
- text-classification |
|
|
- news |
|
|
datasets: |
|
|
- custom |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
model-index: |
|
|
- name: Chinese Financial Sentiment Analysis (Crypto) |
|
|
results: |
|
|
- task: |
|
|
type: text-classification |
|
|
name: Sentiment Analysis |
|
|
metrics: |
|
|
- type: accuracy |
|
|
value: 0.645 |
|
|
name: Accuracy |
|
|
- type: f1 |
|
|
value: 0.6365 |
|
|
name: F1 Score |
|
|
- type: precision |
|
|
value: 0.6394 |
|
|
name: Precision |
|
|
- type: recall |
|
|
value: 0.645 |
|
|
name: Recall |
|
|
--- |
|
|
|
|
|
# Chinese Financial Sentiment Analysis Model (Crypto Focus) |
|
|
|
|
|
中文金融情感分析模型(加密货币领域) |
|
|
|
|
|
## 模型描述 | Model Description |
|
|
|
|
|
本模型基于 `yiyanghkust/finbert-tone-chinese` 微调,专门用于分析中文加密货币相关新闻和社交媒体内容的情感倾向。模型可以识别三种情感类别:正面(Positive)、中性(Neutral)和负面(Negative)。 |
|
|
|
|
|
This model is fine-tuned from `yiyanghkust/finbert-tone-chinese` and specifically designed for sentiment analysis of Chinese cryptocurrency-related news and social media content. It can classify text into three sentiment categories: Positive, Neutral, and Negative. |
|
|
|
|
|
## 训练数据 | Training Data |
|
|
|
|
|
- **数据量 | Size**: 1000条人工标注的中文金融新闻 | 1000 manually annotated Chinese financial news articles |
|
|
- **数据来源 | Source**: 加密货币相关新闻和推文 | Cryptocurrency-related news and tweets |
|
|
- **标注方式 | Annotation**: AI辅助 + 人工修正 | AI-assisted + Manual correction |
|
|
- **数据分布 | Distribution**: |
|
|
- Positive(正面): 420条 (42.0%) |
|
|
- Neutral(中性): 420条 (42.0%) |
|
|
- Negative(负面): 160条 (16.0%) |
|
|
|
|
|
## 性能指标 | Performance Metrics |
|
|
|
|
|
在200条测试集上的表现 | Performance on 200 test samples: |
|
|
|
|
|
| 指标 Metric | 数值 Value | |
|
|
|-------------|-----------| |
|
|
| 准确率 Accuracy | 64.50% | |
|
|
| F1分数 F1 Score | 63.65% | |
|
|
| 精确率 Precision | 63.94% | |
|
|
| 召回率 Recall | 64.50% | |
|
|
|
|
|
## 使用方法 | Usage |
|
|
|
|
|
### 快速开始 | Quick Start |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
# 加载模型和分词器 | Load model and tokenizer |
|
|
model_name = "LocalOptimum/chinese-crypto-sentiment" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_name) |
|
|
|
|
|
# 分析文本 | Analyze text |
|
|
text = "比特币突破10万美元创历史新高" |
|
|
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) |
|
|
|
|
|
# 预测 | Predict |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
|
predicted_class = torch.argmax(predictions, dim=-1).item() |
|
|
|
|
|
# 结果映射 | Result mapping |
|
|
labels = ['positive', 'neutral', 'negative'] |
|
|
sentiment = labels[predicted_class] |
|
|
confidence = predictions[0][predicted_class].item() |
|
|
|
|
|
print(f"情感: {sentiment}") |
|
|
print(f"置信度: {confidence:.4f}") |
|
|
``` |
|
|
|
|
|
### 批量处理 | Batch Processing |
|
|
|
|
|
```python |
|
|
texts = [ |
|
|
"币安获得阿布扎比监管授权", |
|
|
"以太坊完成Fusaka升级", |
|
|
"某交易所遭攻击损失100万美元" |
|
|
] |
|
|
|
|
|
inputs = tokenizer(texts, return_tensors="pt", truncation=True, |
|
|
max_length=128, padding=True) |
|
|
|
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) |
|
|
predicted_classes = torch.argmax(predictions, dim=-1) |
|
|
|
|
|
labels = ['positive', 'neutral', 'negative'] |
|
|
for text, pred in zip(texts, predicted_classes): |
|
|
print(f"{text} -> {labels[pred]}") |
|
|
``` |
|
|
|
|
|
## 训练参数 | Training Configuration |
|
|
|
|
|
- **基础模型 | Base Model**: yiyanghkust/finbert-tone-chinese |
|
|
- **训练轮数 | Epochs**: 5 |
|
|
- **批次大小 | Batch Size**: 16 |
|
|
- **学习率 | Learning Rate**: 2e-5 |
|
|
- **最大序列长度 | Max Length**: 128 |
|
|
- **训练设备 | Device**: NVIDIA GeForce RTX 3060 Laptop GPU |
|
|
- **训练时间 | Training Time**: ~5分钟 | ~5 minutes |
|
|
|
|
|
## 适用场景 | Use Cases |
|
|
|
|
|
- ✅ 加密货币新闻情感分析 |
|
|
- ✅ 社交媒体舆情监控 |
|
|
- ✅ 金融市场情绪指标 |
|
|
- ✅ 实时新闻情感跟踪 |
|
|
- ✅ 投资决策辅助参考 |
|
|
|
|
|
## 局限性 | Limitations |
|
|
|
|
|
- ⚠️ 主要针对加密货币领域的金融新闻,其他金融领域可能表现不佳 |
|
|
- ⚠️ 负面样本相对较少(16%),对负面情感的识别可能不够敏感 |
|
|
- ⚠️ 短文本(少于10字)的分析准确率可能下降 |
|
|
- ⚠️ 仅支持简体中文 |
|
|
- ⚠️ 模型不能替代人工判断,仅供参考 |
|
|
|
|
|
## 许可证 | License |
|
|
|
|
|
Apache-2.0 |
|
|
|
|
|
## 引用 | Citation |
|
|
|
|
|
如果使用本模型,请引用: |
|
|
|
|
|
```bibtex |
|
|
@misc{watchtower-sentiment-2025, |
|
|
title={Chinese Financial Sentiment Analysis Model (Crypto Focus)}, |
|
|
author={Onefly}, |
|
|
year={2025}, |
|
|
howpublished={\url{https://huggingface.co/YOUR_USERNAME/sentiment-finetuned-1000}}, |
|
|
note={Fine-tuned from yiyanghkust/finbert-tone-chinese} |
|
|
} |
|
|
``` |
|
|
|
|
|
## 基础模型 | Base Model |
|
|
|
|
|
本模型基于以下模型微调: |
|
|
- [yiyanghkust/finbert-tone-chinese](https://huggingface.co/yiyanghkust/finbert-tone-chinese) |
|
|
|
|
|
感谢原作者的贡献! |
|
|
|
|
|
## 更新日志 | Changelog |
|
|
|
|
|
### v2.0 (2025-12-09) |
|
|
- ✅ 扩充训练数据至1000条 |
|
|
- ✅ 修正标注错误,提升数据质量 |
|
|
- ✅ 优化类别分布,提升模型平衡性 |
|
|
- ✅ F1分数提升2.01%(0.6165 → 0.6365) |
|
|
|
|
|
### v1.0 (Initial Release) |
|
|
- 基于500条标注数据的初始版本 |
|
|
|
|
|
## 联系方式 | Contact |
|
|
|
|
|
如有问题或建议,欢迎提 issue 或 PR。 |
|
|
|
|
|
--- |
|
|
|
|
|
**维护者 | Maintainer**: Onefly |
|
|
**最后更新 | Last Updated**: 2025-12-09 |