File size: 5,677 Bytes
82d05ef
 
 
 
 
 
 
 
 
 
6b3d4bc
82d05ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b3d4bc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
---
language: zh
license: apache-2.0
tags:
- sentiment-analysis
- chinese
- finance
- finbert
- crypto
- text-classification
- news
datasets:
- custom
metrics:
- accuracy
- f1
- precision
- recall
model-index:
- name: Chinese Financial Sentiment Analysis (Crypto)
  results:
  - task:
      type: text-classification
      name: Sentiment Analysis
    metrics:
    - type: accuracy
      value: 0.645
      name: Accuracy
    - type: f1
      value: 0.6365
      name: F1 Score
    - type: precision
      value: 0.6394
      name: Precision
    - type: recall
      value: 0.645
      name: Recall
---

# Chinese Financial Sentiment Analysis Model (Crypto Focus)

中文金融情感分析模型(加密货币领域)

## 模型描述 | Model Description

本模型基于 `yiyanghkust/finbert-tone-chinese` 微调,专门用于分析中文加密货币相关新闻和社交媒体内容的情感倾向。模型可以识别三种情感类别:正面(Positive)、中性(Neutral)和负面(Negative)。

This model is fine-tuned from `yiyanghkust/finbert-tone-chinese` and specifically designed for sentiment analysis of Chinese cryptocurrency-related news and social media content. It can classify text into three sentiment categories: Positive, Neutral, and Negative.

## 训练数据 | Training Data

- **数据量 | Size**: 1000条人工标注的中文金融新闻 | 1000 manually annotated Chinese financial news articles
- **数据来源 | Source**: 加密货币相关新闻和推文 | Cryptocurrency-related news and tweets
- **标注方式 | Annotation**: AI辅助 + 人工修正 | AI-assisted + Manual correction
- **数据分布 | Distribution**:
  - Positive(正面): 420条 (42.0%)
  - Neutral(中性): 420条 (42.0%)
  - Negative(负面): 160条 (16.0%)

## 性能指标 | Performance Metrics

在200条测试集上的表现 | Performance on 200 test samples:

| 指标 Metric | 数值 Value |
|-------------|-----------|
| 准确率 Accuracy | 64.50% |
| F1分数 F1 Score | 63.65% |
| 精确率 Precision | 63.94% |
| 召回率 Recall | 64.50% |

## 使用方法 | Usage

### 快速开始 | Quick Start

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# 加载模型和分词器 | Load model and tokenizer
model_name = "LocalOptimum/chinese-crypto-sentiment"  
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# 分析文本 | Analyze text
text = "比特币突破10万美元创历史新高"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

# 预测 | Predict
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1).item()

# 结果映射 | Result mapping
labels = ['positive', 'neutral', 'negative']
sentiment = labels[predicted_class]
confidence = predictions[0][predicted_class].item()

print(f"情感: {sentiment}")
print(f"置信度: {confidence:.4f}")
```

### 批量处理 | Batch Processing

```python
texts = [
    "币安获得阿布扎比监管授权",
    "以太坊完成Fusaka升级",
    "某交易所遭攻击损失100万美元"
]

inputs = tokenizer(texts, return_tensors="pt", truncation=True,
                   max_length=128, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
    predicted_classes = torch.argmax(predictions, dim=-1)

labels = ['positive', 'neutral', 'negative']
for text, pred in zip(texts, predicted_classes):
    print(f"{text} -> {labels[pred]}")
```

## 训练参数 | Training Configuration

- **基础模型 | Base Model**: yiyanghkust/finbert-tone-chinese
- **训练轮数 | Epochs**: 5
- **批次大小 | Batch Size**: 16
- **学习率 | Learning Rate**: 2e-5
- **最大序列长度 | Max Length**: 128
- **训练设备 | Device**: NVIDIA GeForce RTX 3060 Laptop GPU
- **训练时间 | Training Time**: ~5分钟 | ~5 minutes

## 适用场景 | Use Cases

- ✅ 加密货币新闻情感分析
- ✅ 社交媒体舆情监控
- ✅ 金融市场情绪指标
- ✅ 实时新闻情感跟踪
- ✅ 投资决策辅助参考

## 局限性 | Limitations

- ⚠️ 主要针对加密货币领域的金融新闻,其他金融领域可能表现不佳
- ⚠️ 负面样本相对较少(16%),对负面情感的识别可能不够敏感
- ⚠️ 短文本(少于10字)的分析准确率可能下降
- ⚠️ 仅支持简体中文
- ⚠️ 模型不能替代人工判断,仅供参考

## 许可证 | License

Apache-2.0

## 引用 | Citation

如果使用本模型,请引用:

```bibtex
@misc{watchtower-sentiment-2025,
  title={Chinese Financial Sentiment Analysis Model (Crypto Focus)},
  author={Onefly},
  year={2025},
  howpublished={\url{https://huggingface.co/YOUR_USERNAME/sentiment-finetuned-1000}},
  note={Fine-tuned from yiyanghkust/finbert-tone-chinese}
}
```

## 基础模型 | Base Model

本模型基于以下模型微调:
- [yiyanghkust/finbert-tone-chinese](https://huggingface.co/yiyanghkust/finbert-tone-chinese)

感谢原作者的贡献!

## 更新日志 | Changelog

### v2.0 (2025-12-09)
- ✅ 扩充训练数据至1000条
- ✅ 修正标注错误,提升数据质量
- ✅ 优化类别分布,提升模型平衡性
- ✅ F1分数提升2.01%(0.6165 → 0.6365)

### v1.0 (Initial Release)
- 基于500条标注数据的初始版本

## 联系方式 | Contact

如有问题或建议,欢迎提 issue 或 PR。

---

**维护者 | Maintainer**: Onefly
**最后更新 | Last Updated**: 2025-12-09