Update README.md

6b3d4bc verified 16 days ago

5.68 kB

	---
	language: zh
	license: apache-2.0
	tags:
	- sentiment-analysis
	- chinese
	- finance
	- finbert
	- crypto
	- text-classification
	- news
	datasets:
	- custom
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	model-index:
	- name: Chinese Financial Sentiment Analysis (Crypto)
	results:
	- task:
	type: text-classification
	name: Sentiment Analysis
	metrics:
	- type: accuracy
	value: 0.645
	name: Accuracy
	- type: f1
	value: 0.6365
	name: F1 Score
	- type: precision
	value: 0.6394
	name: Precision
	- type: recall
	value: 0.645
	name: Recall
	---

	# Chinese Financial Sentiment Analysis Model (Crypto Focus)

	中文金融情感分析模型（加密货币领域）

	## 模型描述 \| Model Description

	本模型基于 `yiyanghkust/finbert-tone-chinese` 微调，专门用于分析中文加密货币相关新闻和社交媒体内容的情感倾向。模型可以识别三种情感类别：正面（Positive）、中性（Neutral）和负面（Negative）。

	This model is fine-tuned from `yiyanghkust/finbert-tone-chinese` and specifically designed for sentiment analysis of Chinese cryptocurrency-related news and social media content. It can classify text into three sentiment categories: Positive, Neutral, and Negative.

	## 训练数据 \| Training Data

	- 数据量 \| Size: 1000条人工标注的中文金融新闻 \| 1000 manually annotated Chinese financial news articles
	- 数据来源 \| Source: 加密货币相关新闻和推文 \| Cryptocurrency-related news and tweets
	- 标注方式 \| Annotation: AI辅助 + 人工修正 \| AI-assisted + Manual correction
	- 数据分布 \| Distribution:
	- Positive（正面）: 420条 (42.0%)
	- Neutral（中性）: 420条 (42.0%)
	- Negative（负面）: 160条 (16.0%)

	## 性能指标 \| Performance Metrics

	在200条测试集上的表现 \| Performance on 200 test samples:

	\| 指标 Metric \| 数值 Value \|
	\|-------------\|-----------\|
	\| 准确率 Accuracy \| 64.50% \|
	\| F1分数 F1 Score \| 63.65% \|
	\| 精确率 Precision \| 63.94% \|
	\| 召回率 Recall \| 64.50% \|

	## 使用方法 \| Usage

	### 快速开始 \| Quick Start

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# 加载模型和分词器 \| Load model and tokenizer
	model_name = "LocalOptimum/chinese-crypto-sentiment"
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSequenceClassification.from_pretrained(model_name)

	# 分析文本 \| Analyze text
	text = "比特币突破10万美元创历史新高"
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

	# 预测 \| Predict
	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
	predicted_class = torch.argmax(predictions, dim=-1).item()

	# 结果映射 \| Result mapping
	labels = ['positive', 'neutral', 'negative']
	sentiment = labels[predicted_class]
	confidence = predictions[0][predicted_class].item()

	print(f"情感: {sentiment}")
	print(f"置信度: {confidence:.4f}")
	```

	### 批量处理 \| Batch Processing

	```python
	texts = [
	"币安获得阿布扎比监管授权",
	"以太坊完成Fusaka升级",
	"某交易所遭攻击损失100万美元"
	]

	inputs = tokenizer(texts, return_tensors="pt", truncation=True,
	max_length=128, padding=True)

	with torch.no_grad():
	outputs = model(**inputs)
	predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
	predicted_classes = torch.argmax(predictions, dim=-1)

	labels = ['positive', 'neutral', 'negative']
	for text, pred in zip(texts, predicted_classes):
	print(f"{text} -> {labels[pred]}")
	```

	## 训练参数 \| Training Configuration

	- 基础模型 \| Base Model: yiyanghkust/finbert-tone-chinese
	- 训练轮数 \| Epochs: 5
	- 批次大小 \| Batch Size: 16
	- 学习率 \| Learning Rate: 2e-5
	- 最大序列长度 \| Max Length: 128
	- 训练设备 \| Device: NVIDIA GeForce RTX 3060 Laptop GPU
	- 训练时间 \| Training Time: ~5分钟 \| ~5 minutes

	## 适用场景 \| Use Cases

	- ✅ 加密货币新闻情感分析
	- ✅ 社交媒体舆情监控
	- ✅ 金融市场情绪指标
	- ✅ 实时新闻情感跟踪
	- ✅ 投资决策辅助参考

	## 局限性 \| Limitations

	- ⚠️ 主要针对加密货币领域的金融新闻，其他金融领域可能表现不佳
	- ⚠️ 负面样本相对较少（16%），对负面情感的识别可能不够敏感
	- ⚠️ 短文本（少于10字）的分析准确率可能下降
	- ⚠️ 仅支持简体中文
	- ⚠️ 模型不能替代人工判断，仅供参考

	## 许可证 \| License

	Apache-2.0

	## 引用 \| Citation

	如果使用本模型，请引用：

	```bibtex
	@misc{watchtower-sentiment-2025,
	title={Chinese Financial Sentiment Analysis Model (Crypto Focus)},
	author={Onefly},
	year={2025},
	howpublished={\url{https://huggingface.co/YOUR_USERNAME/sentiment-finetuned-1000}},
	note={Fine-tuned from yiyanghkust/finbert-tone-chinese}
	}
	```

	## 基础模型 \| Base Model

	本模型基于以下模型微调：
	- [yiyanghkust/finbert-tone-chinese](https://huggingface.co/yiyanghkust/finbert-tone-chinese)

	感谢原作者的贡献！

	## 更新日志 \| Changelog

	### v2.0 (2025-12-09)
	- ✅ 扩充训练数据至1000条
	- ✅ 修正标注错误，提升数据质量
	- ✅ 优化类别分布，提升模型平衡性
	- ✅ F1分数提升2.01%（0.6165 → 0.6365）

	### v1.0 (Initial Release)
	- 基于500条标注数据的初始版本

	## 联系方式 \| Contact

	如有问题或建议，欢迎提 issue 或 PR。

	---

	维护者 \| Maintainer: Onefly
	最后更新 \| Last Updated: 2025-12-09