--- language: zh license: apache-2.0 tags: - sentiment-analysis - chinese - finance - finbert - crypto - text-classification - news datasets: - custom metrics: - accuracy - f1 - precision - recall model-index: - name: Chinese Financial Sentiment Analysis (Crypto) results: - task: type: text-classification name: Sentiment Analysis metrics: - type: accuracy value: 0.645 name: Accuracy - type: f1 value: 0.6365 name: F1 Score - type: precision value: 0.6394 name: Precision - type: recall value: 0.645 name: Recall --- # Chinese Financial Sentiment Analysis Model (Crypto Focus) 中文金融情感分析模型(加密货币领域) ## 模型描述 | Model Description 本模型基于 `yiyanghkust/finbert-tone-chinese` 微调,专门用于分析中文加密货币相关新闻和社交媒体内容的情感倾向。模型可以识别三种情感类别:正面(Positive)、中性(Neutral)和负面(Negative)。 This model is fine-tuned from `yiyanghkust/finbert-tone-chinese` and specifically designed for sentiment analysis of Chinese cryptocurrency-related news and social media content. It can classify text into three sentiment categories: Positive, Neutral, and Negative. ## 训练数据 | Training Data - **数据量 | Size**: 1000条人工标注的中文金融新闻 | 1000 manually annotated Chinese financial news articles - **数据来源 | Source**: 加密货币相关新闻和推文 | Cryptocurrency-related news and tweets - **标注方式 | Annotation**: AI辅助 + 人工修正 | AI-assisted + Manual correction - **数据分布 | Distribution**: - Positive(正面): 420条 (42.0%) - Neutral(中性): 420条 (42.0%) - Negative(负面): 160条 (16.0%) ## 性能指标 | Performance Metrics 在200条测试集上的表现 | Performance on 200 test samples: | 指标 Metric | 数值 Value | |-------------|-----------| | 准确率 Accuracy | 64.50% | | F1分数 F1 Score | 63.65% | | 精确率 Precision | 63.94% | | 召回率 Recall | 64.50% | ## 使用方法 | Usage ### 快速开始 | Quick Start ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # 加载模型和分词器 | Load model and tokenizer model_name = "LocalOptimum/chinese-crypto-sentiment" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSequenceClassification.from_pretrained(model_name) # 分析文本 | Analyze text text = "比特币突破10万美元创历史新高" inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128) # 预测 | Predict with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_class = torch.argmax(predictions, dim=-1).item() # 结果映射 | Result mapping labels = ['positive', 'neutral', 'negative'] sentiment = labels[predicted_class] confidence = predictions[0][predicted_class].item() print(f"情感: {sentiment}") print(f"置信度: {confidence:.4f}") ``` ### 批量处理 | Batch Processing ```python texts = [ "币安获得阿布扎比监管授权", "以太坊完成Fusaka升级", "某交易所遭攻击损失100万美元" ] inputs = tokenizer(texts, return_tensors="pt", truncation=True, max_length=128, padding=True) with torch.no_grad(): outputs = model(**inputs) predictions = torch.nn.functional.softmax(outputs.logits, dim=-1) predicted_classes = torch.argmax(predictions, dim=-1) labels = ['positive', 'neutral', 'negative'] for text, pred in zip(texts, predicted_classes): print(f"{text} -> {labels[pred]}") ``` ## 训练参数 | Training Configuration - **基础模型 | Base Model**: yiyanghkust/finbert-tone-chinese - **训练轮数 | Epochs**: 5 - **批次大小 | Batch Size**: 16 - **学习率 | Learning Rate**: 2e-5 - **最大序列长度 | Max Length**: 128 - **训练设备 | Device**: NVIDIA GeForce RTX 3060 Laptop GPU - **训练时间 | Training Time**: ~5分钟 | ~5 minutes ## 适用场景 | Use Cases - ✅ 加密货币新闻情感分析 - ✅ 社交媒体舆情监控 - ✅ 金融市场情绪指标 - ✅ 实时新闻情感跟踪 - ✅ 投资决策辅助参考 ## 局限性 | Limitations - ⚠️ 主要针对加密货币领域的金融新闻,其他金融领域可能表现不佳 - ⚠️ 负面样本相对较少(16%),对负面情感的识别可能不够敏感 - ⚠️ 短文本(少于10字)的分析准确率可能下降 - ⚠️ 仅支持简体中文 - ⚠️ 模型不能替代人工判断,仅供参考 ## 许可证 | License Apache-2.0 ## 引用 | Citation 如果使用本模型,请引用: ```bibtex @misc{watchtower-sentiment-2025, title={Chinese Financial Sentiment Analysis Model (Crypto Focus)}, author={Onefly}, year={2025}, howpublished={\url{https://huggingface.co/YOUR_USERNAME/sentiment-finetuned-1000}}, note={Fine-tuned from yiyanghkust/finbert-tone-chinese} } ``` ## 基础模型 | Base Model 本模型基于以下模型微调: - [yiyanghkust/finbert-tone-chinese](https://huggingface.co/yiyanghkust/finbert-tone-chinese) 感谢原作者的贡献! ## 更新日志 | Changelog ### v2.0 (2025-12-09) - ✅ 扩充训练数据至1000条 - ✅ 修正标注错误,提升数据质量 - ✅ 优化类别分布,提升模型平衡性 - ✅ F1分数提升2.01%(0.6165 → 0.6365) ### v1.0 (Initial Release) - 基于500条标注数据的初始版本 ## 联系方式 | Contact 如有问题或建议,欢迎提 issue 或 PR。 --- **维护者 | Maintainer**: Onefly **最后更新 | Last Updated**: 2025-12-09