---
language: ko
license: apache-2.0
tags:
- text-classification
- korean
- emotion-classification
- sentiment-analysis
datasets:
- custom
metrics:
- accuracy
- f1
widget:
- text: "오늘 정말 기분이 좋아!"
---

# Mindcast Topic Classifier

## Model Description

한국어 텍스트의 주제를 분류하는 모델입니다.

이 모델은 LoRA (Low-Rank Adaptation)를 사용하여 효율적으로 파인튜닝되었으며, 최종적으로 base model과 merge되어 배포되었습니다.

**Training Date**: 2025-12-12

## Performance

### Test Set Results

| Metric | Score |
|---|---|
| **Accuracy** | **0.5583** |
| **F1 Score (Macro)** | **0.1024** |
| **F1 Score (Weighted)** | **0.4001** |

### Confusion Matrix

```
[[67  0  0  0  0  0  0]
 [24  0  0  0  0  0  0]
 [ 6  0  0  0  0  0  0]
 [15  0  0  0  0  0  0]
 [ 6  0  0  0  0  0  0]
 [ 1  0  0  0  0  0  0]
 [ 1  0  0  0  0  0  0]]
```

### Detailed Classification Report

```
              precision    recall  f1-score   support

          사회     0.5583    1.0000    0.7166        67
          정치     0.0000    0.0000    0.0000        24
        생활문화     0.0000    0.0000    0.0000         6
          세계     0.0000    0.0000    0.0000        15
          경제     0.0000    0.0000    0.0000         6
        IT과학     0.0000    0.0000    0.0000         1
         스포츠     0.0000    0.0000    0.0000         1

   micro avg     0.5583    0.5583    0.5583       120
   macro avg     0.0798    0.1429    0.1024       120
weighted avg     0.3117    0.5583    0.4001       120

```

## Training Details

### Hyperparameters


| Hyperparameter | Value |
|---|---|
| Base Model | `klue/roberta-base` |
| Batch Size | 64 |
| Epochs | 1 |
| Learning Rate | 0.0001 |
| Warmup Ratio | 0.1 |
| Weight Decay | 0.01 |
| LoRA r | 8 |
| LoRA alpha | 16 |
| LoRA dropout | 0.05 |


### Training Data

- **Train samples**: 970
- **Valid samples**: 108
- **Test samples**: 120
- **Number of labels**: 7
- **Labels**: 사회, 정치, 생활문화, 세계, 경제, IT과학, 스포츠


## Usage

### Installation

```bash
pip install transformers torch
```

### Quick Start

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

# Load model
model_name = "merrybabyxmas/mindcast-topic-classifier"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Create pipeline
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

# Predict
text = "오늘 날씨가 정말 좋네요"
result = classifier(text)
print(result)

```

## Model Architecture

- **Base Model**: klue/roberta-base
- **Task**: Sequence Classification
- **Number of Labels**: N/A

## Citation

If you use this model, please cite:

```bibtex
@misc{mindcast-model,
  author = {Mindcast Team},
  title = {Mindcast Topic Classifier},
  year = {2025},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/merrybabyxmas/mindcast-emotion-sc-only}},
}
```

## Contact

For questions or feedback, please open an issue on the model repository.

---

*This model card was automatically generated.*