---
tags:
- setfit
- sentence-transformers
- text-classification
- generated_from_setfit_trainer
widget: []
metrics:
- accuracy
- f1
- precision
- recall
pipeline_tag: text-classification
library_name: setfit
inference: true
license: mit
datasets:
- NLBSE/nlbse25-code-comment-classification
language:
- en
base_model:
- sentence-transformers/all-MiniLM-L6-v2
---

# Python comment classifier

This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Python code comment classification.

The model has been trained using few-shot learning that involves:

1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning.
2. Training a classification head with features from the fine-tuned model.

## Model Description

- **Model Type:** SetFit
- **Classification head:** [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)

## Sources

- **Repository:** [GitHub](https://github.com/fabiancpl/sbert-comment-classification/)
- **Paper:** [Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification](https://ieeexplore.ieee.org/document/11029440)
- **Dataset:** [HF Dataset](https://huggingface.co/datasets/NLBSE/nlbse25-code-comment-classification)

## How to use it

First, install the depencies:

```bash
pip install setfit scikit-learn
```

Then, load the model and run inferences:

```python
from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("fabiancpl/nlbse25_python")
# Run inference
preds = model("This function sorts a list of numbers.")
```

## Cite as

```bibtex
@inproceedings{11029440,
  author={Peña, Fabian C. and Herbold, Steffen},
  booktitle={2025 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE)}, 
  title={Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification}, 
  year={2025},
  pages={21-24},
  doi={10.1109/NLBSE66842.2025.00010}}
```