|
|
--- |
|
|
tags: |
|
|
- setfit |
|
|
- sentence-transformers |
|
|
- text-classification |
|
|
- generated_from_setfit_trainer |
|
|
widget: [] |
|
|
metrics: |
|
|
- accuracy |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
pipeline_tag: text-classification |
|
|
library_name: setfit |
|
|
inference: true |
|
|
license: mit |
|
|
datasets: |
|
|
- NLBSE/nlbse25-code-comment-classification |
|
|
language: |
|
|
- en |
|
|
base_model: |
|
|
- sentence-transformers/all-MiniLM-L6-v2 |
|
|
--- |
|
|
|
|
|
# Python comment classifier |
|
|
|
|
|
This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Python code comment classification. |
|
|
|
|
|
The model has been trained using few-shot learning that involves: |
|
|
|
|
|
1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning. |
|
|
2. Training a classification head with features from the fine-tuned model. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Model Type:** SetFit |
|
|
- **Classification head:** [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) |
|
|
|
|
|
## Sources |
|
|
|
|
|
- **Repository:** [GitHub](https://github.com/fabiancpl/sbert-comment-classification/) |
|
|
- **Paper:** [Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification](https://ieeexplore.ieee.org/document/11029440) |
|
|
- **Dataset:** [HF Dataset](https://huggingface.co/datasets/NLBSE/nlbse25-code-comment-classification) |
|
|
|
|
|
## How to use it |
|
|
|
|
|
First, install the depencies: |
|
|
|
|
|
```bash |
|
|
pip install setfit scikit-learn |
|
|
``` |
|
|
|
|
|
Then, load the model and run inferences: |
|
|
|
|
|
```python |
|
|
from setfit import SetFitModel |
|
|
|
|
|
# Download from the 🤗 Hub |
|
|
model = SetFitModel.from_pretrained("fabiancpl/nlbse25_python") |
|
|
# Run inference |
|
|
preds = model("This function sorts a list of numbers.") |
|
|
``` |
|
|
|
|
|
## Cite as |
|
|
|
|
|
```bibtex |
|
|
@inproceedings{11029440, |
|
|
author={Peña, Fabian C. and Herbold, Steffen}, |
|
|
booktitle={2025 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE)}, |
|
|
title={Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification}, |
|
|
year={2025}, |
|
|
pages={21-24}, |
|
|
doi={10.1109/NLBSE66842.2025.00010}} |
|
|
``` |
|
|
|