nlbse25_python / README.md

fabiancpl

Update README.md

1b8f032 verified 7 months ago

preview code

raw

history blame contribute delete

2.03 kB

metadata

tags:
  - setfit
  - sentence-transformers
  - text-classification
  - generated_from_setfit_trainer
widget: []
metrics:
  - accuracy
  - f1
  - precision
  - recall
pipeline_tag: text-classification
library_name: setfit
inference: true
license: mit
datasets:
  - NLBSE/nlbse25-code-comment-classification
language:
  - en
base_model:
  - sentence-transformers/all-MiniLM-L6-v2

Python comment classifier

This is a SetFit model that can be used for Python code comment classification.

The model has been trained using few-shot learning that involves:

Fine-tuning a Sentence Transformer with contrastive learning.
Training a classification head with features from the fine-tuned model.

Model Description

Model Type: SetFit
Classification head: RandomForestClassifier

Sources

Repository: GitHub
Paper: Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification
Dataset: HF Dataset

How to use it

First, install the depencies:

pip install setfit scikit-learn

Then, load the model and run inferences:

from setfit import SetFitModel

# Download from the 🤗 Hub
model = SetFitModel.from_pretrained("fabiancpl/nlbse25_python")
# Run inference
preds = model("This function sorts a list of numbers.")

Cite as

@inproceedings{11029440,
  author={Peña, Fabian C. and Herbold, Steffen},
  booktitle={2025 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE)}, 
  title={Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification}, 
  year={2025},
  pages={21-24},
  doi={10.1109/NLBSE66842.2025.00010}}