--- tags: - setfit - sentence-transformers - text-classification - generated_from_setfit_trainer widget: [] metrics: - accuracy - f1 - precision - recall pipeline_tag: text-classification library_name: setfit inference: true license: mit datasets: - NLBSE/nlbse25-code-comment-classification language: - en base_model: - sentence-transformers/all-MiniLM-L6-v2 --- # Python comment classifier This is a [SetFit](https://github.com/huggingface/setfit) model that can be used for Python code comment classification. The model has been trained using few-shot learning that involves: 1. Fine-tuning a [Sentence Transformer](https://www.sbert.net) with contrastive learning. 2. Training a classification head with features from the fine-tuned model. ## Model Description - **Model Type:** SetFit - **Classification head:** [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html) ## Sources - **Repository:** [GitHub](https://github.com/fabiancpl/sbert-comment-classification/) - **Paper:** [Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification](https://ieeexplore.ieee.org/document/11029440) - **Dataset:** [HF Dataset](https://huggingface.co/datasets/NLBSE/nlbse25-code-comment-classification) ## How to use it First, install the depencies: ```bash pip install setfit scikit-learn ``` Then, load the model and run inferences: ```python from setfit import SetFitModel # Download from the 🤗 Hub model = SetFitModel.from_pretrained("fabiancpl/nlbse25_python") # Run inference preds = model("This function sorts a list of numbers.") ``` ## Cite as ```bibtex @inproceedings{11029440, author={Peña, Fabian C. and Herbold, Steffen}, booktitle={2025 IEEE/ACM International Workshop on Natural Language-Based Software Engineering (NLBSE)}, title={Evaluating the Performance and Efficiency of Sentence-BERT for Code Comment Classification}, year={2025}, pages={21-24}, doi={10.1109/NLBSE66842.2025.00010}} ```