Shoriful025
/

Abstract-to-Topic-and-Sentiment-Classifier

+---
+tags:
+- text-classification
+- scientific-abstract
+- multi-label
+- sentiment-analysis
+- distilbert
+datasets:
+- SciTopicSentimentDataset
+license: apache-2.0
+---
+# SciTopicSentimentClassifier
+## 🔬 Overview
+SciTopicSentimentClassifier is a **multi-label classification** model fine-tuned to simultaneously predict the **primary scientific topic** and the **underlying sentiment** (high-positive or low-negative) from a research paper's abstract text. This model is ideal for automated paper categorization, literature review triage, and scientific trend analysis.
+The model was trained on the SciTopicSentimentDataset (a proprietary dataset similar to the generated Dataset 1), which links abstract text to predefined scientific topics and a binarized sentiment score derived from the original continuous value.
+## 🧠 Model Architecture
+This model is an adaptation of **DistilBERT**, a smaller, faster, and lighter version of BERT.
+* **Base Model:** `distilbert-base-uncased`
+* **Modification:** A custom classification head is added on top of the DistilBERT pooled output.
+* **Output Layer:** The final layer is a dense layer with **12 outputs** (10 for scientific topics + 2 for sentiment classes), followed by a Sigmoid activation function to allow for multi-label prediction (an abstract can belong to multiple topics/sentiments).
+* **Input:** Tokenized abstract text (up to 512 tokens).
+* **Task:** Multi-Label Text Classification.
+## 🚀 Intended Use
+* **Automated Labeling:** Automatically assign relevant topic tags to new scientific publication abstracts.
+* **Research Triage:** Quickly filter papers based on subject matter and the perceived 'success' or 'novelty' indicated by the abstract's sentiment.
+* **Scientific Landscape Mapping:** Analyze large corpora of papers to track emerging positive/negative trends in specific research areas.
+* **Indexing Systems:** Integration into library or repository indexing services.
+## ⚠️ Limitations
+* **Topic Granularity:** The model is limited to the 10 predefined topics in its training set. It may perform poorly on highly niche or interdisciplinary topics outside this scope.
+* **Sentiment Scope:** The sentiment is coarse-grained (high vs. low) based on a metric derived from the abstract's language (e.g., using words like "novel," "significant," "limitations," "challenges"). It does not capture nuanced human-level emotional sentiment.
+* **Language:** Trained exclusively on English abstracts.
+* **Max Length:** Input texts longer than 512 tokens are truncated.
+## 💻 Example Code
+To use the model for prediction:
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+# Load the model and tokenizer
+model_name = "your-username/SciTopicSentimentClassifier" # Replace with actual HuggingFace path
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Sample Abstract
+abstract = "We propose a novel architecture combining convolutional and recurrent neural networks for multi-modal data fusion, demonstrating significant performance gains in complex classification tasks, overcoming prior limitations."
+# Preprocess the input
+inputs = tokenizer(abstract, return_tensors="pt", truncation=True, padding=True)
+# Run inference
+with torch.no_grad():
+    logits = model(**inputs).logits
+# Apply sigmoid for multi-label scores
+probs = torch.sigmoid(logits)
+# Get predicted labels (e.g., probability > 0.5)
+labels = model.config.id2label
+predictions = []
+for i, prob in enumerate(probs[0]):
+    if prob > 0.5:
+        predictions.append(labels[i])
+print(f"Abstract: {abstract[:80]}...")
+print(f"Predicted Labels: {predictions}")
+# Expected Output: ['Deep Learning/AI', 'High-Positive-Sentiment']