Shoriful025 commited on
Commit
3d3f450
·
verified ·
1 Parent(s): 1a88f1a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +80 -0
README.md ADDED
@@ -0,0 +1,80 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - text-classification
4
+ - scientific-abstract
5
+ - multi-label
6
+ - sentiment-analysis
7
+ - distilbert
8
+ datasets:
9
+ - SciTopicSentimentDataset
10
+ license: apache-2.0
11
+ ---
12
+
13
+ # SciTopicSentimentClassifier
14
+
15
+ ## 🔬 Overview
16
+
17
+ SciTopicSentimentClassifier is a **multi-label classification** model fine-tuned to simultaneously predict the **primary scientific topic** and the **underlying sentiment** (high-positive or low-negative) from a research paper's abstract text. This model is ideal for automated paper categorization, literature review triage, and scientific trend analysis.
18
+
19
+ The model was trained on the SciTopicSentimentDataset (a proprietary dataset similar to the generated Dataset 1), which links abstract text to predefined scientific topics and a binarized sentiment score derived from the original continuous value.
20
+
21
+ ## 🧠 Model Architecture
22
+
23
+ This model is an adaptation of **DistilBERT**, a smaller, faster, and lighter version of BERT.
24
+
25
+ * **Base Model:** `distilbert-base-uncased`
26
+ * **Modification:** A custom classification head is added on top of the DistilBERT pooled output.
27
+ * **Output Layer:** The final layer is a dense layer with **12 outputs** (10 for scientific topics + 2 for sentiment classes), followed by a Sigmoid activation function to allow for multi-label prediction (an abstract can belong to multiple topics/sentiments).
28
+ * **Input:** Tokenized abstract text (up to 512 tokens).
29
+ * **Task:** Multi-Label Text Classification.
30
+
31
+ ## 🚀 Intended Use
32
+
33
+ * **Automated Labeling:** Automatically assign relevant topic tags to new scientific publication abstracts.
34
+ * **Research Triage:** Quickly filter papers based on subject matter and the perceived 'success' or 'novelty' indicated by the abstract's sentiment.
35
+ * **Scientific Landscape Mapping:** Analyze large corpora of papers to track emerging positive/negative trends in specific research areas.
36
+ * **Indexing Systems:** Integration into library or repository indexing services.
37
+
38
+ ## ⚠️ Limitations
39
+
40
+ * **Topic Granularity:** The model is limited to the 10 predefined topics in its training set. It may perform poorly on highly niche or interdisciplinary topics outside this scope.
41
+ * **Sentiment Scope:** The sentiment is coarse-grained (high vs. low) based on a metric derived from the abstract's language (e.g., using words like "novel," "significant," "limitations," "challenges"). It does not capture nuanced human-level emotional sentiment.
42
+ * **Language:** Trained exclusively on English abstracts.
43
+ * **Max Length:** Input texts longer than 512 tokens are truncated.
44
+
45
+ ## 💻 Example Code
46
+
47
+ To use the model for prediction:
48
+
49
+ ```python
50
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
51
+ import torch
52
+
53
+ # Load the model and tokenizer
54
+ model_name = "your-username/SciTopicSentimentClassifier" # Replace with actual HuggingFace path
55
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
56
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
57
+
58
+ # Sample Abstract
59
+ abstract = "We propose a novel architecture combining convolutional and recurrent neural networks for multi-modal data fusion, demonstrating significant performance gains in complex classification tasks, overcoming prior limitations."
60
+
61
+ # Preprocess the input
62
+ inputs = tokenizer(abstract, return_tensors="pt", truncation=True, padding=True)
63
+
64
+ # Run inference
65
+ with torch.no_grad():
66
+ logits = model(**inputs).logits
67
+
68
+ # Apply sigmoid for multi-label scores
69
+ probs = torch.sigmoid(logits)
70
+
71
+ # Get predicted labels (e.g., probability > 0.5)
72
+ labels = model.config.id2label
73
+ predictions = []
74
+ for i, prob in enumerate(probs[0]):
75
+ if prob > 0.5:
76
+ predictions.append(labels[i])
77
+
78
+ print(f"Abstract: {abstract[:80]}...")
79
+ print(f"Predicted Labels: {predictions}")
80
+ # Expected Output: ['Deep Learning/AI', 'High-Positive-Sentiment']