LLM Safety From Within: Detecting Harmful Content with Internal Representations Paper • 2604.18519 • Published 8 days ago • 21
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published 26 days ago • 495