SomosNLP

non-profit

https://somosnlp.org/

SomosNLP_

somosnlp

Activity Feed

AI & ML interests

Democratizar el PLN en español e incentivar su aplicación para generar impacto social 💛

Recent Activity

haritzpuerto authored a paper 10 days ago

Controllable Reasoning Models Are Private Thinkers

suchirsalhan authored a paper 15 days ago

BabyLM Turns 4: Call for Papers for the 2026 BabyLM Workshop

mariagrandury authored a paper 22 days ago

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

View all activity

reddrex

updated a dataset about 2 months ago

somosnlp/LingComp_QA

Viewer • Updated Jan 15 • 1k • 109 • 1

mariagrandury

updated a dataset 6 months ago

somosnlp/recursos-pln-es

Viewer • Updated Sep 18, 2025 • 183 • 73 • 1

mariagrandury

published a dataset 6 months ago

somosnlp/recursos-pln-es

Viewer • Updated Sep 18, 2025 • 183 • 73 • 1

mariagrandury

updated a dataset 6 months ago

somosnlp/recursos-pln-es-models

Viewer • Updated Sep 16, 2025 • 22 • 21

mariagrandury

published a dataset 6 months ago

somosnlp/recursos-pln-es-models

Viewer • Updated Sep 16, 2025 • 22 • 21

mariagrandury

updated a Space 7 months ago

Leaderboard Retos Hackathon SomosNLP 2025

🏆

Leaderboard Retos Hackathon SomosNLP 2025

mariagrandury

published a dataset 9 months ago

somosnlp/babylm-es

Updated Jun 19, 2025 • 14

dvilasuero

posted an update 9 months ago

Post

3360

Super excited to launch Hugging Face Sheets: Spreadsheets meet AI and unstructured data.

A few months ago, we started imagining new ways to build and transform datasets with the latest open-source models.

Today, I'm thrilled to introduce our first step in this direction.

In a nutshell:

📁 Effortlessly run prompts and models over your data.
🌐 Agentic search for accuracy and real-time information.
🖼️ Familiar, minimalistic interface for interacting with data.
🎯 Human feedback 2.0: Your input directly improves generated data.
💯 Access hundreds of open models and leading inference providers.

Go to this space to try it out!

aisheets/sheets

Leave your questions below, we're just getting started!

3 replies

dianags

authored 2 papers 9 months ago

Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset

Paper • 2503.23899 • Published Mar 31, 2025 • 1

Analyzing the Performance of GPT-3.5 and GPT-4 in Grammatical Error Correction

Paper • 2303.14342 • Published Mar 25, 2023

reddrex

in somosnlp/LingComp_QA 10 months ago

How use the dataset to train my model GPT

#1 opened 10 months ago by

luisaarias

ouhenio

updated a Space 11 months ago

Mapa Blend-es

🌍

Revisa el avance colectivo de blend-es 😊

plaguss

authored a paper about 1 year ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4, 2025 • 256

mariagrandury

updated a collection about 1 year ago

Corpus: Evaluation datasets for ES & LATAM

Collection

Corpus of La Leaderboard, the open LLM leaderboard for ES & LATAM • 56 items • Updated Feb 5, 2025 • 4

tadeodonegana

posted an update about 1 year ago

Post

1229

At RooMix(dot)ai we’re looking for an expert in generative image models for a short consulting gig. Any recommendations?

1 reply

mariagrandury

updated 2 collections about 1 year ago

Pre-trained LMs ES

Collection

Monolingual language models pre-trained on Spanish and related languages. • 21 items • Updated Feb 4, 2025 • 6

Instruction-Tuned Models ES

Collection

Instruction-tuned models in Spanish and other related languages • 8 items • Updated Feb 4, 2025 • 4

dvilasuero

authored a paper over 1 year ago

Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation

Paper • 2412.03304 • Published Dec 4, 2024 • 19

dvilasuero

posted an update over 1 year ago

Post

2794

🌐 Announcing Global-MMLU: an improved MMLU Open dataset with evaluation coverage across 42 languages, built with Argilla and the Hugging Face community.

Global-MMLU is the result of months of work with the goal of advancing Multilingual LLM evaluation. It's been an amazing open science effort with collaborators from Cohere For AI, Mila - Quebec Artificial Intelligence Institute, EPFL, Massachusetts Institute of Technology, AI Singapore, National University of Singapore, KAIST, Instituto Superior Técnico, Carnegie Mellon University, CONICET, and University of Buenos Aires.

🏷️ +200 contributors used Argilla MMLU questions where regional, dialect, or cultural knowledge was required to answer correctly. 85% of the questions required Western-centric knowledge!

Thanks to this annotation process, the open dataset contains two subsets:

1. 🗽 Culturally Agnostic: no specific regional, cultural knowledge is required.
2. ⚖️ Culturally Sensitive: requires dialect, cultural knowledge or geographic knowledge to answer correctly.

Moreover, we provide high quality translations of 25 out of 42 languages, thanks again to the community and professional annotators leveraging Argilla on the Hub.

I hope this will ensure a better understanding of the limitations and challenges for making open AI useful for many languages.

Dataset: https://huggingface.co/datasets/CohereForAI/Global-MMLU

dvilasuero

posted an update over 1 year ago

Post

1223

@Jesse-marqo and the Marqo team are killing it on the Hub: top embedding models and datasets!

Here's how to start using their new evaluation dataset for curation and labelling:

1. Deploy Argilla on Spaces: https://huggingface.co/new-space?template=argilla%2Fargilla-template-space
2. Load Marqo/amazon-products-eval with the UI wizard.
3. Start curating!

AI & ML interests

Recent Activity

Team members 312

somosnlp's activity

Leaderboard Retos Hackathon SomosNLP 2025

How use the dataset to train my model GPT

Mapa Blend-es