Data Distribution Adjustment

import pandas as pd
import datasets

data_source = "qiaojin/PubMedQA"
dataset = datasets.load_dataset(data_source, 'pqa_artificial', streaming=False)

train_data = dataset['train'].to_pandas()
binary_data = train_data[train_data["final_decision"].isin(["yes", "no"])]

# Separate yes and no samples
yes_data = binary_data[binary_data["final_decision"] == "yes"]
no_data = binary_data[binary_data["final_decision"] == "no"]

# Get the size of the minority class
min_size = min(len(yes_data), len(no_data))

# Randomly sample from each class
yes_sampled = yes_data.sample(n=min_size, random_state=42)
no_sampled = no_data.sample(n=min_size, random_state=42)

# Combine into balanced dataset
balanced_data = pd.concat([yes_sampled, no_sampled])

# Shuffle the dataset
balanced_data = balanced_data.sample(frac=1, random_state=42).reset_index(drop=True)

New Label Distribution

final_decision
no     15125
yes    15125
Name: count, dtype: int64
Downloads last month
4
Safetensors
Model size
8B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Semantic-Health/Scenarios-Llama3.1-8B-v3

Quantizations
1 model