IHPPM-OPEN-ARC / README.md

JohanBeytell

Update README.md

14c67f0 verified 9 days ago

preview code

raw

history blame contribute delete

3.93 kB

metadata

license: mit
language:
  - en
metrics:
  - mae
  - r_squared
pipeline_tag: tabular-regression
tags:
  - regression
  - price-prediction

Model Card for Infinitode/IHPPM-OPEN-ARC

Repository: https://github.com/Infinitode/OPEN-ARC/

Model Description

OPEN-ARC-IHPP is a CatBoostRegressor model developed as part of Infinitode's OPEN-ARC initiative. It was designed to predict accurate price points for India house and property rentals based on various factors.

Architecture:

CatBoostRegressor: iterations=2500, depth=10, learning_rate=0.045, loss_function="MAE", eval_metric="MAE", random_seed=42, verbose=200.
Framework: CatBoost
Training Setup: Trained with 2500 iterations on the dataset split.

Uses

Predicting accurate price points for properties in India.
Validating or measuring existing price points for properties.
Researching property value and factors that influence price.

Limitations

May generate implausible or inappropriate results when influenced by extreme outlier values.
Could provide inaccurate prices; caution is advised when relying on these outputs.

Training Data

Dataset: India House Rent Prediction dataset from Kaggle.
Source URL: https://www.kaggle.com/datasets/pranavshinde36/india-house-rent-prediction
Content: House type, locality, city, area, furnishing and room specifics along with the target rent value.
Size: 7691 entries of properties in India.
Preprocessing: Removed tiny area properties, extreme rent outliers, and area_rate. Also created "area buckets" for better performance.

Training Procedure

Metrics: MAE, R-squared
Train/Testing Split: 85% train, 15% testing.

Evaluation Results

Metric	Value
Testing MAE	3.86k
Testing R-squared	0.9351

How to Use

def predict_user_rent(model, raw_df):
    print("\n\n========== RENT PREDICTION ASSISTANT ==========\n")
    print("Choose values for each feature below. For categorical vars, pick a number.\n")

    sample = {}

    # Menu
    def choose_cat(col_name):
        unique_vals = sorted(raw_df[col_name].unique())
        print(f"\n--- {col_name} ---")
        for idx, val in enumerate(unique_vals):
            print(f"{idx + 1}. {val}")
        sel = int(input("Enter your choice number: ")) - 1
        return unique_vals[sel]

    # Categorical
    sample["house_type"] = choose_cat("house_type")
    sample["locality"] = choose_cat("locality")
    sample["city"] = choose_cat("city")
    sample["furnishing"] = choose_cat("furnishing")

    # Numeric values
    def choose_num(col_name):
        return float(input(f"\nEnter value for {col_name}: "))

    sample["area"] = choose_num("area")
    sample["beds"] = choose_num("beds")
    sample["bathrooms"] = choose_num("bathrooms")
    sample["balconies"] = choose_num("balconies")

    # area bucket
    area_val = sample["area"]
    area_bins = [0, 300, 600, 900, 1200, 2000, 5000, 100000]
    area_bucket = np.digitize([area_val], area_bins)[0] - 1
    sample["area_bucket"] = area_bucket

    # placeholder for rent_psf bucket (we don't know rent yet)
    # so we use area only as a proxy for typical price density
    sample["rent_psf_bucket"] = min(int(area_bucket), 19)

    df_input = pd.DataFrame([sample])

    # Must match training encodings
    for col in ["house_type", "locality", "city", "furnishing"]:
        df_input[col] = df_input[col].astype(raw_df[col].dtype)

    # Prediction
    pred_log = model.predict(df_input)[0]
    pred_rent = np.expm1(pred_log)

    print("\n===================================")
    print(f"Estimated Rent: ₹ {pred_rent:,.2f}")
    print("===================================\n")

    return pred_rent

# Uncomment to use interactively:
# predict_user_rent(model, df)

Contact

For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact.