--- license: mit language: - en metrics: - mae - r_squared pipeline_tag: tabular-regression tags: - regression - price-prediction --- # Model Card for Infinitode/IHPPM-OPEN-ARC Repository: https://github.com/Infinitode/OPEN-ARC/ ## Model Description OPEN-ARC-IHPP is a CatBoostRegressor model developed as part of Infinitode's OPEN-ARC initiative. It was designed to predict accurate price points for India house and property rentals based on various factors. **Architecture**: - **CatBoostRegressor**: `iterations=2500`, `depth=10`, `learning_rate=0.045`, `loss_function="MAE"`, `eval_metric="MAE"`, `random_seed=42`, `verbose=200`. - **Framework**: CatBoost - **Training Setup**: Trained with 2500 iterations on the dataset split. ## Uses - Predicting accurate price points for properties in India. - Validating or measuring existing price points for properties. - Researching property value and factors that influence price. ## Limitations - May generate implausible or inappropriate results when influenced by extreme outlier values. - Could provide inaccurate prices; caution is advised when relying on these outputs. ## Training Data - Dataset: India House Rent Prediction dataset from Kaggle. - Source URL: https://www.kaggle.com/datasets/pranavshinde36/india-house-rent-prediction - Content: House type, locality, city, area, furnishing and room specifics along with the target rent value. - Size: 7691 entries of properties in India. - Preprocessing: Removed tiny area properties, extreme rent outliers, and `area_rate`. Also created "area buckets" for better performance. ## Training Procedure - Metrics: MAE, R-squared - Train/Testing Split: 85% train, 15% testing. ## Evaluation Results | Metric | Value | | ------ | ----- | | Testing MAE | 3.86k | | Testing R-squared | 0.9351 | ## How to Use ```python def predict_user_rent(model, raw_df): print("\n\n========== RENT PREDICTION ASSISTANT ==========\n") print("Choose values for each feature below. For categorical vars, pick a number.\n") sample = {} # Menu def choose_cat(col_name): unique_vals = sorted(raw_df[col_name].unique()) print(f"\n--- {col_name} ---") for idx, val in enumerate(unique_vals): print(f"{idx + 1}. {val}") sel = int(input("Enter your choice number: ")) - 1 return unique_vals[sel] # Categorical sample["house_type"] = choose_cat("house_type") sample["locality"] = choose_cat("locality") sample["city"] = choose_cat("city") sample["furnishing"] = choose_cat("furnishing") # Numeric values def choose_num(col_name): return float(input(f"\nEnter value for {col_name}: ")) sample["area"] = choose_num("area") sample["beds"] = choose_num("beds") sample["bathrooms"] = choose_num("bathrooms") sample["balconies"] = choose_num("balconies") # area bucket area_val = sample["area"] area_bins = [0, 300, 600, 900, 1200, 2000, 5000, 100000] area_bucket = np.digitize([area_val], area_bins)[0] - 1 sample["area_bucket"] = area_bucket # placeholder for rent_psf bucket (we don't know rent yet) # so we use area only as a proxy for typical price density sample["rent_psf_bucket"] = min(int(area_bucket), 19) df_input = pd.DataFrame([sample]) # Must match training encodings for col in ["house_type", "locality", "city", "furnishing"]: df_input[col] = df_input[col].astype(raw_df[col].dtype) # Prediction pred_log = model.predict(df_input)[0] pred_rent = np.expm1(pred_log) print("\n===================================") print(f"Estimated Rent: ₹ {pred_rent:,.2f}") print("===================================\n") return pred_rent # Uncomment to use interactively: # predict_user_rent(model, df) ``` ## Contact For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact.