Optimizing Random Forest Performance for Regional Gender Classification via Bayesian Optimization

Authors

  • Blerina Boçi Department of Mathematics, Faculty of Information Technology University “Aleksandër Moisiu”, Durrës, Albania
  • Aurora Simoni Department of Applied Mathematics, Faculty of Natural Science University of Tirana, Albania

DOI:

https://doi.org/10.56345/ijrdv12n319

Keywords:

Gender dominance, class imbalance, classification, random forest classifier, Bayesian Optimization, SMOTENC

Abstract

This study investigates the application of Bayesian Optimization (BO) to enhance the performance of a random forest classifier (RFC) for gender-based regional classification in Albania. The model was applied to both the original demographic dataset and an extended simulated dataset that preserved the same structural relationships among demographic variables. The datasets included prefecture-level demographic indicators such as total population, male and female counts, year, and area. A unified machine learning pipeline was implemented, incorporating feature standardization, label encoding, and class imbalance correction through the SMOTENC technique embedded within cross-validation folds. Bayesian Optimization with BayesSearchCV was applied to automatically select the optimal hyperparameters of the random forest classifier, optimizing for the F1-macro score. The performance of the optimized RFC was compared with the default configuration using multiple evaluation metrics, including accuracy, balanced accuracy, F1-macro, precision, recall, and ROC-AUC, all supported by 95% bootstrap confidence intervals. The optimized model consistently outperformed the baseline on both datasets, achieving higher predictive accuracy and more balanced classification performance. Results from the original dataset demonstrated that the BO-tuned model provided significantly improved accuracy and sensitivity across classes, while the simulated dataset yielded even stronger generalization with improved stability and narrower confidence intervals. The comparison between real and simulated data confirmed that Bayesian Optimization effectively adapts model complexity to different data scales while preserving interpretability. These findings highlight the potential of BO as a reliable and automated hyperparameter tuning strategy for improving machine learning (ML) models in demographic and socio-economic studies, particularly when datasets are small, imbalanced, or limited in scope.

Downloads

Published

2025-11-26

Deprecated: json_decode(): Passing null to parameter #1 ($json) of type string is deprecated in /web/htdocs/www.journal-uamd.org/home/plugins/generic/citations/CitationsPlugin.php on line 68

How to Cite

Optimizing Random Forest Performance for Regional Gender Classification via Bayesian Optimization. (2025). Interdisciplinary Journal of Research and Development, 12(3), 164. https://doi.org/10.56345/ijrdv12n319

Similar Articles

1-10 of 70

You may also start an advanced similarity search for this article.