Optimizing Random Forest Performance for Regional Gender Classification via Bayesian Optimization
DOI:
https://doi.org/10.56345/ijrdv12n319Keywords:
Gender dominance, class imbalance, classification, random forest classifier, Bayesian Optimization, SMOTENCAbstract
This study investigates the application of Bayesian Optimization (BO) to enhance the performance of a random forest classifier (RFC) for gender-based regional classification in Albania. The model was applied to both the original demographic dataset and an extended simulated dataset that preserved the same structural relationships among demographic variables. The datasets included prefecture-level demographic indicators such as total population, male and female counts, year, and area. A unified machine learning pipeline was implemented, incorporating feature standardization, label encoding, and class imbalance correction through the SMOTENC technique embedded within cross-validation folds. Bayesian Optimization with BayesSearchCV was applied to automatically select the optimal hyperparameters of the random forest classifier, optimizing for the F1-macro score. The performance of the optimized RFC was compared with the default configuration using multiple evaluation metrics, including accuracy, balanced accuracy, F1-macro, precision, recall, and ROC-AUC, all supported by 95% bootstrap confidence intervals. The optimized model consistently outperformed the baseline on both datasets, achieving higher predictive accuracy and more balanced classification performance. Results from the original dataset demonstrated that the BO-tuned model provided significantly improved accuracy and sensitivity across classes, while the simulated dataset yielded even stronger generalization with improved stability and narrower confidence intervals. The comparison between real and simulated data confirmed that Bayesian Optimization effectively adapts model complexity to different data scales while preserving interpretability. These findings highlight the potential of BO as a reliable and automated hyperparameter tuning strategy for improving machine learning (ML) models in demographic and socio-economic studies, particularly when datasets are small, imbalanced, or limited in scope.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Deprecated: json_decode(): Passing null to parameter #1 ($json) of type string is deprecated in /web/htdocs/www.journal-uamd.org/home/plugins/generic/citations/CitationsPlugin.php on line 68