Abstract:
It is essential to use machine learning for predicting products backorder to deal with massive data of SKU. Naturally, real world data is usually imbalanced data which is affect to the efficiency of machine learning. Mistaken predicting products backorder negatively affects customers service level and decrease 10 percent of their revenue. This research has studied adjusting data by Threshold Moving and sampling methods for creating efficient model and high forecast proficiency in minority class model. There are 4 methods for adjusting data including NearMiss-3 for undersampling dataset, One-Sided Selection (OSS) for undersampling dataset, SMOTE for oversampling dataset, and combining OSS and SMOTE dataset. LOGIST, FOREST and XGBoost are used as algorithms and Stratified 5-Fold Cross-Validation is used to prevent overfitting. In this research, AUROC, F1 score and G-Mean are used as the efficiency measurements. The result obtained from this research study is Threshold Moving with the G-Mean metric gives more weight to the minority data group compared to F1 score and provides better results than AUROC. The most effective method is using Threshold Moving with G-Mean metric for the Forest algorithm, achieving an approximate value of 0.8737.