ADS Capstone Chronicles Revised

First page Table of contents Previous page 140 Next page Last page

multicollinearity by penalizing large coefficients ( Ridge , 2014). Lasso classifier, on the other hand, uses L1 regularization to perform feature selection by shrinking some coefficients to zero, which aids in handling outliers and improving model interpretability ( Lasso , n.d.). These models enhance the performance of the baseline Logistic Regression by mitigating overfitting and addressing the complexities of the data more effectively. XGBoost was included for its powerful gradient-boosting capabilities, which effectively capture complex patterns and interactions within the data, offering robust performance ( XGBoost Documentation — Xgboost 2.1.0 Documentation , 2022). SVM with Kernel Trick was selected for its ability to handle non-linear relationships by transforming the input data into a higher-dimensional space, allowing it to effectively separate classes that are not linearly separable in the original feature space ( 1.4. Support Vector Machines , n.d.). The Bagging Classifier was chosen for its ability to improve predictive performance through bootstrapping and aggregating predictions from multiple models ( BaggingClassifier , n.d.). This ensemble method reduces variance and enhances generalization, providing a robust alternative to single-model approaches. The neural network model was employed to capture complex, nonlinear relationships through its layered architecture, enabling it to model intricate patterns in the data ( 1.17. Neural Network Models (Supervised) , n.d.).

AdaBoost, an ensemble technique that combines weak learners to form a strong classifier, was also tested. AdaBoost focuses on correcting errors made by previous models, enhancing overall performance by adjusting the weights of incorrectly classified instances ( AdaBoostClassifier , n.d.) Quadratic discriminant analysis (QDA) was included for its probabilistic approach to classification, which assumes different covariance structures for each class ( QuadraticDiscriminantAnalysis , n.d.). This can be beneficial for distinguishing between classes with varying distributions. Stochastic Gradient Descent (SGD) was selected for its scalability and efficiency in large-scale optimization problems. It provides a flexible approach to handling high-dimensional data ( SGDClassifier , n.d.). These models collectively offer a comprehensive approach to detecting potential fraud, waste, and abuse within the healthcare data, each contributing unique advantages to the overall analysis. 4.4.3 Test design: Training and Validation Datasets To ensure a robust evaluation of the models, the dataset was split into training, validation, and test sets. The process was conducted in multiple steps to preserve the class distribution of the target variable, "potential_fwa." 1. Initial split into training and

temporary Sets: The dataset was first divided into 70% training data

140

Made with FlippingBook - Online Brochure Maker