ADS Capstone Chronicles Revised
16
( u , v ) Σ ( , − ) 2 Σ ( , − ) 2 Evaluation metrics for KNN include precision@k and recall@k for recommendation relevance and hit rate, which measures the proportion of hits in the recommended items. Random forest is a versatile and powerful machine learning algorithm. It is also widely used for multiclass classification. For multiclass classification 4.4.2 Selection of modeling techniques for customer segmentation. Among different prediction models, four different models have been used. These are: Random Forest, Decision tree, XGBoost, Neural Network. Random forest is a powerful algorithm and it is a versatile machine learning tool to use as a classifier, regression model and multiclass classifier. It operates by constructing different decision trees during its training period and it shows different modes of classes for each input feature. (Louppe et al., 2014) It is an ensemble learning method. So, it combines the predictions of multiple base learners (decision trees) to improve the overall performance. Also, they use bagging by constructing multiple subsets with the training data. These subsets are trained on different subsets with replacements. When splitting the decision tree, random forest selects a random subsets of features rather than considering all the features. = mode{h 1 (x),h 2 (x),...,h n (x)} (4) ^ h i (x) represents the prediction of the i-th decision tree in the forest, and mode denotes the most frequent class among all the trees' predictions. Decision tree is a straightforward and powerful machine learning algorithm. It can be used as an effective multiclass classifier. Like random forest, it is also split into different subsets based (3) = Σ ( , − )( , − )
on the values of input features. In terms of splitting, for each node, it chooses the best features based on specific criterion, such as Gini impurity or entropy (information gain). This split is recursive and it stops when the stopping criteria is made. (Farid et al 2014) For predicting the classified group, each leaf is assigned a class label. This label is determined by the majority of the majority class of the samples that reach the leaf. It is useful to use a decision tree as it can get a non linear pattern of the data. The hyperparameter tuning is easy to track as it is easier to interpret the model. It does not behave like a “black box” unlike different models eg: neural network. is the Gini impurity at node i, K is the number of classes and is the proportion of , samples of class k at node i. The main goal of the splitting criterion aims to minimize the impurity at each node. XGboost is a highly efficient and flexible machine learning algorithm. It is used for multiclass classification. It is based on the boosting technique. It combines with prediction of the decision tree to form a strong learning trend. It generally uses gradient boosting where each subsequent model is trained to predict the residuals (errors) of the previous model. This helps to minimize the overall prediction error by focusing on the mistakes of the previous model in the decision tree (Chen et al., 2016). Different regularization terms inside XGboosting helps them to prevent overfitting. It is also efficient in GPU as it supports parallel processing to speed up the training process. It uses softmax function to specific objective function. (5) = 1 − =1 ∑ 2 , where
− = 1 ∑
∑
, ,
=1
Obj=
(6)
60
Made with FlippingBook - Online Brochure Maker