ADS Capstone Chronicles Revised
13
Figure 4.7.3.1 Downsampled DBSCAN SSC-A Versus CD3 APC H7
Figure 4.7.3.3 behaved similarly as the previous t SNE models with a silhouette score of 0.3455. Clusters remain poorly separated on the SSC-A versus CD3 APC-H7 plot throughout these tests.
Figure 4.7.3.3 PCA t-SNE DBSCAN SSC-A Versus CD3 APC-H7
The PCA preprocessed version resulted in the highest silhouette score of 0.5610 for this plot seen in Figure 4.7.3.2. It does appear to accurately isolate one cluster, though it appears a smaller cluster has been grouped with a larger blue one, visually.
5 Discussion and Evaluation Upon further inspection of the data, it became apparent our original hypothesis—90% of samples could be accurately classified—was infeasible, given the nature of the data set and how the data were constructed. Because labels were not present from the originally sourced data, and because of the rather subjective nature required to infer a label based on cluster location, cluster axes, and cluster characteristics, the evaluation criteria defaulted to the time required to identify clusters and how well the clusters could be identified by these algorithms. 5.1 Silhouette Scores Silhouette score as a measure of how cohesive and compact a cluster is relative to how well separated they are from other clusters was chosen as an objective and deterministic method to assess cluster validity. The human analogue to this would
Figure 4.7.3.2 PCA DBSCAN SSC-A Versus CD3 APC-H7
198
Made with FlippingBook - Online Brochure Maker