Experiment with various binary classification models below and select the most appropriate based on Area Under the ROC Curve together with Principal Component Analysis (PCA) in Apache Spark.
- Logistic Regression
- RandomForest Classification
- Linear Support Vector Classification
- Gradient Boosted Tree Classification
- Naive Bayes Classification
The following package to be installed:
pyspark 2.4.5 py_0