Update ReadMe.md

1fd219ba · Amuthini Kulatheepan · 7c3abcfa · 1fd219ba
Commit 1fd219ba authored Jul 10, 2021 by Amuthini Kulatheepan
Hide whitespace changes
Inline Side-by-side

Showing with 104 additions and 1 deletion

ReadMe.md ReadMe.md +104 -1

No files found.
--- a/ReadMe.md
+++ b/ReadMe.md
+## Implement machine learning models to predict the personality  background  of job applicants from twitter tweets
+
 #  Development Tools
 The natural language processing toolkit (NLTK), Pandas, numpy, re, seaborn, matplotlib and sklearn are other Python libraries which  were used for the development process. 
  
@@ -18,4 +20,105 @@ Sklearn library was used to recognize the words appearing in 10% to 70% of the p
 The classification task was divided into 16 classes and further into four binary classification tasks, since each MBTI type is made of four binary classes. Each one of these binary classes represents an aspect of personality according to the MBTI personality model. As a result, four different binary classifiers were trained, whereby each one specializes in one of the aspects of personality. Thus, in this step, a model for each type indicator was built individually. Term Frequency–Inverse Document Frequency (TF–IDF) was performed and MBTI type indicators were binarised. Variable X was used for posts in TF–IDF representation and variable Y was used for the binarised MBTI type indicator.

 #  Developing  Model for the Dataset
- SGDClassifier, XGBoost, and AdaBoost  were used in this step to create the binary classification Models for the four dimesion of personality. MBTI type indicators were trained individually, and the data was then split into training and testing datasets using the train_test_split() function from sklearn library. In total, 70% of the data was used as the training set and 30% of the data was used as the test set. The model was fit onto the training data and the predictions were made for the testing data. After this step, the performance of the  models on the testing dataset during training was evaluated and early stopping was monitored. MBTI type indicators were trained individually and then the data was split into training and testing datasets. The model was fit onto the training data and the predictions were made for the testing data. In this step, the performance of the all the  models on the testing dataset was evaluated again. 
\ No newline at end of file
+ SGDClassifier, XGBoost, and AdaBoost  were used in this step to create the binary classification Models for the four dimesion of personality. MBTI type indicators were trained individually, and the data was then split into training and testing datasets using the train_test_split() function from sklearn library. In total, 70% of the data was used as the training set and 30% of the data was used as the test set. The model was fit onto the training data and the predictions were made for the testing data. After this step, the performance of the  models on the testing dataset during training was evaluated and early stopping was monitored. MBTI type indicators were trained individually and then the data was split into training and testing datasets. The model was fit onto the training data and the predictions were made for the testing data. In this step, the performance of the all the  models on the testing dataset was evaluated again. 
+ 
+ 
+#  AdaBoost classifier best performance hyper parameter details
+
+Intoversion - extroversion
+Best AUC Score: 0.803667 
+Accuracy: 0.7285539643730353
+[[2229    0]
+ [ 634    0]]
+{'abc__learning_rate': 0.1, 'abc__n_estimators': 500}
+
+NS: Intuition (N) â Sensing (S) ...
+Best AUC Score: 0.6727666369367796
+Accuracy: 0.8046946929265
+[[2431   32]
+ [ 384   16]]
+{'abc__learning_rate': 0.1, 'abc__n_estimators': 300}
+
+FT: Feeling (F) - Thinking (T) ...
+Best AUC Score: 0.75395340936081
+Accuracy: 0.72895568376202319 
+[[1199  355]
+ [ 421  888]]
+{'abc__learning_rate': 0.01, 'abc__n_estimators': 500}
+
+JP: Judging (J) â Perceiving (P) ...
+
+Best AUC Score: 0.6638994402640133
+Accuracy: 0.6521131680055885
+[[ 252  867]
+ [ 129 1615]]
+{'abc__learning_rate': 0.1, 'abc__n_estimators': 500}
+
+
+ 
+#  XGBoost classifier best performance hyper parameter details
+
+Intoversion - extroversion
+Best AUC Score: 0.677028682166271
+Accuracy: 0.7785539643730353
+[[2229    0]
+ [ 634    0]]
+{'xgb__n_estimators': 200, 'xgb__max_depth': 6, 'xgb__learning_rate': 0.01, 'xgb__gamma': 0.2, 'xgb__colsample_bytree': 0.1}
+
+NS: Intuition (N) â Sensing (S) ...
+Best AUC Score: 0.6527666346929265
+Accuracy: 0.854697869367796
+[[2431   32]
+ [ 384   16]]
+{'xgb__n_estimators': 150, 'xgb__max_depth': 3, 'xgb__learning_rate': 0.3, 'xgb__gamma': 0.2, 'xgb__colsample_bytree': 0.2}
+
+
+FT: Feeling (F) - Thinking (T) ...
+Best AUC Score: 0.8139538376202319
+Accuracy: 0.728955640936081
+[[1199  355]
+ [ 421  888]]
+{'xgb__n_estimators': 150, 'xgb__max_depth': 4, 'xgb__learning_rate': 0.1, 'xgb__gamma': 0.1, 'xgb__colsample_bytree': 0.1}
+
+JP: Judging (J) â Perceiving (P) ...
+
+Best AUC Score: 0.6638994402640133
+Accuracy: 0.6521131680055885
+[[ 252  867]
+ [ 129 1615]]
+{'xgb__n_estimators': 50, 'xgb__max_depth': 3, 'xgb__learning_rate': 0.1, 'xgb__gamma': 0.0, 'xgb__colsample_bytree': 0.2}
+
+
+#  SGDClassifier best performance hyper parameter details
+
+sgd
+IE: Introversion (I) - Extroversion (E) ...
+Fitting 10 folds for each of 10 candidates, totalling 100 fits
+Best AUC Score: 0.5292548647534668
+Accuracy: 0.7740132727907789
+[[2216    0]
+ [ 647    0]]
+{'sgd__alpha': 0.0009265019438562898, 'sgd__loss': 'modified_huber', 'sgd__penalty': 'l1'}
+
+NS: Intuition (N) â Sensing (S) ...
+Fitting 10 folds for each of 10 candidates, totalling 100 fits
+Best AUC Score: 0.5426211797685075
+Accuracy: 0.857492141110723
+[[2455    1]
+ [ 407    0]]
+{'sgd__alpha': 0.0011441798336083461, 'sgd__loss': 'modified_huber', 'sgd__penalty': 'l2'}
+FT: Feeling (F) - Thinking (T) ...
+Fitting 10 folds for each of 10 candidates, totalling 100 fits
+Best AUC Score: 0.5
+Accuracy: 0.5312609151239958
+[[1521    0]
+ [1342    0]]
+{'sgd__alpha': 0.0019410296620838965, 'sgd__loss': 'hinge', 'sgd__penalty': 'l1'}
+
+JP: Judging (J) â Perceiving (P) ...
+Fitting 10 folds for each of 10 candidates, totalling 100 fits
+Best AUC Score: 0.4989554047081358
+Accuracy: 0.5302130632203982
+[[ 336  814]
+ [ 531 1182]]
+{'sgd__alpha': 0.0004231316730058021, 'sgd__loss': 'modified_huber', 'sgd__penalty': 'none'}