Update ReadMe.md

7c3abcfa · Amuthini Kulatheepan · 1548a796 · 7c3abcfa
Commit 7c3abcfa authored Jul 10, 2021 by Amuthini Kulatheepan
Hide whitespace changes
Inline Side-by-side

Showing with 1 addition and 1 deletion

ReadMe.md ReadMe.md +1 -1

No files found.
--- a/ReadMe.md
+++ b/ReadMe.md
@@ -18,4 +18,4 @@ Sklearn library was used to recognize the words appearing in 10% to 70% of the p
 The classification task was divided into 16 classes and further into four binary classification tasks, since each MBTI type is made of four binary classes. Each one of these binary classes represents an aspect of personality according to the MBTI personality model. As a result, four different binary classifiers were trained, whereby each one specializes in one of the aspects of personality. Thus, in this step, a model for each type indicator was built individually. Term Frequency–Inverse Document Frequency (TF–IDF) was performed and MBTI type indicators were binarised. Variable X was used for posts in TF–IDF representation and variable Y was used for the binarised MBTI type indicator.

 #  Developing  Model for the Dataset
- SGDClassifier, XGBoost, and AdaBoost  were used in this step to create the binary classification Models for the four dimesion of personality. MBTI type indicators were trained individually, and the data was then split into training and testing datasets using the train_test_split() function from sklearn library. In total, 70% of the data was used as the training set and 30% of the data was used as the test set. The model was fit onto the training data and the predictions were made for the testing data. After this step, the performance of the  models on the testing dataset during training was evaluated and early stopping was monitored. Following this step, the learning rate in XGBoost should be set to 0.1 or lower, and the addition of more trees will be required for smaller values. Moreover, the depth of trees should be configured in the range of 2 to 8, as there is not much benefit seen with the deeper trees. Furthermore, row sampling should be configured in the range of 30% to 80% of the training dataset. Thus, tree_depth in the created XGBoost was configured and parameters for XGBoost were setup as follow: n_estimators=200 max_depth=2 nthread=8 learning_rate=0.2 MBTI type indicators were trained individually and then the data was split into training and testing datasets. The model was fit onto the training data and the predictions were made for the testing data. In this step, the performance of the XGBoost model on the testing dataset was evaluated again. 
\ No newline at end of file
+ SGDClassifier, XGBoost, and AdaBoost  were used in this step to create the binary classification Models for the four dimesion of personality. MBTI type indicators were trained individually, and the data was then split into training and testing datasets using the train_test_split() function from sklearn library. In total, 70% of the data was used as the training set and 30% of the data was used as the test set. The model was fit onto the training data and the predictions were made for the testing data. After this step, the performance of the  models on the testing dataset during training was evaluated and early stopping was monitored. MBTI type indicators were trained individually and then the data was split into training and testing datasets. The model was fit onto the training data and the predictions were made for the testing data. In this step, the performance of the all the  models on the testing dataset was evaluated again. 
\ No newline at end of file