Commit 2d01e32f authored by Mihiranga G.L.V - IT18500790's avatar Mihiranga G.L.V - IT18500790

Merge branch 'master' into IT18500790

parents 130cf6f3 6941d901
...@@ -10,3 +10,37 @@ Some travelers do not know much about trains and attractive destinations to choo ...@@ -10,3 +10,37 @@ Some travelers do not know much about trains and attractive destinations to choo
Passengers cannot get information about train stopping station, facilities, and the time duration of the trip. Railway passengers cannot get the needful and exact information about their visiting places. Passengers cannot get information about train stopping station, facilities, and the time duration of the trip. Railway passengers cannot get the needful and exact information about their visiting places.
When passengers booking the ticket they cannot be booked which seat they want and there is no process to suggest the most preferred seating place. It may cause personal conflicts between closely seated passengers due to differences of personal interest. When passengers booking the ticket they cannot be booked which seat they want and there is no process to suggest the most preferred seating place. It may cause personal conflicts between closely seated passengers due to differences of personal interest.
Difficult to find the visitor's attracted places that are located near the trip route. Therefore unpopular attractions are missed by many travelers. It is affected by traveling passengers and the tourism industry. Difficult to find the visitor's attracted places that are located near the trip route. Therefore unpopular attractions are missed by many travelers. It is affected by traveling passengers and the tourism industry.
**Individual research question**
***IT18500790***
Some travelers do not know much about trains and attractive destinations to choose from. At present, there is no application to guide tourists to the tourist destinations of their choice. Therefore passenger needs to spend more time for find the visiting places. It may cause them to waste their time which can spend on their enjoyment.
***IT18085822***
Passengers cannot get information about train stopping station, facilities, and the time duration of the trip. Railway passengers cannot get the needful and exact information about their visiting places.
***IT18001280***
Providing a facility to view available seats and suggest the most suitable seating place for a particular passenger. Providing the selection facility to choosing the seating place according to their choice.
***IT18148282***
Suggest the best visiting places according to the passenger's personal trip plan. Providing the most suitable visiting places suggestions by using passenger's relevant information.
**Individual Objectives**
***IT18500790***
Providing the most suitable train plan to the passenger according to their needs by using a machine learning algorithm.
***IT18085822***
Machine learning-based chat-bot app to interact with the user 24 X 7, providing relevant information like train facilities, place information which suggest by trip schedule, etc. to the users according to the user queries.
***IT18001280***
Sequentially predict the most suitable seat for the passenger by using a machine learning algorithm.
***IT18148282***
Suggest the best places to visit for the passenger using a machine learning algorithm by gathering relevant data from the railway passengers.
**Solution**
Passengers can select their train schedule for traveling. But sometimes they miss out on places of their choice. This situation can reduce if predicting the trip schedule they want.
Presently passenger wants to find some information about the particular train and system solves that issue by introducing a new machine learning-based chat-bot app for the users. Users can get information about a specific location by using the chat-bot application.
Providing a facility to view available seats and suggest the most suitable seating place for a particular passenger. Providing the selection facility to choosing the seating place according to their choice.
Suggest the best visiting places according to the passenger's personal trip plan. Providing the most suitable visiting places suggestions by using passenger's relevant information.
This source diff could not be displayed because it is too large. You can view the blob instead.
{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"id": "b2b62b44",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"from tqdm.notebook import tqdm\n",
"from collections import Counter\n",
"from sklearn import metrics\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.model_selection import train_test_split"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "11016a95",
"metadata": {},
"outputs": [],
"source": [
"import warnings\n",
"warnings.filterwarnings('ignore')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "cec68912",
"metadata": {},
"outputs": [],
"source": [
"data = pd.read_csv(\"Railway_Passenger_Final.csv\")"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "0870d81a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Age</th>\n",
" <th>Country</th>\n",
" <th>Disability</th>\n",
" <th>Class</th>\n",
" <th>GenderNo</th>\n",
" <th>LineNo</th>\n",
" <th>SeatLine</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>38</td>\n",
" <td>169</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>165</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>52</td>\n",
" <td>186</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>6</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>52</td>\n",
" <td>165</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>6</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>33</td>\n",
" <td>165</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7387</th>\n",
" <td>56</td>\n",
" <td>9</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7388</th>\n",
" <td>48</td>\n",
" <td>165</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7389</th>\n",
" <td>16</td>\n",
" <td>77</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7390</th>\n",
" <td>14</td>\n",
" <td>165</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7391</th>\n",
" <td>38</td>\n",
" <td>77</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>9</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>7392 rows × 7 columns</p>\n",
"</div>"
],
"text/plain": [
" Age Country Disability Class GenderNo LineNo SeatLine\n",
"0 38 169 0 1 1 3 6\n",
"1 33 165 0 2 1 1 4\n",
"2 52 186 0 1 2 6 8\n",
"3 52 165 0 2 2 6 8\n",
"4 33 165 0 2 1 1 4\n",
"... ... ... ... ... ... ... ...\n",
"7387 56 9 1 2 2 1 1\n",
"7388 48 165 1 1 1 5 6\n",
"7389 16 77 0 1 1 4 10\n",
"7390 14 165 0 1 1 5 4\n",
"7391 38 77 0 1 1 5 9\n",
"\n",
"[7392 rows x 7 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "dc782f98",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Age</th>\n",
" <th>Country</th>\n",
" <th>Disability</th>\n",
" <th>Class</th>\n",
" <th>GenderNo</th>\n",
" <th>LineNo</th>\n",
" <th>SeatLine</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [Age, Country, Disability, Class, GenderNo, LineNo, SeatLine]\n",
"Index: []"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data[pd.isnull(data).any(axis=1)]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "b7d860da",
"metadata": {},
"outputs": [],
"source": [
"Y = data.SeatLine.copy()\n",
"X = data.drop(['SeatLine'], axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "b85ff8d6",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"4 1928\n",
"7 1842\n",
"5 1015\n",
"6 915\n",
"3 742\n",
"8 679\n",
"2 76\n",
"9 68\n",
"10 64\n",
"1 63\n",
"Name: SeatLine, dtype: int64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data['SeatLine'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "d498a8ce",
"metadata": {},
"outputs": [],
"source": [
"def pearson(X,Y):\n",
" correlation_matrix = np.corrcoef(X,Y)\n",
" return correlation_matrix[0,1]"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "c041cb4c",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-0.0035806135588234236\n",
"-0.009099476534655794\n",
"0.01109999125101634\n",
"0.7410300176783485\n"
]
}
],
"source": [
"print(pearson(X.Age, Y))\n",
"print(pearson(X.Country, Y))\n",
"print(pearson(X.Disability, Y))\n",
"print(pearson(X.GenderNo, Y))"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "094a84fa",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"-0.09118144742552847\n",
"-0.627520395988806\n",
"0.0036254166615416767\n",
"0.6412361822996381\n"
]
}
],
"source": [
"print(np.cov(X.Age, Y)[0,1])\n",
"print(np.cov(X.Country, Y)[0,1])\n",
"print(np.cov(X.Disability, Y)[0,1])\n",
"print(np.cov(X.GenderNo, Y)[0,1])"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "c001994a",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Age</th>\n",
" <th>Country</th>\n",
" <th>Disability</th>\n",
" <th>Class</th>\n",
" <th>GenderNo</th>\n",
" <th>LineNo</th>\n",
" <th>SeatLine</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>38</td>\n",
" <td>169</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>3</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>33</td>\n",
" <td>165</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>52</td>\n",
" <td>186</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>6</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>52</td>\n",
" <td>165</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>6</td>\n",
" <td>8</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>33</td>\n",
" <td>165</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Age Country Disability Class GenderNo LineNo SeatLine\n",
"0 38 169 0 1 1 3 6\n",
"1 33 165 0 2 1 1 4\n",
"2 52 186 0 1 2 6 8\n",
"3 52 165 0 2 2 6 8\n",
"4 33 165 0 2 1 1 4"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head()"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "0de26781",
"metadata": {},
"outputs": [],
"source": [
"finalFeaturedDataset = data[['Age', 'Country','Disability','Class','GenderNo']]"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "0042371f",
"metadata": {},
"outputs": [],
"source": [
"from sklearn.preprocessing import MinMaxScaler\n",
"scaler = MinMaxScaler(feature_range=(0,1)) \n",
"\n",
"#assign scaler to column:\n",
"data = pd.DataFrame(scaler.fit_transform(finalFeaturedDataset), columns=finalFeaturedDataset.columns)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "c5b4937c",
"metadata": {},
"outputs": [],
"source": [
"X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.20, random_state=123)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "bb0a6886",
"metadata": {},
"outputs": [],
"source": [
"from sklearn.svm import SVC, LinearSVC"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "da921c77",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "1dad2d9530bd43c7b6abdadac29fdfa4",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/10 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#X_train, X_test, Y_train, Y_test,Y_pred\n",
"\n",
"linear_svc = LinearSVC()\n",
"for i in tqdm(range(10)):\n",
" linear_svc.fit(X_train, Y_train)\n",
"\n",
"Y_pred = linear_svc.predict(X_test)\n",
"\n",
"acc_linear_svc = metrics.accuracy_score(Y_test, Y_pred)\n",
"pre_linear_svc = metrics.precision_score(Y_test,Y_pred, average='macro')\n",
"recall_linear_svc = metrics.recall_score(Y_test,Y_pred, average='macro')\n",
"f1_linear_svc = metrics.f1_score(Y_test,Y_pred, average='macro')"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "a79ef29a",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 0.2738336713995943\n",
"Precision: 0.35832646520146516\n",
"Recall: 0.11236006475404055\n",
"f1-score: 0.06488542557685116\n"
]
}
],
"source": [
"print(\"Accuracy:\", metrics.accuracy_score(Y_test, Y_pred))\n",
"print(\"Precision:\", metrics.precision_score(Y_test, Y_pred,average='macro'))\n",
"print(\"Recall:\", metrics.recall_score(Y_test, Y_pred,average='macro'))\n",
"print(\"f1-score:\", metrics.f1_score(Y_test,Y_pred, average='macro'))"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "ee385170",
"metadata": {},
"outputs": [],
"source": [
"from sklearn.neighbors import KNeighborsClassifier"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "a1bb8f06",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "0df9475381644df594ce15d28dfe72a5",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/10 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"#X_train, X_test, Y_train, Y_test\n",
"\n",
"knn = KNeighborsClassifier(n_neighbors = 2)\n",
"for i in tqdm(range(10)):\n",
" knn.fit(X_train, Y_train) \n",
"Y_pred = knn.predict(X_test)\n",
"\n",
"\n",
"acc_knn = metrics.accuracy_score(Y_test, Y_pred)\n",
"pre_knn = metrics.precision_score(Y_test,Y_pred, average='macro')\n",
"recall_knn = metrics.recall_score(Y_test,Y_pred, average='macro')\n",
"f1_knn = metrics.f1_score(Y_test,Y_pred, average='macro')"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "f25572c9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 0.9350912778904665\n",
"Precision: 0.8078615566477673\n",
"Recall: 0.717261850491169\n",
"f1-score: 0.7114544346026009\n"
]
}
],
"source": [
"print(\"Accuracy:\", metrics.accuracy_score(Y_test, Y_pred))\n",
"print(\"Precision:\", metrics.precision_score(Y_test, Y_pred,average='macro'))\n",
"print(\"Recall:\", metrics.recall_score(Y_test, Y_pred,average='macro'))\n",
"print(\"f1-score:\", metrics.f1_score(Y_test,Y_pred, average='macro'))"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "ac6748e8",
"metadata": {},
"outputs": [],
"source": [
"from sklearn.ensemble import RandomForestClassifier"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "2f48f36e",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d2d315141cb1411ebf65727eae04efc9",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" 0%| | 0/10 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"rf = RandomForestClassifier()\n",
"for i in tqdm(range(10)):\n",
" rf.fit(X_train,Y_train)\n",
"Y_pred = rf.predict(X_test)\n",
"\n",
"acc_rf = metrics.accuracy_score(Y_test, Y_pred)\n",
"pre_rf = metrics.precision_score(Y_test,Y_pred, average='macro')\n",
"recall_rf = metrics.recall_score(Y_test,Y_pred, average='macro')\n",
"f1_rf = metrics.f1_score(Y_test,Y_pred, average='macro')"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "fc1ebd91",
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 0.9445571331981069\n",
"Precision: 0.7413751655908479\n",
"Recall: 0.7385777558448817\n",
"f1-score: 0.7368101892594321\n"
]
}
],
"source": [
"print(\"Accuracy:\", metrics.accuracy_score(Y_test, Y_pred))\n",
"print(\"Precision:\", metrics.precision_score(Y_test, Y_pred,average='macro'))\n",
"print(\"Recall:\", metrics.recall_score(Y_test, Y_pred,average='macro'))\n",
"print(\"f1-score:\", metrics.f1_score(Y_test,Y_pred, average='macro'))"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "e7086868",
"metadata": {},
"outputs": [
{
"ename": "NameError",
"evalue": "name 'acc_log' is not defined",
"output_type": "error",
"traceback": [
"\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[1;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[1;32m<ipython-input-24-d117186c3a1d>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m 1\u001b[0m results = pd.DataFrame({\n\u001b[0;32m 2\u001b[0m \u001b[1;34m'Model'\u001b[0m\u001b[1;33m:\u001b[0m \u001b[1;33m[\u001b[0m\u001b[1;34m'Support Vector Machines'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'KNN'\u001b[0m\u001b[1;33m,\u001b[0m \u001b[1;34m'Logistic Regression'\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;34m'Random Forest'\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m \u001b[1;34m'Accuracy'\u001b[0m\u001b[1;33m:\u001b[0m \u001b[1;33m[\u001b[0m\u001b[0macc_linear_svc\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0macc_knn\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0macc_log\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0macc_rf\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 4\u001b[0m \u001b[1;34m'Precission'\u001b[0m\u001b[1;33m:\u001b[0m \u001b[1;33m[\u001b[0m\u001b[0mpre_linear_svc\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mpre_knn\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mpre_log\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mpre_rf\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 5\u001b[0m \u001b[1;34m'Recall'\u001b[0m\u001b[1;33m:\u001b[0m \u001b[1;33m[\u001b[0m\u001b[0mrecall_linear_svc\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mrecall_knn\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mrecall_log\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mrecall_rf\u001b[0m\u001b[1;33m]\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
"\u001b[1;31mNameError\u001b[0m: name 'acc_log' is not defined"
]
}
],
"source": [
"results = pd.DataFrame({\n",
" 'Model': ['Support Vector Machines', 'KNN', 'Random Forest'],\n",
" 'Accuracy': [acc_linear_svc, acc_knn, acc_rf],\n",
" 'Precission': [pre_linear_svc, pre_knn, pre_log, pre_rf],\n",
" 'Recall': [recall_linear_svc, recall_knn, recall_log, recall_rf],\n",
" 'F1-Score': [f1_linear_svc, f1_knn, f1_log, f1_rf]})\n",
"\n",
"result_df = results.sort_values(by='Accuracy', ascending=False)\n",
"result_df = result_df.set_index('Accuracy')\n",
"result_df"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "edd5c2db",
"metadata": {},
"outputs": [],
"source": [
"results= pd.DataFrame({'Model': ['S V M', 'KNN', 'Logistic R','Random Forest'], 'Score': [acc_linear_svc, acc_knn, acc_log, acc_rf ]})\n",
"\n",
"ax = results.plot.bar(x='Model', y='Score', rot=90)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3148b25",
"metadata": {},
"outputs": [],
"source": [
"#Save the Decision Tree strained modelusing pickle\n",
"import pickle\n",
"with open('ab_classifier_Random_forest', 'wb') as picklefile:\n",
" pickle.dump(rf,picklefile)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "698412e2",
"metadata": {},
"outputs": [],
"source": [
"with open('ab_classifier_Random_forest', 'rb') as training_model:\n",
" model6 = pickle.load(training_model)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f5964b95",
"metadata": {},
"outputs": [],
"source": [
"def start_questionnaire():\n",
" my_predictors = []\n",
" parameters=['Age', 'Count','Country','Disability','Class','GenderNo']\n",
" \n",
" print('Input Passenger Information:')\n",
" \n",
" age = input(\"Passenger age: >>> \") \n",
" my_predictors.append(age)\n",
" count = input(\"Passenger Count: >>> \") \n",
" my_predictors.append(count)\n",
" country = input(\"Passenger Country: >>> \") \n",
" my_predictors.append(country)\n",
" disability = input(\"Any Disability: >>> \")\n",
" my_predictors.append(disability)\n",
" classNo = input(\"Choice Class: >>> \")\n",
" my_predictors.append(classNo)\n",
" gender = input(\"Passenger Gender: >>> \")\n",
" my_predictors.append(gender)\n",
" \n",
" my_data = dict(zip(parameters, my_predictors))\n",
" my_df = pd.DataFrame(my_data, index=[0])\n",
" scaler = MinMaxScaler(feature_range=(1,6))\n",
" \n",
" # assign scaler to column:\n",
" my_df_scaled = pd.DataFrame(scaler.fit_transform(my_df), columns=my_df.columns)\n",
" my_y_pred = model6.predict(my_df)\n",
" print('\\n')\n",
" print('Result:')\n",
" print(my_y_pred);"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b1b355f4",
"metadata": {},
"outputs": [],
"source": [
"start_questionnaire()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e622dcf5",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
import random
import json
import torch
from model import NeuralNet
from nltk_utils import bag_of_words, tokenize
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
with open('intents.json', 'r') as json_data:
intents = json.load(json_data)
FILE = "data.pth"
data = torch.load(FILE)
input_size = data["input_size"]
hidden_size = data["hidden_size"]
output_size = data["output_size"]
all_words = data['all_words']
tags = data['tags']
model_state = data["model_state"]
model = NeuralNet(input_size, hidden_size, output_size).to(device)
model.load_state_dict(model_state)
model.eval()
bot_name = "Sam"
print("Let's chat! (type 'quit' to exit)")
while True:
# sentence = "do you use credit cards?"
sentence = input("You: ")
if sentence == "quit":
break
sentence = tokenize(sentence)
X = bag_of_words(sentence, all_words)
X = X.reshape(1, X.shape[0])
X = torch.from_numpy(X).to(device)
output = model(X)
_, predicted = torch.max(output, dim=1)
tag = tags[predicted.item()]
probs = torch.softmax(output, dim=1)
prob = probs[0][predicted.item()]
if prob.item() > 0.75:
for intent in intents['intents']:
if tag == intent["tag"]:
print(f"{bot_name}: {random.choice(intent['responses'])}")
else:
print(f"{bot_name}: I do not understand...")
File added
{
"intents": [
{
"tag": "greeting",
"patterns": [
"Hi",
"Hey",
"How are you",
"Is anyone there?",
"Hello",
"Good day"
],
"responses": [
"Hey :-)",
"Hello, thanks for visiting",
"Hi there, what can I do for you?",
"Hi there, how can I help?"
]
},
{
"tag": "goodbye",
"patterns": ["Bye", "See you later", "Goodbye"],
"responses": [
"See you later, thanks for visiting",
"Have a nice day",
"Bye! Come back again soon."
]
},
{
"tag": "thanks",
"patterns": ["Thanks", "Thank you", "That's helpful", "Thank's a lot!"],
"responses": ["Happy to help!", "Any time!", "My pleasure"]
},
{
"tag": "anuradhapura-places",
"patterns": ["what are the places i can visit in anuradhapura?", "what are the places I can see in Anuradhapura?", "what are the locations I can see in Anuradhapura", "Why am i going to anuradhapuraya?","What are the tourist places in anuradhapura?"],
"responses": ["You can see Sigiriya, Ruwanweliseya, Thuparamaya, Isurumuniya and many other historical places in Anuradhapura"]
},
{
"tag": "create-sigiriya",
"patterns": ["who made sigiriya?", "who created sigiriya?", "who built sigiriya?","who built lion rock?"],
"responses": ["Sigiriya was built by King Kashyapa"]
},
{
"tag": "see-sigiriya",
"patterns": ["what can i see in sigiriya?", "why am i go to sigiriya?", "what are the beautiful places in sigiriya?","why am i go to lion rock?"],
"responses": ["You can see ancient ponds and wall art in the Sigiriya"]
},
{
"tag": "important-sigiriya",
"patterns": ["what is the important of the sigiriya?", "tell me about sigiriya?", "why people like sigiriya?","why is the important of the sigiriya for us?", "tell me about lion rock?","what is sigiriya?","what is the special of the sigiriya?"],
"responses": ["Sigiriya is one of the most valuable historical monuments of Sri Lanka. Referred by locals as the Eighth Wonder of the World this ancient palace and fortress complex has significant archaeological importance and attracts thousands of tourists every year. It is probably the most visited tourist destination of Sri Lanka."]
},
{
"tag": "when-sigiriya",
"patterns": ["when create sigiriya?", "when built sigiriya?", "which year made sigiriya"],
"responses": ["Since the 3th century BC Sigiriya was used as a monastery and after eight centuries it was turned into a royal palace"]
},
{
"tag": "old-sigiriya",
"patterns": ["how old sigiriya?", "how many years sigiriya?"],
"responses": ["Archeological excavations have proven that Sigiriya and its surrounding territories were inhabited for more than 4000 years."]
},
{
"tag": "crowd-sigiriya",
"patterns": ["how many peoples comes to the sigiriya in the one day?", "how many crowd visit to the sigiriya in a day?"],
"responses": ["Around 2000 people come to visit Sigiriya daily."]
},
{
"tag": "ticket-sigiriya",
"patterns": ["what is the ticket price of sigiriya", "entrance fee of sigiriya?"],
"responses": ["You should by a ticket which price of US$30 or 4620 LKR for tourists, or 50 LKR for Sri Lankan citizens."]
},
{
"tag": "heritage-sigiriya",
"patterns": ["is sigiriya world heritage?", "when sigiriya become the heritage?"],
"responses": ["Sigiriya is a UNESCO listed World Heritage Site since 1982."]
},
{
"tag": "station-sigiriya",
"patterns": ["what is the nearest railway station to the sigiriya", "how long so far to closest railway station from sigiriya?","where is sigiriya"],
"responses": ["Habarna is the closet railway station to Sigiriya. It's 15km away from Sigiriya."]
},
{
"tag": "heigh-sigiriya",
"patterns": ["what is the height of sigiriya", "what is the peak of sigiriya?","elevation of sigiriya?","elevation of sigiriya"],
"responses": ["1,144 feet (349 metres) above sea level and is some 600 feet (180 metres) above the surrounding plain."]
},
{
"tag": "why-sigiriya",
"patterns": ["why create sigiriya", "what is reason for make sigiriya?","what is the main purpose of sigiriya","why built sigiriya"],
"responses": ["1In India he raised an army with the intention of returning and retaking the throne of Sri Lanka, which he considered to be rightfully his. Expecting the inevitable return of Moggallana, Kashyapa is said to have built his palace on the summit of Sigiriya as a fortress as well as a pleasure palace."]
}
]
}
import torch
import torch.nn as nn
class NeuralNet(nn.Module):
def __init__(self, input_size, hidden_size, num_classes):
super(NeuralNet, self).__init__()
self.l1 = nn.Linear(input_size, hidden_size)
self.l2 = nn.Linear(hidden_size, hidden_size)
self.l3 = nn.Linear(hidden_size, num_classes)
self.relu = nn.ReLU()
def forward(self, x):
out = self.l1(x)
out = self.relu(out)
out = self.l2(out)
out = self.relu(out)
out = self.l3(out)
# no activation and no softmax at the end
return out
import numpy as np
import nltk
#nltk.download('punkt')
from nltk.stem.porter import PorterStemmer
stemmer = PorterStemmer()
def tokenize(sentence):
"""
split sentence into array of words/tokens
a token can be a word or punctuation character, or number
"""
return nltk.word_tokenize(sentence)
def stem(word):
"""
stemming = find the root form of the word
examples:
words = ["organize", "organizes", "organizing"]
words = [stem(w) for w in words]
-> ["organ", "organ", "organ"]
"""
return stemmer.stem(word.lower())
def bag_of_words(tokenized_sentence, words):
"""
return bag of words array:
1 for each known word that exists in the sentence, 0 otherwise
example:
sentence = ["hello", "how", "are", "you"]
words = ["hi", "hello", "I", "you", "bye", "thank", "cool"]
bog = [ 0 , 1 , 0 , 1 , 0 , 0 , 0]
"""
# stem each word
sentence_words = [stem(word) for word in tokenized_sentence]
# initialize bag with 0 for each word
bag = np.zeros(len(words), dtype=np.float32)
for idx, w in enumerate(words):
if w in sentence_words:
bag[idx] = 1
return bag
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document</title>
</head>
<body>
</body>
</html>
\ No newline at end of file
import json
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
from nltk_utils import bag_of_words, tokenize, stem
from model import NeuralNet
with open('intents.json', 'r') as f:
intents = json.load(f)
all_words = []
tags = []
xy = []
for intent in intents['intents']:
tag = intent['tag']
# add to tag list
tags.append(tag)
for pattern in intent['patterns']:
# tokenize each word in the sentence
w = tokenize(pattern)
# add to our words list
all_words.extend(w)
# add to xy pair
xy.append((w, tag))
ignore_words = ['?', '.', '!']
all_words = [stem(w) for w in all_words if w not in ignore_words]
all_words = sorted(set(all_words))
tags = sorted(set(tags))
print(tags)
X_train = []
y_train = []
for (pattern_sentence, tag) in xy:
# X: bag of words for each pattern_sentence
bag = bag_of_words(pattern_sentence, all_words)
X_train.append(bag)
# y: PyTorch CrossEntropyLoss needs only class labels, not one-hot
label = tags.index(tag)
y_train.append(label)
X_train = np.array(X_train)
y_train = np.array(y_train)
class ChatDataset(Dataset):
def __init__(self):
self.n_samples = len(X_train)
self.x_data = X_train
self.y_data = y_train
# support indexing such that dataset[i] can be used to get i-th sample
def __getitem__(self, index):
return self.x_data[index], self.y_data[index]
# we can call len(dataset) to return the size
def __len__(self):
return self.n_samples
num_epochs = 1000
learning_rate = 0.001
batch_size = 8
input_size = len(X_train[0])
hidden_size = 8
output_size = len(tags)
print(input_size, output_size)
dataset = ChatDataset()
train_loader = DataLoader(dataset=dataset,
batch_size=batch_size,
shuffle=True,
num_workers=0)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = NeuralNet(input_size, hidden_size, output_size).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for epoch in range(num_epochs):
for (words, labels) in train_loader:
words = words.to(device)
labels = labels.to(dtype=torch.long).to(device)
# Forward pass
outputs = model(words)
# if y would be one-hot, we must apply
# labels = torch.max(labels, 1)[1]
loss = criterion(outputs, labels)
# Backward and optimize
optimizer.zero_grad()
loss.backward()
optimizer.step()
if (epoch+1) % 100 == 0:
print (f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')
print(f'final loss: {loss.item():.4f}')
data = {
"model_state": model.state_dict(),
"input_size": input_size,
"hidden_size": hidden_size,
"output_size": output_size,
"all_words": all_words,
"tags": tags
}
FILE = "data.pth"
torch.save(data, FILE)
print(f'training complete. file saved to {FILE}')
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment