Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
2
2021-118
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
2021_118- Gazette
2021-118
Commits
06747cb1
Commit
06747cb1
authored
Nov 25, 2021
by
IT18396164-Silva K.K.S
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Clustering code edited
parent
ee477dd0
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
27 deletions
+2
-27
cgp1/CgpApp/topicclustering.py
cgp1/CgpApp/topicclustering.py
+2
-27
No files found.
cgp1/CgpApp/topicclustering.py
View file @
06747cb1
...
...
@@ -24,27 +24,7 @@ import json
class
TopicCluster
:
def
cluster
():
texts
=
[
"Registrar of Births Deaths and MarriagesAdditional Marriages Kandyan"
,
"Registrar of Muslim Marriages -Gampaha"
,
"Registrar of Births Deaths and Marriages"
,
"Registrar of Muslim Marriages -Ratnapura"
,
"Registrar of Births Deaths and MarriagesAdditional Marriages Kandyan"
,
"Teacher Services 2021 for sinhala,Tamil and English-Kaluthara District"
,
"Teacher Services 2021 for sinhala,Tamil and English-Galle District"
,
"Teacher Services for sinhala-Ratnapura District"
,
"Medical officer preliminary grade i"
,
"medical consultant"
,
"medical officer grade i"
,
"medical officer grade ii"
,
"Medical officer preliminary grade ii"
,
"Community Development Assistant"
,
"Data Processing Assistant -colombo"
,
"Community Development Assistant"
,
"Data Processing Assistant -ratnapura"
,
"Social Development Assistant"
,
]
#clustering with k-means
count_vectorizer
=
CountVectorizer
()
# .fit_transfer TOKENIZES and COUNTS
...
...
@@ -109,12 +89,7 @@ class TopicCluster:
l2_df
=
pd
.
DataFrame
(
X
.
toarray
(),
columns
=
l2_vectorizer
.
get_feature_names
())
# l2_df
# Initialize a vectorizer
vectorizer
=
TfidfVectorizer
(
use_idf
=
True
,
tokenizer
=
stemming_tokenizer
,
stop_words
=
'english'
)
X
=
vectorizer
.
fit_transform
(
texts
)
#removing unwanted file
# distortions = []
# K = range(1, 8)
# for k in K:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment