Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
2
2020-101
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Sachith Fernando
2020-101
Commits
82b58cef
Commit
82b58cef
authored
Nov 02, 2020
by
LiniEisha
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
Reformatting Summary.py
parent
d8f6824a
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
5 additions
and
4 deletions
+5
-4
LectureSummarizingApp/Summary.py
LectureSummarizingApp/Summary.py
+5
-4
No files found.
LectureSummarizingApp/Summary.py
View file @
82b58cef
import
spacy
from
spacy.lang.pt.stop_words
import
STOP_WORDS
from
spacy.lang.pt.stop_words
import
STOP_WORDS
from
sklearn.feature_extraction.text
import
CountVectorizer
from
sklearn.feature_extraction.text
import
CountVectorizer
import
pt_core_news_sm
import
pt_core_news_sm
# Reading the file
nlp
=
pt_core_news_sm
.
load
()
nlp
=
pt_core_news_sm
.
load
()
with
open
(
"audioToText01.txt"
,
"r"
,
encoding
=
"utf-8"
)
as
f
:
with
open
(
"audioToText01.txt"
,
"r"
,
encoding
=
"utf-8"
)
as
f
:
text
=
" "
.
join
(
f
.
readlines
())
text
=
" "
.
join
(
f
.
readlines
())
doc
=
nlp
(
text
)
doc
=
nlp
(
text
)
#calculating the word frequency
corpus
=
[
sent
.
text
.
lower
()
for
sent
in
doc
.
sents
]
corpus
=
[
sent
.
text
.
lower
()
for
sent
in
doc
.
sents
]
cv
=
CountVectorizer
(
stop_words
=
list
(
STOP_WORDS
))
cv
=
CountVectorizer
(
stop_words
=
list
(
STOP_WORDS
))
cv_fit
=
cv
.
fit_transform
(
corpus
)
cv_fit
=
cv
.
fit_transform
(
corpus
)
...
@@ -19,6 +18,7 @@ word_list = cv.get_feature_names()
...
@@ -19,6 +18,7 @@ word_list = cv.get_feature_names()
count_list
=
cv_fit
.
toarray
()
.
sum
(
axis
=
0
)
count_list
=
cv_fit
.
toarray
()
.
sum
(
axis
=
0
)
word_frequency
=
dict
(
zip
(
word_list
,
count_list
))
word_frequency
=
dict
(
zip
(
word_list
,
count_list
))
val
=
sorted
(
word_frequency
.
values
())
val
=
sorted
(
word_frequency
.
values
())
higher_word_frequencies
=
[
word
for
word
,
freq
in
word_frequency
.
items
()
if
freq
in
val
[
-
3
:]]
higher_word_frequencies
=
[
word
for
word
,
freq
in
word_frequency
.
items
()
if
freq
in
val
[
-
3
:]]
print
(
"
\n
Words with higher frequencies: "
,
higher_word_frequencies
)
print
(
"
\n
Words with higher frequencies: "
,
higher_word_frequencies
)
...
@@ -27,6 +27,7 @@ higher_frequency = val[-1]
...
@@ -27,6 +27,7 @@ higher_frequency = val[-1]
for
word
in
word_frequency
.
keys
():
for
word
in
word_frequency
.
keys
():
word_frequency
[
word
]
=
(
word_frequency
[
word
]
/
higher_frequency
)
word_frequency
[
word
]
=
(
word_frequency
[
word
]
/
higher_frequency
)
#calculating sentence rank and taking top ranked sentences for the summary
sentence_rank
=
{}
sentence_rank
=
{}
for
sent
in
doc
.
sents
:
for
sent
in
doc
.
sents
:
for
word
in
sent
:
for
word
in
sent
:
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment