Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
2
22_23-J 25
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
1
Merge Requests
1
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
22_23-J 25
22_23-J 25
Commits
6ea46ecd
Commit
6ea46ecd
authored
Feb 03, 2023
by
Ranodya M.J.C IT19987644
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
add law_ai_qna.py file
parent
d9c8009c
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
31 additions
and
0 deletions
+31
-0
backend/law_ai_qna.py
backend/law_ai_qna.py
+31
-0
No files found.
backend/law_ai_qna.py
0 → 100644
View file @
6ea46ecd
import
torch
import
numpy
as
np
import
pandas
as
pd
from
torch.utils.data
import
Dataset
from
transformers
import
TrainingArguments
,
Trainer
from
sklearn.model_selection
import
train_test_split
from
transformers
import
OpenAIGPTTokenizer
,
OpenAIGPTModel
data_path
=
'data/qna-summarization.xlsx'
df
=
pd
.
read_excel
(
data_path
)
Answers
=
df
[
'Answer'
]
.
tolist
()
Question
=
df
[
'Question'
]
.
tolist
()
tokenizer
=
OpenAIGPTTokenizer
.
from_pretrained
(
'openai-gpt'
)
tokenizer
.
add_special_tokens
({
'pad_token'
:
'[PAD]'
})
model
=
OpenAIGPTModel
.
from_pretrained
(
'openai-gpt'
)
question_encoding
=
tokenizer
(
Question
,
return_tensors
=
'pt'
,
padding
=
True
,
truncation
=
True
)
answer_encoding
=
tokenizer
(
Answers
,
return_tensors
=
'pt'
,
padding
=
True
,
truncation
=
True
)
class
QnADataset
(
Dataset
):
def
__init__
(
self
,
question_encoding
,
answer_encoding
):
self
.
question_encoding
=
question_encoding
self
.
answer_encoding
=
answer_encoding
def
__getitem__
(
self
,
idx
):
return
self
.
question_encoding
[
idx
],
self
.
answer_encoding
[
idx
]
def
__len__
(
self
):
return
len
(
self
.
question_encoding
)
dataset
=
QnADataset
(
question_encoding
,
answer_encoding
)
\ No newline at end of file
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment