README.md 3.52 KB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# Learn Joy - Function 3

## Table of Contents
[Function 3 Development](#function-3-development)
  * [1. About Function 3](#1-About-Function-3)
  * [2. Function 3 processing & ML Model Development](#3-Function-3-processing-&-ML-Model-Development-folder)
  * [3. Service Files](#3-Service-Files-folder)
  * [4. Fast API & app file](#4-Fast-API-&-app-file-folder)

-------------------------------------------------
## Function 3 Development

### 1.About Function 3

- This function include three main parts:
- 1. Speech to text : Convert the voice to text children with Dyslexia 
- 2. Accuracy : The accuracy of the pronunciation should be given as a percentage.
- 3. Text to speech : Convert the the Missing or wrong pronounsation words to voice.


Prabuddha Gimhan's avatar
Prabuddha Gimhan committed
21
### 2. Function 3 processing & ML Model Development ([Link](http://gitlab.sliit.lk/tmp-2023-24-059/learnjoy-ml/-/blob/IT20643072/lj_function03.ipynb))
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83

- Inside this folder, you'll come across one Jupyter Notebooks dedicated to data preprocessing and model training:

  - `lj_function03.ipynb`: This file primarily involves to loard the speech to text & text to spech pre training models and finaly return the accuracy score.

### 3. Service Files ()
- Within this directory, there are one python ".py" files:

  - `lj_functiono3.py`: This service file include three main functions(speech to text , scoring , text to speech)
    - input : sound file and text  `def speech_to_text(audio_file) , def scoring(words,transcriptions) ,def text_to_speech(text,return_tensors="pt")`
    - Attached models : speech to text ([Link](https://drive.google.com/drive/folders/1jvP4lyhkyLbhtv0fUohLHaeQOnjGsttH?usp=drive_link))
                       - License : ` Apache License` ([Link](https://github.com/huggingface/transformers/blob/3cefac1d974db5e2825a0cb2b842883a628be7a0/src/transformers/models/wav2vec2/processing_wav2vec2.py))
                      Text to speech ([Link](https://drive.google.com/drive/folders/1pWtfsLg4IyvPTjKC-a-0-PEIWbZTyYE_?usp=drive_link))
                       - License : `  MIT License` ([Link](https://github.com/microsoft/SpeechT5?tab=MIT-1-ov-file#readme))

    - Process : This method is responsible for get the speech and convertto text and compaired with orginal text and return to speech accurasy and missing word speech
    - Output : sentences and word scoring, missing words and voice of missing words.

---------------------

### 4. Fast API & app file ()
- Within this app.py module include one Fast API end points for function 3 :
  - 1. Request body : orginal word , audio file
      ```python
      @app.post("/function3/STT")    
      async def main_fun03(words:str, audio_file: UploadFile = File(...)):
       ```
      
    - Request URL: ` https://Public_or_local_host.app`
    - Response body: 
      ```json
           {
       "Scoring": {
         "final_sent_score": 100,
         "final_word_score": 100,
         "missing_voice2": []
       },
       "audio_file": {
         "path": "C:\\Users\\KAUSH\\AppData\\Local\\Temp\\tmpefzwhc1x.wav",
         "status_code": 200,
         "filename": "speech.wav",
         "media_type": "audio/wav",
         "background": null,
         "raw_headers": [
           [
             "content-type",
             "audio/wav"
           ],
           [
             "content-disposition",
             "attachment; filename=\"speech.wav\""
           ]
         ],
         "_headers": {
           "content-type": "audio/wav",
           "content-disposition": "attachment; filename=\"speech.wav\""
         },
         "stat_result": null
       }
     }

       ```
Prabuddha Gimhan's avatar
Prabuddha Gimhan committed
84

Prabuddha Gimhan's avatar
Prabuddha Gimhan committed
85