Update README.md

54305b85 · Harshana Supun Buddhika Abeysinghe · 3cbcaa6f · 54305b85
Commit 54305b85 authored Oct 07, 2022 by Harshana Supun Buddhika Abeysinghe
Hide whitespace changes
Inline Side-by-side

Showing with 170 additions and 0 deletions

README.md README.md +170 -0

No files found.
--- a/README.md
+++ b/README.md
 # 22_23-J 53
+
+**Main Objective **
+
+Our main goal in this project is to detect and analyze potential users with depression and suicidal feeling on the social media. So, we can treat them and get rid of the conditions. As mentioned about the best method to detect something like depression and suicidal thoughts is from patient to doctor therapy sessions. But these therapy sessions sometimes can be expensive and time consuming. So our Idea was to detect these conditions from someone’s social media posts, pictures, videos etc.  
+There are many research based on this topic but most of them only use text classifications. But nowadays people not only post text in the social media but also photos, videos, voice etc.. . So it only going to be more accurate if we use both text and multimedia analysis to detect the Depression or Suicidal thoughts in this system. As mentioned in above it is so important to find a solution to detect these conditions of a person through social media.
+Text analysis is the most common method used in these type of researches. Here we try to get useful information from user posted text data to identify depression. Specifically, because of the success of CNN(Convolution Neural Network ) deep learning model in text understanding. And  also use  LSTM(Long Short-term Memory) Use the text data set regarding depression persons to detect this. Using the combinations of these networks we will be able to get a more accurate output rather than using one of these networks. 
+For the visual analysis part, the structure will be designed by CNN(Convolution Neural Network  - Type of artificial neural network, which is widely used for image/object recognition and) For that first we have to separate the data set into two parts and then train the model. As for the data we are going to use images such as profile pictures and images posted by the users. So the CNN is going to use for extraction of complex features from the images and classify them into whether they have depression or not. 
+
+We have two solutions for the visual analysis part, the structure will be designed by combining CNN (Convolution Neural Network - Type of artificial neural network, which is widely used for image/object recognition and) and LSTM(Long Short-term Memory-Used in the field of Deep Learning. It is a variety of recurrent neural network (RNNs) that are capable of learning long-term dependencies, especially in sequence prediction problems) networks. For that first we have to separate the data set into two parts and then train the model. As for the data we are going to use images such as profile pictures and images posted by the users. So, the CNN is going to use for extraction of complex features from the images and LSTM is going to use for the classification. So we are going get the best solution out of these two for the visual analysis. 
+As mentioned above our main goal is to implement a system that use both visual and textual features to detect these conditions. So as our final step we are going to get the results from both visual and textual classifications and detect the final results whether the user suffer from above mentioned conditions. 
+
+
+**Research Problem **
+
+
+Depression, as described in the world health organization’s comprehensive mental health action plan 2013-2020[1], is one of the most common mental disorders. More than 300 million people worldwide suffer from chronic depression. In the modern world people are under lot of stress due to many reasons. For an example they might be under stress because of Personal Problems, Problems at work or even because of problems in the community or environment he/she lives. But Due to the lack of knowledge and lack of awareness of these conditions most of them are not diagnosed at the early stages. So this might sometimes will lead to suicide. But if we detect these conditions at the early stages, we can treat them get rid of the depression conditions [2]. This is the most effective way to diagnose depression, more than 70% of people with early depression go untreated, worsening their condition because that most people lack medical knowledge and do not realize the risks of the disease, or they are ashamed of the disease and do not seek medical treatment. In clinical diagnosis, psychologists usually refer to standard diagnostic guidelines for diseases, such as PHQ 2[3] and PHQ 9 [4] tests  (Patient Health Questionnaire) , and conduct face-to-face interviews. 
+
+Social media occupies an important part of the people’s daily life. Through social media they share lot of multimedia content. Since most of the peoples use social media in these days it is a good source of information to have an idea about feelings and mental state of a person. Depression conditions might be varied; from one person to other so it is important to identify all of them in order to treat them.
+
+
+References 
+[1]	Who.int. 2022. Doing What Matters in Times of Stress. [online] Available at: <https://www.who.int/publications-detail-redirect/9789240003927>[Accessed 10 July 2022]. 
+ 
+[2]	Marks, M. Artificial Intelligence Based Suicide Prediction. Yale J. Health Policy Law Ethics 2019, Forthcoming. 
+ 
+[3]	Hiv.uw.edu. 2022. Patient Health Questionnaire-2 (PHQ-2) - Mental Disorders Screening - National HIV Curriculum. [online] Available at: <https://www.hiv.uw.edu/page/mental-health-screening/phq-2> [Accessed 11 July 2022]. 
+ 
+[4] 	Hiv.uw.edu. 2022. Patient Health Questionnaire-9 (PHQ-9) - Mental Disorders Screening - National HIV Curriculum. [online] Available at: <https://www.hiv.uw.edu/page/mental-health-screening/phq-9> [Accessed 11 July 2022].s 
+
+**Individual Objectives**
+
+Main Objective: 
+Detection of Depression and Detection of Suicidal Ideation through social media
+
+Sub Objective 1: 	Detect depression through text data
+Use the text data set regarding depression persons to detect this and also use CNN technique and LSTM neural network technique for detection.
+
+Sub Objective 2: 	Detect depression through visual data
+For that we use the data set including with images. Use CNN algorithm and hope to use LSTM neural network technique as a hybrid model by using these two techniques
+
+Sub Objective 3: 	Detect suicidal Ideation through text data
+			Using the text data set and finding the relevant suicidal individuals to     				identify this. utilize the CNN and LSTM neural network techniques to do so.
+			
+Sub Objective 4: 	Detect suicidal Ideation through visual data
+To increase the accuracy of the model, here we hope to use CNN algorithm and LSTM neural network technique as a hybrid model to identify suicidal Ideation in data set which includes images.
+
+
+
+Member 1
+Abeysinghe M.G.H.S.B.		: Depression detection through visual data Analysis 
+Our system offers several features to show how depression detection works and what it finds. We first give a peek of some example users' uploaded pictures. Users with depression and regular users are both included in the sample. The selected user's photographs are then processed by the online detection function using the trained model to get the prediction outcome. Besides recognizing depression Additionally, our technology has the ability to automatically generate depression analysis reports that graphically and quantitatively demonstrate the criteria used to categorize a user as having a depression or not. Such a report aids both specialists and users in comprehending the psychological expression of the user.
+The analysis report provides an illustration of the likelihood of the input user's projected results. Additionally, using each Reddit post and any associated images posted by the input user in chronological order, we depict the likelihood curve. It shows the likelihood and trend of the input user having depression. Analyses like these can be utilized to avert problems. We are only going to preserve one or two of the photographs each user posts due to space restrictions. We discover that models using solely picture attributes are unable to accurately detect these users. It suggests that the suggested approach's successful blending of visual and text elements can aid in the identification of depressed users.
+My role in this project is to analyze visual data (images) to detect whether a user has depression or not. I will therefore use CNN to extract the complex features from the images and to classify the images into whether they have depression or not. 
+Implementation of CNN network:
+For this project we are going to separate our dataset mainly into two parts. The profile images and posted images of the users that their Reddit are used for text feature extraction, are collected as the image set A for visual feature extraction. A set of profile images and posted images from other users with and without depression are built as B for classifier training.  
+For all users with depression, a set of their related images, C can be acquired. While for other users without depression, the set of images, D is also collected. Then we are going to train a deep convolutional neural network. Then we fine-tune ImageNet pre-trained network using C and D data. Then the A image set is fed into such binary classifier to generate user visual features. The feature vectors from the fully connected layer of trained CNN classifier are extracted as the deep visual feature representation.
+Now as for the classification part Data from the B dataset will be divided into two parts. These two parts of data will be used to train and test the classifier. (In here we might have to collect this dataset after we classified these users whether they have depression or not through textual classification)
+
+Because of the less practical knowledge in these kinds of projects I decide to take another approach too. At the end I’m going to implement the method that is most accurate. That is to complete the visual analysis part using combination of both CNN and LSTM network. I will therefore combine CNN and the LSTM Network for this. Whereas the LSTM will be used to classify the photos and CNN will be used to extract complicated characteristics from the images.
+Implementation of CNN network:
+In order to extract visual features, we will first collect user-posted images and profile pictures under a single data set called A. We don't care whether the user has depression in this data   
+
+For all users with depression, a set of their related images, B can be acquired. While for other users without depression, the set of images, C is also collected. Then we are going to train a deep convolutional neural network. Then we fine-tune ImageNet pre-trained network using B and C data. Then the A image set is fed into such binary classifier to generate user visual features. The feature vectors from the fully connected layer of trained CNN classifier are extracted as the deep visual feature representation. The main idea behind a CNN is that it can obtain local features from high layer inputs and transfer them to lower layers for more complex features.  
+Implementation of LSTM Network:	
+As I mentioned previously, I'll extract complex features from the photos using the CNN network and classify the images using the LSTM network. I will implement the LSTM network when I have finished implementing the CNN network.
+•	First, we are going will split the data into two separate data frames — one for training and the other for validation.
+•	Mainly there are four ways we can use LSTM in our model. But since we are going to predict the future values from a sequence of values we are creating a many-to-many prediction model, we need to use a slightly more complex encoder-decoder configuration. Both encoder and decoder are hidden LSTM layers, with information passed from one to another via a repeat vector layer.
+•	Then a repeat vector will be needed when we want to have sequences of different lengths. It will ensure that  we provide the right shape for a decoder layer. 
+•	We will need to add Bidirectional wrapper to LSTM layers. It allows us to train the model in both directions, which sometimes produces better results. 
+•	Also, we need to use a Time Distributed wrapper in the output layer to predict outputs for each timestep individually.
+•	Finally, we are going to use unscaled data in this project because it has produced better results than the model trained with scaled data. 
+
+In this study, a combined method was developed to detect people with depression using there profile picture and post images. The structure of this architecture was designed by combining CNN and LSTM networks, where the CNN is used to extract complex features from images, and LSTM is used as a classifier. After analysing the characteristics of the user images, the architecture sorts the images through a fully connected layer to predict whether user is under depression or not.
+
+
+
+
+Member 2
+Madurapperuma P.L.		: Suicidal Ideation detection through visual data Analysis
+To detect Suicidal Ideation in a Visualized data analysis Classification methodology is a well-recognized way. In Classification method, we divide the data set as training data set and testing data set. Training data set is used for training the model and testing data set is used for check the accuracy of the model which we trained using the training data set. When selecting the data set two types of people are included and for the one people, we are going to use one or two images posted by him or her to the social media like profile picture and another posted image. Those above mentioned two types of people are,
+1.People who are not suffering with Suicidal Ideation.
+2.People who are suffering with Suicidal Ideation. 
+Using the Suicidal Ideation Analysis report which is generate by the system automatically, we hope to give the reasons to why that people has classified under one of the above mentioned categories and many other important points which are important to the relevant parties.
+When considering about the accuracy we realized that using the combination of images and texts to detect Suicidal Ideation produces a more accurate result rather than using only images to detect Suicidal Ideation and to detect Suicidal Ideation using Visual data I am planning to use CNN (Convolutional Neural Networks) and LSTM (Long Short-Term Memory) networks.
+CNN is used for, CNN is a supervised type of Deep learning, most preferable used in image recognition. Convolutional Neural Network (CNN) works by getting an image, designating it some weightage based on the different objects of the image, and then distinguishing them from each other. CNN is specifically designed to process pixel data and comparing with other deep learning algorithms CNN needs to do a very little pre-processing part.
+Training a neural network typically consists of two phases:
+1.Forward Phase
+2.Backward Phase
+ A forward phase, where the input is passed completely through the network and Backward phase, where gradients are backpropagated (backprop) and weights are updated.
+Face recognition part is hope to achieve using Deep Learning's sub-field that is Convolutional Neural Network (CNN) which is a multi-layer network trained to perform a specific task using classification. The network is first trained on the pictures from the face database, and then it is used to identify the face pictures given to it under the given conditions.
+
+
+LSTM is used for, Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning order dependence in sequence prediction problems. This is a behavior required in complex problem domains like machine translation, speech recognition, and more. This LSTM algorithm is well adapted to categorize, analyze, and predict time series of uncertain duration.
+We train LSTM networks to perform the following tasks involving detect Suicidal Ideation classification and to test and evaluate the performance of the networks 
+for these tasks, we use images from the face database which include profile pictures and user posted images. For face or non-face classification, we select sample faces and non-face templates from the training set. The networks are train to output 1 if it receives Suicidal Ideation and 0 otherwise. We stop the LSTM network training after get the expected accuracy. We identified that LSTM can learn properly the classes even with one sample of each class and a reduced feature set.
+
+In the implementation part to predict the output I hope to use CNN network only or combination of CNN and LSTM networks as a hybrid model and find the best model according to the output that produce by each model separately. 
+
+
+Member 3
+Perera M.R.R.		: Suicidal Ideation detection through textual content
+We try to extract valuable information for suicide diagnosis from the user-posted text data. We train model using Convolution Neural Network (CNN) and long short-term memory (LSTM) to recognize text postings mentioning suicide and perform the optimum preprocessing step.
+Text posts on social media in text format about suicide refer to people who are suicided and occasionally express their emotions in these posts. We will be able to observe phrases or words that fall into these categories in these text posts.
+We are attempting to conclude that LSTM outperforms other machine learning classifiers. We hope that by combining LSTM and CNNs, we will be able to develop a hybrid model that can more accurately identify depression, benefiting people all over the world. We hope that by combining LSTM and CNNs, we will be able to develop a hybrid model that can more accurately identify c, benefiting people all over the world.
+This research proposes a smart and context-aware deep learning framework based on CNN to effectively identify mental-health-related problems from social media user posts with improved classification accuracy. This study combines various sources of data for an effective analysis of suicide-related data. We used a knowledge distillation scheme to transfer knowledge from a large pretrained CNN to a smaller model, and we examined suicide-related data using LSTM. The results show that our proposed system accurately handles mixed data and improves the performance of mental health classification
+
+(i)	A new framework is presented to extract a huge size of highly appropriate suicide-related data from Reddit. In addition, we implemented a combined cyber-community-group-based labeling and keyword-based data crawling technique based on the circumplex model of emotion to identify the desired mental health problem data.
+
+(ii)	A deep neural network-based bidirectional text representation model, that is, CNN, is used to embody mental health problem textual data maintaining contextual and semantic connotations. In addition, we proposed a sequence processing model called long short-term memory (LSTM) as a classifier, which effectively maximizes the amount of information accessible to the network, improving the content available to the algorithm in knowing what words immediately follow and come before a given word in a sentence.
+
+
+
+(iii)	   We propose a knowledge distillation technique, which is a means of transferring knowledge from a large pretrained model (CNN) to a smaller model to maximize performance and accuracy. We filter the large network (CNN) into another much smaller network for mental health-related problem identification, and it performs very well by transferring the required domain knowledge and applying it to a specific healthcare environment.
+
+
+(iv)     We conducted extensive experiments using a principal component analysis (PCA) and different deep learning/ML models, the results of which are compare with other related models. This evaluation plays a key role in regulating the shortcomings of the already applied methods and classification models. The experimental results show that our model performs considerably well over the compare methods, which, after many hyperparameter optimizations, provides a proper accuracy and we can make decision mental health situation of the person
+
+
+ 
+
+Member 4
+Rangana P.W.M.	            : Depression detection through Textual content 
+we attempt to acquire useful information from the user-posted text data for depression detection. Specifically, we do the best preprocessing part and train the CNN model to identify text posts regarding depression. Specifically, we do the best preprocessing part and train the CNN model to identify text posts regarding depression. So using the combination of CNN with LSTM will be able to give results with high accuracy level.
+
+What are the text posts are we going to use ? 
+We are going to look for key words that is going express feelings and expressions such as below mentioned. Sometimes they express their feelings through text posts in social media. 
+Physical	Emotional and mental
+Tiredness or low energy, even when rested	Persistent sadness, anxiousness or irritability
+Restlessness or difficulty concentrating	Loss of interest in friends and activities that they normally enjoy
+Difficulty in carrying out daily activities	Withdrawal from others and loneliness
+Changes in appetite or sleep patterns	Feelings of worthlessness, hopelessness or guilt
+Aches or pains that have no obvious cause	Taking risks they wouldn’t normally take
+
+we are trying to conclude that the performance of LSTM is better than that of other machine learning classifiers .We hope to  merge LSTM and CNN s, we can create a hybrid model that can more accurately identify depression, benefiting people all across the world.
+Step 01 - Select the data set.( Reddit social media dataset where users can express their     opinion via text posts)
+Step 02 - preprocess the data set
+
+Data collected from the social media platform contain some error or may contain useless text, which causes difficulty in semantic analysis. As the dataset we are using is free from emojis, there is no need for emoji processing. Second, the stop words removal task is performed by using NLTK in Python. We can identify the list of stop words, and stemming is used, which ignores stop words and creates systems by removing suffixes or prefixes that are used with the word. In this study, we use a snowball stemmer, which is different from a porter stemmer, as it allows performing multiple language stemmers.
+
+
+Step 03 – Divide the data set in to training data set and testing data set
+Step 04 - Train the CNN and LSTM as a hybrid model
+	            
+CNNs can identify them in the sentence regardless of their position. Recurrent neural networks can obtain context information but the order of words will lead to bias; the text analysis method based on Convolutional neural network (CNN) can obtain important features of text through pooling but it is difficult to obtain contextual information which can be leverage using LSTM. So using the combination of CNN with LSTM will be able to give results with high accuracy level.
+	            
+Step 05 – Implement a user-friendly system to use users. We show up to the user whether he/she has depression or not . According to the level of our trained model we identify the level of him/her depression situation
+
+
+
+
+
+
+
+
+ 
+
+
+
+ 
+
+
+
+
+
+ 
+
+
+
+
+