Using NLP and LSTM to combat cyberbullying

Pillars
IndiaAI Portal
Resources
Ecosystem
Sectors

Back

Results for ""

IndiaAI Recommends

NLP 5 Min Read Sep 29, 2021

Using Machine learning for NLP usually takes more time in training and hand crafting features, instead Deep learnings techniques can be used to improve the accuracy - Using deep learning techniques such as RNN, CNN, LSTM etc. is also useful because slurs and insults can often be, intentionally or not, misspelled which are better detected with deep learning techniques compared to machine learning algorithms.

Published By : Kumud Gautami

What is Cyberbullying?

Cyberbullying is any type of bullying or harassment using electronic medium.

Why is Cyberbullying trend worry-some?

A recent study by Child Rights and You, a non-governmental organisation, found that nearly 9.2% of 630 children surveyed in Delhi –National Capital Region reported that they experienced cyberbullying. [scroll.in]
In a data released by the National Crime Records Bureau showed that cases of cyberstalking or bullying of women or children increased by 36% from 542 in 2017 to 739 in 2018. [scroll.in]
25 percent of students who are cyberbullied turn to self-harm to cope. [pandasecurity.com]
A separate study found that young adults who experience cyberbullying are twice as likely to self-harm and execute suicidal behavior. [pandasecurity.com]

How can AI techniques can help combat cyberbullying?

NLP Techniques can be used to determine the tone of speech to detect specify sentiments such as bullying, hate speech etc.
NLP algorithms have advantage over parental control software and keyword-spotting blockers in that they can be trained to recognize subtle and sarcastic comments
Using Machine learning for NLP usually takes more time in training and hand crafting features, instead Deep learning techniques can be used to improve the accuracy
Using deep learning techniques such as RNN, CNN, LSTM etc. is also useful because slurs and insults can often be, intentionally or not, misspelled which are better detected with deep learning techniques compared to machine learning algorithms.

Current work in Industries:

There are lot of platforms that are using NLP and deep learning techniques to combat cyber bullying, example:

In June 2016, Facebook introduced DeepText as “a deep learning-based text understanding engine that can understand with near-human accuracy the textual content of several thousand posts per second.
Twitter also uses AI technology to spot spam and recognize negative interactions.

A case study to understand deep learning better:

Let us explore a sample case study to understand NLP-Deep learning better. For this we have taken a Kaggle Twitter hate speech data set.

This datasets contains around 30K training tweets labelled 1 or 0 where 1 corresponds to hate speech.

From data distribution below we can see that we have only 7% data available classified as hate comment this warrants data imbalance techniques, for purpose of this case study however we’ll continue with current distribution.

Data Processing

Before proceeding we must clean the data and prepare for model.

Our text pre-processing will include the following steps:

removing special characters
convert all letters to lower case
remove stop words
lemmatization (Lemmatization looks at surrounding text to determine a given word's part of speech)

Let’s now make test train split (20%)

Model Building

Let’s set hyper parameter:

Tokenization is a method used to break raw text into smaller units (can be words, sentence, characters, or subwords) called tokens. These token help understand the context and develop NLP model.

LSTM Model

LSTMs – are a special kind of RNN, capable of learning long-term dependencies. In LSTM we can use string of multiple words to identify to which class it belongs to. This is quite helpful with NLP.

We'll architect sequential model and add various layers to it:

first layer is the Embedding layer that uses 100 length vectors to represent each word. Word embedding provide a dense representation of words and their relative meanings.
second layer is the LSTM layer with 16 neurons.
third layer is the LSTM layer with 6 neurons.
dense layer is the output layer which has 2 cells representing the 2 different categories in this case. activation function is sigmoid for binary-class classification.
Finally, we'll use adam optimizer and binary_crossentropy. Adam optimizer is currently best optimizer for handling sparse gradients and noisy problems. binary_crossentropy is used as the loss function since this problem has binary outputs.

(We must experiment with different layers and hyper parameters to train the model and get best result.)

Sample model with above layers we can see as below:

Conclusion

As shown, in this case study we are able to achieve decent accuracy with LSTM of ~95%. With more hyperparameter trainings or with different neural network model we can use better results.

Deep Learnings are finding increased use in the field of NLP due to their versatility.

For complete code you can refer my Kaggle notebook here

cover image @freepik.com

Want to publish your content?

Publish an article and share your insights to the world.