Template Credit: Adapted from a template made available by Dr. Jason Brownlee of Machine Learning Mastery.
SUMMARY: This project aims to construct a text classification model using a neural network and document the end-to-end steps using a template. The IMDB Movie Sentiment dataset is a binary classification situation where we attempt to predict one of the two possible outcomes.
INTRODUCTION: This dataset contains 50,000 movie reviews extracted from IMDB. The researchers have annotated the tweets with labels (0 = negative, 1 = positive) to detect the reviews’ sentiment.
From iteration Take1, we created a bag-of-words model to perform binary classification (positive or negative) for the Tweets. The Part A script focused on building the model with the training and validation datasets due to memory capacity constraints. Part B focused on testing the model with the training and test datasets.
In this Take2 iteration, we will create a word-embedding model to perform binary classification for the Tweets.
ANALYSIS: From iteration Take1, the preliminary model’s performance achieved an accuracy score of 88.80% on the validation dataset after ten epochs. Furthermore, the final model processed the test dataset with an accuracy measurement of 89.48%.
In this Take2 iteration, the preliminary model’s performance achieved an average accuracy score of 88.40% on the validation dataset after ten epochs. Furthermore, the final model processed the test dataset with an accuracy measurement of 89.66%.
CONCLUSION: In this iteration, the word-embedding TensorFlow model appeared to be suitable for modeling this dataset. We should consider experimenting with TensorFlow for further modeling.
Dataset Used: IMDB Movie Sentiment
Dataset ML Model: Binary class text classification with text-oriented features
Dataset Reference: https://www.kaggle.com/columbine/imdb-dataset-sentiment-analysis-in-csv-format
One potential source of performance benchmarks: https://www.kaggle.com/columbine/imdb-dataset-sentiment-analysis-in-csv-format
The HTML formatted report can be found here on GitHub.