Improving Text Classification using Deep Neural Networks and Word Embeddings

Subscribe to access this work and thousands more

Natural Language Processing is a flourishing aspect of Data Science in both academia and industry. This paper examines the technique of sentiment classification on text data using automated Machine learning approaches. It will delve into newly introduced embedding techniques and machine learning model pipelines. The models explored in this paper consist of a simple Naive Bayes classifier, as well as several Deep Neural Models, including a Gated Recurrent Unit (GRU) and a Long Term Short Term architecture. Several different forms of numerical representations of words (word em- beddings) are introduced to supplement the neural models in an attempt to build a solid sentiment predictor.

Overall, the Neural methods outperformed Naive Bayes. Furthermore, the neural methods performed very similarly; the word embed- dings and suggested feature extraction yielded very little gain, leading to the conclusion that in the case of these experiments, a simple model would be preferred. Perhaps in the future for a stronger model, it would be beneficial to optimize the parameters with grid search, or by acquiring a larger variety of data for transfer learning. 

Subscribe to access this work and thousands more