Arabic Sentiment Analysis: For Arabic Tweets
Abstract. this term paper introduces an arabic social sentiment analysis dataset collected from Twitter crawler, it contains of 2000 labelled tweets (1000 positive tweets and 1000 negative ones) from mixed topics: political and arts, these tweets contain opinions written in both Modern Standard Arabic(MSA) and Jordanian dialect.the selected tweets convey some kind of feelings (Positive or negative).
methodology proposed for arabic social sentiment analysis of Twitter tweets using natural language processing (NLP) and 3 supervised machine learning approaches that are the Support Vector Machines (SVMs) and Naïve Bayes (NB) and K-Nearest Neighbor (Knn).
1. Introduction:
The evolution of the technology of web2.0 created a huge number of raw data by allowing the users to post about their Comments, reviews, and opinions on the world wide web,the amount of data is massively increasing staggeringly,and precisely with usage of Social media Application, such as Myspace,Tumbler,Pinterest,Instagram and Twittwer.Twitter is a very popular social platform that allow users to share their emotions and actions and make them involve in trendy discussions.to process and extract some knowledge from this huge amount of data can be a daunting task, there are many example of important information that can be extracted from the user's tweets such as events, services,trends,viral news,product reviews and their opinions on some issues,
2. Related Work
The finding of the user sentiment in tweets is a modern task in natural Language processing.and this process is getting a huge attention recently due to the increasing of social media applications and the amount of their users.few arabic sentiment datasets been gathered (Abdul-Mageed et al.,2014) he presented the SAMAR system that operate individuality and sentiment analysis for arabic social media.they collected the dataset from different domain such as wikipedia talkpages,twitter and arabic forums, (Aly and atiya,2013) presented LABR, a dataset based on books reviews extracted from GoodReads.(Rushdi-Saleh et al.2011)proposed an arabic corpus of reviews of more than 400 movies that was collected from various websites
3.Methodology:
There are many Sentiment Classification approaches, in Machine Learning we have: Bayesian Network, NaiveBayes Classification, Maximumentropy, NeuralNetworks, Support Vector Machine. and in case of using Lexicon Based approach we have: Dictionary Based approach, Novel Machine Learning Approach, Corpus based approach, ensembleapproaches, and each method has its own advantages and limitations, the advantages of using ML is the capability of adapting and making trained models for specified purposes and contexts, the limitations will be that it is not applicable for the new data, Lexicon method advantages will be coverage of wide term, and the limitations is limited number of words in the lexicon,in this paper NB and KNN and SVM will be used.