A Comparative Study for Two Stemming Algorithms for Arabic Wikipedia Documents Classification Based on Similarity Measures

Abstract

Text mining is an important field in information retrieval; it organize a large number of text

documents that available on the internet to facilitate the retrieved processing and increase

efficiency. Text classification is automatically determining the category to new or unseen

documents that depends on content of document itself. In text classification, text

preprocessing is a fundamental step to obtained a better result. The Arabic text processing

depends on stemming algorithms to achieve high accuracy. This research aims to compare

between two stemming algorithms stem approach (snowball light) and root approach

(Shereen Khoja) using three similarity measures: Euclidean distance, cosine similarity, and

pearson correlation distance. This research use Arabic Wikipedia dataset and TF-IDF as

weight scheme to construct the vector space model to represent the weight of selected

features of text. For evaluation measures, the research applies overall accuracy, average

recall, average precision, and average F1 measure to assess the results of the classified text

documents. The collection of document is divided into training and test documents

according to three experimental (85% – 15%) (80% – 20%) (90% – 10%) for training and

test document respectively. The results showed the overall accuracy of Shereen Khoja

stemmer is better than Snowball stemmer in all experimental excluding cosine similarity in

the first experimental and Euclidean distance in the third experimental which has a better

accuracy when use Snowball stemmer.

Subscribe to access this work and thousands more
Overall Rating

0

5 Star
(0)
4 Star
(0)
3 Star
(0)
2 Star
(0)
1 Star
(0)
APA

Ali, M (2021). A Comparative Study for Two Stemming Algorithms for Arabic Wikipedia Documents Classification Based on Similarity Measures. Afribary. Retrieved from https://afribary.com/works/a-comparative-study-for-two-stemming-algorithms-for-arabic-wikipedia-documents-classification-based-on-similarity-measures

MLA 8th

Ali, Mohamed "A Comparative Study for Two Stemming Algorithms for Arabic Wikipedia Documents Classification Based on Similarity Measures" Afribary. Afribary, 19 May. 2021, https://afribary.com/works/a-comparative-study-for-two-stemming-algorithms-for-arabic-wikipedia-documents-classification-based-on-similarity-measures. Accessed 02 May. 2024.

MLA7

Ali, Mohamed . "A Comparative Study for Two Stemming Algorithms for Arabic Wikipedia Documents Classification Based on Similarity Measures". Afribary, Afribary, 19 May. 2021. Web. 02 May. 2024. < https://afribary.com/works/a-comparative-study-for-two-stemming-algorithms-for-arabic-wikipedia-documents-classification-based-on-similarity-measures >.

Chicago

Ali, Mohamed . "A Comparative Study for Two Stemming Algorithms for Arabic Wikipedia Documents Classification Based on Similarity Measures" Afribary (2021). Accessed May 02, 2024. https://afribary.com/works/a-comparative-study-for-two-stemming-algorithms-for-arabic-wikipedia-documents-classification-based-on-similarity-measures