An Automatic Question Answering System For The Arabic Quran

Abstract

This thesis investigates Question answering system in general and then applying it in a specific domain: Arabic language and Holy Quran. A corpus of questions and answers from Albagarah and Alfatihah chapters was built, using three types of files: excel worksheet, texts and database table. This corpus was collected from many resources and validated by Islamic scholars from Gabrah college. The thesis contains six chapters: Chapter one defines the problem statement, objectives and motivation, and research methodology. Chapter two gives a general introduction and history to question answering systems, natural language processing, and corpus. Chapter three reviews related work in Arabic language processing, Holy Quran, Corpus, question answering systems. Chapter four covers special characteristics and challenges of processing Arabic language in the Holy Quran. Chapter five contains methodology, methods, and experiments. Six sets of experiments were done. The first one uses baseline NLP question answering system removing stopwords, diacritics, and special symbols. The second uses Lucene indexing and تفاعيل pattern. The third uses indexing and فياعيل، فعاعيل’ تفاعيل pattern. The fourth uses the dynamic corpus built from real user questions. The fifth uses Exaggeration formulas pattern. The sixth uses singular, dual and plural pattern. Finally there are results and discussion. The experiments showed that removing stop words and diacritics enhanced the search results; also the new patterns added more value to the question answering system and enhanced recall and precision.