| |||||||||||||||
OSACT 2016 : The 2nd Workshop on Arabic Corpora and Processing Tools (2016 Theme: Social Media) | |||||||||||||||
Link: http://www.kacstac.org.sa/osact2/index.html | |||||||||||||||
| |||||||||||||||
Call For Papers | |||||||||||||||
In the NLP and CL communities, Arabic is considered to be relatively resource poor compared to English. This situation was thought to be the reason for the limited number of corpus based studies in Arabic. However, the last years witnessed the emergence of new considerably free Modern Standard Arabic (MSA) corpora and to a lesser extent Arabic processing tools. Over the past few years, the use of Arabic in social media has increased dramatically, leading to an abundance of Arabic content that is either formal or informal, MSA or dialectal, and Arabic script or Arabizi. Other phenomena include the use of emoticons, abbreviated words, decorations, etc. Despite the abundance of such content, there is a severe shortage of annotated corpora and processing tools that are tailored for such content.
Available Arabic corpora can be divided into two groups. The first group contains large Arabic texts, which are designed and constructed basically for Arabic linguistic and NLP research activities, and can be useful for a variety of tasks such as language modeling. These corpora are diverse in the genres they cover and their sizes range from one million words to billions of words. The second group contains corpora that were designed basically for Arabic specific NLP tasks such as text classification, clustering, POS tagging, etc., and they typically contain annotations at clitic, word, sentence, paragraph, or document level. Most of the currently available corpora in this group are composed of newspaper articles, and range in size between tens of thousands of words to millions of words. Annotated corpora that are derived from social media continues to be limited, and corpus processing tools for such corpora is lacking. Some of the required tools include corpus exploration tools that provide word/stem frequencies, concordances, collocations, etc. and processing tools such as tokenization, normalization, word segmentation, morphological analysis, and part-of-speech tagging. Having proper exploration and processing tools can open the door for a variety of applications such as machine translation, opinion mining, text classification, and a variety of social applications. Topics of interest This half-day-workshop aims to encourage the researchers and developers to foster the utilization of freely available Arabic corpora, including social media corpora, and open source Arabic language processing tools and help in highlighting the drawbacks of these resources and discuss techniques and approaches on how to improve them. The workshop topics include but not limited to: Corpora: Surveying and criticizing the design of freely available Arabic corpora, their associated tools and stand alone Arabic corpora processing tools. Availing new annotated corpora for NLP applications such as named entity recognition, machine translation, part-of-speech tagging, sentiment analysis, text classification, and language learning. Evaluating the use of crowdsourcing platforms (ex. Mechanical Turk, Crowdflower) for Arabic data annotation. Tools and Technologies: Language education e.g. L1 and L2. Language modeling and word embeddings. Tokenization, normalization, word segmentation, morphological analysis, part-of-speech tagging, parsing, diacritization Sentiment analysis, dialect identification, and text classification Dialect translation Social Applications: Trend analysis and opinion mining Measuring polarization and opinion shift Religious and ideological discourse Important Dates Submission deadline: 10 February 2016 Notification of acceptance: 10 March 2016 Final submission of manuscripts: 21 March 2016 Workshop date: Tuesday, 24 May 2016 (Morning session) Submissions The language of the workshop is English and submissions should be with respect to LREC 2016 paper submission instructions. All papers will be peer reviewed possibly by three independent referees. Papers must be submitted electronically in PDF format to the STAR system. When submitting a paper from the START page, authors will be asked to provide essential information about resources (in a broad sense, i.e. also technologies, standards, evaluation kits, etc.) that have been used for the work described in the paper or are a new result of your research. Moreover, ELRA encourages all LREC authors to share the described LRs (data, tools, services, etc.), to enable their reuse, replicability of experiments, including evaluation ones, etc. Distinguished papers, after further revisions, will be considered for publication in special issue of Journal of King Saud University - Computer and Information Sciences: http://ees.elsevier.com/jksu-cis/default.asp Note: The LREC Proceedings have been accepted for inclusion in the Thomson Reuters Conference Proceedings Citation Index (CPCI). Organising Committee Hend Al-Khalifa, King Saud University, KSA Abdulmohsen Al-Thubaity, King Abdul Aziz City for Science and Technology, KSA Walid Magdy, Qatar Computing Research Institute, Qatar Kareem Darwish, Qatar Computing Research Institute, Qatar Program Committee Abdulrhman Almuhareb, KACST, KSA Abdullah Alfaifi, Imam University, KSA Abeer ALDayel, King Saud University, KSA Areeb AlOwisheq, Imam University, KSA Auhood Alfaries, King Saud University, KSA Hamdy Mubarak, Qatar Computing Research Institute, Qatar Hazem Hajj, American University of Beirut, Lebanon Hind Al-Otaibi, King Saud University, KSA Houda Bouamor, Carnegie Mellon University, Qatar Kemal Oflazer, Carnegie Mellon University, Qatar Khurshid Ahmad, Trinity College Dublin, Ireland Maha Alrabiah, Imam University, KSA Mohammad Alkanhal, KACST, KSA Mohsen Rashwan, Cairo University, Egypt Mona Diab, George Washington University, US Muhammad M. Abdul-Mageed, Indiana University, US Nizar Habash, New York University Abu Dhabi, UAE Nora Al-Twairesh, King Saud University, KSA Nouf Al-Shenaifi, King Saud University, KSA Stephan Vogel, Qatar Computing Research Institute, Qatar Tamer Elsayed, Qatar University, Qatar Wajdi Zaghouani, Carnegie Mellon University in Qatar, Qatar |
|