posted by organizer: iwan || 5180 views || tracked by 5 users: [display]

OSACT3 2018 : The 3rd Workshop on Open-Source Arabic Corpora and Processing Tools

FacebookTwitterLinkedInGoogle

Link: http://edinburghnlp.inf.ed.ac.uk/workshops/OSACT3/
 
When May 7, 2018 - May 12, 2018
Where Miyazaki (Japan)
Submission Deadline Jan 15, 2018
Notification Due Feb 15, 2018
Final Version Due Feb 25, 2018
Categories    NLP   computational linguistics   arabic nlp   natural language processing
 

Call For Papers

Given the success of the first and second workshops on Open-Source Arabic Corpora and Corpora Processing Tools (OSACT) in LREC 2014 and LREC 2016, the third workshop comes to encourage researchers and practitioners of Arabic language technologies, including computational linguistics (CL), natural language processing (NLP), and information retrieval (IR), to share and discuss their research efforts, corpora, and tools. The workshop will also give special attention on the wide variety of initiatives for the creation, use, and evaluation of Arabic as a type of Asian Language Resources and Technologies, which is one of LREC 2018 hot topics. In addition to the general topics of CL, NLP and IR, the workshop will give a special emphasis on a new Arabic Data challenge track.

Data Challenge Track
This year, we are introducing ArabicWeb16, a new Web dataset that is suitable for many research projects. ArabicWeb16 is a public Web crawl of 150M Arabic Web pages, crawled over the month of January 2016, with high coverage of dialectal Arabic (about 21%) as well as Modern Standard Arabic (MSA). One goal of the workshop is to define shared challenges using this dataset. We encourage submissions describing experiments for research tasks on the dataset. This includes (but not limited to) training word-embeddings, deduplication, cross-dialect search, question answering, dialect detection, knowledge-base population, entity search, blog search, text classification, and spam detection. Further details, including instructions on how to obtain the dataset, can be found here (https://sites.google.com/view/arabicweb16).

Topics of interest

*Corpora

Surveying and criticizing the design of available Arabic corpora, their associated and processing tools.
Availing new annotated corpora for NLP and IR applications such as named entity recognition, machine translation, sentiment analysis, text classification, and language learning.
Evaluating the use of crowdsourcing platforms for Arabic data annotation.

* Tools and Technologies

Language education e.g. L1 and L2.
Language modeling and word embeddings.
Tokenization, normalization, word segmentation, morphological analysis, part-of-speech tagging, etc.
Sentiment analysis, dialect identification, and text classification.
Dialect translation.

* ArabicWeb16 Data Challenge

Language modeling, Word embeddings.
Dialect detection, Cross-dialect search.
Entity search, Blog search, Deduplication, Spam detection.
Question answering, Knowledge-base population.
Text Classification.

Related Resources

Ei/Scopus-ITCC 2026   2026 6th International Conference on Information Technology and Cloud Computing (ITCC 2026)
Ei/Scopus-CEICE 2026   2026 3rd International Conference on Electrical, Information and Communication Engineering (CEICE 2026)
Ei/Scopus-SGGEA 2025   2025 2nd Asia Conference on Smart Grid, Green Energy and Applications (SGGEA 2025)
Ei/Scopus-CMLDS 2026   2026 3rd International Conference on Computing, Machine Learning and Data Science (CMLDS 2026)
IEEE-CTCNet 2025   2025 2nd Asia Pacific Conference on Computing Technologies, Communications and Networking (CTCNet 2025)
Ei/Scopus-DMNLP 2026   2026 3rd International Conference on Data Mining and Natural Language Processing (DMNLP 2026)
Ei/Scopus-CDIVP 2026   2026 6th International Conference on Digital Image and Video Processing (CDIVP 2026)
DEPLING 2023   International Conference on Dependency Linguistics
NLP4KGC 2025   4th NLP4KGC: Natural Language Processing for Knowledge Graph Construction
BUSTECH 2026   The Sixteenth International Conference on Business Intelligence and Technology