| |||||||||||||||
ScaNLP 2013 : Workshop on Scalability in Natural Language Processing | |||||||||||||||
Link: https://sites.google.com/site/scanlp2013/ | |||||||||||||||
| |||||||||||||||
Call For Papers | |||||||||||||||
****************************************************************
First Call for Papers Workshop on Scalability in Natural Language Processing https://sites.google.com/site/scanlp2013/ Full-day workshop in conjunction with RANLP 2013 Deadline: 3 July 2013, 23:59 Hawaii Time **************************************************************** This workshop, held in conjunction with RANLP 2013, aims to introduce contemporary work and to discuss novel methods for natural language processing at a large scale, and explore how the resulting technology and methods can be reused in applications both on the Web and in the physical world. DESCRIPTION For a processing approach to be scalable, it should be to take on large volumes of data; it can work through them at high speed; and it can smoothly adapt to changes in these needs. We discuss this in the context of NLP, with particular focus on the core tasks of resource creation, discourse processing, and evaluation. Now is a particularly important time to develop scalable methods in our field. Big data is here and the benefits of effectively getting through it remain to be harvested by the pioneers. Huge datasets are becoming available: Google Books contains 155 billion tokens, over which only shallow surveys have been conducted; the new Common Crawl web corpus contains over 60 terabytes of text and metadata. But size alone is not a driver for scalable methods - the rapid text content creation we see every day presents masses of data we are not yet equipped to handle. For example, Twitter alone is responsible for 500 million microtexts every day; the publicly-visible Wordpress.org holds a part of the 2 million blog documents we create every 24 hours. As well as big text data becoming prolific, demand for this data is also high. The fast, un-curated nature of microtext has been shown to be of value in stock valuation by multiple researchers. User location and movement analysis enables powerful search and analysis modes, such as computational journalism and powerful personalisation. Sentiment detection informs corporations, governance and political activities. Media monitoring requires extracting and co-referring entities and events from thousands of outlets in real time. And finally, the emerging field of deep learning places but one core demand in all its guises: large amounts of data. All these applications' pressures create a demand for NLP that can be done quickly and broadly. There is more demand than ever for scalable natural language processing. Many organisations are interested in the potential results as big data becomes better defined and data-intensive approaches to computational linguistics reach production-level performance. Enormous quantities of data, from user input to news archives, are being mined using more powerful and computationally demanding techniques. The organisation, variety, integrity and public availability of the resulting resources will have a major impact on how we continue to do science. Newly introduced data-intensive approaches to computational linguistics continue thrive on input volume; we need scalable technology to handle the next order of magnitude in corpus sizes and, given the nature of language, to continue data-intensive advances in our field. ============================================================================ TOPICS OF INTEREST With regard to Scalable NLP, we aim to encourage discussion regarding three key areas of natural language processing: resource creation; processing of discourse; and evaluation: -- General scalability issues -- Application approaches -- Performance limits -- Flexible resource creation -- Parallelising annotation -- Handling huge corpora -- Crowdsourcing for corpus creation -- Decomposing resource creation tasks -- Rapid or realtime annotation quality assessment -- Running NLP in the cloud -- Privacy issues -- NLP application optimisation / parallelisation -- Scalable machine learning for NLP -- High performance computing for NLP -- Rapid evaluation -- On-line learning for NLP -- Reinforcement learning -- Iterative and ensemble learning -- Hypothesis generation In addition to the invited talk and presentations, the worskhop will include a 30-minute hands-on demonstration slot with participants doing NLP in the cloud using GATECloud, possibly including social media processing using GATE TwitIE (supported and funded by the organisers). ============================================================================ IMPORTANT DATES Submission deadline: 5 July 2013 Notification of acceptance: 2 August 2013 Camera-ready copies due: 16 August 2013 Workshop date: 12/13 September 2013 ============================================================================ SUBMISSION Submission is via EasyChair: https://www.easychair.org/conferences/?conf=scanlp2013 All submissions must be in PDF format and must follow the RANLP template (http://lml.bas.bg/ranlp2013/submissions.php#styles) Multiple submission policy: We welcome papers that are under review for other venues, but, in the event of multiple acceptances, authors are requested to notify us and choose which meeting to present and publish the work at as soon as possible - we cannot accept for publication or presentation work that will be (or has been) published elsewhere. Reviewing: Reviewing will be blind. No information identifying the authors should be in the paper: this includes not only the authors' names and affiliations, but also self-references that reveal authors' identities; for example, "We have previously shown (Smith 1999)" should be changed to "Smith (1999) has previously shown". Paper length and presentation: We invite long (8) and short (4) papers. Accepted short papers will be presented either as short oral presentations or as posters. ============================================================================ ORGANIZERS Leon Derczynski, University of Sheffield, UK Kalina Bontcheva, University of Sheffield, UK Bin Yang, Aarhus University, Denmark Valentin Tablan, University of Sheffield, UK Arno Scharl, MODUL University Vienna, Austria Thierry Declerck, DFKI, Germany ============================================================================ PROGRAMME COMMITTEE: Galia Angelova, Bulgarian Academy of Sciences, Bulgaria Srikanta Bedathur, Indraprastha Institute of Information Technology, India Kai-wei Chang, University of Illinois Urbana-Champaign, USA Freddy Chong-Tat Chua, Singapore Management University, Singapore Hamish Cunningham, University of Sheffield, UK David Martins de Matos, L2F INESC ID, Portugal Ted Dunning, MapR Technologies, USA Chris Dyer, Carnegie Mellon University, USA Rainer Gemulla, Max Planck Institut für Informatik, Germany Amit Goyal, University of Maryland, USA Christian S. Jensen, Aarhus University, Denmark Vinh Ngoc Khuc, Ohio State University, USA Oleksandr Kolomiyets, KU Leuven, Belgium Hector Llorens, Nuance, Spain Barry Norton, Ontotext, UK Miles Osborne, University of Edinburgh, UK Weining Qian, East China Normal University, China Alan Ritter, University of Washington, USA Matthew Rowe, Lancaster University, UK Marta Sabou, MODUL University Vienna, Austria Sina Samangooei, University of Southampton, UK Sebastian Schelter, TU Berlin / Apache Software Foundation, Germany Darius Sidlauskas, Aarhus University, Denmark Marc Spaniol, Max Planck Institut für Informatik, Germany Andreas Vlachos, University of Cambridge, UK ============================================================================ SUPPORT The ScaNLP workshop is partially supported by GATE, the EU FP7 projects TrendMiner (http://www.trendminer-project.eu/) and AnnoMarket (https://annomarket.eu/), and the CHIST-ERA uComp (http://http://www.ucomp.eu/) project. |
|