Frontiers in Linguistically Annotated Corpora 2006

A Merged Workshop with

7th International Workshop on Linguistically Interpreted Corpora (LINC-2006)


Frontiers in Corpus Annotation III

Coling/ACL 2006

Sydney Convention and Exhibition Centre

Sydney, Australia

July 22, 2006


Large linguistically interpreted corpora play an increasingly important role for machine learning, evaluation, psycholinguistics as well as theoretical linguistics. Many research groups are engaged in the creation of corpus resources annotated with morphological, syntactic, semantic, discourse and other linguistic information for a variety of languages. In the tradition of previous LINC ( and Frontiers ( workshops, we aim to bring together these activities in order to identify and disseminate best practice in the development and utilization of linguistically interpreted corpora.


The goals of the workshop are two-fold: (1) to exchange and propagate research results with respect to the annotation, conversion and exploitation of corpora taking into account different applications and theoretical investigations in the field of language technology and research; and (2) work towards a consensus on issues crucial to the advancement of the field of corpus annotation. In particular, we would like to focus on questions like:

  • How can a system developer take advantage of the multitude of annotation efforts with completely different underlying assumptions, annotation schemata, etc.?
  • How might one merge different annotation of the same data into one single unified representation?
  • How can closely related schemes be applied across languages?

Working Groups

There will be two invited "working group" presentations. Each working group will consist of a group of researchers with the expressed purpose of laying out the dimensions of some crucial problem facing the field of corpus annotation, particularly problems involving merging annotation and extending annotation to new languages, genres and modalities. There are currently two working groups:

  • Annotation Compatibility: A roadmap of the compatibility of current annotation schemes with each other.
  • Low-density Languages: A discussion of low density languages and the problems associated with them.

We will attempt to lay out clearly and precisely the assumptions on such topics held by members of the annotation community and in doing so, we hope to both: (1) lay the foundations for the meaningful integration of annotation resources; and (2) assess the limitations of integrated approaches. See here for progress of each of the working groups.

Student Award

Václav Novák was chosen as the recipient of the Innovative Student Annotation Award. Congratulations, Václav!

Target Audience

Those interested in creating and using existing and future annotated corpora. This includes annotators, lexicographers, system developers and those designing NLP system evaluation tasks for the NLP community.


09.00 - 09.10 Opening remarks
09.10 - 09.30 Challenges for annotating images for sense disambiguation
Cecilia Ovesdotter Alm, Nicolas Loeff and David A. Forsyth  [SLIDES]
09.30 - 10.00 A Semi-Automatic Method for Annotating a Biomedical Proposition Bank
Wen-Chi Chou, Richard Tzong-Han Tsai, Ying-Shan Su, Wei Ku, Ting-Yi Sung and Wen-Lian Hsu
10.00 - 10.30 How and Where do People Fail with Time: Temporal Reference Mapping Annotation by Chinese and English Bilinguals
Yang Ye and Steven Abney
10.30 - 11.00 Coffee Break
11.00 - 11.30 Probing the space of grammatical variation: induction of cross-lingual grammatical constraints from treebanks
Felice Dell'Orletta, Alessandro Lenci, Simonetta Montemagni and Vito Pirrelli
11.30 - 12.00 Low-density Languages Working Group presentation  [SLIDES]
12.00 - 12.30 Annotation Compatibility Working Group presentation  [SLIDES]
12.30 - 14.00 Lunch
14.00 - 14.30 Manual Annotation of Opinion Categories in Meetings
Swapna Somasundaran, Janyce Wiebe, Paul Hoffmann and Diane Litman  [SLIDES]
14.30 - 15.00 The Hinoki Sensebank — A Large-Scale Word Sense Tagged Corpus of Japanese —
Takaaki Tanaka, Francis Bond and Sanae Fujita  [SLIDES]
15.00 - 15.30 Issues in Synchronizing the English Treebank and PropBank
Olga Babko-Malaya, Ann Bies, Ann Taylor, Szuting Yi, Martha Palmer, Mitch Marcus, Seth Kulick and Libin Shen
15.30 - 16.00 Coffee Break
16.00 - 16.30 On Distance between Deep Syntax and Semantic Representation
Václav Novák
16.30 - 17.30 Discussion
17.30 - 17.40 Closing Remarks
Alternate Papers
Corpus annotation by generation
Elke Teich, John A. Bateman and Richard Eckart
Constructing an English Valency Lexicon
Jiri Semecky and Silvie Cinkova


All papers will be presented in English

Workshop Chairs

Adam Meyers
New York University, USA

Shigeko Nariyama
University of Melbourne, Australia

Timothy Baldwin
University of Melbourne, Australia

Francis Bond
NTT Communication Science Laboratories, Japan

Programme Committee

Lars Ahrenberg (Linköpings Universitet)
Kathy Baker (U.S. Dept. of Defense)
Steven Bird (University of Melbourne)
Alex Chengyu Fang (City University Hong Kong)
David Farwell (Computing Research Laboratory, New Mexico State University)
Chuck Fillmore (International Computer Science Institute, Berkeley)
Anette Frank (DFKI)
John Fry (SRI International)
Eva Hajicova (Center for Computational Linguistics, Charles University, Prague)
Erhard W. Hinrichs (University of Tübingen)
Ed Hovy (International Sciences Institute)
Baden Hughes (University of Melbourne)
Emi Izumi (NICT)
Tsai Jia-Lin (Tung Nan Institute of Technology)
Aravind Joshi (University of Pennsylvania, Philadelphia)
Sergei Nirenburg (University of Maryland, Baltimore County)
Stephan Oepen (University of Oslo)
Boyan A. Onyshkevych (U.S. Dept. of Defense)
Kyonghee Paik (KLI)
Martha Palmer (University of Colorado)
Gerald Penn (University of Toronto)
Manfred Pinkal (DFKI)
Massimo Poesio (University of Essex)
James Pustejovsky (Brandeis University)
Owen Rambow (Columbia University)
Peter Rossen Skadhauge (Copenhagen Business School)
Beth Sundheim (SPAWAR Systems Center)
Janyce Wiebe (University of Pittsburgh)
Nianwen Xue (University of Pennsylvania)
