| |
|
|
Associate Professor and ARC Future Fellow
Dept of Computing and Information Systems
The University of Melbourne
Victoria 3010, Australia
| Office: | Room 8.21, Level 8, Doug McDonell Building |
| Tel: | +61 3 8344 1363 |
| Fax: | +61 3 9349 4596 |
| Email: |
 |
|
|
Prospective research students/interns, please read this before contacting
me.
Major Projects
Present
- Information access through web-scale question-answer pair
finding, ranking and matching (ARC Future Fellowship,
2013—2016)
- Principles, Practice, and Pragmatics of Measurement in Experimental
Computer Science (ARC Discovery Project, 2011–2013)
- Talking about Place — Tapping Human Knowledge to Enrich
National Spatial Data Sets (ARC Linkage Project,
2011–2014)
- NICTA
biomedical text mining (with Cavedon, Zobel, Moffat et al.)
Past
- OLE (ARC Discovery Project, with Bird: Online Linguistic Exploration: Deeper,
Faster, Broader Language Documentation, 2009–2011)
- Kubadji (ARC Discovery Project, with
Zukerman, Sonenberg, Balbo and Bird: Personalised Content Delivery
for Assisted Navigation of Information Rich, Physical Environments such
as a Museum, 2007–2010)
- Web-scale Language Identification: All Languages Great and Small
(Google Research Award, 2008–2009)
- Multilingual Unsupervised Parse Selection (Microsoft Research Asia Research
Award, 2009–2010)
- Web User Forum Text Analysis (Microsoft Research Asia Research
Award, 2008–2009)
- Information Delivery from Segmented Textual Data Streams (ARC
Discovery Project: 2006–2008)
- Scalable Language Understanding for Japanese (joint research project
with NTT Communication Science Labs., 2006–2008)
- Interactive Information Discovery and Delivery (NICTA project, with
Cavedon, Stokes, Bird, Moffat, et al., 2005–2007)
- An Intelligent Search Infrastructure for Language Resources on the
Web (ARC e-Research Special Research Initiative, with Bird and Hughes, 2006)
- Feature-rich Word Sense Disambiguation and Unknown Word
Bootstrapping (joint research project with NTT Communication Science
Labs., 2004–2006)
Publications
See my publications page for a
reasonably up-to-date list of my papers (with links to most of my
papers). My
Google Scholar profile is also a reasonably accurate snapshot of my
publication output.
Resources
Online systems
- FOKS: an intelligent dictionary
interface for Japanese, intended to help learners of Japanese look up
unknown words without having to dust off their kanji dictionary
[developed in collaboration with Lars Yencken and Slaven Bilac]
- Kanji Tester: an adaptive
Japanese learning environment, specifically targeted at those swatting
for the JLPT 3 and 4 exams [developed in collaboration with Lars Yencken]
- SimSearch: a visual kanji
search interface, based on similarity with known kanji [developed in
collaboration with Lars Yencken]
- ILIAD Linux Search:
Troubleshoot Linux problems via our in-house search engine
Software
- On-line
Topic Modeller: implementation of an on-line topic modeller for trend
analysis [developed in collaboration with Jey Han Lau]
- langid.py:
fast, accurate standalone language identification toolkit [developed in
collaboration with Marco Lui]
- SiteScraper:
automatically scrapes data from websites based on a handful of sample
URLs and strings of interest [developed in collaboration with Richard
Penman and David Martinez]
- Hydrat:
Python library for text categorisation/language identification
[developed in collaboration with Marco Lui]
- Malay tokeniser/lemmatiser:
lex/perl tools for tokenising and lemmatising Malay text
Datasets
- Lexical
normalisation dictionary (described in Han et al., 2012)
- Multi-domain
language identification dataset (from Lui and Baldwin, 2011)
- Topic
label dataset (described in Lau et al., 2011)
- Lexical
normalisation dataset (described in Han and Baldwin, 2011)
- Multilingual
language identification dataset (as used in the ALTA-2010
Shared Task, and described in Baldwin
and Lui, 2010)
- Web
user forum thread and post structure dataset (described in Kim et al., 2010 and
Wang
et al., 2010)
- Topic
coherence topics and human judgements (described in Newman et al., 2012)
- Language
identification dataset (described in Baldwin and Lui, 2010)
- Case
and punctuation restoration dataset (described in Baldwin and Joseph, 2009)
- Satire
document collection (described in Burfoot and
Baldwin, 2009)
- Tagalog
predicate-argument parsing dataset (described in Mistica and
Baldwin, 2009)
- Pooled kanji
similarity dataset (described in Yencken and Baldwin,
2008)
- Noun-noun
compound semantic relations (described in Kim and
Baldwin, 2008)
- Compound
nominalisation interpretation (described in Nicholson
and Baldwin, 2008)
- Deep
lexical acquisition of English verb-particle constructions (described
in Baldwin,
2008)
- Parsing and WSD dataset (described in Agirre et
al., 2008) — email me for access details
- Kanji
similarity dataset (described in Yencken
and Baldwin, 2006)
- Japanese
grapheme-phoneme alignment data (described in Baldwin and
Tanaka, 1999)
Miscellaneous
Teaching
Present
- COMP10001 Foundations of Computing (Semester 1, 2013)
Past
Staff
Present
- Paul Cook (McKenzie Postdoctoral Fellow)
- Yvette Graham (Research Fellow)
Past
- Rebecca Dridan (Research Fellow working on OLE 2009—2011)
- GintarÄ— GrigonytÄ— (Visiting Research Fellow 2011)
- Su Nam Kim
(Research Fellow working on LangID and ILIAD 2009—2010)
- Patrick Ye (Research Fellow working on Kubadji 2009—2010)
- David Martinez (Research Fellow working on ILIAD 2007—2009)
- Marco Lui (Research Assistant working on ILIAD and LangID 2009—2010)
- Richard Penman (Research Assistant working on ILIAD 2008—2009)
- Shlomo Berkovsky
(Research Fellow 2007—2008)
- Kapil Gupta (Research Fellow 2009)
Students
Present
- Jim Breen (PhD
student; co-supervised with Francis Bond)
- Clint Burford (PhD student; co-supervised with Steven Bird)
- Andrew Chester (MSc(CS) student; co-supervised with Tony Wirth)
- Richard Fothergill (PhD student)
- Spandana Gella (MSc(CS) student; co-supervised with Paul Cook)
- Bo Han (PhD student; co-supervised with Paul Cook)
- Jey Han Lau (PhD student; co-supervised with Dave Newman)
- Ned Letcher (MPhil student; co-supervised with Emily Bender)
- Marco Lui (PhD student)
- Meladel Mistica (PhD student; external supervisor)
- Michael Niemann
(PhD student; external supervisor)
- Bahar Salehi (MPhil student; co-supervised with Paul Cook and Su
Nam Kim)
- Li Wang (PhD student; co-supervised with Su Nam Kim)
- Willy Yap (PhD student; co-supervised with Tara McIntosh)
Past
- Jared Willett (MSc(CS) student; co-supervised with David Martinez
and Angus Webb)
- Luke Parkinson (MSc(CS) student; co-supervised with Paul Cook)
- Matej Korvas (MSc(CS) student)
- Igor Tytyk (MSc(CS) student)
- Andrew MacKinlay (completed PhD 2012)
- Karl Grieser (completed PhD 2012)
- Ned Letcher (completed BSc(Hons) 2010)
- Lars Yencken
(completed PhD 2010)
- Marco Lui (completed BCS(Hons) 2009)
- Ben White (completed MIT 2009)
- Li Wang (completed MIT 2009)
- Patrick Ye
(completed PhD 2009)
- Meladel Mistica (completed PGDip 2008)
- Lejoe Kuriakose (completed MEDC 2008)
- Paul Joseph (completed MSSE 2008)
- Su Nam Kim (submitted PhD 2008)
- Michael Yang (completed BCS(Hons) 2007)
- Sumukh Ghodke(completed MSSE 2007)
- Phil Blunsom (completed PhD 2007)
- Edward Ivanovic (completed Masters 2007)
- Aidan Furlan (completed BCS(Hons) 2006)
- Karl Grieser (completed BSc(Hons) 2006)
- Rebecca Dridan (completed Masters 2006)
- Jeremy Nicholson (completed BCS(Hons) 2005)
- Andrew MacKinlay (completed BSc(Hons) 2005)
Interested in pursuing language technology research at the University
of Melbourne? Contact me directly, making sure to include a CV and
description of your research interests.
UniMelb Administration
Past
- Deputy Head of Department (2011—2012)
- Teaching Committee chair (2010—2012)
- School of Engineering Education Committee (2010—2012)
- CIS Faculty of Science liaison (Science APC/PGPC: 2005—2011)
- CSSE Research Committee (2008—2011)
- CSSE Postgraduate Coursework Programmes Committee chair
(2007-2009)
- School of Engineering IT Advisory Group (2009)
- Honours/PGDip Coordinator (2005-2007)
- Publications Liaison (2005-2007)
- Member of Science Tools working group, Faculty of Science (2007)
Professional Research Activities
Present
- Founding co-editor of CSLI Publications Series on Japanese
Computational Linguistics
- Editorial board of Language Resources and Evaluation (2010—)
- Editorial board of Transactions of the Association for
Computational Linguistics (2012—)
- Executive member of the Australasian Language Technology Association
(2011—2012)
- Information Officer for ACL
SIGLEX (Special Interest Group on the Lexicon) (2010—)
- Programme co-chair for *SEM 2013
- Programme co-chair for EMNLP 2013
- Programme committee for ACL 2013
Past (highlights)
Random Miscellania
In the news:
In a moment of weakness, I signed up for LinkedIn.
For the trivia lovers, here is my (almost certainly outdated) full CV.