Stephen Grimes, Ph.D.


I am a lead software application developer at the Linguistic Data Consortium at the University of Pennsylvania in Philadelphia.

all smiles



At the Linguistic Data Consortium, I direct efforts to create software in support of linguistic corpora annotation and production. Our language data are used by our partners at tech companies and research universities as training data for machine learning systems that support technologies such as automatic speech recognition, machine translation, language and speaker identification, character and handwriting recognition, and many other language technology application.

I am the lead engineer on the MADCAT project, an Arabic handwriting recognition project. I oversee all word alignment corpora at LDC: primarily Arabic-English and Chinese-English data in support of the BOLT and GALE programs. I also work on the DEFT project (Deep Exploration and Filtering of Text) where we build sample data for named entity recognition and information extraction projects and evaluations. I also periodically contribute to the TAC-KBP (Text Analysis Conference-Knowledge Base Population) project. Finally, I oversee delivery of all data from LDC and coordinate a team that works to validate our data to ensure it is error-free.


Page loads since 2005: counter ( magyar változat )