CenDiPede
An implementation of the Corpus-Derived Profiles framework in Java
Welcome to cendipede.org! This is intended to become the primary distribution point for the
program CenDiPede, which is an implementation of the Corpus-Derived Profiles framework. The
framework and the program are both part of my doctoral project in linguistics at Boston
University. There are some slide shows and screencast demos below. If you find this
information interesting, feel free to link to this site or send the link to others. If you
have any questions, please do not hesitate to contact me.
— Gregory Garretson
Status (February 2010)
Both the CDP framework and the program CenDiPede are under development. When CenDiPede
has reached public beta stage, it will become possible to download the program from right
here. This is expected to happen at some point in 2010. If you're dying to try it
before then, feel free to contact me. Otherwise, please check back!
Brief FAQ
- What is the CDP framework?
- The Corpus-Derived Profiles (CDP) framework is a conceptual framework for studying
syntagmatic word relations in corpora (large text databases). The underlying idea is that
the meaning of a word can really only be understood when we look at how it is used
together with other words. The more specific idea is that every word has a set of
behaviors with regard to its co-occurrence with other words in the text; information about
these behaviors is collected in a lexical profile, or CDP. Every given word (type, not
token, that is) has exactly one profile with respect to a given corpus. In practice, these
profiles are generated by a computer program.
- What is CenDiPede?
- CenDiPede is an implementation of the CDP framework that I have written in Java. The
framework is designed to be implemented computationally, and CenDiPede is the first
implementation of the framework (though if you want to write another, feel free; the
framework is described sufficiently in my thesis that someone else could implement it in
another language, such as Python). CenDiPede, when it is finished, will be made freely
available (both the binaries and the source code) on this website.
Information on CenDiPede and the CDP Framework
At present, there are three resources online that describe the framework and the
software:
- Slide show: 30-slide presentation on
Corpus-Derived Profiles
- These are the slides from a presentation given at the ICAME conference in Lancaster,
England in May 2009. It gives an overview of both the framework and the software, and then
goes into some depth on the question of how to define collocation in corpus studies.
Here is the complete reference:
Garretson, Gregory. 2009. Introducing Corpus-derived profiles: A
framework for analysing word relations. Conference presentation given at ICAME 30: the
30th conference of the International Computer Archive of Modern and Medieval
English, Lancaster, England, May 27-31, 2009.
- Slide show: 40-slide presentation on
Corpus-Derived Profiles
- These are the slides from a similar presentation given at Stockholm University in
March 2009. It has a lot more information about the theoretical background and slightly
more information about the CDP framework. It also spends more time on the question of
collocation, though some categories that were developed later are absent from the
discussion. Here is the complete reference:
Garretson, Gregory. 2009. Corpus-Derived Profiles: A new
framework for the analysis of word relations in corpora. Seminar given in series
Semantics seminars, Stockholm University, March 31, 2009.
- Online Demo: A screencast in 4 parts
- This is an online demo that was made in March, 2009. The software I used to make the
screencast allows only five-minute recordings, and this demo is just under 20 minutes, so
I made a four-part demo:
-
Demo 1, part 1
Demo 1, part 2
Demo 1, part 3
Demo 1, part 4
Back to garretson.info

This site © 2009 Gregory Garretson