Welcome to cendipede.org! This is intended to become the primary distribution point for the program CenDiPede, which is an implementation of the Corpus-Derived Profiles framework. The framework and the program are both part of my doctoral project in linguistics at Boston University. There are some slide shows and screencast demos below. If you find this information interesting, feel free to link to this site or send the link to others. If you have any questions, please do not hesitate to contact me.

     — Gregory Garretson

Status (February 2010)

Both the CDP framework and the program CenDiPede are under development. When CenDiPede has reached public beta stage, it will become possible to download the program from right here. This is expected to happen at some point in 2010. If you're dying to try it before then, feel free to contact me. Otherwise, please check back!

Brief FAQ

What is the CDP framework?
The Corpus-Derived Profiles (CDP) framework is a conceptual framework for studying syntagmatic word relations in corpora (large text databases). The underlying idea is that the meaning of a word can really only be understood when we look at how it is used together with other words. The more specific idea is that every word has a set of behaviors with regard to its co-occurrence with other words in the text; information about these behaviors is collected in a lexical profile, or CDP. Every given word (type, not token, that is) has exactly one profile with respect to a given corpus. In practice, these profiles are generated by a computer program.
What is CenDiPede?
CenDiPede is an implementation of the CDP framework that I have written in Java. The framework is designed to be implemented computationally, and CenDiPede is the first implementation of the framework (though if you want to write another, feel free; the framework is described sufficiently in my thesis that someone else could implement it in another language, such as Python). CenDiPede, when it is finished, will be made freely available (both the binaries and the source code) on this website.

Information on CenDiPede and the CDP Framework

At present, there are three resources online that describe the framework and the software:

Slide show: 30-slide presentation on Corpus-Derived Profiles
These are the slides from a presentation given at the ICAME conference in Lancaster, England in May 2009. It gives an overview of both the framework and the software, and then goes into some depth on the question of how to define collocation in corpus studies. Here is the complete reference:

Garretson, Gregory. 2009. Introducing Corpus-derived profiles: A framework for analysing word relations. Conference presentation given at ICAME 30: the 30th conference of the International Computer Archive of Modern and Medieval English, Lancaster, England, May 27-31, 2009.

Slide show: 40-slide presentation on Corpus-Derived Profiles
These are the slides from a similar presentation given at Stockholm University in March 2009. It has a lot more information about the theoretical background and slightly more information about the CDP framework. It also spends more time on the question of collocation, though some categories that were developed later are absent from the discussion. Here is the complete reference:

Garretson, Gregory. 2009. Corpus-Derived Profiles: A new framework for the analysis of word relations in corpora. Seminar given in series Semantics seminars, Stockholm University, March 31, 2009.

Online Demo: A screencast in 4 parts
This is an online demo that was made in March, 2009. The software I used to make the screencast allows only five-minute recordings, and this demo is just under 20 minutes, so I made a four-part demo:
Demo 1, part 1
Demo 1, part 2
Demo 1, part 3
Demo 1, part 4