Thursday, January 12, 2006

Profiling VOA's Special English II

80% of VOA vocab is consistently within the first 1000 words (K1), so the VOA "Special English" is truly simple. I profiled the last four months of economic-business articles from VOA [list of articles].

Can you write an article with the depth of the Economist with Simple English though?Concordancing could tell what techniques can be used to keep grammar and vocab simple. I'll have to rewrite an Economist article in "Special English".

I've been noticing that the profiling software does use a lemmatiser or stemmer (eliminate grammatical inflections), like any good search engine would, before it counts and lists unique words. For example:

airline_[1] airlines_[5]
campaign_[2] campaigned_[2] campaigns
cancel_[3] cancelled_[1] (not cancellation_[1] though, which is in the same word family)

It seems that the measure of unique words that a student has to deal with in a text (type-token ratio) should not depend on grammatical inflection. Should it depend on the part of speech? For example the word family:

employed_[6] employees_[8] employer_[1] employers_[5] employment_[3]

maps to the one unique counted word:


Will have to do a little bit of research.

