Sunday, January 08, 2006

Simple English Vocabulary Profile

Finally tested the Vocabulary profiler on one of Wikipedia's Simple English pages. The first one that I could find that was relatively complete was the one on human rights.

Most of the simple words used on the page were caught by the profiler's lists. The words that weren't caught were either reasonable new words you'd have to master to read about human rights, have something to do with philosophy, or are proper nouns or mistakes:

"abuse_[1] abuses_[5] american_[1] asylum_[1]
biologists_[1] condemn_[1] covenant_[2] covenants_[2]
disability_[1] enlightenment_[1] etc_[1] european_[2]
france_[2] french_[1] georg_[1] hegel_[1] innocent_[1]
jail_[1] john_[2] locke_[1] nationality_[1] numbersome_[1]
organism_[1] organisms_[2] protest_[1] rightsbecause_[1]
scriptures_[1] stuart_[1]"

One thing the page does lack is any concrete examples of human rights abuses which seems pretty important because it's the very non-abstract cruelty of these acts that make them so reprehensible.

Just for comparison, here is the main non-simplified Wikipedia page for Human Rights and here are the profile stats on which lists the words are caught by:

Simple: (K1:88%,K2:4%,AWL:4%, Off-List:4%)
Not Simple:(K1:77%,K2:4%,AWL:11%,Off-List:8%)

Not really very different! At least as far as the vocab is concerned.

If the "Simple English" in WIkipedia is not much more simple than the authentic real-life English in the main Wikipedia should we really be investing time time with these articles, or maybe we have rethink exactly what we mean by "Simple English" and then measure and control simplicity.

No comments: