Sunday, January 29, 2006

My Economics EAP Reading
Conference Paper

A conference paper I presented at the international TESOL conference at Mae Fah Luang University in Chiangrai, Thailand 2005.

Instead of just looking for simple word collocations in a corpora of economics texts, the paper advocates looking for semantic-syntactic patterns like TREND-CAUSES-TREND. Furthermore, the paper suggests how these patterns can be used in the teaching of large lecture classes of L2 students. It is based on my experience of teaching economics in such an environment.

Friday, January 27, 2006

Counting Eskimo Words for Snow

Does language influence or determine thought? Will I lose my own cultural identity if I start using another language too much? Consider these injunctions that a language teacher might face:

1. Teach culturally neutral International Business English.
2. Refrain from any cultural distortions of language.
3. Keep your language pure and neutral.
4. Don't be a Linguistic Imperialist

Implicit in these injunctions is that you actually can suppress the cultural component of your language in non-trivial uses of language. Maybe cookbook phrasebook language that you can program a computer program like Eliza to generate can be culturally neutral:

A: "Excuse me sir, where is the bathroom?"
B: "Down the hall to the right."

But at higher levels, language seems to determine thought, and thought language, there is at least strong feedback between the two. Linguistic determinism (the strong form of the Sapir-Whorf Hypothesis) is controversial though:

"Among the most frequently cited examples of linguistic determinism is Whorf's study of the language of the Inuit, who have multiple words for snow. He argues that this modifies the world view of the Inuit, creating a different mode of existence for them than, for instance, a speaker of English. The notion that Arctic people have a large number of words for snow has been shown to be false by linguist Geoffrey Pullum; in an essay titled The great Eskimo vocabulary hoax, he tracks down the origin of the story, ultimately attributing it largely to Whorf. More to the point is the triviality of this observation. The fact that wine fanciers have a rich vocabulary to speak about the tastes they find in wines is not thought of as evidence that their minds work differently, only that they know more than the average person about wine. English-speaking skiers may also have a rich vocabulary for snow."

"Wine literacy" certainly affected the thought and lifestyle of the wine afficianado main character of the 2004 movie Sideways [Review]. A computer geek or any kind of geek certainly acquires a vocabulary and way of using language and expressing themselves that affects their thought and life. It is a feedback process though and geeks and the wine literate are extreme cases.

Controversial statement: When you learn a word, learn everything about it, even the cultural hooks that people hang it on in their mind.

The word lists I've been looking at recently treat the words in word families as separate words. For example:

1. produce (v - action)
2. producer (n - actor)
3. production (n - activity)
4. productive (adj - applied to actor)
5. productivity (n - 4 nominalized)
6. product (n - object of action)

Maybe, the whole word family and perhaps even frequent collocates, should be taken as knowing or learning a new word. If you make enough links or handles for the student to hang the word on in their mind, culture is going to inevitably intrude. In my experience, teaching words in a set like this, also ties grammar to vocabulary building. Vocabulary-Grammar-Culture all tied together. Try to separate them and you'll get something inhuman.

(Stray thought: Isolating and counting words or units of meaning must be even more difficult in agglutinative languages like Eskimo, Sanskrit, or Turkish (See morphological typology of languages (agglutinative -building words from particles vs. isolating or analytic - each particle is a word)).

Wiktionary word frequency lists

Wikipedia's dictionary has word frequency lists calculated from Project Gutenberg texts. Taking a random sample:

"language that's House los individual South mon meant food wide now formed"

Reveals several problems. No lemmatization ("that's", "formed"). "mon" is either an abbreviation for "Monday" or a mistake (e.g. typo, word split at line break, etc.). "los" is either Spanish , perhaps from "Los Angeles" or a mistake. Capitilized and uncapitilized are apparently counted separately.

If a realistic type/token ratio is to be calculated that shows how many unique words the reader was exposed to, you probably have to go even further and count word families.

(Note: Chinese characters make it a lot easier since in Chinese (an analytic language) character morpheme breaks make defining exactly what is to be counted a lot easier.)

Monday, January 16, 2006

Content Based Instruction:
A generalization of the communicative approach?

To me, CBI or "learning a language through content" is a generalization of the common English teaching idea of communicative information gaps that need to be bridged in speaking activities.

Nowadays, I like to stay away from the mechanical "information" gap activity, transforming it into more of a "meaning" gap activity where the information gap comes from student projects or writing activities. My favorite, the marketing plan, where stundents design a Product, Pricing and Promotion for the product, and distribution (Place), which are called the 3 P's in marketing, has several opportunities for information or meaning gap exchange of information or roleplaying, marketing research via surveys and focus group sessions being two instances with opportunities for question formulation.

The hostility towards TESOL and the communicative approach shown by some more experienced teachers first struck me as unusual. One even called TESOL the "Hello, How are you?, I love you!" school of education. I now realize that the mainstream English teaching community (i.e. "TESOL") is, in fact, rather insulated, often resistant to outside intellectual influence and often unable to connect to broader-based educational research.

Studying higher-order ideas like CBI can help us do our lower level day to day work such as lesson planning more creatively and efficiently.

This introductory chapter from a recent book is a good introduction and overview of the idea of CBI (Content Based Instruction), learning a language by learning something else besides the language, using the language, albeit in simplified forms at first. Rather an ambitious task, wouldn't you say?

Table 1 in this book outlines the differences in applying CBI to novice and more advanced learners. An outline syllabus for novice learners is also given.

The table of "Kumar's Macro-Strategies" also provide a nice set of guidelines.

The bibliography is also very up-to-date.

Friday, January 13, 2006

Moving up the journalism value chain

Having just studied Michael Porter's value chain idea a bit and the notion of "moving up the value chain", the question arises how journalistic writing can move up the value chain using the web. Jakob Nielson addresses this:

"On the Web, the inverted pyramid becomes even more important since we know from several user studies that users don't scroll...writers can link to old articles instead of having to summarize background information in every is possible to link to full background materials and to construct digests of links to multiple treatments of an issue."

This all assumes that reliable, relevant, and concise background information is available online like it usually is at Wikipedia. In fact, "pedia" in general could be broken out as a concept, defined as a topically indexed background providing online source. Nielson continues:

"...the Web is a linking medium and weknow from hypertext theory that writing for interlinked information spaces is different than writing linear flows of text. In fact, George Landow,a Professor of English literature, coined the phrases rhetoric of departure and rhetoric of arrival to indicate the need for both ends of the link to give users some understanding of where they can go as wellas why the arrival page is of relevance to them."

The History of Journalism's
Inverted Pyramid

"Writing from the Top Down: Pros and Cons of the Inverted Pyramid" is a great little critical history of journalism's conventional pattern of writing:

"The conventions of the inverted pyramid require the reporter to summarize the story, to get to the heart, to the point, to sum up quickly and concisely the answer to the question: What's the news?"

This can split narratives unchronologically:

"The inverted pyramid, its critics say, is the anti-story. It tells the story backward and is at odds with the storytelling tradition that features a beginning, middle, and end."

Important background information can often fall off the end of the article if there is not enough space. Despite these failings, the inverted pyramid remains the pillar of western journalism, but will this change?

Thursday, January 12, 2006

Profiling VOA's Special English II

80% of VOA vocab is consistently within the first 1000 words (K1), so the VOA "Special English" is truly simple. I profiled the last four months of economic-business articles from VOA [list of articles].

Can you write an article with the depth of the Economist with Simple English though?Concordancing could tell what techniques can be used to keep grammar and vocab simple. I'll have to rewrite an Economist article in "Special English".

I've been noticing that the profiling software does use a lemmatiser or stemmer (eliminate grammatical inflections), like any good search engine would, before it counts and lists unique words. For example:

airline_[1] airlines_[5]
campaign_[2] campaigned_[2] campaigns
cancel_[3] cancelled_[1] (not cancellation_[1] though, which is in the same word family)

It seems that the measure of unique words that a student has to deal with in a text (type-token ratio) should not depend on grammatical inflection. Should it depend on the part of speech? For example the word family:

employed_[6] employees_[8] employer_[1] employers_[5] employment_[3]

maps to the one unique counted word:


Will have to do a little bit of research.

Tuesday, January 10, 2006

Profiling VOA's Special English

The Voice Of America (VOA) has been using its own version of simplified English since 1959 and their archives are available to the public. I could only find one economics-business related article quickly though: "Simple English: American Agriculture: Shrinking but More Productive" [Profile]

The most interesting recent economic-business article I could find on the whole VOA site was about China buying into African oil, but it doesn't say it was written in special English: "China Oil Giant Reaches Deal to Buy Major Stake in Nigerian Oil Field". [Profile]

Comparing the profiles of the two articles:
China oil, not simplified: (80,4,8,8) vs.
American agriculture, simplified: (64,4,8,24)
where (first1000words,second1000,academic,off-list)

Clearly, the VOA special English article routinely uses a lot more simple vocabulary, but note that 80% is near the 77% of the Wikipedia article I profiled. A more detailed study is obviously needed both characterizing lexically and grammatically these different forms of simplified writing and maybe also some objective computer-based measurement of how quickly students can read and understand these different kinds of writing.

Sunday, January 08, 2006

Simple English Vocabulary Profile

Finally tested the Vocabulary profiler on one of Wikipedia's Simple English pages. The first one that I could find that was relatively complete was the one on human rights.

Most of the simple words used on the page were caught by the profiler's lists. The words that weren't caught were either reasonable new words you'd have to master to read about human rights, have something to do with philosophy, or are proper nouns or mistakes:

"abuse_[1] abuses_[5] american_[1] asylum_[1]
biologists_[1] condemn_[1] covenant_[2] covenants_[2]
disability_[1] enlightenment_[1] etc_[1] european_[2]
france_[2] french_[1] georg_[1] hegel_[1] innocent_[1]
jail_[1] john_[2] locke_[1] nationality_[1] numbersome_[1]
organism_[1] organisms_[2] protest_[1] rightsbecause_[1]
scriptures_[1] stuart_[1]"

One thing the page does lack is any concrete examples of human rights abuses which seems pretty important because it's the very non-abstract cruelty of these acts that make them so reprehensible.

Just for comparison, here is the main non-simplified Wikipedia page for Human Rights and here are the profile stats on which lists the words are caught by:

Simple: (K1:88%,K2:4%,AWL:4%, Off-List:4%)
Not Simple:(K1:77%,K2:4%,AWL:11%,Off-List:8%)

Not really very different! At least as far as the vocab is concerned.

If the "Simple English" in WIkipedia is not much more simple than the authentic real-life English in the main Wikipedia should we really be investing time time with these articles, or maybe we have rethink exactly what we mean by "Simple English" and then measure and control simplicity.

Thursday, January 05, 2006

Wikipedia and the rise of Participatory Journalism

Rules like NPOW (neutral point of view) that Wikipedia established have made Wikipedia a reliable place to get information on the internet which is often a very unreliable place to get good information.

Wikipedia is being used in Hong Kong as a tool to teach journalism and how to write "in a fair and balanced manner for an international audience. By collaborating online with others, students can interact with each other when writing, and receive advice and corrections from complete strangers around the world within minutes of making contributions. With students for which English is a second language, this provides a highly interactive experience for learning copy editing and grammar usage."

Wikipedia could also become an important repository of simplified texts for English language learners and for disseminating the practice of extensive reading of simplified texts advocated by experts ranging from Krashen, Richard Day, and Nation. Most newspaper articles need the sort of additional background information that Wikipedia can provide.

This essay also comments on the rise of the Chinese version of Wikipedia which is still behind Esperanto in terms of content. Hopefully, one day there'll be a simplified Chinese Wikipedia too for language learning content that goes beyond the traditional checking into a hotel, a trip to the post office, ordering food, friends having a banal conversation, etc.

Simplified English Texts

How can I measure how simple a text is? One way is to count unique words. Simple metrics like the Flesch readability formula only provide a very rough rule of thumb. What about comparing a text with similar texts that you already know are simple?

Texts from graded readers like the Oxford Bookwork series provide a nice baseline for comparison, but they are copyrighted. Maybe articles in the Simplified English Wikipedia could be used, although when I took look there weren't many articles yet and some people were writing their articles with Ogden's Basic English which actually distorts the English language sometimes, not a good idea.

The vocab profiler can be used to do the comparison. Start with a corpus of simplified texts and compare the profile on these simplified texts with authentic texts from newspapers.

Anyway, simplified vs. authentic texts is a very murky area. What is simplified? Don't you lose information with simplified texts? Next, I have to create profiles for some simplified texts and compare them with the authentic text profiles I already have.

Tuesday, January 03, 2006

Vocabulary Profiling II

I extracted 10 words from a newspaper article to focus on in a vocab lesson. Here's the vocab profile I'm working from. The topic is "The Police" and the most fruitful place to look for new words to teach was among the words that were not caught by a list. I cooked up this TV dinner of a lesson based on a test prep book.

Sunday, January 01, 2006

Vocabulary Profiling

Just used Nation's vocabulary profiler on a newspaper text [original text, results]. The profiler is supposed to show you how difficult the vocabulary in a text is. If you were writing one of those simplified vocabulary graded readers like Oxford Bookworms that only uses let's say 1000 words, you could use this software to keep on track and control difficulty. This particular version at The Compleat Lexical Tutor also color codes the text to help you.

Now I have a complicated printout to interpret, ouch! Most of the AWL words are not the sort of words I would define for my students, too easy. Maybe a domain specific list, e.g. for economics, should be used too. The profiler allows you to add vocab lists. Here are the words that weren't in the lists:

"baht baht baht baht baht baht baht baht bangkok cane chakramon chakramon chakramon csb csb csb embarrassing ex fertiliser freight frustration hike hike hoarding inflation kilogramme kilogramme longstanding pesticides phasukvanich plaguing policymaking provinces quit reportedly retail retail shortages skyrocketing smuggling tackle tackle wholesalers"

I've define the expressions "hike prices" and "hoard" for students recently. I'd define "tackle" too. Detecting common collocations would be a nice add-on feature.

Special file formats for lessons

The Guardian uses a special format that is easy to read in emails. A short SMS message could provide everything necessary to improvise a lesson. (For a spoof on minimalist teaching see "The Ten Rules") In some environments, even in today's technologically sophisticated world, computers are unavailable or too much of a hassle to use.

Each file format can make different aspects of using a lesson easier. PDF files make printing out worksheets easier. An interactive self-correcting online elearning activity can be used at anytime without a teacher. Flash makes certain features of these activities easier like drag and drop. HTML pages are easy to read online.

There is no reason a program can't be written to reformat lessons in several different convenient formats.