Thursday, July 27, 2006

The daily news is meaningless without history:
Rich contexts for vocabulary learning

There is an interesting thread on the role of journalism over at Brad DeLong's Semi-Daily Journal. Here's the quote that attracted my attention:

"...instead of hoping that clever/informed readers will see through the kabuki to the facts, and leaving the less sophisticated readers to flounder about in disinformation, journalists should in fact make those value judgements plain and call a spade a spade...leaving utter nonsense unchallenged except by a partisan source, and failing to provide the necessary context."

I find this interesting because you can't really learn vocabulary without a rich context. So I succumbed to tempatation and made the following comment:

HISTORY has to be written in parallel with the news, also known as background info or rich context. Wikipedia's pretty good for this.

Working in the education department of a newspaper I have to repurpose articles for educational purposes.The lack of context and background information makes this very difficult for many articles. For example, today there is an article on the auctioning off of assets seized by banks after the 1997 economic crisis in Thailand, but the list of facts is really meaningless without background information.

Sometimes it is opaque rent-seeking relations with powerful people that makes the context unwritable. Informed sources have told me that this is the case
in the sugar industry.

Empowering knowledge-seeking readers, not the ones who just want to look in the the mirror, should be the goal, but the newsreaders have to become **good critical readers of history** to do this.

Thursday, July 06, 2006

Good English listening practice for economics majors

For students in Thai, Korean, or Japanese universities studying economics, UC Berkeley professor of economics Brad de Long does a series of videocasts called the Morning Coffee Videocast.

Today's videocast, A Primer on the Federal Reserve, is particularly helpful for learning how to listen to people speaking about economics and also how speak about economics yourself.

Tuesday, February 07, 2006

Overcoming student plagiarism

Western teachers in Asia often have to contend with plagiarism. Besides wondering to what extent the very notion of plagiarism may be part of western culture and somewhat foreign to other cultures, we often wonder what simple or creative steps can be taken to eliminate it.

Extremely broad definitions of plagiarism can be an impediment to the free flow of information and knowledge to the developing world. Wikipedia never cites sources, but its much less successful predecessor Nupedia cited obsessively cited sources and had a rigorous regime of peer review also. One could argue that Wikipedia has pretty much redefined plagiarism. Here is Wikipedia's definition of plagiarism:

"Plagiarism is the use of another person’s work (this could be his or her words, products or ideas) for personal advantage, without proper acknowledgement of the original work."

Did I just commit an act of plagiarism by not citing the source? No, because there are links to the source in the quote and above the quote and I mentioned the source which is pretty easy to find with a Google search: "plagiarism wikipedia". I've even seen whole legitimate history books written by highly respected professors without any citations at all. Sometimes merely commenting that any scholar who is familiar with the subject will know the source is deemed enough. David Wyatt's A Short History of Thailand is one good example [my review].

It's better to give students an easy and explicit way to cite sources they use. Teach students to quote and paraphrase texts and to cite sources with an easycitation system like "(Smith, 1977, 123)" with simple bibliography entries like "Smith, John (1977) The Meaning of Life(New York: Profundity Press)". Then insist they use it all the time without exception. Soon citation of sources will become habit.

Another approach is to teach students how to immitate and adapt texts without plagiarizing. Copying verbatim without thought won't lead to language acquisition, but reflective adaptation of individual sentences, using perhaps select subject-object or adjective-noun collocations, is essential for language students to acquire language patterns for future reuse in freer speech and writing. Example sentences from learner's dictionaries and language corpora can be mined for patterns to re-use. To play it safe, always have students provide the citation for the source that was immitated or adapted.

At another level, the notion of inter-textuality is a rich source of ideas for teaching students legitimate ways to appropriate texts. One definition of inter-textuality from the definitions on the web reads: "When a media text makes reference to another text that, on the surface, appears to be unique and distinct" (

Doesn't this sound a lot like plagiarism? As Daniel Chandler's Semiotics for Beginners observes: "Gerard Genette proposed the term 'transtextuality' as a more inclusive term than 'intertextuality' (Genette 1997). He listed five subtypes:

"intertextuality: quotation, plagiarism, allusion;

paratextuality: the relation between a text and its 'paratext' - that which surrounds the main body of the text - such as titles, headings, prefaces, epigraphs, dedications, acknowledgements, footnotes, illustrations, dust jackets, etc.;

architextuality: designation of a text as part of a
genre or genres (Genette refers to designation by the text itself, but this could also be applied to its framing by readers);

metatextuality: explicit or implicit critical commentary of one text on another text (metatextuality can be hard to distinguish from the following category);

hypotextuality (Genette's term was hypertextuality): the relation between a text and a preceding 'hypotext' - a text or genre on which it is based but which it transforms, modifies, elaborates or extends (including parody, spoof, sequel, translation).

Saturday, February 04, 2006

Value-added journalism:
quotes as easy background

Background information for events in the news.

Is this the most important value-added feature that the web provides to journalism?

Writing good and comprehensive background articles takes time. For more general topics you can usually find background articles at Wikipedia. The Economist also has "Backgrounder" sections and Country Briefings

If you have no time, quotes from other sources are a good way to begin creating a "Backgrounder". That's what I see some people doing at Wikipedia, like this article on Web Mashups (see the quotes section). Wikipedia on Quotes

Blog Rhetoric: Organization

This article covers part of blogging rhetoric. Rhetoric is the study of how to communicate effectively and persuasively in writing or speech [web definitions].

Organization is an important component in most rubrics for assessing student writing. A specific length is usually part of the writing assignment specifications.

Here are the seven basic blog posting formats:

1. Link-only (few words, a bookmark like del-icio-us)
2. Link blurb (2 lines-few paragraphs, maybe an extract)
3. Brief remark (1-3 short paragraphs)
4. List
5. Short article (under 500-700 words)
6. Long article (700+ words)
7. Series postings (500-1000 words each)

"Some formats work best for commentary or explanation, others for alerts and references, etc."

The "brief remark" is "a blog posting that generally is just 1-3 short paragraphs long. It can contain virtually any kind of content: an observation on current events, an idea, an event announcement, a question for readers, an anecdote, a joke, a description, etc."

Of course, Jakob Nielsen must be ranked as number one web rhetorician.

Thursday, February 02, 2006

Intrinsic Motivation
= Interesting and Relevant Content

University classes are a captive audience.

Business people studying in their free time are not captive.

If the content is not engaging and relevant, business people won't come to class. Eventually, if they are bored they won't come to school at all. What to do?

Add a little intrinsic motivation and we can often capture this demanding audience:

"People who are intrinsically motivated work on tasks because they find them enjoyable...choosing to do an activity for no compelling reason...[the activity] occurs for its own sake...requires no external supports or reinforcements."

Sometimes the preparation of language teaching material is a search for good content, for relevant and up-to-date news, the same thing that drives journalists.

Outsourcing blogs: My favorite

There are several blogs devoted to outsourcing where you can find the latest relevant developments in this area. My favorite blogs are this Globalization of Services blog and this Business Process Outsourcing (BPO) blog.

Teachers have an advantage over text book companies here. Textbook companies avoid using and providing materials in computer readable form out of the fear of being copied. With Google and knowledge of their students, any teacher can find the right fit between students and content with a little experimentation.

Wednesday, February 01, 2006

The Theory Behind
Content-Based Instruction

This short paper has a nice concise statement of why content based instruction (CBI) is more effective than just teaching a language:

"One of the achievements of cognitive science is the confirmation of the dual nature of cognition given in the dictionary definition: all human intellectual activities, such as thinking, communicating, problem solving, and learning, require both processes and content (knowledge). This implies that attempting to raise people's cognitive abilities to high levels simply by improving processes such as "reading," "writing," "critical thinking" is nearly futile. To perform these processes well requires high levels of content knowledge on which the processes can operate."

Computer programs that understand natural language need databases of commonsense content knowledge like Cyc to function.

Humans like computers need these databases and they acquire them by osmosis through extensive reading and automatic word recognition:

"To efficiently read and comprehend, the decoding aspect of reading must become automatic, that is, performed without conscious attention. This can only be accomplished by hours and hours of practice in reading. This is one of the reasons why adults who leave literacy programs having completed just 50 to 100 or so hours of instruction do not make much improvement in general reading comprehension: they have not automated the decoding process. A second reason is that, to markedly improve reading comprehension, one must develop a large body of knowledge in long term memory relevant to what is being read. Like skills, the development of large bodies of knowledge takes a long time."

The article ends by suggesting that language training be combined with job training.

Language tests that even native speakers can't answer correctly

Are tests that even native speakers of a language might not answer correctly really legitimate? This gap-fill quiz for business vocabulary is from this quiz repository. Only scored 16 out of 20 and I have a masters in economics. Does that make me incompetent and shameless? Hardly.

Part of the problem is probably English teachers without subject-specific knowledge asking either irrelevant questions or questions with either multiple or no answers. Another problem may be United Kingdom-specific language. The problems were:

"allowance money" vs. "pocket money" ("pocket money" is quite a general word)

"socialist economy" vs. a "mixed economy" ("mixed" pretty vacuous here)

"gold reserves" vs. "gold reserve" ("is" forces a usage that I'm not familiar with)

"multi-use ticket" vs. "season ticket" (no reference to summer, winter, ...)

Nonetheless, such quizzes still have value as a sort of meeting of minds between teacher and student and it's probably better that a lot of time is not wasted writing air-tight valid questions. It is important that students are not assessed with questions like this. Sometimes this type of question actually penalizes the best students in the class. I've thrown away test questions like this that were given in exams and confused very competent students.

The bottom-line: focus on content and not language itself, i.e. Content Based Instruction (CBI) . Setting learning objectives for content that is expressed with vocabulary is more effective and natural than making the vocabulary itself an objective.

(London Chamber of Commerce and Industry (LCCI) International Qualifications exam curriculae seem to be a good basis for content-based instruction).

Sunday, January 29, 2006

My Economics EAP Reading
Conference Paper

A conference paper I presented at the international TESOL conference at Mae Fah Luang University in Chiangrai, Thailand 2005.

Instead of just looking for simple word collocations in a corpora of economics texts, the paper advocates looking for semantic-syntactic patterns like TREND-CAUSES-TREND. Furthermore, the paper suggests how these patterns can be used in the teaching of large lecture classes of L2 students. It is based on my experience of teaching economics in such an environment.

Friday, January 27, 2006

Counting Eskimo Words for Snow

Does language influence or determine thought? Will I lose my own cultural identity if I start using another language too much? Consider these injunctions that a language teacher might face:

1. Teach culturally neutral International Business English.
2. Refrain from any cultural distortions of language.
3. Keep your language pure and neutral.
4. Don't be a Linguistic Imperialist

Implicit in these injunctions is that you actually can suppress the cultural component of your language in non-trivial uses of language. Maybe cookbook phrasebook language that you can program a computer program like Eliza to generate can be culturally neutral:

A: "Excuse me sir, where is the bathroom?"
B: "Down the hall to the right."

But at higher levels, language seems to determine thought, and thought language, there is at least strong feedback between the two. Linguistic determinism (the strong form of the Sapir-Whorf Hypothesis) is controversial though:

"Among the most frequently cited examples of linguistic determinism is Whorf's study of the language of the Inuit, who have multiple words for snow. He argues that this modifies the world view of the Inuit, creating a different mode of existence for them than, for instance, a speaker of English. The notion that Arctic people have a large number of words for snow has been shown to be false by linguist Geoffrey Pullum; in an essay titled The great Eskimo vocabulary hoax, he tracks down the origin of the story, ultimately attributing it largely to Whorf. More to the point is the triviality of this observation. The fact that wine fanciers have a rich vocabulary to speak about the tastes they find in wines is not thought of as evidence that their minds work differently, only that they know more than the average person about wine. English-speaking skiers may also have a rich vocabulary for snow."

"Wine literacy" certainly affected the thought and lifestyle of the wine afficianado main character of the 2004 movie Sideways [Review]. A computer geek or any kind of geek certainly acquires a vocabulary and way of using language and expressing themselves that affects their thought and life. It is a feedback process though and geeks and the wine literate are extreme cases.

Controversial statement: When you learn a word, learn everything about it, even the cultural hooks that people hang it on in their mind.

The word lists I've been looking at recently treat the words in word families as separate words. For example:

1. produce (v - action)
2. producer (n - actor)
3. production (n - activity)
4. productive (adj - applied to actor)
5. productivity (n - 4 nominalized)
6. product (n - object of action)

Maybe, the whole word family and perhaps even frequent collocates, should be taken as knowing or learning a new word. If you make enough links or handles for the student to hang the word on in their mind, culture is going to inevitably intrude. In my experience, teaching words in a set like this, also ties grammar to vocabulary building. Vocabulary-Grammar-Culture all tied together. Try to separate them and you'll get something inhuman.

(Stray thought: Isolating and counting words or units of meaning must be even more difficult in agglutinative languages like Eskimo, Sanskrit, or Turkish (See morphological typology of languages (agglutinative -building words from particles vs. isolating or analytic - each particle is a word)).

Wiktionary word frequency lists

Wikipedia's dictionary has word frequency lists calculated from Project Gutenberg texts. Taking a random sample:

"language that's House los individual South mon meant food wide now formed"

Reveals several problems. No lemmatization ("that's", "formed"). "mon" is either an abbreviation for "Monday" or a mistake (e.g. typo, word split at line break, etc.). "los" is either Spanish , perhaps from "Los Angeles" or a mistake. Capitilized and uncapitilized are apparently counted separately.

If a realistic type/token ratio is to be calculated that shows how many unique words the reader was exposed to, you probably have to go even further and count word families.

(Note: Chinese characters make it a lot easier since in Chinese (an analytic language) character morpheme breaks make defining exactly what is to be counted a lot easier.)

Monday, January 16, 2006

Content Based Instruction:
A generalization of the communicative approach?

To me, CBI or "learning a language through content" is a generalization of the common English teaching idea of communicative information gaps that need to be bridged in speaking activities.

Nowadays, I like to stay away from the mechanical "information" gap activity, transforming it into more of a "meaning" gap activity where the information gap comes from student projects or writing activities. My favorite, the marketing plan, where stundents design a Product, Pricing and Promotion for the product, and distribution (Place), which are called the 3 P's in marketing, has several opportunities for information or meaning gap exchange of information or roleplaying, marketing research via surveys and focus group sessions being two instances with opportunities for question formulation.

The hostility towards TESOL and the communicative approach shown by some more experienced teachers first struck me as unusual. One even called TESOL the "Hello, How are you?, I love you!" school of education. I now realize that the mainstream English teaching community (i.e. "TESOL") is, in fact, rather insulated, often resistant to outside intellectual influence and often unable to connect to broader-based educational research.

Studying higher-order ideas like CBI can help us do our lower level day to day work such as lesson planning more creatively and efficiently.

This introductory chapter from a recent book is a good introduction and overview of the idea of CBI (Content Based Instruction), learning a language by learning something else besides the language, using the language, albeit in simplified forms at first. Rather an ambitious task, wouldn't you say?

Table 1 in this book outlines the differences in applying CBI to novice and more advanced learners. An outline syllabus for novice learners is also given.

The table of "Kumar's Macro-Strategies" also provide a nice set of guidelines.

The bibliography is also very up-to-date.

Friday, January 13, 2006

Moving up the journalism value chain

Having just studied Michael Porter's value chain idea a bit and the notion of "moving up the value chain", the question arises how journalistic writing can move up the value chain using the web. Jakob Nielson addresses this:

"On the Web, the inverted pyramid becomes even more important since we know from several user studies that users don't scroll...writers can link to old articles instead of having to summarize background information in every is possible to link to full background materials and to construct digests of links to multiple treatments of an issue."

This all assumes that reliable, relevant, and concise background information is available online like it usually is at Wikipedia. In fact, "pedia" in general could be broken out as a concept, defined as a topically indexed background providing online source. Nielson continues:

"...the Web is a linking medium and weknow from hypertext theory that writing for interlinked information spaces is different than writing linear flows of text. In fact, George Landow,a Professor of English literature, coined the phrases rhetoric of departure and rhetoric of arrival to indicate the need for both ends of the link to give users some understanding of where they can go as wellas why the arrival page is of relevance to them."

The History of Journalism's
Inverted Pyramid

"Writing from the Top Down: Pros and Cons of the Inverted Pyramid" is a great little critical history of journalism's conventional pattern of writing:

"The conventions of the inverted pyramid require the reporter to summarize the story, to get to the heart, to the point, to sum up quickly and concisely the answer to the question: What's the news?"

This can split narratives unchronologically:

"The inverted pyramid, its critics say, is the anti-story. It tells the story backward and is at odds with the storytelling tradition that features a beginning, middle, and end."

Important background information can often fall off the end of the article if there is not enough space. Despite these failings, the inverted pyramid remains the pillar of western journalism, but will this change?

Thursday, January 12, 2006

Profiling VOA's Special English II

80% of VOA vocab is consistently within the first 1000 words (K1), so the VOA "Special English" is truly simple. I profiled the last four months of economic-business articles from VOA [list of articles].

Can you write an article with the depth of the Economist with Simple English though?Concordancing could tell what techniques can be used to keep grammar and vocab simple. I'll have to rewrite an Economist article in "Special English".

I've been noticing that the profiling software does use a lemmatiser or stemmer (eliminate grammatical inflections), like any good search engine would, before it counts and lists unique words. For example:

airline_[1] airlines_[5]
campaign_[2] campaigned_[2] campaigns
cancel_[3] cancelled_[1] (not cancellation_[1] though, which is in the same word family)

It seems that the measure of unique words that a student has to deal with in a text (type-token ratio) should not depend on grammatical inflection. Should it depend on the part of speech? For example the word family:

employed_[6] employees_[8] employer_[1] employers_[5] employment_[3]

maps to the one unique counted word:


Will have to do a little bit of research.

Tuesday, January 10, 2006

Profiling VOA's Special English

The Voice Of America (VOA) has been using its own version of simplified English since 1959 and their archives are available to the public. I could only find one economics-business related article quickly though: "Simple English: American Agriculture: Shrinking but More Productive" [Profile]

The most interesting recent economic-business article I could find on the whole VOA site was about China buying into African oil, but it doesn't say it was written in special English: "China Oil Giant Reaches Deal to Buy Major Stake in Nigerian Oil Field". [Profile]

Comparing the profiles of the two articles:
China oil, not simplified: (80,4,8,8) vs.
American agriculture, simplified: (64,4,8,24)
where (first1000words,second1000,academic,off-list)

Clearly, the VOA special English article routinely uses a lot more simple vocabulary, but note that 80% is near the 77% of the Wikipedia article I profiled. A more detailed study is obviously needed both characterizing lexically and grammatically these different forms of simplified writing and maybe also some objective computer-based measurement of how quickly students can read and understand these different kinds of writing.

Sunday, January 08, 2006

Simple English Vocabulary Profile

Finally tested the Vocabulary profiler on one of Wikipedia's Simple English pages. The first one that I could find that was relatively complete was the one on human rights.

Most of the simple words used on the page were caught by the profiler's lists. The words that weren't caught were either reasonable new words you'd have to master to read about human rights, have something to do with philosophy, or are proper nouns or mistakes:

"abuse_[1] abuses_[5] american_[1] asylum_[1]
biologists_[1] condemn_[1] covenant_[2] covenants_[2]
disability_[1] enlightenment_[1] etc_[1] european_[2]
france_[2] french_[1] georg_[1] hegel_[1] innocent_[1]
jail_[1] john_[2] locke_[1] nationality_[1] numbersome_[1]
organism_[1] organisms_[2] protest_[1] rightsbecause_[1]
scriptures_[1] stuart_[1]"

One thing the page does lack is any concrete examples of human rights abuses which seems pretty important because it's the very non-abstract cruelty of these acts that make them so reprehensible.

Just for comparison, here is the main non-simplified Wikipedia page for Human Rights and here are the profile stats on which lists the words are caught by:

Simple: (K1:88%,K2:4%,AWL:4%, Off-List:4%)
Not Simple:(K1:77%,K2:4%,AWL:11%,Off-List:8%)

Not really very different! At least as far as the vocab is concerned.

If the "Simple English" in WIkipedia is not much more simple than the authentic real-life English in the main Wikipedia should we really be investing time time with these articles, or maybe we have rethink exactly what we mean by "Simple English" and then measure and control simplicity.

Thursday, January 05, 2006

Wikipedia and the rise of Participatory Journalism

Rules like NPOW (neutral point of view) that Wikipedia established have made Wikipedia a reliable place to get information on the internet which is often a very unreliable place to get good information.

Wikipedia is being used in Hong Kong as a tool to teach journalism and how to write "in a fair and balanced manner for an international audience. By collaborating online with others, students can interact with each other when writing, and receive advice and corrections from complete strangers around the world within minutes of making contributions. With students for which English is a second language, this provides a highly interactive experience for learning copy editing and grammar usage."

Wikipedia could also become an important repository of simplified texts for English language learners and for disseminating the practice of extensive reading of simplified texts advocated by experts ranging from Krashen, Richard Day, and Nation. Most newspaper articles need the sort of additional background information that Wikipedia can provide.

This essay also comments on the rise of the Chinese version of Wikipedia which is still behind Esperanto in terms of content. Hopefully, one day there'll be a simplified Chinese Wikipedia too for language learning content that goes beyond the traditional checking into a hotel, a trip to the post office, ordering food, friends having a banal conversation, etc.

Simplified English Texts

How can I measure how simple a text is? One way is to count unique words. Simple metrics like the Flesch readability formula only provide a very rough rule of thumb. What about comparing a text with similar texts that you already know are simple?

Texts from graded readers like the Oxford Bookwork series provide a nice baseline for comparison, but they are copyrighted. Maybe articles in the Simplified English Wikipedia could be used, although when I took look there weren't many articles yet and some people were writing their articles with Ogden's Basic English which actually distorts the English language sometimes, not a good idea.

The vocab profiler can be used to do the comparison. Start with a corpus of simplified texts and compare the profile on these simplified texts with authentic texts from newspapers.

Anyway, simplified vs. authentic texts is a very murky area. What is simplified? Don't you lose information with simplified texts? Next, I have to create profiles for some simplified texts and compare them with the authentic text profiles I already have.

Tuesday, January 03, 2006

Vocabulary Profiling II

I extracted 10 words from a newspaper article to focus on in a vocab lesson. Here's the vocab profile I'm working from. The topic is "The Police" and the most fruitful place to look for new words to teach was among the words that were not caught by a list. I cooked up this TV dinner of a lesson based on a test prep book.

Sunday, January 01, 2006

Vocabulary Profiling

Just used Nation's vocabulary profiler on a newspaper text [original text, results]. The profiler is supposed to show you how difficult the vocabulary in a text is. If you were writing one of those simplified vocabulary graded readers like Oxford Bookworms that only uses let's say 1000 words, you could use this software to keep on track and control difficulty. This particular version at The Compleat Lexical Tutor also color codes the text to help you.

Now I have a complicated printout to interpret, ouch! Most of the AWL words are not the sort of words I would define for my students, too easy. Maybe a domain specific list, e.g. for economics, should be used too. The profiler allows you to add vocab lists. Here are the words that weren't in the lists:

"baht baht baht baht baht baht baht baht bangkok cane chakramon chakramon chakramon csb csb csb embarrassing ex fertiliser freight frustration hike hike hoarding inflation kilogramme kilogramme longstanding pesticides phasukvanich plaguing policymaking provinces quit reportedly retail retail shortages skyrocketing smuggling tackle tackle wholesalers"

I've define the expressions "hike prices" and "hoard" for students recently. I'd define "tackle" too. Detecting common collocations would be a nice add-on feature.

Special file formats for lessons

The Guardian uses a special format that is easy to read in emails. A short SMS message could provide everything necessary to improvise a lesson. (For a spoof on minimalist teaching see "The Ten Rules") In some environments, even in today's technologically sophisticated world, computers are unavailable or too much of a hassle to use.

Each file format can make different aspects of using a lesson easier. PDF files make printing out worksheets easier. An interactive self-correcting online elearning activity can be used at anytime without a teacher. Flash makes certain features of these activities easier like drag and drop. HTML pages are easy to read online.

There is no reason a program can't be written to reformat lessons in several different convenient formats.