Friday, January 27, 2006

Counting Eskimo Words for Snow

Does language influence or determine thought? Will I lose my own cultural identity if I start using another language too much? Consider these injunctions that a language teacher might face:

1. Teach culturally neutral International Business English.
2. Refrain from any cultural distortions of language.
3. Keep your language pure and neutral.
4. Don't be a Linguistic Imperialist

Implicit in these injunctions is that you actually can suppress the cultural component of your language in non-trivial uses of language. Maybe cookbook phrasebook language that you can program a computer program like Eliza to generate can be culturally neutral:

A: "Excuse me sir, where is the bathroom?"
B: "Down the hall to the right."

But at higher levels, language seems to determine thought, and thought language, there is at least strong feedback between the two. Linguistic determinism (the strong form of the Sapir-Whorf Hypothesis) is controversial though:

"Among the most frequently cited examples of linguistic determinism is Whorf's study of the language of the Inuit, who have multiple words for snow. He argues that this modifies the world view of the Inuit, creating a different mode of existence for them than, for instance, a speaker of English. The notion that Arctic people have a large number of words for snow has been shown to be false by linguist Geoffrey Pullum; in an essay titled The great Eskimo vocabulary hoax, he tracks down the origin of the story, ultimately attributing it largely to Whorf. More to the point is the triviality of this observation. The fact that wine fanciers have a rich vocabulary to speak about the tastes they find in wines is not thought of as evidence that their minds work differently, only that they know more than the average person about wine. English-speaking skiers may also have a rich vocabulary for snow."

"Wine literacy" certainly affected the thought and lifestyle of the wine afficianado main character of the 2004 movie Sideways [Review]. A computer geek or any kind of geek certainly acquires a vocabulary and way of using language and expressing themselves that affects their thought and life. It is a feedback process though and geeks and the wine literate are extreme cases.

Controversial statement: When you learn a word, learn everything about it, even the cultural hooks that people hang it on in their mind.

The word lists I've been looking at recently treat the words in word families as separate words. For example:

1. produce (v - action)
2. producer (n - actor)
3. production (n - activity)
4. productive (adj - applied to actor)
5. productivity (n - 4 nominalized)
6. product (n - object of action)

Maybe, the whole word family and perhaps even frequent collocates, should be taken as knowing or learning a new word. If you make enough links or handles for the student to hang the word on in their mind, culture is going to inevitably intrude. In my experience, teaching words in a set like this, also ties grammar to vocabulary building. Vocabulary-Grammar-Culture all tied together. Try to separate them and you'll get something inhuman.

(Stray thought: Isolating and counting words or units of meaning must be even more difficult in agglutinative languages like Eskimo, Sanskrit, or Turkish (See morphological typology of languages (agglutinative -building words from particles vs. isolating or analytic - each particle is a word)).

Wiktionary word frequency lists

Wikipedia's dictionary has word frequency lists calculated from Project Gutenberg texts. Taking a random sample:

"language that's House los individual South mon meant food wide now formed"

Reveals several problems. No lemmatization ("that's", "formed"). "mon" is either an abbreviation for "Monday" or a mistake (e.g. typo, word split at line break, etc.). "los" is either Spanish , perhaps from "Los Angeles" or a mistake. Capitilized and uncapitilized are apparently counted separately.

If a realistic type/token ratio is to be calculated that shows how many unique words the reader was exposed to, you probably have to go even further and count word families.

(Note: Chinese characters make it a lot easier since in Chinese (an analytic language) character morpheme breaks make defining exactly what is to be counted a lot easier.)

Monday, January 16, 2006

Content Based Instruction:
A generalization of the communicative approach?

To me, CBI or "learning a language through content" is a generalization of the common English teaching idea of communicative information gaps that need to be bridged in speaking activities.

Nowadays, I like to stay away from the mechanical "information" gap activity, transforming it into more of a "meaning" gap activity where the information gap comes from student projects or writing activities. My favorite, the marketing plan, where stundents design a Product, Pricing and Promotion for the product, and distribution (Place), which are called the 3 P's in marketing, has several opportunities for information or meaning gap exchange of information or roleplaying, marketing research via surveys and focus group sessions being two instances with opportunities for question formulation.

The hostility towards TESOL and the communicative approach shown by some more experienced teachers first struck me as unusual. One even called TESOL the "Hello, How are you?, I love you!" school of education. I now realize that the mainstream English teaching community (i.e. "TESOL") is, in fact, rather insulated, often resistant to outside intellectual influence and often unable to connect to broader-based educational research.

Studying higher-order ideas like CBI can help us do our lower level day to day work such as lesson planning more creatively and efficiently.

This introductory chapter from a recent book is a good introduction and overview of the idea of CBI (Content Based Instruction), learning a language by learning something else besides the language, using the language, albeit in simplified forms at first. Rather an ambitious task, wouldn't you say?

Table 1 in this book outlines the differences in applying CBI to novice and more advanced learners. An outline syllabus for novice learners is also given.

The table of "Kumar's Macro-Strategies" also provide a nice set of guidelines.

The bibliography is also very up-to-date.

Friday, January 13, 2006

Moving up the journalism value chain

Having just studied Michael Porter's value chain idea a bit and the notion of "moving up the value chain", the question arises how journalistic writing can move up the value chain using the web. Jakob Nielson addresses this:

"On the Web, the inverted pyramid becomes even more important since we know from several user studies that users don't scroll...writers can link to old articles instead of having to summarize background information in every article...it is possible to link to full background materials and to construct digests of links to multiple treatments of an issue."

This all assumes that reliable, relevant, and concise background information is available online like it usually is at Wikipedia. In fact, "pedia" in general could be broken out as a concept, defined as a topically indexed background providing online source. Nielson continues:

"...the Web is a linking medium and weknow from hypertext theory that writing for interlinked information spaces is different than writing linear flows of text. In fact, George Landow,a Professor of English literature, coined the phrases rhetoric of departure and rhetoric of arrival to indicate the need for both ends of the link to give users some understanding of where they can go as wellas why the arrival page is of relevance to them."

The History of Journalism's
Inverted Pyramid

"Writing from the Top Down: Pros and Cons of the Inverted Pyramid" is a great little critical history of journalism's conventional pattern of writing:

"The conventions of the inverted pyramid require the reporter to summarize the story, to get to the heart, to the point, to sum up quickly and concisely the answer to the question: What's the news?"

This can split narratives unchronologically:

"The inverted pyramid, its critics say, is the anti-story. It tells the story backward and is at odds with the storytelling tradition that features a beginning, middle, and end."

Important background information can often fall off the end of the article if there is not enough space. Despite these failings, the inverted pyramid remains the pillar of western journalism, but will this change?

Thursday, January 12, 2006

Profiling VOA's Special English II

80% of VOA vocab is consistently within the first 1000 words (K1), so the VOA "Special English" is truly simple. I profiled the last four months of economic-business articles from VOA [list of articles].

Can you write an article with the depth of the Economist with Simple English though?Concordancing could tell what techniques can be used to keep grammar and vocab simple. I'll have to rewrite an Economist article in "Special English".

I've been noticing that the profiling software does use a lemmatiser or stemmer (eliminate grammatical inflections), like any good search engine would, before it counts and lists unique words. For example:

airline_[1] airlines_[5]
campaign_[2] campaigned_[2] campaigns
cancel_[3] cancelled_[1] (not cancellation_[1] though, which is in the same word family)

It seems that the measure of unique words that a student has to deal with in a text (type-token ratio) should not depend on grammatical inflection. Should it depend on the part of speech? For example the word family:

employed_[6] employees_[8] employer_[1] employers_[5] employment_[3]

maps to the one unique counted word:

employ_[23]

Will have to do a little bit of research.

Tuesday, January 10, 2006

Profiling VOA's Special English

The Voice Of America (VOA) has been using its own version of simplified English since 1959 and their archives are available to the public. I could only find one economics-business related article quickly though: "Simple English: American Agriculture: Shrinking but More Productive" [Profile]

The most interesting recent economic-business article I could find on the whole VOA site was about China buying into African oil, but it doesn't say it was written in special English: "China Oil Giant Reaches Deal to Buy Major Stake in Nigerian Oil Field". [Profile]

Comparing the profiles of the two articles:
China oil, not simplified: (80,4,8,8) vs.
American agriculture, simplified: (64,4,8,24)
where (first1000words,second1000,academic,off-list)

Clearly, the VOA special English article routinely uses a lot more simple vocabulary, but note that 80% is near the 77% of the Wikipedia article I profiled. A more detailed study is obviously needed both characterizing lexically and grammatically these different forms of simplified writing and maybe also some objective computer-based measurement of how quickly students can read and understand these different kinds of writing.

Sunday, January 08, 2006

Simple English Vocabulary Profile

Finally tested the Vocabulary profiler on one of Wikipedia's Simple English pages. The first one that I could find that was relatively complete was the one on human rights.

Most of the simple words used on the page were caught by the profiler's lists. The words that weren't caught were either reasonable new words you'd have to master to read about human rights, have something to do with philosophy, or are proper nouns or mistakes:

"abuse_[1] abuses_[5] american_[1] asylum_[1]
biologists_[1] condemn_[1] covenant_[2] covenants_[2]
disability_[1] enlightenment_[1] etc_[1] european_[2]
france_[2] french_[1] georg_[1] hegel_[1] innocent_[1]
jail_[1] john_[2] locke_[1] nationality_[1] numbersome_[1]
organism_[1] organisms_[2] protest_[1] rightsbecause_[1]
scriptures_[1] stuart_[1]"

One thing the page does lack is any concrete examples of human rights abuses which seems pretty important because it's the very non-abstract cruelty of these acts that make them so reprehensible.

Just for comparison, here is the main non-simplified Wikipedia page for Human Rights and here are the profile stats on which lists the words are caught by:

Simple: (K1:88%,K2:4%,AWL:4%, Off-List:4%)
Not Simple:(K1:77%,K2:4%,AWL:11%,Off-List:8%)

Not really very different! At least as far as the vocab is concerned.

If the "Simple English" in WIkipedia is not much more simple than the authentic real-life English in the main Wikipedia should we really be investing time time with these articles, or maybe we have rethink exactly what we mean by "Simple English" and then measure and control simplicity.

Thursday, January 05, 2006

Wikipedia and the rise of Participatory Journalism

Rules like NPOW (neutral point of view) that Wikipedia established have made Wikipedia a reliable place to get information on the internet which is often a very unreliable place to get good information.

Wikipedia is being used in Hong Kong as a tool to teach journalism and how to write "in a fair and balanced manner for an international audience. By collaborating online with others, students can interact with each other when writing, and receive advice and corrections from complete strangers around the world within minutes of making contributions. With students for which English is a second language, this provides a highly interactive experience for learning copy editing and grammar usage."

Wikipedia could also become an important repository of simplified texts for English language learners and for disseminating the practice of extensive reading of simplified texts advocated by experts ranging from Krashen, Richard Day, and Nation. Most newspaper articles need the sort of additional background information that Wikipedia can provide.

This essay also comments on the rise of the Chinese version of Wikipedia which is still behind Esperanto in terms of content. Hopefully, one day there'll be a simplified Chinese Wikipedia too for language learning content that goes beyond the traditional checking into a hotel, a trip to the post office, ordering food, friends having a banal conversation, etc.

Simplified English Texts

How can I measure how simple a text is? One way is to count unique words. Simple metrics like the Flesch readability formula only provide a very rough rule of thumb. What about comparing a text with similar texts that you already know are simple?

Texts from graded readers like the Oxford Bookwork series provide a nice baseline for comparison, but they are copyrighted. Maybe articles in the Simplified English Wikipedia could be used, although when I took look there weren't many articles yet and some people were writing their articles with Ogden's Basic English which actually distorts the English language sometimes, not a good idea.

The vocab profiler can be used to do the comparison. Start with a corpus of simplified texts and compare the profile on these simplified texts with authentic texts from newspapers.

Anyway, simplified vs. authentic texts is a very murky area. What is simplified? Don't you lose information with simplified texts? Next, I have to create profiles for some simplified texts and compare them with the authentic text profiles I already have.

Tuesday, January 03, 2006

Vocabulary Profiling II

I extracted 10 words from a newspaper article to focus on in a vocab lesson. Here's the vocab profile I'm working from. The topic is "The Police" and the most fruitful place to look for new words to teach was among the words that were not caught by a list. I cooked up this TV dinner of a lesson based on a test prep book.

Sunday, January 01, 2006

Vocabulary Profiling

Just used Nation's vocabulary profiler on a newspaper text [original text, results]. The profiler is supposed to show you how difficult the vocabulary in a text is. If you were writing one of those simplified vocabulary graded readers like Oxford Bookworms that only uses let's say 1000 words, you could use this software to keep on track and control difficulty. This particular version at The Compleat Lexical Tutor also color codes the text to help you.

Now I have a complicated printout to interpret, ouch! Most of the AWL words are not the sort of words I would define for my students, too easy. Maybe a domain specific list, e.g. for economics, should be used too. The profiler allows you to add vocab lists. Here are the words that weren't in the lists:

"baht baht baht baht baht baht baht baht bangkok cane chakramon chakramon chakramon csb csb csb embarrassing ex fertiliser freight frustration hike hike hoarding inflation kilogramme kilogramme longstanding pesticides phasukvanich plaguing policymaking provinces quit reportedly retail retail shortages skyrocketing smuggling tackle tackle wholesalers"

I've define the expressions "hike prices" and "hoard" for students recently. I'd define "tackle" too. Detecting common collocations would be a nice add-on feature.

Special file formats for lessons

The Guardian uses a special format that is easy to read in emails. A short SMS message could provide everything necessary to improvise a lesson. (For a spoof on minimalist teaching see "The Ten Rules") In some environments, even in today's technologically sophisticated world, computers are unavailable or too much of a hassle to use.

Each file format can make different aspects of using a lesson easier. PDF files make printing out worksheets easier. An interactive self-correcting online elearning activity can be used at anytime without a teacher. Flash makes certain features of these activities easier like drag and drop. HTML pages are easy to read online.

There is no reason a program can't be written to reformat lessons in several different convenient formats.

Saturday, December 31, 2005

Vocabulary Level Tests

These vocab level tests would be useful for student goal setting in self-access. I wonder whether similar tests for more specialized vocabulary, for a specific discipline like economics or a type of newspaper article, like articles on freedom of speech and defamation, let's say, might help with student goal setting as well. It's really nice to see the ideas from Nation's book online like this and ready to use with students.

To test it out, I absentmindedly, without concentrating much, did the Version A 10,000 vocabulary test and just slipped by with 83%. My point here is that these tests are difficult and require concentration even by heavy reading native speakers. I might have students actually study the words in the test before I had them do the test and recycle the words with further tests that covered different senses of the word and homonyms. One of these days, I'm going to write tests like this for the Burmese language.

Friday, December 30, 2005

The Compleat Lexical Tutor

This site is "a vast range of resources for both teaching and learning vocabulary and grammar." There are so many useful resources that only reading this extensive review can really do the site justice.

I just want to point out that much of the site supports the ideas in my favorite book, Nation's Learning vocabulary in another language (2001, Cambridge University Press). I'll investigate the mind-boggling functionality of this site and how it relates to teaching English with newspapers in future posts to this blog.

Defamation in the news

Defamation is a topic that is appearing over and over again in Thai newspapers nowadays, so it's nice to see a lesson plan someone has come up with in Great Britain on this topic.

The worksheet is divided into three clear sections (called "stages"). It would be nice, if each section clearly indicated what it was about with a heading.

Section one defines defamation, but the actual definition is probably a lot more complex than the one given here. It was in the Thai article I wrote a lesson for recently, at least.

"Cases" would be the best title for stage two. Students read and discuss cases and debate whether they constitute defamation or not. More realistic cases or at least more complex and realistic facts taken from real cases would be nice here.

Stage three involves re-inserting paragraphs that have been taken out of a newspaper article on defamation, a good silent self-access activity. This activity might be easier to do for the student if the article was cut into pieces that could be arranged into different sequences, so the student could check which made better sense. It is certainly impossible to do even for me in the given PDF file. I could see this being a good drag and drop activity online with Javascript too.

Wednesday, December 28, 2005

YAETA
(Yet Another English Teaching Acronym)

Finally found where they hide all the information on English language teaching in Wikipedia. Apparently, the acronyms I already know (ESL, TESOL) are not enough and we need more acronyms to amaze and bewilder our colleagues with. There's a long list at the bottom of the page.

English as an Additional Language (EAL), I guess, is not the same as ENAL (English as Not an Additional Language). Maybe we're all EEL's (English as an Expat Language) because we speak slower and use less vocab.

Many relevant entries are filed more reasonably under language acquisition, but there's some obvious advertising that they don't let into this category like the Pimsleur language learning system or accelerated language learning, see discussion, but to be fair to Pimsleur, Nation cites Pimsleur's "Forgetting Curve" in his classic book on teaching vocabulary (See review and also see Waring's Basic Principles and Practice in Vocabulary Instruction).

Jigsaw reading of newspapers

What is "jigsaw reading" really? It seems to be more of a general principle that can be applied to the teaching of reading, than merely jumbling and reordering texts (the definition given by this crib sheet). I looked in vain for a general definition. Let me hazard some possible definitions: 1) the verbal sharing of newspaper texts, 2) information gap speaking activities where each partner has a related piece of news that they have to paraphrase and share with the other partner. They have to ask the other student questions because: "any one student only has only a portion of the information needed to complete a task."

The general principle of jigsaw reading could make reading long newspaper articles manageable by cutting them into pieces assigned to individual students or groups of students.

A short British Council article suggests a pair of students explain articles on related theme to each other or two halves of one article.

The longer detailed article at Iteslj linked to above really takes a lot of concentration and acting out to understand but really helps you understand how to use this kind of activity with a class.

Here's a good example of the jigsaw principle used in a broad sort of way with with L1 students, each L1 speaker explaining an issue to the other students after they watch a film about animal rights.

Teaching reading:
Important points
in easy to remember form

Here's the perfect little crib sheet on reading to help you prepare reading lessons. Looks like it was originally notes to study for a master's class in TESOL.

Tuesday, December 27, 2005

Javascript for language elearning

These little snippets of Javascript code for common language teaching tasks on the internet have been around forever. They include: true-false, multiple choice, matching, feature of category identification, short answer, self-evaluation, cloze, editing, sentence generation, hypertext, memory-spelling, and timed reading. Here's an interesting paper on a French vocabulary tutor with lots of Javascript code snippets.