Tuesday, November 23, 2010

Check your assumptions at the door

Finishing graduate studies is hard. At Tampere University computing science department, there are about 100 graduate students. Every year about 5 of them graduate. This means that every PhD thesis requires about 20 student enrollment years. For every student who makes it, about 2 others fail. What makes me think I can do it despite not working on it full time?

What are my strengths and weaknesses?


The worst weakness is that I plan to work while doing graduate studies. This restricts available time.

My stregth is that industry experience has given me solid programming routine. Any research plan should rely on this strength in order to be realistic. Competitive disadvantage has to be balanced out by a competitive advantage. Otherwise risks grow.

How will I find time and energy?


A few years ago starting regular exercise increased my energy levels permanently. Thus far I have poured this extra concentration only into Chinese. However, Chinese is moving from the active study phase to the slow and steady vocabulary build-up phase. This liberates the time for graduate studies. Without a clear plan how to use this time, I'll just waste it, getting nothing in return.

Minimizing the amount of work is the second path. Both computer-aided language learning and bioinformatics are fields, where it is important to cross the cultural gap between two disciplines. Relatively modest skill in programming and mathematics is enough compared to pure computing science topics like model checking. This means that smaller number of hours is enough to produce new discoveries.

How will I benefit from graduate studies?


First of all, I don't expect to get paychecks from university. Applying for a university position would be a bad choice since industry salaries are bigger than researcher salaries. Getting a good salary from research requires a teaching position. You have to prepare for that already while studying by working as a teaching assistant. When I studied, I prepared for industrial work.

Getting a PhD degree will make it possible to apply to new kinds of jobs with higher pay. If I am able to come up with a popular CALL website, it will continue to mill small amounts of advertisement income for years. Setting up such a site requires a big initial effort but little maintenance effort after that. It creates an economic incentive to make the site high-quality from the start.

Monday, November 22, 2010

Chinese character exercise for N900

Summary: This post introduces Bezca, a CALL software prototype for training Chinese characters. It showcases that (1) the technology in Finnish Annotator can be modified to many different purposes, (2) having a day job does not prevent me from writing CALL software with realistic goals, and (3) context is fairly easy to integrate to any CALL software, as long as you take an uncompromising attitude towards the need for context.

In Bezca, you train Chinese characters by drawing the strokes with a stylus, finger of plectrum. Correctly drawn strokes appear to the screen as you draw them:



If you don't remember the character, you can look at the hint which also tells the next stroke:



Clicking "Show Examples" displays dictionary words and examples sentences which use the character in question.



In the example browser, if you don't know some word in an example sentence, you can just click on it and enjoy further examples about that word. This way, you can browse examples in an endless chain, Wikipedia style.



Bezca also contains a spaced repetition system. Pressing "This was easy" shows the exercise again in a week. Pressing "This was hard" shows it again in a day. After that, it uses exponential time gaps to decrease or increase the period based on user responses.

In the beginning, the program calibrates difficulty to suit the student's skill level. It shows some characters and asks the user to say if they are suitably difficult or too easy. This way, students can go directly to material which is new for them.

This is a prototype and not yet mature enough to be distributed. It contains only 180 characters. Installing requires a memory card. I don't currently have any plans to take it further, because demand is likely to be small, the use of CEDICT would make it a copyright violation to ask for a price, and it would take a lot of effort to input 1000 characters. However, I'm happy to demonstrate it face-to-face to anyone, especially CALL researhcers.

Friday, November 05, 2010

Action points and plan B

1. I've been working on an N900 port of the character drawing exercise.
It demonstrates that FA technology is still valid and can be reused. Spend 2 week finalizing the proof-of-concept prototype.

2. Post an introduction to the N900 port.

3. Write a plea for CALL research partners into this blog.

4. Make a list of CALL researchers by browsing the researcher lists in Finnish university web pages.

5. Write cold call mails for CALL researchers, tell them about my track record in CALL, and ask if they want to talk about research. Add links to my thesis and blog.

If step 5 produces no contacts, it means that cold calling doesn't work (what a surprise.) In that case I have to use personal contacts to get a research topic for graduate studies in 2011. Usually people don't talk about their work, including researchers. This leaves me with few options.

6. Contact Yoe to ask for a research topic in bioinformatics. This post hints that she might have a suitable topic for someone trained in math and programming. I don't know Yoe, so to avoid cold calling, I would ask recommendations from Vera and Janka. ("We have tracked Simo for years and he is what he claims to be. If you have a suitable research topic in bioinformatics, you'd probably benefit a lot from assigning it for Simo.")

Good: Bioinformatics has reputation as down-to-earth and useful branch of applied researh. I'll learn new things because of the 'bio' part.

Bad: No earlier track record. Not enough background to evaluate if the research topic would pass the scrutiny in Hamming's advice.

If 6. fails:

7. Ask The Scientist for an applied research topic in model checking based on this post.

Good: I know The Scientist personally. Also FA already made me familiar with finite languages and state machines. For example, I implemented deterministic state machine minimization to make some vocabulary state machines faster to handle, while The Scientist is working with algoritms to simplify nondeterministic state machines.

Bad: When studying, we hade a course about DisCo and temporal logic of actions. While the theory part was a fun trip to a different worldview, DisCo toolset left a really bad taste in my mouth. It had "ivory towery" feel: it could never become useful for solving practical problems, no matter how well the researchers reach their research goals.

I'm aware that The Scientist doesn't work with DisCo, and it's just my stupidity that I don't understand the field. After all, he does writes more often, for more readers, about a wider range of topics and so on. But that does not change the fact that it would be insane for me to do research in an area where I don't understand the big picture.

Asking The Scientist for a research topic may lead to a very embarrassing situations where I have to say no even if he gives me everything I ask for.

If 7 fails:

8. Write a plan for 2011 which does not include graduate studies.

Thursday, November 04, 2010

Blueprints for a translation sentence website

Summary: The previous post pointed out a huge gap in CALL: there are no tools to practise writing. This post addresses it by proposing a translation sentence website.

Why translation sentences?


When people start writing in a foreign language, they first form sentences in their fluent language and then translate them piece by piece. This component behavior can be conditioned. This lowers the barrier to write as the student is already fluent in syntactic structures and only needs to slot in the phrases from his domain area.

Aretae, says that consructivism is the most overlooked aspect of teaching.
Take martial arts for a couple years, and watch how much the Sensei DOESN'T explain...but rather makes you practice until you have an internal representation of the system...then adds 3 words to clarify your mistakes.

Translation sentences are constructivist in the sense that if you have problems with syntax, they teach you syntax. If you have problem with prepositions, they teach you their correct use. If you have problem with word inflection, they improve that area.

Checking correctness


Translation sentences have multiple correct answers. It takes someone who is fluent in the language to classify answers as correct or wrong. Simple string matching is out of question. Finnish Annotator had an option to answer flashcards by writing, and string matching was insufficient even for words. In the end, the site normalized away substantive and verb articles (a, an, them, to) and used edit distance to ignore typos in known-language words.

First of all, instead of listing correct answers there should be word-level regular expression notation to denote different options. This avoids combinatorial explosion when, say, one slot has 3 correct phrases and another has 5.

Since many phrases have synonyms, the site can make sentences more ready from the first try by supporting them. For example, sentences "There is plenty of snow" and "There is a lot of snow" are both correct. This could be denoted with "synonym:'plenty of'".

There should also be a list of wrong answers, which contain common errors. This enables high-quality constructivist feedback.

Preparing for unknown


Nobody can guess all correct and interestingly wrong answers to a translation sentence. The site should deal with it instead of denying it.

First of all, it needs a social web interface for fine-tuning the sentences. This means adding new acceptable answers and new explanations for wrong ansers. This interface is open for authorized users whose language skill has been verified. The site saves all unclassified answers to translation sentences and sorts them by frequency. If many students give a similar wrong answers, explanation should be added.

This social aspect also means that it is not enough to solve the technical challenge of building the website. Before a single line of code is witten, there must be confirmed support from a steering group of CALL researchers. Writing such a site is a big effort and makes the programmer blind to simple but obvious shortcomings in it. The research group would review and criticize away such defects. The group would also kickstart sentence writing and introduce it to students until it reaches a critical mass of having enough sentences to be useful. The same people would be adding correct and wrong answers.

Automatic ways to prepare for unknown


Finnish Annotator used edit distance to ignore typos. This can be applied to translation sentences in two ways. Firstly, correct sentence structure can be verified even if songle words are mistyped when edit distance is used to compare words.

Secondly, sentence level edit distance can point out added or missing words after identifying the closest correct or wrong answer. Missing words are always errors, since problem authors should mark optional words as optional.

Added words may be errors, but they may also be just a symptom that the sentence is new and not enough variations have been added. The site should prepare for it by making a list of words, which are usually innocent when added. This list may need context restriction, so that certain added words are known to be innocent in certain contexts.

Replaced words benefit from similarity comparison. The error is likely to be small if the word is a verb both in a correct answer and the student's answer, especially if the verb is inflected in the same way.

In 2006 I browsed some books about syntactic parsing, where a sentence is converted into a parse tree. The methods looked very difficult to implement, because syntax is much more complex than word inflection. For example the following sentences have the same meaning, but sentence-level parsing is necessary to identify it automatically: "If it rains tomorrow, the party is held inside." and "The party is held inside if it rains tomorrow." Sentence parsing is also useful for rating wrong but unclassified answers: answers which can be parsed into a parse tree are less wrong than syntax-violating sentences.

What kinds of sentences to train?


Simple sentences, which deal with one topic at a time. The topic may be some syntactic structure, preposition, time phrase etc. Simplicity leaves less room for unexpected variation, therefore giving better feedback.

Prior art and differences to flashcard programs


I'm not aware of any significant academic prior art since PLATO, a groundbraking CALL system from the 70s. Unfortunately the details of PLATO's translation sentence engine are not available. Their technology may be obsolete, but the people making it faced the same challenges as we face today. They were not stupid, and most importantly they established a feedback loop where they improved based on experience, while I'm just theoretically speculating.

The main difference to flashcard programs is that a single translation sentence is "difficult and slow" while a single flashcard "easy and fast". Therefore Goproblems.com, a site for training tesuji skills in a board game, is better prior art than Anki. Also for goproblems.com, you hear anecdotes where people voluntarily bang it for hours, stopping only when the remaining problems are too easy or too difficult.

Goproblems.com has an automatic rating system for both users and problems. Problems are rated based on how many people get them right. Users are rated based on how many problems they get right. The rating scale is given in kyu/dan level so that problems and users should have similar rating.

Item response theory tells the mathematical formulas for implementing such ratings, although I don't know which exact method goproblems.com uses. The previous chapters have dealed with methods to distinguish grave errors from small typos and errors in problems themselves. Rating system based on item response theory benefits from having more information than just pass or fail.

Chatbots and communicative language teaching


How could each translation sentence also be meaningful communication? One way is to give the student a communicative task, for example "order a flight ticket to Melborne", and to write a chatbot to hold the other end of the conversation.

This would explode the number of correct reactions. The user could start by greeting. An order for "ticket to Melbourne" could be formulated in tens of different ways.

The details about time window, passanger class etc. could be given right away or the chatbot would have to ask them. It is no longer enough to just parse inflection and syntax, we need to pay attention to meaning. Hard-coded ontology would be needed for each chatbot.

In the last post I said that CLT is nice to have but comes with an expensive price tag. Chatbots are a good example. Each one would take a long time to write. Their feedback would have inferior quality compared to translation sentences. Most importantly, they would not scale to cover large amounts of material.

They would not be enough to train students to write.