Monday, December 24, 2007

Merry Christmas & Happy New Year

Merry Christmas and Happy New Year to all, if any, readers, no matter it is in winter or in summer at your location. :D

Wednesday, December 19, 2007

iocharset, deprecated option but might be of use

Thanks to various encoding systems which makes the world diverse.

First of all, this is extracted from manual from mount on Debian Linux
Mount options for ntfs
Character set to use when returning file names. Unlike VFAT,
NTFS suppresses names that contain unconvertible characters.

New name for the option earlier called iocharset.

utf8 Use UTF-8 for converting file names.

Typically, you don't need the iocharset option when you are using an UTF-8 locale. But when it comes that you have to transfer some ugly-named files to other systems that don't have an UTF-8 support (or when you can't turn it on, like on Windows), the iocharset option turns out useful. What I was doing recently is to transfer files from an old PC to a new laptop. The files are in an NTFS partition, but for some reason, the driver is used as a secondary HDD on a running Debian Linux, which means I can not unplug it or reboot to windows. The new laptop is running windows. To solve this problem is easy. Mount the NTFS partition and transfer files using scp. (Installation of scp clients on the laptop is necessary though). As my windows is using codepage 936 or GBK as default, mount the partition as follows
mount -t ntfs -o iocharset=cp936,umask=022 /dev/hdd1 /mnt/ntfspart
and that is all. And the command (pscp is a scp client from putty)
pscp -ls someone@someaddr:/some/dir
returned well formed results, which will makes me happy.

Note that you might mess up the file names on your NTFS partition when writing data using an incorrect codepage. Please make sure you know what you are doing before striking the enter/return key.

Friday, December 7, 2007

Separation of Users and Applications

Service - Separation of Users and Applications

In the very past, we bought a new computer, and then install the operating system and applications and start using it.

Recently, we bought computers with bundled applications, many of which unnecessary or unwelcome, and start to uninstall them and install what we really need and then start using it.

In the future, can we just buy a new computer and start using it right after?

Accept it or not, software upgrades fast, not slower than the double of power of CPU. We have new features every day, every hour, every minute. However, customers can not afford such upgrading expense. This is especially true when there are huge number of users, or the product is for general public. Instead of letting users dealing with complex and boring installations or upgrades, why not just let the talent computer engineers deal them more efficiently.

Internet and web service provides a possible way to separate customers from software applications. However, we don't care whether it is web based applications or not. And it is not likely that all applications will be web-based in the future. Hardware could only be replaced manually. However, software can be upgraded 'forever' with a connection to the cyberspace. No matter it is operating systems or not. Theoretically, a running process can even update itself.

Up till now, there is no OS that can get upgraded automatically without causing trouble and without human beings' interfere. Debian and Ubuntu are among the not-so-perfect ones. They are in so far the best ones. You can get your software upgraded with ease most of the time. Although they are still far from perfect.

Compared to web based applications, operating systems are far more complicated. One reason is that no matter how complex the web pages are, there is only one way to interact with them, i.e. through browsers or explorers, while a operating system has to deal with all kinds of devices as well as supporting various applications. Another reason is that there are existing standards for web pages or web services. This makes things easy and coherent.

Thursday, October 25, 2007

Possible Future of Google Gears

I'm not a user of google gears up till now, so what I said might be incorrect.

Google gears will cache information or data on your hard disk, and thus the data could be accessed offline. Not only web pages are cached, but many application data could be also cached. The behavior is like a proxy server but local only and powerful.

Online or offline might be indistinguishable, or not important. One might continue to work without network for certain amount of time. An ordinary web visitor might not sending requests every second. Instead, it might be several minutes before he/she jumped to the next page. There are types of services that requires continuous interactions, while there are more services, like archives or documents, that could be viewed offline.

It might be possible to make use of others' cache. For example when the server is down, you are referring to a product manual on that website. This is like a P2P network proxy. There are problems like privacy, consistency, and reliability though. Currently, the function of google cache is quite limit.

Tuesday, October 23, 2007

More about is based onGoogle Apps. The services used include mail, calendar, document and blogger.

Most settings are user friendly on dashboard. However, blogger will require separate setup. Go to, and create a new blog and navigate to publishing tab in settings. Follow the instructions there. (Maybe this could be a feature in Google Apps in the future. ;))

Also, it is better to have fixed URL (instead of redirection) for mail, calendar, and document services.

Monday, September 10, 2007

Old posts imported

Now, old posts were included.

There are 3 ways to get notified:
  1. Visit
  2. Feed (atom):
  3. Get mailed: request to join a secret list. ;)
Also, there are 3 ways to post:
  1. Log in to the web pages
  2. Through APIs
  3. Send mails to a secret mail box.

Wednesday, September 5, 2007


If you visited, you might know that many of Google's services come with a web API. With these APIs, one could 'easily' develop a desktop application or a website based on Google's existing services. Thus, Google acts like a web service provider and developers make use of these powerful services.

The based-on web service programming model might replace OS oriented programming in the near future. The browsers may not replace all applications. However, operating system will become less and less important and services become independent of OSes.

Wordpress to Blogger

I wrote a small Java program to import posts from Wordpress to Blogger using blogger API. Now the program seems OK, but due to the "Word Verification" or Captcha problem, the migration could not be done automatically within one day.

It seemed that Blogger APIs does not support per post captcha currently. And the 50 posts per day limit seemed a little tight.

When posting with emails, I got the following message:
You have exceeded the the allowable number of posts without solving a captcha.

Thursday, April 26, 2007

Carrot2 and meX Search

Carrot2 is an open source framework for building search clustering engines.

meX Search is an application of Carrot2. meX provides a new look and feel of search engines.

Compared to traditional search engines, meX is visualized. meX starts from a search box as usual. However, results are categorized and presented to users as a graph, bubbles connected to the starting search box. Each bubble, representing one category, may contain one or more results with hyper links. Once a category is clicked, a new search start with the name of the category as the keyword.

meX takes Flash as a way for presenting the results, which means that a plugin is necessary. It should not be a problem (except that you are a member of ...) as flash is widely used on Internet. By the way, Wallop is another good example of flash applications.

Friday, April 20, 2007


j down
k up
l right
h left
i insert
a append
w word
b back word
ZZ save and quit
g go
gg go to very beginning
G go to end
/ search
? rewind search
o add a line
dd delete a line
^ front none blank
0 front
$ end
\ word interval
u undo


Thursday, April 12, 2007


What will this web page look like in ten years?

Windows User Tips

  1. Backup! Backup! Backup!

  2. Upgrade to Windows Vista as soon as possible.

  3. Do not login as administrators.

Thursday, April 5, 2007

Language Recognition

Here, language recognition refers to distinguishing English from Spanish, etc..

Recognition of Languages seems not a big deal, but why should many translating tools asking users to select the languages.

Online translation websites like Babel Fish and Google Translate both have a language selection box.

One solution, which is quite straight forward, is to recognize the words or characters of different languages. Those words with a high frequency are very good clues. Different encoding also helps recognize languages.

Friday, March 16, 2007

Real Time Search

When I was reading on reader, I found it quite not up to date. There might be two reasons. One is that reader did not get the latest feed, while the other is the feed itself is not updated.

As information explodes, a huge number of bytes were generated day by day, hour by hour. With newspaper, we know what happened yesterday; with TV, we know what happened today; with Internet, we could know what happened just now. However, these are all limited to public concerned events. They are not what we are most interested in.

When search engine came up, we were able to find what we want by typing some keywords. When searching, we got what was being. The information we searched on was crawled by robots days or even months ago.

One approach to solve time critical problem is subscription and notification. In this way, every subscriber is notified whenever an event happened. It is better than "publishing and waiting till someone sees it" method. But still, there are problems. One is that the subscription and notification server would become a bottleneck. Another is that we could not know what to subscribe before it happened. To solve the former one, a more powerful hardware or smarter software on cooperated computers might do. To the latter, only can we believe there is a wizard.

So, here comes the necessity of existence of real time search.

Wednesday, March 14, 2007

Recovery Oriented Computing

Mr Patterson, UCB, introduced Recovery Oriented Computing. In the article, Recovery-Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies, an undoable email system was presented. This reminds me of Gmail which is the only one that I know, that provides a 'undo' function.

Recovery Oriented Computing is not a new idea. Things like checkpoints do just the same. Reliability or Dependability is quite important on High Performance Computers. However, it is equally hard to implement a fully reliable system that meets the performance requirements. Then checkpointing and rollback is an option. Recovery Oriented Computing showed why it is necessary and how to implement it.

On the other hand, Gmail is a new technology based product. Many big ideas like tagging, search not sort, dialog, combining with IM and AJAX were added to help improve performance, flexibility and change the way that emails were used. Compared to old plain text email system, which I believe was still used on many Unix systems, Gmail has countless advantages.

Either you do things ahead, or you do them faster. Theories are often ahead of times. They lead the technology. Theories, especially in engineering, require the support of experiments. Such experiments might be very difficult or costly. Thus supporting is somewhat important. Technology makes use of theory. The sooner you make use of advanced technology, the more you will benifit. Racers running faster reach the finishing line earlier than those slowers starting at the same line.

Wednesday, March 7, 2007

Distributed Hash Table

Distributed hash table, or DHT in short, has become more and more popular in P2P networks.

Refer to

Friday, January 19, 2007

Event Listener, a Java Pattern

The book, Java Threads, describes 'event listener' as a standard Java pattern. Has anyone heard of it?

Tuesday, January 16, 2007

Web Service Containers

In web services, request and response are common. However, a stateless web service (actually, should it be called static web pages?) could not provides something as easy as a counter that most websites have (or should they hide the counter for certain reasons). You might be familiar with dynamic pages or PHP, ASP etc. They are just the front end of a stateful web service, or to be more precise, connectors. They connect customers' requests to containers.

What is the relationship among containers, resources and services?

A service provides an entry to certain resources, and services and resources are all in containers. A typical container is Tomcat. Sometimes, standalone makes it easier for installation, but confuses terms. A standalone Tomcat seems more than just a container. It provides what a typical web server provides, at least.

Apache web server (Apache httpd) accepts modulars like PHP modular. Then, could PHP modular be called a container? The motive of PHP is to provides a flexible way to generate dynamical web pages.