Friday, March 16, 2007

Real Time Search

When I was reading on reader, I found it quite not up to date. There might be two reasons. One is that reader did not get the latest feed, while the other is the feed itself is not updated.

As information explodes, a huge number of bytes were generated day by day, hour by hour. With newspaper, we know what happened yesterday; with TV, we know what happened today; with Internet, we could know what happened just now. However, these are all limited to public concerned events. They are not what we are most interested in.

When search engine came up, we were able to find what we want by typing some keywords. When searching, we got what was being. The information we searched on was crawled by robots days or even months ago.

One approach to solve time critical problem is subscription and notification. In this way, every subscriber is notified whenever an event happened. It is better than "publishing and waiting till someone sees it" method. But still, there are problems. One is that the subscription and notification server would become a bottleneck. Another is that we could not know what to subscribe before it happened. To solve the former one, a more powerful hardware or smarter software on cooperated computers might do. To the latter, only can we believe there is a wizard.

So, here comes the necessity of existence of real time search.

Wednesday, March 14, 2007

Recovery Oriented Computing

Mr Patterson, UCB, introduced Recovery Oriented Computing. In the article, Recovery-Oriented Computing (ROC): Motivation, Definition, Techniques, and Case Studies, an undoable email system was presented. This reminds me of Gmail which is the only one that I know, that provides a 'undo' function.

Recovery Oriented Computing is not a new idea. Things like checkpoints do just the same. Reliability or Dependability is quite important on High Performance Computers. However, it is equally hard to implement a fully reliable system that meets the performance requirements. Then checkpointing and rollback is an option. Recovery Oriented Computing showed why it is necessary and how to implement it.

On the other hand, Gmail is a new technology based product. Many big ideas like tagging, search not sort, dialog, combining with IM and AJAX were added to help improve performance, flexibility and change the way that emails were used. Compared to old plain text email system, which I believe was still used on many Unix systems, Gmail has countless advantages.

Either you do things ahead, or you do them faster. Theories are often ahead of times. They lead the technology. Theories, especially in engineering, require the support of experiments. Such experiments might be very difficult or costly. Thus supporting is somewhat important. Technology makes use of theory. The sooner you make use of advanced technology, the more you will benifit. Racers running faster reach the finishing line earlier than those slowers starting at the same line.

Wednesday, March 7, 2007

Distributed Hash Table

Distributed hash table, or DHT in short, has become more and more popular in P2P networks.

Refer to