Sunday, December 21, 2008

Talk to foreigners (even aliens) in your language seamlessly

Proposal
Build an automatic translation mechanism into instance message system (and other Internet communication system), as if a simultaneous interpreter were there.
It would be exciting if you master three languages. However, one could not master all languages. Even if such an interpreter exists, hiring him/her would be expensive. Internet helps people communicate easily from every corner around the world, and if it were an even probability to talk to anyone on earth, the chance a talker speaks foreign language is much higher. If we have such an automatic translation system, as if a simultaneous interpreter is ready, negotiation should be much simpler.

The problem currently is that a machine could not translate as good as an average interpreter. Languages are not one-to-one mapping, and machines are not good at analyzing and choosing which words to use and in what order. And this is why interpret is still a career.

Whenever a computer is not good at something, we could try to train it in hope that it will eventually learns and masters it. The source of training material would be from thousands of millions of people. As in image recognition, tags from Internet users helps classify and recognize pictures a lot. If, in one day, computers could seamlessly translate one language into another, we would be happy that we no longer need to learn a second language, and we would be equally sad that we will never drive to learn other than native language.

Currently, machine translation in instance message system should be more than experimental, but at least it could provide some information which might be helpful to international talkers. And, have fun to laugh at silly computers.

Wednesday, December 17, 2008

File names you can NOT create on Windows

Try to create a file named NUL.txt on Windows XP, and you probably get an error message.

Many characters are not allowed for file names on Windows. This includes
< > : " / \ | ? *

Besides, many reserved device names should be avoided. This includes
NUL, CON, PRN, AUX, COM1, COM2, ..., LPT1, LPT2, ...

Also, using the device names unintended might raise security vulnerability.

Saturday, December 13, 2008

Dimensional Modeling vs Interval Tree

The concept Dimensional Modeling is like the Interval Tree (Segment Tree). Both speed up queries by storing information at different levels. The difference is that in Dimensional Modeling, levels are defined by users, while in Interval Tree, levels are usually defined by a complete binary tree. Interval Tree usually solves problems on one dimension, but it could also be applied to 2D problems. One dimension is not a limit.

Sunday, November 23, 2008

Quotes

Quoted
Quotes are widely used in scripting languages, including (but not limited to) Bash, SQL, Python, Tcl, Perl, PHP, JavaScript. However, there is no universal rule for quotes. Most languages support single quote (') or double quote ("), while others have their own quotes like triple quote (''' or """), or braces ({, }). When one should code in a combination of different languages, the quotes would definitely confues programmers, not mentioning variable substitution, regular expression, etc.

Thursday, November 13, 2008

Different outputs in Tcl - Stdout vs Return Value

When both set a {hi} and puts "hi" echo a "hi" in an interactive tcl shell, what is the difference between them?

Clarification on "output"
1. standard output
puts command will output to stdout by default.
% puts "hello"
hello
2. return values
set hi {hello} has a return value hello and will echo back to user
% set hi {hello}
hello

To make things clear, look at this example.
% set ret [puts "hello"]
hello
% puts $ret
(blank)

So that the return value from puts is blank.

Compound statements
Let move on to compound statements.
% set a "hello"; set b "hello2"
hello2
This gives a single line of output from `set b`, and a and b are both set.
% puts $a
hello
% puts $b
hello2

Here's two lines of output from stdout.
% puts "hello"; puts "hello2"
hello
hello2

Importance
Why is this important? What if we would like to store a calculated value into variable a.
% set a [expr 3+3]
6
% puts $a
6
This is what we expected. However, if we wrote another version of expr that output the result into stdout.
% set a [set b [expr 3+3]; puts $b]
6
% puts $a
(blank)
We get nothing.

Generally speaking, it is not a good idea to print to screen from many functions. Because this would make the testing difficult, if not impossible. Usually, assert will compare the return result to expected result, while the output to standard output are ignored. Despite all that, we can still try to test function which output to stdout. The idea is to trap the stdout. This assert for stdout would look like this pseudo code:
procedure assertStdout (expected, command) {
  create out for this instance
  execute command > out
  flush out
  return expected == out
}
One trick is that if multiple assertStdout procedures are running, 'out's must be distinct in order to keep thread-safe.

The echo back of tcl might be of use, but it could confuse people with the output from within procedures. However, it is mandatory to distinguish them for a programmer. Other scripting languages like shell does not have such echo back, but that doesn't mean it is OK to output from many functions for them. Especially, for compicated software that requires a thorough test, standard out or output to file everywhere would definitely bring down the testing.

Monday, September 29, 2008

Snapshot in UI Testing

If you have experience in JUnit or other testing tools, you probably know the importance of setup and teardown methods. These two methods help build a testing environment. However, when it comes to UI testing, it is not the case. UI testing means taking a long time, boring, and random exceptions. What's more, every time an error occurs, which might be caused either by a typo or by the tested codes, the only solution is to start over again from the very beginning. This is boring!

If we could take a snapshot at certain points (which are called save points), we could easily get started from those points. It should be time efficient. It certainly save a lot of time otherwise wasted in setting up the environment again. What's more, you can benefit from reproducing some random errors. If a snapshot were taken just before it occurs, the error would probably show up again, which also helps reduce the number of bugs.

It is not too hard to implement such a system. There are various ways though. It can be implemented as a functionality of the testing tools, or it could be bounded with the help of underlying systems. One good candidate is the virtual machine. Most VMMs, including open source ones, have a similar function. Taking snapshot takes seconds to minutes, depending on the implementations of VMM and the size of the virtual machine. We could take multiple snapshots, and rollback to a certain one, and even make branches. Although making use of VMM might seem easier, it is harder, but necessary, to integrate the snapshot into the testing tools. Taking snapshot should be as simple as setting breakpoints in debug. 

Sunday, August 31, 2008

Expecting analytical tools for individuals

Analytical tools for websites are everywhere. However, analytical tools for individuals are rare. Google web history might be one, but has much less functions compared to its analytics.

One may wonder why would we need an analytical tool for individuals. The reason is that tracing one's activities is interesting (as long as it's not privacy related). As it should be helpful to a website, an analytical tool for individual would benefit the user and even provide suggestions on time management. The analytical tool for individual has to be
  1. automatic; users won't log each into a document manually.
  2. complete; a partial result is of mere use.
  3. precise; an imprecise log will lead to fault result.
Of these characters, automatic is the most important. A user is usually not willing to log his/her activities manually. Users are not able to log each activity precisely. However logging is a required step in time management, without which analyzing would be impossible.

It would be difficult to implement such an analytical tool, especially for fine-grained user activities. There are different operating systems, browsers, etc. Besides, there are hundreds of thousands of activities outside computer world.

Saturday, July 26, 2008

Seamless mode of virtual machines could be improved.

Thanks to Windows, we are so used to Ctrl+Tab. (What? You are using a device called mouse?)

Virtual machine has advantages over host machines, though it is usually because of the users rather than the OS itself. Consider the case that you are using Linux as your host OS, but you would also like the the benefits from Windows, besides you get an licensed version. You might be using VirtualBox, or VMware products. They both have a seamless mode which integrates the guest OS into your host OS. This blurs the edge between the host and the guest. You might not notice that the applications are running on different machines. I could watch from Windows Media Player and update my Ubuntu at the same time.

All seems good before you come to a keyboard-only guru. The 'Ctrl+Tab' simply won't work when you want to switch between host and guest. The focus will not be released, when it is captured by the guest, unless the 'host key' is pressed. This is not an idea 'seamless' mode, where even gurus will not notice the switching from guest from host.

There are two possible trends of seamless mode for virtual machines on Linux. One is to make use of workspace on Linux. A workspace is like a separate root window on windows and the number of workspace can be changed by users. A virtual machine could use one entire workspace for the guest OS. This is especially convenient if the guest OS is a single workspace operating system. Another possibility is to make guest applications more like host applications. This is similar to Wine. However, one might still want to keep the original theme from guest OS.

A 'seamless mode' is a very good start point for improved user experience.
A screen shot of seamless mode of VirtualBox.

Thursday, July 24, 2008

What next? Cloud computing, mashup, or app engine.

Cloud computing, Mashup, and App engine.

It might be interesting to build a 'cloud' instead of building an OS. As time goes by, a TV set is no longer an expensive property. Although a computer ( the hardware plus many software bundled) is not as cheap as a TV set, it certainly will in the near future. Will these clouds in the future become OSes which we are now used to? It might be. It was once the era of hardware, and followed by software. It is now Internet and services.

Thursday, April 17, 2008

os.system() and its return value

When os.system('exit 1') returns 256, you might wonder what is going on with python? Why isn't it simply 1 (one)? Let's take a look at this command.

A system() command might be frequently used when you were interacting with the system or other scripts.

Try help(os.system) will only show:
system(command) -> exit_status

Execute the command (a string) in a subshell.

And it talks nothing about the exit status.

This function appears in the tutorial as an example and in library reference:

system(command)
Execute the command (a string) in a subshell. This is implemented by calling the Standard C function system(), and has the same limitations. Changes to posix.environ, sys.stdin, etc. are not reflected in the environment of the executed command.

On Unix, the return value is the exit status of the process encoded in the format specified for wait(). Note that POSIX does not specify the meaning of the return value of the C system() function, so the return value of the Python function is system-dependent.

On Windows, the return value is that returned by the system shell after running command, given by the Windows environment variable COMSPEC: on command.com systems (Windows 95, 98 and ME) this is always 0; on cmd.exe systems (Windows NT, 2000 and XP) this is the exit status of the command run; on systems using a non-native shell, consult your shell documentation.

Availability: Macintosh, Unix, Windows.

wait(command)

Wait for completion of a child process, and return a tuple containing its pid and exit status indication: a 16-bit number, whose low byte is the signal number that killed the process, and whose high byte is the exit status (if the signal number is zero); the high bit of the low byte is set if a core file was produced. Availability: Macintosh, Unix.
So this explains why os.system('exit 1') returns 256 instead of 1. Actually, the signal number killed the process is zero. To get the exit status, just shift the value 8-bit to the right.

Here is a link talking about the return code of os.system(). http://mail.python.org/pipermail/python-list/2001-March/073147.html

Tuesday, March 11, 2008

Virtualization and Video Card

Virtualization helps utilize the computer power. (It would consume more power, than a single server, but much less compared to a group of single servers). Virtualization is a cost-effective solution.

Although there are great enhancement of virtualiztion technology, there is not so much in the field of video virtualization. So far, no released virtual machine could provide a modern video card. What they do provide is an emulated slow video card with likely 8mb ram or even worse. It has limited the usability of virtual machine. For example, a game user certainly could not play his/her most favorable 3D game on such a VM. An 8mb ram might not be sufficient to render a quite large LCD. Further more, many CAD software requires a compatible video card.

There are other possibilities. When CPU and main memory are fast enough, the GPU could be replaced. Thus, things will be simpler. Only an average video is necessary.

It would be great for people requiring the flexibility of *nix who can't live without games which only have windows versions. In an ideal world, users would be able to
  • switch between VMs as easy as a simple click;
  • run any application at native speed;

Friday, January 18, 2008

When Parted ruins it, Testdisk rescues it

Notice that this article contains many useful yet dangerous tools. You should always NOT use them unless you have to and you know what your are doing.

Yesterday, I was going to make a new Linux LVM partition on the hard disk with a Debian system and a Windows installed. I tried parted (GNU Parted) and everything went well until the misleading command

mklabel LABEL-TYPE create a new disklabel (partition table)
which I thought of labeling the partition, but which actually erase your partition table. This command also gives a warning like
Warning: Partition(s) on /dev/sda are being used.
Ignore/Cancel?
and if you ignored it, you would regret.

So the partition table was wiped out. However, the good news was that only the partition table was lost. Parted provided a rescue command which is useless. Let's just skip the boring part of trying 'rescue 0 20GB' a hundred times. After several search queries, I came to gpart (not GParted). It turned out to be of use. With

gpart /dev/sda
I found an NTFS partition and a Linux swap partition. The gpart did not recognize ext3 and lvm2. Then I found the last release dated back to 2001, and it certainly won't support these new formats.

Whenever you are to give up, the savior comes.

Testdisk came in and became handy. Comparatively, it is user-friendly. Testdisk analyzed the disk and gave out a list of partitions it found. I modified some attributes (boot flag etc) and write the partition table back.

It should marked the end of the partition table recovery, but things were a little more complicated than expected.

The partition was not ordered correctly. It is not a problem of testdisk though. Testdisk ordered the partitions by cylinder number, but the partition table on the disk used to be in random order. The reason for that is when I was installing debian and made partitions using partman, the ordered was determined by the operation order. I had to change the partition order back manually. I chose fdisk this time. It never modify the partition table before the w (write) command.

  1. First, list the partitions using fdisk and record them somewhere.
    fdisk -l /dev/sda         (list by cylinders)
    fdisk -ul /dev/sda (list by sectors, it will be of use)
    fdisk -u /dev/sda (using a sector mode)
  2. Second, delete those partitions not in the expected order with command d

  3. Third, add primary partitions using the expected number. (command n).
    Add an extended partition that will contain all logical partitions.
    Add logical partitions from number 5. Note that you can't choose the partition number for logical partitions.

  4. [Very Important] Fourth, verify that the sectors you entered are EXACTLY as the listed ones.
    Set the boot flag on the boot partition.
    Set partition types (LVM, NTFS, FAT etc.).
    Make sure that everything is correct and you know what you are doing.

  5. Finally, write back and quit with w. [Dangerous]

I verified that the partition table is correct with testdisk.

And re-install grub on MBR. With

grub-install /dev/sda

Now the most exciting part. Reboot...

Everything went and the system was not affected.

Links: