Tuesday, December 30, 2008

Priority Queue & Heapsort

A priority queue is a queue where each item has a priority associated with it. And the item with the highest priority is at the top of the queue. So, you will be removing the highest priority items from the queue first.

A priority queue can be implemented using a heap where the heap is implemented using a complete binary tree. A complete binary tree is supposed to have the heap property if the root has higher priority than its child and each child also follows the same heap property.

Now, lets try to understand this thing in english.

A complete binary tree is a tree filled from left to right. So, this is a complete binary tree. Also, it could be seen that for each node the root is always bigger than the child nodes. That is the heap property. And since the root has the highest priority and it is the first element, this is also a priority queue. If you remove the root, you will have to re-arrange the elements so that the new root again has the highest priority.

The benefit of using a heap structure is that inserting new element and removing root are handled in a constant O(log n) time, that is the time taken to re-arrange the elements. When the highest priority element is removed, we put the last element in the tree (lowest right element) at the root position. This new root is compared with both its children and if it has low priority as compared to either of its children, it is exchanged with the child. This goes on till the node resides at a place where its children are of lower priority and its parent is of higher priority. While inserting a new element, we place it at the last available node (remember, the tree should always be a complete tree), and move it up (basically follow the reverse procedure of removing root).

A heap of n elements could be easily stored using n sequential locations in an array. The left child node of node at position k is placed at position 2k, and the right child node is placed at 2k+1.

So, for the above heap, the elements would be stored as

16, 11, 9, 10, 5, 6, 8, 1, 2, 4

Lets check out the pseudo code for pushing and popping elements from the heap:

function heapPop(arr, count)
  start = (count-2)/2;
  while(start >= 0)
    shiftDown(arr, start, count-1);
    start = start=1;

function shiftDown(arr, start, end)
  root = start;
  while root*2+1 <= end // while root has only 1 child
    child = root*2+1;
    if ( child+1 < end ) and ( arr[child]<a[child+1] )
      child = child+1;
    if ( arr[root] < arr[child] )
      swap(arr[root], arr[child]);
      root = child;

function heapPush( arr, count)
  end = 1;
  while (end < count )
    shiftUp(arr, 0, end);
    end = end+1;

function shiftUp(arr, start, end)
  child = end;
  while (child > start )
    parent = floor((child-1)/2);
    if(arr[parent] < arr[child])
      child = parent;

function heapSort(arr, count) // input is unsorted array "arr" of "count" elements
  heapPop(arr, count);
  end = count-1;
  while(end > 0)
    end = end-1;
    heapPop(arr, count);

Wednesday, December 24, 2008

Recursive algos

Few recursive algos:


function factorial(int n)
  if ( n==0 ) return 1;
  else return n*factorial(n-1);

Fibonacci numbers:

function fibo(int n)
  if( (n==0) || (n==1) ) return 1;
  else return fibo(n-1)+fibo(n-2);

Greatest common divisor of 2 numbers :

function gcd(int x, int y)
  if(y == 0) return x;
  else return gcd(y, x%y);

Tower of Hanoi: Given 3 pegs, one with a set of N disks of increasing size, determine the minimal/optimal no of steps required to move the disks from their initial position to another peg without placing a larger disk on top of a smaller one

function hanoi(int n)
  if(n==1) return 1;
  else return 2*hanoi(n-1)+1;

Binary search : Search an ordered array of elements by cutting the array in half on each pass

function binary_search(int* data, int tofind, int start, int end)
  int mid = start + (end-start)/2; // no float/double
  if(start > end)
    return -1;
  else if(data[mid] == tofind)
    return mid;
  else if(data[mid] > tofind)
    return binary_search(data, tofind, start, mid-1);
    return binary_search(data, tofind, mid+1, end);

Monday, December 15, 2008

google chrome on linux

So, you visit http://www.google.com/chrome every other day with the hope that there would be version of chrome for linux out. And everytime you see the "For Windows Vista/XP only", you feel jealous of windows users and you want to ask google guys why they came out with "chrome for windows" before "chrome for linux".

Though, we still do not have any official version of chrome for linux, i looked around and found 2 ways of installing chrome on linux. It is possible using wine.

1 way gives you a complete geeky way of doing it. You can find it here : http://www.myscienceisbetter.info/2008/09/install-google-chrome-on-linux-using-wine.html. But it did not work on my system. Some problem with ALSA it said.

So i looked again and again and found out something known as crossover chromium. You can find it here http://www.codeweavers.com/services/ports/chromium/. It provides pre-compiled binaries for installation on your system.

Basically the guys at codeweavers had taken up some developer build 21 of chrome from google and ported it on linux using wine. It would not update itself. But if you are able to install the "geeky" way, then chrome might update itself. Basically something to feel a bit satisfied till you get the "google chrome for linux" official version

Friday, November 28, 2008

postgresql replication using slony-I

As most postgresql users must be knowing, postgresql does not provide any inbuilt replication solution. There are lots of 3rd party replication products available for postgresql. Slony is one of them. Slony is a trigger based replication solution, that is it used triggers to push data to the slave. Slony is supposed to be one of the most stable replication solutions for postgresql.

You can download slony from www.slony.info. There are two major versions of slony - slony-I & slony-II. Slony-I is a simple master-slave replication solution. Whereas slony-II is a advanced multi-master replication solution. We will go ahead with simple master-slave replication solution. So we will download Slony-I. The latest version available is Slony-I 1.2.15. Slony-I 2.0 is in RC and should be soon released. But we will go with a stable release - 1.2.15.

Postgresql version being used is 8.3.3. To install slony, simply untar the downloaded file and run
./configure --with-pgconfigdir=<path to pg_config>
sudo make install

I have used two machines for setting up replication. Installed postgresql and slony-I on both of them.

master server ip :
slave server ip :

We will be working with the superuser postgres which is used to start and stop the postgresql server.

Quick steps

  • Define environment variables on master & slave. The main purpose is to make our task easier. Lets create an env.sh file containing all the definitions.




    As you can see here, my postgresql is installed in /usr/local/pgsql. I have defined the IP addresses & ports of master and slave servers. I have used the superuser postgres for replication. And i have defined the master and slave databases to be used for replication. You can replicate between databases with different names on master and slave - just change the names in all the scripts.

  • Create database on master & slave
    On master run
    /usr/local/pgsql/bin/createdb -O $REPLICATIONUSER -h $MASTERHOST -p $MASTERPORT $MASTERDBNAME
    On slave run
    /usr/local/pgsql/bin/createdb -O $REPLICATIONUSER -h $SLAVEHOST -p $SLAVEPORT $SLAVEDBNAME

  • Since slony-I depends on triggers for replication, you will need to install the plsql procedural language on master to generate and run triggers & stored procedures for pushing data to slave.
    /usr/local/pgsql/bin/createlang -h $MASTERHOST -p $MASTERPORT plpgsql $MASTERDBNAME

  • Put some tables in the $MASTERDBNAME on master, which you want to replicate. And port the tables to slave. It has to be done manually.

    Dump the tables on master
    /usr/local/pgsql/bin/pg_dump -s -U $MASTERDBA -h $MASTERHOST -p $MASTERPORT $MASTERDBNAME > replmaster.sql

    Import the tables on slave
    /usr/local/pgsql/bin/psql -U $SLAVEDBA -h $SLAVEHOST -p $SLAVEPORT $SLAVEDBNAME < replmaster.sql

  • And now configure the databases for replication. When you install Slony-I, it puts two binaries slonik and slon in the pgsql/bin directory. Slonik is the tool which is used for creating configuration tables, stored procedures and triggers. All we need to do is create a configuration file to pass it to the slonik tool. Here i am assuming that there are two tables which need to be replicated - parent & child.

    vim replconfig.cnf
    # define the namespace the replication system uses in our example it is
    # replcluster
    cluster name = replcluster;
    # admin conninfo's are used by slonik to connect to the nodes one for each
    # node on each side of the cluster, the syntax is that of PQconnectdb in
    # the C-API
    node 1 admin conninfo = 'dbname=repltestdb host= port=5432 user=postgres';
    node 2 admin conninfo = 'dbname=repltestdb host= port=5432 user=postgres';
    # init the first node. Its id MUST be 1. This creates the schema
    # _$CLUSTERNAME containing all replication system specific database
    # objects.
    init cluster ( id=1, comment = 'Master Node');
    # Add unique keys to table that do not have one.
    # This command adds a bigint column named "_Slony-I_$CLUSTERNAME_rowID" to the table which will have a default value of nextval('_$CLUSTERNAME.s1_rowid_seq') and have UNIQUE & NOT NULL constraints applied on it.
    # table add key (node id = 1, fully qualified name = 'table_name');
    # Slony-I organizes tables into sets. The smallest unit a node can
    # subscribe is a set.
    # you need to have a set add table() for each table you wish to replicate
    create set (id=1, origin=1, comment='parent child table')
    set add table (set id=1, origin=1, id=1, fully qualified name = 'public.parent', comment='parent table');
    set add table (set id=1, origin=1, id=2, fully qualified name = 'public.child', comment='child table');
    # Create the second node (the slave) tell the 2 nodes how to connect to
    # each other and how they should listen for events.
    store node (id=2, comment = 'Slave node');
    store path (server = 1, client = 2, conninfo='dbname=repltestdb host= port=5432 user=postgres');
    store path (server = 2, client = 1, conninfo='dbname=repltestdb host= port=5432 user=postgres');
    store listen (origin=1, provider = 1, receiver =2);
    store listen (origin=2, provider = 2, receiver =1);

    Pass the config file to slonik for creating required triggers & config tables.

    /usr/local/pgsql/bin/slonik replconfig.cnf

  • Lets start the replication daemons on master & slave

    On master run
    /usr/local/pgsql/bin/slon $CLUSTERNAME "dbname=$MASTERDBNAME user=$MASTERDBA host=$MASTERHOST port=$MASTERPORT" > slon.log &

    On slave run
    /usr/local/pgsql/bin/slon $CLUSTERNAME "dbname=$SLAVEDBNAME user=$SLAVEDBA host=$SLAVEHOST port=$SLAVEPORT" > slon.log &

    Check out the output in slon.log files

  • Now everything is setup and from the slon.log files on master and slave you can see that both the servers are trying to sync with each other. But still replication is not on way. To start replication we need to make the slave subscribe to the master. Here is the required config file for doing this

    # This defines which namespace the replication system uses
    cluster name = replcluster;
    # connection info for slonik to connect to master & slave
    node 1 admin conninfo = 'dbname=repltestdb host= port=5432 user=postgres';
    node 2 admin conninfo = 'dbname=repltestdb host= port=5432 user=postgres';
    # Node 2 subscribes set 1
    subscribe set ( id = 1, provider = 1, receiver = 2, forward = no);

    Passing this file to slonik will do the trick and replication would start happening.

    /usr/local/pgsql/bin/slonik startrepl.cnf

Now simply make some inserts, updates and deletes on the master and check out whether they are happening on the slave as well. Officially, since replication is on full swing all changes in master tables should be replicated on the slave.

Please note that new tables & changes to table structures wont be replicated automatically. So whenever a new table is created or an existing table is altered the changes has to be manually propagated to slave and the scripts need to be run to make appropriate changes in the triggers and config tables.

Another important thing to note is that postgresql on master and slave should be able to communicate with both the ip addresses. For this add the ip addresses in the pgsql/data/pg_hba.conf.

For the able replication i had added the lines
host all all trust
host all all trust

to the pg_hba.conf file in both master & slave.

Tuesday, November 04, 2008

An auto ride in Delhi

I have been coming back from baroda and it was around 7 AM in the morning. The train was already 2 hours late and would stop at new delhi railway station in some time. I was ready with my backpack. As soon as the train stopped, i got down and proceeded to the exit (Ajmeri gate side) to catch an auto for noida (my residence).

As soon as i came to the road outside the station and almost half a kilometer from the auto stand, as usual i saw lots of people who were asking me for auto and taxi. The taxi agents came first - yes i called them agents cause they try to catch you first and take a commission from the auto/taxi. The quotes i got for noida was 250/-. I was like WTF!!! I said no to all of them and came out and approached the auto stand. There were tons of auto but only a few were going to noida and their fare was 200/- min. Officially the fare is around 120/-. Then i saw this man who was shouting "noida, noida", and i asked him for a quote. He told me 150/-, I bargained and finally we froze at 130/- I went to his auto and saw that it was already full. There were two good looking ladies in the back seat and i was supposed to ride with the driver. The "ladies" were paying 250/- for the trip to noida. So the auto was going to make 380/- for this ride. Ghosh!!!

I had avoided the prepaid booth cause of the long queue over there. As we rode out of the stand a white clothed police man stopped us and asked the auto wala for his prepaid receipt. Usually the prepaid booth issues a receipt which i have to give to the auto at the destination, and the auto wala presents this receipt to the prepaid booth to collect his payment. Since this auto was a totally self arranged one, we did not have any receipt. So, instead of issueing a receipt for us (convenience for passenger), we were asked to get down and another couple who had a receipt were asked to board the auto. I was again like WTF!!!. After half an hour of bargain, i get an auto and the police man simply hands it over to someone else.

Well, so i thought, why not get a receipt from the prepaid booth. I stood in the queue for 15 minutes and got a receipt for 140/- to a place known as ashok nagar which is on the border of noida & delhi. If you would ask, why not into noida to my destination, i would say that the prepaid booth does not provide service to U.P. It was just that i remembered the name of the place and got the receipt. Now, the police man who was assigning passengers to autos were not visible. I stood for some time and looked around and i spotted a crowd. The crowd would move to an auto and the auto would depart with passengers, then the crowd would move to the next auto. I went near "the crowd", and saw that it was a junior policeman surrounded with lots and lots of passengers all havine prepaid receipts. Everyone was trying to put his receipt on priority and the policeman was enjoying the attention he was getting. He would go to an auto, assign it a passenger, scold the auto-wala and then talk on his mobile and then move to the next auto - all at his liesure. He had to do this whole day, and it seemed he was enjoying his work.

Finally, the policeman saw my receipt and assigned me an auto. I jumped in, glad that finally i am going to get home. It was almost 8:30 - 1.5 hours to get an auto. We departed. But my adventure does not end here. As soon as we were out of the auto stand and on the road, the auto wala turned his head to me and asked me where was i going.

Let me try to paint a picture of the auto wala. He is a male, 40ish, with black and white hair. He is very untidy and smelly. It seems that he has not bathed in few days. He is smoking a bidi (an indian version of ciggerate) from the side of his mouth. And he has an attitude - "i am the king of the road".

So, i told him that i have to go to sector 12 in noida, but he can leave me at ashok nagar and i will take a manual rickshaw from there, if he wants. the auto wala says from the side of his mouth (he has bidi on the other side). He will drop me at my place - sector 12 noida, if i pay him 100/- extra. I was a bit shocked. After spending 1.5 hours in getting an auto i am still at the same price. I told him that i will pay him 40/- extra. After everything, we finally bargained a deal at 200/-. And, i finally relaxed on the back of the auto.

No sooner had i done this that, a bus overtook the auto from the wrong side and at a very close distance. I jumped up. The auto wala was cool. I asked him whether he had given indicator or not. And he showed me his foot which was sticking out in the direction where we were about to turn. I said "wow...".

Now, i began to notice the way the auto was being driven. We were exactly in the center of the road, in such a manner that no car can overtake us from the right side. If a vehicle has to overtake us, it has to do it from the wrong side. There were lots of honking behind us. A car was getting frustrated. We were moving at a constant speed of 35, and the car was trying to overtake us from the right side, but the auto would not give way to it. Finally the car overtook us from the wrong side. The car-driver glared at the auto-driver but the auto driver did not look at him.

We were half way and suddenly the auto started struggling. It coughed and died. I got down and pushed the auto to the side of the road. The auto-driver got out a bottle and put in some oil in his tank. I asked him what is this oil, and he told me it is kerosene. The auto was started and i could see the whitish black and smelly smoke from its exhaust. I got back in and we continued our journey. Now we were climbing a flyover and the auto's speed dropped to 15 kmph. It was not able to pull me up but the auto driver was very persistent, so we finally made it to the top.

Again we were in the middle of the road and some car was trying to overtake us. I looked to my side and i could see another auto just next to mine. Both autos were at their top speed i.e. 20 kmph and both were trying to race. There was a smile on my auto-driver's face because he was an inch ahead of the other auto. Behind, i could see cars and busses trying to adjust to this low speed and honking, so that they would be given side for overtaking. The flyover ended and so did the race (i think) and my auto driver won.

As soon as we entered noida, we started jumping red lights. Delhi traffic police are a bit strict and you cant jump red ligts there, but in noida, there are hardly any police at the red lights. Anyways, after saving his auto from 2-3 near collisions, we finally turned and parked the auto at my place. I handed over the prepaid receipt and another 100/- Rs. And the auto driver got out 40/- and paid me back. So, after a delay of 2.5 hours and after spending 200/- rs, i was finally home.

Thursday, October 23, 2008

Saturday, October 18, 2008

setting up hadoop

Have updated the post with latest hadoop config changes compatible with hadoop 1.1.2

Hadoop is a distributes file system similar to google file system. It uses map-reduce to process large amounts of data on a large number of nodes. I will give a brief step by step process to set up hadoop on single and multiple nodes.

First lets go with a single node:

  • Download hadoop.tar.gz from hadoop.apache.org.

  • You can setup hadoop to work on any user, but it is preferred that you setup a separate user for running hadoop.
    sudo addgroup hadoop
    sudo adduser -g hadoop hadoop

  • untar hadoop.tar.gz file in the user "hadoop's" home directory
    [hadoop@linuxbox ~]$ tar -xvzf hadoop.tar.gz

  • check version of java - it should be atleast java 1.5 - preferred java 1.6
    $ java -version
    java version "1.6.0"
    Java(TM) SE Runtime Environment (build 1.6.0-b105)
    Java HotSpot(TM) Server VM (build 1.6.0-b105, mixed mode)

  • Hadoop requires to ssh to the local server. So you would need to creat keys on local machine so that the ssh does not require password.
    $ ssh-keygen -t rsa
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
    Enter passphrase (empty for no passphrase):
    Enter same passphrase again:
    Your identification has been saved in /home/hadoop/.ssh/id_rsa.
    Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
    The key fingerprint is:
    fb:7a:cf:c5:c0:ec:30:a7:f9:eb:f0:a4:8b:da:6f:88 hadoop@linuxbox

    now copy the public key to the authorized_keys file, so that ssh should not require passwords
    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
    Now check
    $ ssh localhost
    Last login: Sat Oct 18 18:30:57 2008 from localhost

  • Change environment parameters in hadoop-env.sh
    export JAVA_HOME=/path/to/jdk_home_dir

  • Change configuration parameters in hadoop-site.xml. 
In hadoop 1.1.1, hadoop-site.xml has been replaced by 3 files - core-site.xml, hdfs-site.xml and mapred-site.xml

in core-site.xml

<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>

in hdfs-site.xml
<description>Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.

in mapred-site.xml

<description>The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.


  • Format the name node
    $ cd /home/hadoop
    $ ./bin/hadoop namenode -format

    Check the output for errors.

  • Start single node cluster
    $ <HADOOP_INSTALL>/bin/start-all.sh
    This should start the namenode, datanode, jobtracker and tasktracker - all on one machine.

  • Check whether the nodes are up and running. The output should be approximately like
    $ jps
    28982 JobTracker
    28737 DataNode
    28615 NameNode
    30570 Jps
    29109 TaskTracker
    28870 SecondaryNameNode

  • In case of any error, Please check the log files in the <HADOOP_INSTALL_DIR>/logs directory.

  • To stop the node, run
    $ <HADOOP_INSTALL_DIR>/bin/stop-all.sh

We will skip running actual map-reduce tasks on a single node setup and go ahead with a multi-node setup. Once we have 2 machines up and running, we will run some example map-reduce tasks on those nodes. So, lets proceed with multi-node setup.

For multi node setup, you should have 2 machines up and running and both having hadoop - single node setup on them. We will refer the machines as master and slave, And assume that hadoop has been installed under /home/hadoop directory in both the nodes.

  • Firstly stop the single node hadoop running on them.
    $ <HADOOP_INSTALL_DIR>/bin/stop-all.sh

  • Edit /etc/hosts file on both the servers to setup master and slave names. Eg:
    aaa.bbb.ccc.ddd master
    www.xxx.yyy.zzz slave

  • Now, the master should be able to ssh to the slave server without any password, so copy the public key of master to that of slave.
    master]$ ssh-keygen -t rsa
    master ~/.ssh]$ scp id_rsa.pub hadoop@slave:.ssh/
    slave ~/.ssh]$ cat id_rsa.pub >> authorized_keys

    Test the ssh setup
    master]$ ssh master
    master]$ ssh slave
    slave ]$ ssh slave

  • Change the <HADOOP_INSTALL_DIR>/conf/masters & <HADOOP_INSTALL_DIR>/conf/slaves file to add the master & slave hosts there on the master server. The files should look like this.

    master ~/hadoop/conf]$ cat masters

    The slaves file contains the hosts(one per line) where hadoop slave daemons (data nodes and task trackers) would run. In our case we are running the datanode & tasktracker on both machines. In addition the master server would also run the master related services (namenode). Both master and slave would store data.

    master ~/hadoop/conf]$ cat slaves

  • Now change the configuration (<HADOOP_INSTALL_DIR>/conf/hadoop-site.xml) on all machines (master & slave). Set/change the following variables.

    Specify the host and port of the name node(master server).
    fs.default.name = hdfs://master:54310

    Specify the host and port of the job tracker (map reduce master).
    mapred.job.tracker = master:54311

    Specify the number of machines a single file should be replicated to before it becomes available. It should be equal to the number of slave nodes. In our case it is 2 (master & slave - both act as slaves as well).
    dfs.replication = 2

  • You need to format the namenode recreate the datanode. Do the following
    master ~/hadoop/hadoop-hadoop]$ rm -rf dfs mapred
    slave ~/hadoop/hadoop-hadoop]$ rm -rf dfs mapred

    Recreate/reformat the name node
    master ~/hadoop] $ ./bin/hadoop namenode -format

  • Start the cluster.
    [master ~/hadoop]$ ./bin/start-dfs.sh
    starting namenode, logging to /home/hadoop/bin/../logs/hadoop-hadoop-namenode-master.out
    master: starting datanode, logging to /home/hadoop/bin/../logs/hadoop-hadoop-datanode-master.out
    slave: starting datanode, logging to /home/hadoop/bin/../logs/hadoop-hadoop-datanode-slave.out
    master: starting secondarynamenode, logging to /home/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-master.out

    Check the processes running on the master node
    [master ~/hadoop]$ jps
    5249 SecondaryNameNode
    5319 Jps
    5117 DataNode
    4995 NameNode

    Check the processes running on slave node
    [slave ~/hadoop]$ jps
    22256 Jps
    22203 DataNode

    Check the logs on the slave for errors. <HADOOP_INSTALL_DIR>/logs/hadoop-hadoop-datanode-slave.log

  • Now start the mapreduce daemons:
    [master ~/hadoop]$ ./bin/start-mapred.sh
    starting jobtracker, logging to /home/hadoop/bin/../logs/hadoop-hadoop-jobtracker-master.out
    slave: starting tasktracker, logging to /home/hadoop/bin/../logs/hadoop-hadoop-tasktracker-slave.out
    master: starting tasktracker, logging to /home/hadoop/bin/../logs/hadoop-hadoop-tasktracker-master.out

    Check the processes on master
    [master ~/hadoop]$ jps
    5249 SecondaryNameNode
    5117 DataNode
    5725 TaskTracker
    5598 JobTracker
    5853 Jps
    4995 NameNode

    And the processes on the slave
    [slave ~/hadoop]$ jps
    22735 TaskTracker
    22856 Jps
    22413 DataNode

To shut down the hadoop cluste run the following on master

[master ~/hadoop]$ ./bin/stop-mapred.sh # to stop mapreduce daemons
[master ~/hadoop]$ ./bin/stop-dfs.sh # to stop the hdfs daemons

Now, lets populate some files on the hdfs and see if we can run some programs

Get the following files on your local filesystem in some test directory on master

[master ~/test]$ wget http://www.gutenberg.org/files/20417/20417-8.txt
[master ~/test]$ wget http://www.gutenberg.org/dirs/etext04/7ldvc10.txt
[master ~/test]$ wget http://www.gutenberg.org/files/4300/4300-8.txt
[master ~/test]$ wget http://www.gutenberg.org/dirs/etext99/advsh12.txt

Populate the files in the hdfs file system

[master ~/hadoop]$ ./bin/hadoop dfs -copyFromLocal ../test/ test

Check the files on the hdfs file system

[master ~/hadoop]$ ./bin/hadoop dfs -ls
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2008-10-20 12:37 /user/hadoop/test
[master ~/hadoop]$ ./bin/hadoop dfs -ls test
Found 4 items
-rw-r--r-- 2 hadoop supergroup 674425 2008-10-20 12:37 /user/hadoop/test/20417-8.txt
-rw-r--r-- 2 hadoop supergroup 1573048 2008-10-20 12:37 /user/hadoop/test/4300-8.txt
-rw-r--r-- 2 hadoop supergroup 1423808 2008-10-20 12:37 /user/hadoop/test/7ldvc10.txt
-rw-r--r-- 2 hadoop supergroup 590093 2008-10-20 12:37 /user/hadoop/test/advsh12.txt

Now lets run some test programs. Lets run the wordcount example and collect the output in the test-op directory.

[master ~/hadoop]$ ./bin/hadoop jar hadoop-0.18.1-examples.jar wordcount test test-op
08/10/20 12:49:45 INFO mapred.FileInputFormat: Total input paths to process : 4
08/10/20 12:49:46 INFO mapred.FileInputFormat: Total input paths to process : 4
08/10/20 12:49:46 INFO mapred.JobClient: Running job: job_200810201146_0003
08/10/20 12:49:47 INFO mapred.JobClient: map 0% reduce 0%
08/10/20 12:49:52 INFO mapred.JobClient: map 50% reduce 0%
08/10/20 12:49:56 INFO mapred.JobClient: map 100% reduce 0%
08/10/20 12:50:02 INFO mapred.JobClient: map 100% reduce 16%
08/10/20 12:50:05 INFO mapred.JobClient: Job complete: job_200810201146_0003
08/10/20 12:50:05 INFO mapred.JobClient: Counters: 16
08/10/20 12:50:05 INFO mapred.JobClient: File Systems
08/10/20 12:50:05 INFO mapred.JobClient: HDFS bytes read=4261374
08/10/20 12:50:05 INFO mapred.JobClient: HDFS bytes written=949192
08/10/20 12:50:05 INFO mapred.JobClient: Local bytes read=2044286
08/10/20 12:50:05 INFO mapred.JobClient: Local bytes written=3757882
08/10/20 12:50:05 INFO mapred.JobClient: Job Counters
08/10/20 12:50:05 INFO mapred.JobClient: Launched reduce tasks=1
08/10/20 12:50:05 INFO mapred.JobClient: Launched map tasks=4
08/10/20 12:50:05 INFO mapred.JobClient: Data-local map tasks=4
08/10/20 12:50:05 INFO mapred.JobClient: Map-Reduce Framework
08/10/20 12:50:05 INFO mapred.JobClient: Reduce input groups=88307
08/10/20 12:50:05 INFO mapred.JobClient: Combine output records=205890
08/10/20 12:50:05 INFO mapred.JobClient: Map input records=90949
08/10/20 12:50:05 INFO mapred.JobClient: Reduce output records=88307
08/10/20 12:50:05 INFO mapred.JobClient: Map output bytes=7077676
08/10/20 12:50:05 INFO mapred.JobClient: Map input bytes=4261374
08/10/20 12:50:05 INFO mapred.JobClient: Combine input records=853602
08/10/20 12:50:05 INFO mapred.JobClient: Map output records=736019
08/10/20 12:50:05 INFO mapred.JobClient: Reduce input records=88307

Now, lets check the output.

[master ~/hadoop]$ ./bin/hadoop dfs -ls test-op
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2008-10-20 12:45 /user/hadoop/test-op/_logs
-rw-r--r-- 2 hadoop supergroup 949192 2008-10-20 12:46 /user/hadoop/test-op/part-00000
[master ~/hadoop]$ ./bin/hadoop dfs -copyToLocal test-op/part-00000 test-op-part-00000
[master ~/hadoop]$ head test-op-part-00000
"'A 1
"'About 1
"'Absolute 1
"'Ah!' 2
"'Ah, 2
"'Ample.' 1
"'And 10
"'Arthur!' 1
"'As 1
"'At 1

That's it... We have a live setup of hadoop running on two machines...

Saturday, October 11, 2008

install latest vlc on ubuntu 8.04

So, you have got ubuntu and you are still playing movies on 0.8.6 something version. No matter how many times you do an "apt-get update", vlc does not upgrade. The way to do this is:

a) sudo vim /etc/apt/sources.list

b) Add line "deb http://ppa.launchpad.net/c-korn/ubuntu hardy main" and save the file

c) sudo apt-get update

d) sudo apt-get install vlc

Now when you type in vlc, you will see the latest vlc player popping up...

how law and order works in india...

It had been a week since we had got vegetables from the market. For the past 3-4 days we had been surviving only on varieties made out of potatoes and onions. So, on a fine monday evening, while we were driving back to home from office, we thought that we should get down at the roadside vegetable market and get some veggies. So, we stopped on the opposite side of the road from where the market was, got down and went to get the veggies. There were lots of cars parked over there and ofcourse it is a very busy road with cars and other vehicles moving up and down. Also tons of rickshaw-walas were waiting not 100 meters from where i had my car parked.

We are really quick shoppers, so in 15 minutes we had got supplies for almost a week and we decided that we should move back. When i came back, i saw that the window of the right hand back door was not there. Well, it was there, but it was in pieces and my precious laptop bag with all its contents was missing. It took some time for the fact to sink in that my bag was missing. I looked left and right for the f*** who had stolen my bag, but of-course, no one could be found. And then i looked at people nearby. I saw a man sitting on a chair 10 meters from my car. I ran to him and asked if he has seen anything. Ofcourse, he had seen nothing. The rickshaw-walas were totally ignorant and emphatetic with my situation. "ka jamana aa gaya hai, seesha tod kar bagwa le kar chala gaya". That is how they express their grief.

In india, when things are stolen from you, you have to give up all hope of getting it back. People generally turn blind and deaf when they feel that something wrong is happening in their surrounding. The reason behind this might be the fact that the probability of getting caught and facing a sentence is very low (maybe 0.1 %). People generally try to avoid reporting crimes, because the police are totally un-cooperative. The police try "NOT" to find lost things. I cant figure out if it is their laziness or lack of IQ. It might be both. In foreign countries, say U.S., you dial 911 and within 5-10 minutes, you have cops at your door - trying to help you out. I have not tried dialing 100 here, but i dont think that the police would be at the place in less than an hour.

Well, lets move ahead with the story. So, i called up one of my friends and told him what had happened. He told me that he would be home in an hour and then we can go and maybe report the crime - that is if it is required to claim the insurance. The main things I had lost were my official laptop, possession letter (I had specially taken it out on that day to get it photocopied), RC of my bike, bank and credit-card statements and an almost new cheque book. I went home and made a complete list of things i have lost and passwords that need changing. ( I had stored some passwords on my laptop).

An hour later, me and my friend wrote down an application in english and went to the nearby police "chowki" to get the report registered. The first reaction of the policeman sitting there was that "why was the application in english?", so we wrote it again in Hindi. Imagine the policemen unable to read english. What would happen if a foreign tourist gets robbed? Would he be able to even make these guys understand what has happened? When we handed over the freshly written application to him, he simply put it on the table and went out to get his superior officer. The superior officer read the application and asked us to take him to the place where the incident has happened. I had used my brain a little and had safely put away my car and taken my bike. Mainly because i was not sure if they would ask me to let the car be with them or they would ask more money looking at a long car and thinking that i am a "rich" man.

So, we rode after them on the motorcycle and went to the vegetable market. I showed him the place where i had parked the car from a distance - i dont know why he did not go to the place. And told him that there were rickshaws and a man sitting nearby, he was quick to ask whether i suspect the man to be the thief. How could i judge? Should he not question the person and check out whether he was right or wrong? Anyways, next he started blaming me for parking my car on the road where there was no parking. But, there is no parking space nearby - i thought. And we came back to his "chowki". He kept the application and told us to check on the status next morning. We asked him if he would give us an FIR. But the reply was "No" - "mai FIR nahi likhta".

I knew this would happen and it would be difficult to get an FIR our of these guys without giving them some "donation" in return. But i had expected them to ask upfront for a "donation". But no such thing happened, so we waited. We waited for almost 15 minutes, but nothing happened. Then the "daroga" got angry because we were waiting for his response and said "Ab laptop le kar hi jaogay kya? Dekh lo kahi yehi pada ho to le jana.". We did not know what to reply, so we simply went home.

Next day, i went to the police station with 2 of my friends and again repeated the complete story to atleast 2 police officers. One of them told us to get our car to show the damage. So, we went and got the car. He then sent some junior havaldar to check out whether the window was broken (that is whether we were telling the truth or not). Then we were asked to consult the SO (head in charge of the police station). Again we repeated the story to him. And showed him the car from a distance. So, he told his junior officers to accept the "application" [Still no FIR]. The junior officer simply accepted the application and stamped it and drew a vertical line (which i believe was his signature). He did not read the contents, neither did he check what language it was written in.

After accepting the application, when we asked for a complaint no, he stopped looking at us and started shifting papers from one box to the same box. I think, he was trying to ignore us - i dont know why? So, we again went to the SO and told him that we wanted some complaint no. And he redirected us to another officer, to whom we again repeated the whole story and showed him the application. He simply said "yeah to angreji mein hai. Isko received kisne kia?" (this is in english, how did this get accepted?) - how was i supposed to know the answer to this question? He took us back to the previous junior officer who had accepted and asked him why did he accept an application in english?. Well, the application was returned back to us and we were asked to rewrite it in hindi and submit it again.

So, we wrote it in hindi and submitted it again. This time the junior officer read it and then stamped it. It was during that time that we came to know that the junior officer was an 8th pass and did not know anything about english. If this is the type of education that a policeman has, then what can we expect out of them. We asked almost everyone about when can we get an FIR or a complaint no, and the response was "kal" (tomorrow). Someone even said that we might get in 2-3 hours, if we are ready to wait.

When i came to the office that day, lots of people came to me and shared their experiences when they had to get an FIR. Some of them had spent months to get an FIR. One very sad case had spent 200/- for the FIR and even after that, there was some mistake in the writing and so he had not yet been able to claim his insurance.

For the next 2 days, i just went and asked whether the FIR was ready and the general response was "kal" - mainly due to work pressure. Well, if the police are so busy writing FIR's who would do the investigation and catch the criminals. I think, they have a tough job to do - trying to write down so many FIR's instead of catching the criminals and reducing the crime rate. Everybody advised me to pay them to get the FIR. But when i offered them, they would say that it is unnecessary and still make me come the next day.

Finally on the 3rd day, dad was here, so he went with me and talked them into writing the FIR. We were again sent to the SO to whom we again repeated the complete story and reminded him about the talk we have had earlier. He again read the application that we had submitted and asked us to get a photocopy of the bill of the laptop. We rushed to the office and got a photocopy of the bill. This time the SO was generous and told us to get the FIR written. When we again went to the junior officer, he asked us to wait and then confirmed from the SO whether he should write the FIR. Again we were told to come after 2 hours. This time dad made an indication of the offer and i think the junior officer got it.

We were again asked to write a fresh application and change the dates accordingly. After some pestering the junior officer finally started writing the FIR. Finally after around 30 minutes the FIR was ready and we got a copy. After i came out, i asked dad whether he had given them the "donation" - because i did not see him giving it to them. Then dad told me that it was given to them when he shook hands with them for the final "thankyou".

The point here is that after being a victim to a crime, you have to be after the police to prove that you are a victim. Forget the option of getting back your stolen stuff. You have to give them some money to get the complaint registered. It is like "Please sir, (beg with folded hands - a 100/- rs note between the hands), i have suffered a great loss, please write my complaint". How can you expect justice to come out of these guys. I still remember the detective serials that I used to watch in my childhood. The actual scenario is worse than that. Policemen not only overlook valuable clues but dont want to find the criminal. If your car is picked up and left in the police station for a week, all you would get back would be the outer body and the seats. The steering, engine etc all would go missing. And if you enquire about it, all they would say is that it was brought here like that only.

The SO had his own style (tashan). He would keep on chewing paan, etc and keep on spitting out the reddish stuff. Specially for him there was a huge bucket to his left side, so that he could keep on spitting until the bucket is full and then it would be taken away.

Welcome to India...

Wednesday, October 01, 2008

road rage

I have driven for hours in delhi. I have driven from noida to gurgaon in very heavy traffic which took me 3 hours to reach. Sometimes, it really becomes troublesome when you have a nature's call and you cannot do anything about it. At most of the places where jams are frequent, you could see vendors selling water and other stuff to eat. But there are no public toilets nearby.

Now-a-days, all cycle walas and rickshaw walas think that they are shahrukh khan. They ride in the middle of the road and try to over take your car, cutting in front of you - as if they are invincible. People on bike also believe that they are GOD. They think, they can jump red lights without getting into any accidents and ride in the middle of the road at a constant speed of 30 kmph - turning a deaf ear to your honking and not allowing you to overtake. And the best part is when you see circus like acts on the road with 3 people on a cycle or 3 people on a small bike. I sometimes feel mercy for the overloaded vehicles. And even think whether the people who had designed the engine of the bike had taken such situations into consideration.

This attitude does lead to some frustration. After driving for an hour to cover a 20 minute distance, trying to protect these people inspite of their own carelessness, you tend to get exhausted. And you also have to protect yourself from the buses which keep on zooming from
left and right both in wrong direction and right direction. What amazes me about these busses is how they drive by you with only 2 inches to spare between your vehicle and the bus, at full speed and still it misses you. Arent the bus drivers extremely talented.

And finally cows & dogs on the road. They would just stand there confused about where to go and what to do. The municipal corporation does not try to remove them from the road. Who cares, people would simply go around them.

And ofcourse, the most famous of all these are the tampos. Yup the green colored tampos which have CNG written on their back and still throw out tons and tons of black smoke. They are heavily overloaded and keep on ferrying people between points A & B. They dont go beyond 20 kmph and they always move in the middle of the road, so that all the traffic stay behind them. They always stop in the middle of the road and just at a crossing, so that they could create jams and make people acknowledge their importance by their honking.

When people encounter these situations, and are unable to arrive at a compromise, they simply blow up. Why should they care for people who do not care for themselves?

If i would have had a jeannie, i would have wished that it taught the people on the road to have more road sense.

Saturday, September 27, 2008

Broadband finally...

Yuppie... i got a broadband finally.

When i was a kid, i had this phone line on which i had a modem. I had to dial a particular no and then wait as the web pages loaded. Downloading songs and videos were a huge pain. I had to queue up the downloads and let my pc run overnight to download a few songs (a few mbs). So, when i grew up, i had this thought that i should be getting my own broadband connection and do tons and tons of downloads and uploads - maybe put my own server with a dedicated ip address & dns entry and run maybe some web services on it.

After some research/enquiry i came to know that out of all the broadbands, airtel was supposed to be the best. So i called up an airtel salu and asked him to put my broadband in place. He came, took some documents and promised that i would get my connection the next day. Next day - nothing happened. The day after next, i called him up and he told me that today the task would be done. The same day some wiring people came and wired up my whole place for phone and internet. The appartment that i am in, might be having some pre-arranged setup with airtel, cause all the internal wiring was done. All they had to do was put in connectors and connect them to the internal wires which was already in place.

So, all my rooms [3 bedrooms + 1 drawing room] now have 2 extra connectors - for phone and internet. And there are no extra wires to be seen. I did not opt for wifi - the main reason because it is sometimes difficult to configure wifi on linux [I had configured it on my laptop, but it stopped working some time back after ubuntu sent in some updates] - and i did not want my family members to enjoy wifi while i connect to the net using wires. (a bit selfish on me, but thats the way i am).

After the wiring, the waiting started - for the engineers to turn up, install the device and start the broadband connection. Everyday i used to call up the salu, but nothing happened. Then i asked him about the engineers' no and called up the engineer. Next day the guys turned up and it took them 3 hours to get the net up and running - they had to do some setting from their backend which took a lot of time due to lack of coordination.

Anyways, finally I have broadband in place and i have been downloading tons of softwares on it. Will start downloading music and videos as well. The speed is very good - at least better than what i used to get on reliance aircard.

Wednesday, September 24, 2008



You have two cows in Vijayawada. You hook them to internet and milk them from Hyderabad .

You have two cows. You teach them to cry,"Ammaaaaaaa..."
and fall at your feet.

You have two cows. You give one to your son and the other to your nephew .
You have two cows. But you drink goat's milk.

You have two bulls. You adamantly consider them as cows.

You have two cows. You buy Rs. 900 Crore worth of cattlefeed for them.

You have two cows. You throw them into air and catch their milk in your mouth.

You have two cows. You paint them both to get colourful milk.

Softwarism: (Ultimate....)
Client has 2 cows and u need to milk them.
1 .. First prepare a document when to milk them (Project kick off)

2 .. Prepare a document how long you have to milk them (Project plan)

3 .. Then prepare how to milk them (Design)

4 .. Then prepare what other accessories are needed to milk them (Framework)

5 .. Then prepare a 2 dummy cows (sort of toy cows) and show to client the way in which u will milk them (UI Mockups & POC)

6 .. If client is not satisfied then redo from step 2
7 You actually start milking them and find that there are few problem with accessories. (Change framework)

8 .. Redo step 4

9 .. At last milk them and send it to onsite. (Coding over)

10. Make sure that cow milks properly ( Testing)

11. Onsite reports that it is not milking there.

12. You break your head and find that onsite is trying to milk from bulls

13. At last onsite milk them and send to client (Testing)

14. Client says the quality of milk is not good. (User Acceptance Test)

15. Offsite then slogs and improves the quality of milk

16. Now the client says that the quality is good but its milking at slow rate (performance issue)

17. Again you slog and send it with good performance.

18. Client is happy???

By this time both the COWs aged and cant milk.
(The software got old and get ready for next release
repeat from step 1) !!!!

Monday, August 25, 2008

Thread Executors in java

With Java 1.4, when we used to write threaded applications, we did not have much options in limiting and reusing the same thread for the available tasks. I had written applications earlier where each process that needed to be executed freely (or a process which can be executed on a thread) used to create its own thread. And once a thread was created and used, it was simply discarded. New threads were created whenever needed.

With java 1.5, there is an executor class which allows you to create controlled thread pools and execute the list of tasks on the same number of threads. If the number of tasks exceeds the number of threads then these extra tasks just wait for a thread to become available and then start their execution.

A simple threaded server using java 1.4 :

1 import java.io.*;
2 import java.net.*;
3 import java.util.*;
5 public class testPool implements Runnable
6 {
7   private int port;
8   private int active, total;
10   public testPool(int port)
11   {
12     this.port = port;
13     System.out.println("Thread pool constructor - creating thread");
14     new Thread(this).start();
15   }
17   public void run()
18   {
19     try
20     {
21       System.out.println("Thread pool - new thread created");
22       ServerSocket ss = new ServerSocket(port);
23       while(true)
24       {
25         Socket s = ss.accept();
26         total++;
27         new Handler(s);
28       }
29     }catch(Exception ex)
30     {
31       ex.printStackTrace();
32     }
33   }
35   public class Handler implements Runnable
36   {
37     private Socket socket;
38     public Handler(Socket s)
39     {
40       this.socket = s;
41       System.out.println("Handler constructor - creating thread");
42       new Thread(this).start();
43     }
44     public void run()
45     {
46       System.out.println(Thread.currentThread().getName()+" - Handler - new thread created");
47       active++;
48       boolean loop = true;
49       try
50       {
51         BufferedReader in = new BufferedReader(new InputStreamReader(socket.getInputStream()));
52         DataOutputStream out = new DataOutputStream(socket.getOutputStream());
53         while(loop)
54         {
55           String cmd = in.readLine();
56           if(cmd.equals("QUIT"))
57           {
58             loop = false;
59             socket.close();
60           }
61           else if(cmd.equals("INFO"))
62           {
63             System.out.println("Active Connections : "+active);
64             System.out.println("Total Connections : "+total);
65           }
66           else
67           {
68             System.out.println("Command = "+cmd);
69           }
70         }
71       }catch(Exception ex)
72       {
73         ex.printStackTrace();
74       }
75       active--;
76     }
77   }
79   public static void main(String[] args)
80   {
81     System.out.println("Starting Daemon");
82     new testPool(Integer.parseInt(args[0]));
83   }
84 }

Compile and run the program

jayant@jayantbox:~/myprogs/java$ java -cp . testPool 5000
Starting Daemon
Thread pool constructor - creating thread
Thread pool - new thread created

Now, as you keep on connecting to localhost 5000 port, you can see that new threads are created. Even when the old connections are closed, the same thread is not used again but new threads are created.

so, if i do

telnet localhost 5000

from 3 consoles and issue the "INFO" command from 4th console, the output is

jayant@jayantbox:~/myprogs/java$ java -cp . testPool 5000
Starting Daemon
Thread pool constructor - creating thread
Thread pool - new thread created
Handler constructor - creating thread
Thread-1 - Handler - new thread created
Handler constructor - creating thread
Thread-2 - Handler - new thread created
Handler constructor - creating thread
Thread-3 - Handler - new thread created
Handler constructor - creating thread
Thread-4 - Handler - new thread created
Active Connections : 1
Total Connections : 4

4 threads are created. In this case, everytime you connect to the server, a new thread would be created.

Now, lets modify the code to work on java 1.5 using thread Executors...

Add the following code snippets at or after the following line numbers

4 import java.util.concurrent.*;
10  private int pool;
12  this.pool=2;
22  ExecutorService threadExecutor = Executors.newFixedThreadPool(pool);
27  threadExecutor.execute(new Handler(s));

And comment the following 2 lines

27  new Handler(s);
42  new Thread(this).start();

Not lets run the program again on port 5000 and try connecting to it. We have created a pool of 2 threads here. Observer that:

  • pool-1-thread-1 & pool-1-thread-2 are the two threads created.

  • After two connections to the server, the connections are not rejected, but are queued.

  • If a thread becomes available, a connection from the queue is assigned to the thread for processing

  • The same two threads are used again and again. No new threads are created.

  • Connections which are in queue are not able to process any requests unless they are assigned to a worker thread

Process for testing

console 1: telnet localhost 5000
console 2: telnet localhost 5000
console 3: telnet localhost 5000
console 1: from 1
console 2: from 2
console 3: from 3
console 1: INFO
console 1: 1 quitting
console 1: QUIT

Output for the testing process

jjayant@jayantbox:~/myprogs/java$ java -cp . testPool 5000
Starting Daemon
Thread pool constructor - creating thread
Thread pool - new thread created
Handler constructor - creating thread
pool-1-thread-1 - Handler - new thread created
Handler constructor - creating thread
pool-1-thread-2 - Handler - new thread created
Handler constructor - creating thread
Command = from 1
Command = from 2
Active Connections : 2
Total Connections : 3
Command = 1 quitting
pool-1-thread-1 - Handler - new thread created
Command = from 3

As seen, processing of connection 3 starts only after connection 1 is closed by the client.

The benefits of using a thread executor instead of a normal thread are

a] Overhead of creating and destroying threads is avoided.
b] A queue of threads is created automatically when the no of tasks exceeds the no of threads in the pool. The tasks in queue are automatically executed when the threads become free.
c] A limit (maximum no of threads) could be imposed thus controlling the available resources for delivering better performance.

Tuesday, August 05, 2008

Berkeley DB

I had a chance to look in BDB-XML some time back. But i did not find it that good in performance. BDB on the other hand promises fast storage and retrieval of key-value pairs. For the new-comers, BDB is a embedded database API which can be used to create your own database and fire a defined set of queries on it. BDB does not provide you with an SQL interface or even a database server. All you have is a set of API using which you can write programs to create your own database, populate data into the database, define your own cache and indexes and then use the database to retrieve values for a particular key using your own programs.

With BDB, there is no sql/query optimizer/query parser layer between you and the database.

There are certain types of BDB engines available:

1.] BDB -> the original BDB with c/c++ api. You write programs in c/c++ to create and access your database.
2.] BDB-JAVA -> The java api to BDB. The java api uses JNI in the backend to communicate with the BDB library (written in c/c++).
3.] BDB-JE -> The java edition of the BDB engine. It is a pure java implementation of the BDB engine. It does not use JNI for communication with the system.
4.] BDB-XML -> This is a very sophesticated version of BDB - where you can store XML documents and retrieve documents using any of the keys in the XML Document. You have an XQuery interface where you can fire XML based queries and retrieve results.

The original BDB is ofcourse the fastest.

For a startup, we will take a look at the DPL API of BDB-JAVA. DPL stands for Direct Persistence Layer and is used generally for storing and managing java objects in the database. DPL works best with a static database schema and requires java 1.5.

To create a database using DPL, you generally require an entity class and then create/open a database environment and insert the entity object into the entity store in the database environment. Sounds greek right ?? Lets see an example

Entity Class

import com.sleepycat.persist.*;
import com.sleepycat.db.*;
import com.sleepycat.persist.model.*;
import static com.sleepycat.persist.model.Relationship.*;

public class SimpleEntityClass {

   // Primary key is pKey
   private String pKey;

   // Secondary key is the sKey
   private String sKey;

   public SimpleEntityClass(String pk, String sk)
      this.pKey = pk;
      this.sKey = sk;

   public void setpKey(String data)
      pKey = data;

   public void setsKey(String data)
      sKey = data;

   public String getpKey()
      return pKey;

   public String getsKey()
      return sKey;

Then create a Database Access class which can insert and retrieve Entity objects from the database.

import java.io.*;
import com.sleepycat.db.*;
import com.sleepycat.persist.*;

public class SimpleDA
   PrimaryIndex pIdx;
   SecondaryIndex sIdx;

   EntityCursor pcursor;
   EntityCursor scursor;

   public SimpleDA(EntityStore store) throws Exception
      pIdx = store.getPrimaryIndex(String.class, SimpleEntityClass.class);
      sIdx = store.getSecondaryIndex(pIdx, String.class, "sKey");

   public void addEntry(String pk, String sk) throws DatabaseException
      pIdx.put(new SimpleEntityClass(pk, sk));

   public SimpleEntityClass findByPk(String pk) throws DatabaseException
      SimpleEntityClass found = pIdx.get(pk);
      return found;

   public ArrayList findBySk(String sk) throws DatabaseException
      ArrayList ret = new ArrayList();
      scursor = sIdx.subIndex(sk).entities();
      for(SimpleEntityClass sec1 = scursor.first(); sec1!=null; sec1 = scursor.next())
      return ret;

Create/open the environment to put and retrieve records from the database

import java.io.*;
import com.sleepycat.db.*;
import com.sleepycat.persist.*;

public class SimpleStore
   private static File envHome = new File("./bdbjava");
   private Environment env;
   private EntityStore store;
   private SimpleDA sda;

   public void setup() throws DatabaseException
   // put all config options here.
      EnvironmentConfig envConfig = new EnvironmentConfig();
      envConfig.setCacheSize(536870912); //512 MB
      envConfig.setCacheCount(2); // 2 caches of 256 MB each

      StoreConfig sConfig = new StoreConfig();

         env = new Environment(envHome, envConfig);
         store = new EntityStore(env, "MyDatabaseName", sConfig);
      }catch(Exception ex)

   public SimpleStore()

   public void putData(String pk, String sk) throws Exception
      sda = new SimpleDA(store);
      sda.addEntry(pk, sk);

   public void getDataPk(String pk) throws Exception
      sda = new SimpleDA(store);
      SimpleEntityClass sec = sda.findByPk(pk);
      System.out.println("pk = "+sec.getpKey()+", sk = "+sec.getsKey());

   public void getDataSk(String sk) throws Exception
      sda = new SimpleDA(store);
      ArrayList data = sda.findBySk(sk);
      for(int x=0; x<data.size(); x++)
         SimpleEntityClass sec = data.get(x);
         System.out.println("pk = "+sec.getpKey()+", sk = "+sec.getsKey());

   public void closeAll() throws Exception

   public static void main(String[] args)
      SimpleStore ss = new SimpleStore();




So, now you have your own program for creating entities of the type SimpleEntityClass and store them in the database in serialized form. These objects cab be retrieved using primary or secondary keys.

For the relationship between primary and secondary keys please refer to http://www.oracle.com/technology/documentation/berkeley-db/db/java/com/sleepycat/persist/model/Relationship.html

Since you would be storing complete objects instead of just the required data sets, the database size would be relatively high and that would slow down things a bit.

Friday, August 01, 2008

The Other World...

There is this world where you buy your tickets (be it movie tickets or travel tickets) online. Where you go to the big baazaar an swipe your credit card to make the payment. Where you call up for your car service and the service guys take the car from your place, service it and give it back to you with a bill. You use your credit card for most of the transactions and you rarely deal with cash. You sit in an air-conditioned office and punch a few keys on your computer and you get paid heavily for it. The world of a software related person.

And then there is this other world where everything is done through cash. You rely on the travel agent to get your rail and air tickets. And you get your movie tickets from the movie hall ticket window. You take your car to the service station yourself and stand there and get the service done in front of your eyes. You pay cash all the time and try to avoid getting bills to avoid paying the 12.XX% tax on the total amount. You are not eligible for a credit card cause you dont have any asset to show to the banks.

I had an encounter with the "other world" when i got wood work and painting done in my new flat. I came to know that the margin of profit could be increased for 10% to 60% by simply twisting the quality of materials. I also came to know the fact that the MRP quoted on most materials includes a huge profit margin and that you can actually bargain on the MRP.

When i went to get the first lot of wood i came to know that wood comes from 30Rs/sq ft to more than 100Rs/sq ft. And a increase of 5 Rs/sq ft can make a huge difference in the final cost. Dealing with the workers is a pain. For them you are a fresh,delicious and juicy piece of chicken almost ready for being chopped. And they would love to chop you irrespective of what you think. If you leave them to their will, they would chop you and fight over your pieces. For them even 5 Rs has a meaning. And then there are these people in the local market who only understand cash. If you look at their shop, you wont believe that the shop can have lakhs of rs in cash within itself. There are no guards to guard the shop. And they still have lakhs of cash with them and crores of raw materials just lying there. They dont have a credit card machine cause the bank wont give them any. They dont have credit cards themselves cause they dont have any proof of their huge income. They make lakhs in a month and dont file any income tax returns. They hardly pay any tax. And the fact may be that though they may be serving you, but they may be having more wealth and property than you do.

When i made my first payment of 46000 (cash) to the wood supplier, i filled up my office bag with rupees. I was scared of driving on the road and my hands were literally trembling when i was counting the amount i had to pay. But the shop owner was calm and composed as if it was his daily job. These guys dont know what the internet is or how to use it. All they know is how to make money. It is the other world - the actual world. For us this world is hidden behind a layer of service guys and internet.

Tuesday, July 15, 2008

murphy's law

Murphy's law states that "If anything can go wrong, it will".


I first encountered the murphy's law when i was in school. One of my friends mentioned it and we had a bet that there was no such law. And that friend of mine bought a big thick book on murphy's law to prove that he was right and there is actually such a law.

There has been numerous instances throughout my life which are in line with the murphy's law. But the most recent one was the best.

I was coming back from my in-laws place - chattarpur to delhi. The journey is divided into two parts - part 1 from chattarpur to jhansi by bus. And part 2 from jhansi to delhi by train. Part-1 has to be covered by bus only - and the number of government buses are very low - they are rarely seen. Major bus operators are private who want to make the most by filling up as many passengers as possible per trip. The distance from jhansi to chattarpur is 135 kms. And it takes approx 3 hours by bus. Imagine standing in a very crowded bus for 3 hours without voluntarily movine a muscle. All muscle movements are involuntary - inspired by the movements of the bus and the pot-holes on the road.

So, the story goes that my train was at 6:11 pm and so, taking a buffer of 2 hours, i left from home at 1 pm and caught the 1:30 bus which should have left me at jhansi by 4:30 - max 5:00 pm - which leaves me enough time to go and catch the train. The bus i sat in ran at very slow speed - stopping at all petty stops and even on the road to pick up & drop passengers here and there. I realized that we were running very slow when we covered half the distance in 2.5 hours. So by 4:00 pm we were still 75 kms away from Jhansi. It seems that the bus driver and the passengers both realized this fact at the same time as well - so the driver started over speeding.

I got some hope that we will reach by 5:30 max - still giving me half and hour to catch my train. But then the inevitable happened - a railway crossing and a jam. The train crossed 15 minutes later and it took 15 minutes to cross the jam. Now I was sure that i will miss my train. But the bus driver started speeding. I think its max speed was 65 kmph on which it ran most of the way. Still stopping here and there to pick up and drop passengers.

I started counting milestones. When i saw that we were 20 kms away from jhansi and it was 5:30, i knew that i will either miss my train or as usual the train would be 15 minutes late and i will be lucky enough to catch it. As soon as i saw the jhansi border i jumped off the bus and ran looking for an auto. It was 5:50 and i was desperately hoping that the train would be late.

I flagged down an auto which was carrying 4 passengers and asked him how much time would it take to go to the station. He told me 40 Rs and 10 minutes - by the watch. I told him - i will pay him 50 if he makes it in 10 minutes. All 4 passengers were dropped at the nearby bus stop and the race started. The auto-driver, an old guy tried his best to overtake other vehicles and ride ahead but it seems that autos are not made to race. They are steady means of transport and cannot go beyond 30 kmph.

I was getting nervous and at 6:00, i asked him how much time - he said 2 minutes more. And then out of no-where we were in front of a red light and there was a traffic cop asking all vehicles to stop. I never expected this to happen. It meant that due to this red-light i will be missing my train. My mind went back to school and i recalled my friend who had taught me the murphy's law. And i made up my mind that - as per murphy's law, i wont be able to catch the train. I will miss the train by seconds. But i may as well try to catch it. Maybe my luck would work out and the train would be late.

So, after 5 minutes instead of 2 the auto left me at the station. I handed him 50 rs and ran. Running all the way shouting - "bhaiya hatna" / "jane dena". I saw people moving out of my way - looking at me with an expression of "WTF" in their eyes. I ran up the stairs and towards the 5th platform on which the train generally is. But reaching there, i saw there was no train. I was going to give up when i thought why not get an idea - by how much time i missed the train. And so, i asked a passerby. And he told me that the train was a bit ahead on the platform.

My heart leaped. And i jumped down the stairs. There was a cop standing there and he looked at me with tons of suspicion in his eyes. I asked him the bogie no - "C8" and he realized that i have come late. He told me go straight ahead and I again ran. It has been ages since i had done any physical exercise. And i was out of my breath. I had a heavy bag on by back. And i was sweating like anything. My whole body was paining and asking me to relax & slow down. But i still continued and finally jumped into the C8 coach.

Rushed inside and found my seat. The train whistled and started moving. I looked at my watch and it was 06:12. The trains are generally late by 10-15 minutes. But this time, as per murphy's law, he train was exactly on time.

If anything can go wrong, it will...

Thursday, June 26, 2008


Yes, i have GTA 4. I have been playing on xbox for some time. Halo 3 was good. Awesome visuals - though my TV is not a HDTV and is just 20 inches - around 6 years old cheap one. But still the thrill of playing Halo 3, watching the visuals was good.

Mass Effect was still better. The amount of information associated with each character. How each character reacts to which situation. The power to choose between two different decisions. It was good in its own way.

And now it is GTA 4. I had played the original gta which was a 2d game on my home pc. But that time i was in college and the game was a simple demo. So, though i enjoyed it a lot, but still i wanted more. But, at that time purchasing a game was like a throwing money away. So, i simply enjoyed the 15 minute demo version of GTA.

And then on this sunday, i got the GTA-IV to run on my xbox. I was looking for Gears Of War. But it was not available. The shop keeper instead shoved a GTA-4 under my nose. And i was surprised to see a copy of the most wanted game available so easily. I jumped on it. Checked the seals to ensure that it was orignial and then got it for a hefty 2500/-. That is 2 months salary to my maid. Or 20-25 movies. Or a 80 GB HDD. And instead i spent so much on a single DVD of game.

Anyways, playing GTA is a good experience. In addition to stealing cars and beating up people for money, i could call and fix up meetings and dates with my friend and girl-friend. I could go to strip clubs and play pool. I could go on the internet inside GTA and search for information online. It is like a city where i could do whatever i want.

I have just completed 3-4 missions and am looking forward to more time to spend on it.

Few intersting links:

Missions : http://www.gta4.net/missions/
Official Cheats : http://www.gta4.net/cheats/
Other cheats : http://www.cheatcc.com/xbox360/grandtheftauto4cheatscodes.html

Wednesday, June 18, 2008

QA Tester Versus Developer

How Roshan D'Mello (QA Tester) frustrates developer (Mukesh Thakur)

Roshan D'Mello: Hey Mukesh, there is a bug in your code. Type a text in user name text box and press enter. Beep sound doesn't appear.

Mukesh Thakur: How can that be a bug? There is no requirement that beep sound should come. Anyway, I will assign it to offshore and get it fixed.

After 2 days,

Mukesh Thakur: Roshan, bug is fixed. Please verify.

After another 2 days,

Roshan D'Mello: I have re-opened the bug because sound is not coming in some PCs. Sound is coming in my machine, but my colleague Rajat Choudhry is not getting the sound.

After another 2 days,

Mukesh Thakur: Not a bug. I observed that your friend Rajat Choudhry has old IBM machine. Unlike your DELL machine, IBM machines do not have inbuilt speakers. So, to hear the sound in Rajat Choudhry's machine, please use head phones and then get the bug closed soon.

Another 2 days,

Roshan D'Mello: I have re-opened the bug because sound tone is different across different machines. Sound is coming as 'BEEP' in my machine, but my colleague Rajat Choudhry who is having IBM machine is getting the sound as 'TONG'.

Mukesh Thakur: Not a bug. Get lost man. What can we do for the bug? The two machines are built in such a way that they produce different sounds. Do you expect the developers to rebuild the IBM processors to make them uniform? Please close it.

Another 2 days,

Roshan D'Mello: I have re-opened the bug because intensity of beep sound produced on 2 different DELL machines is different. My machine produces beep sound of intensity 10 decibels whereas my friend's machine produces sound worth 20 decibels. Fix your code to make the sound uniform across all machines.

Another 2 days later,

Mukesh Thakur: Once again it is not a bug. I have noticed that the volume set is different on the two machines. Ensure that volume is same in both the machines before I get mad and then close the bug.

Another 2 days,

Roshan D'Mello: I have re-opened the bug.

Mukesh Thakur: What ?? Why? What more stupid reasons can be there for re-opening?

Roshan D'Mello: Sound intensity is different for machines placed at different locations (different buildings). So, I have re-opened it.

After 2 days,

Mukesh Thakur: I have made some scientists do an acoustical analysis of the two buildings you used for testing. They have observed that the acoustics in the two buildings varies to a large extent. That is why sound intensity is different across the 2 buildings. So, I beg you to please close the bugs.

After 1 year

Roshan D'Mello: I am re-opeing the bug. During the year, I requested the clients to arrange architects to build two buildings with same acoustical features, so that I can test it again. Now, when I tested, I found that intensity of sound still varying. So, I am re-opening the defect.

Mukesh Thakur: GROWLLLL.....I am really mad now. I am sure that the sound waves of the two buildings are getting distorted due to some background noise or something. Now I need to waste time to prove that it is because of background noise.

Roshan D'Mello: No need for that. We will put the machines and run them in vacuum and see.

Mukesh Thakur: (not alive)

Friday, June 13, 2008

vim configuration basic

Have you ever thought about configuring vim? The same vim that you use for editing your files (you should be very familiar with vi editor if you are using unix-like systems). Yes, the editor is highly configurable.

Here are some of the basic tips for vim - Vi Improved 7.1

The global configuration file resides at /etc/vim/vimrc. And the local configuration file resides at your home folder. So if you are logged in as jayant and your home folder is /home/jayant, then your local configuration file would be /home/jayant/.vimrc. If you do not see color in your vi editor, you can do the following

syntax on
set background = dark

I have set the following as my default configuration

jayant@jayantbox:~$ cat .vimrc
set autoindent
set cmdheight=2 "command bar is 2 high
set backspace=indent,eol,start "set backspace function
set hlsearch "highlight searched things
set incsearch "incremental search
set ignorecase "ignore case
set textwidth=0
set autoread "auto read when file is changed from outside
set ruler "show current position
" set nu "show line number
set showmatch "show maching braces
set shiftwidth=2
set tabstop=2
set gfn=Courier\ 12
set t_Co=256
colorscheme oceandeep

This sets the tab width to 2 chars instead of the default 8. Color Scheme is changed to oceandeep. You can get Color schemes for vim from http://www.vim.org/scripts/script_search_results.php?keywords=&script_type=color+scheme&order_by=creation_date&direction=descending&search=search. Your color schemes have to be put in <home>/.vim/colors folder. Auto indenting has been turned on, so you dont need to press tab to indent your code.

Check out my vim using the oceandeep color scheme

Wednesday, June 04, 2008

MySQL versus PostgreSQL - part II

My earlier post mysql versus postgresql brought me lots of negative comments - that i did not compare the transactional database of pgsql with the transactional engine (innodb) of mysql. The main reason why i did not do that was because i had found InnoDB to be very slow as compared to MyISAM.

But after all those comments i ran the benchmarks again using the same scripts and the same technology on the same machine (my laptop) and here are the results. I created a new table in both Mysql (using InnoDB engine) and pgsql. And i disabled the binary logging in mysql to speed up insert/update/delete queries. Please refer to the earlier post for the setup information.

Following notification would be used :

<operation(select/insert/update/delete)> : <no_of_threads> X <operations_per_thread>

  • Firstly i ran single thread with inserts both before and after disabling binary logging in mysql
    Mysql Insert : 1 X 100000
    Time : 65.22 Sec (binary logging enabled)
    Time : 32.62 Sec (binary logging disabled)
    So disabling binary logging in mysql would make your insert/update/delete queries take half the time.
    Pgsql Insert : 1 X 100000
    Time : 53.07 Sec
    Inserts in mysql are very fast.

  • Selects : 2 X 100000
    Mysql time : 30.1 Sec
    Pgsql time : 29.92 Sec
    Both are same

  • Updates : 2 X 50000
    Mysql time : 29.38 Sec
    Pgsql time : 36.98 Sec
    Mysql updates are faster

  • Ran 4 threads with different no_of_operations/thread
    Run 1 [Select : 1 X 100000, Insert : 1 X 50000, Update : 1 X 50000, Delete : 1 X 20000]
    Mysql time : 40.86 Sec
    Pgsql time : 45.03 Sec
    Run 2 [Select : 1 X 100000, Insert : 1 X 100000, Update : 1 X 50000, Delete : 1 X 10000]
    Mysql time : 49.91 Sec
    Pgsql time : 63.38 Sec
    Run 3 [Select : 1 X 100000, Insert : 1 X 20000, Update : 1 X 20000, Delete : 1 X 1000]
    Mysql time : 29.83 Sec
    Pgsql time : 29.3 Sec
    It could be seen that increasing the amount of insert/update/delete queries affects the performance of pgsql. Pgsql would perform better if number of selects are very high. Whereas mysql-innodb performs better in all cases

  • Had 4 runs with different no of threads.
    Run 1: 12 threads [Select : 2X30000 + 3X20000, Insert : 1X20000 + 2X10000, Update : 2X10000, Delete : 2X1000]
    Mysql time : 31.16 Sec
    Pgsql time : 30.46 Sec
    Run 2: 12 threads [Select : 2X50000 + 2X40000 + 1X30000, Insert : 1X20000 + 2X15000, Update : 2X15000, Delete : 2X2000]
    Mysql time : 52.25 Sec
    Pgsql time : 53.03 Sec
    Run 3: 20 Threads [Select : 4X50000 + 4X40000 + 2X30000, Insert : 2X20000 + 3X15000, Update : 2X20000 + 1X15000, Delete : 2X5000]
    Mysql time : 169.81 Sec
    Pgsql time : 136.04 Sec
    Run 4: 30 Threads [Select : 2X50000 + 3X40000 + 3X30000 + 3X20000 + 4X10000, Insert : 1X30000 + 2X20000 + 3X10000, Update : 3X20000 + 3X10000, Delete : 1X10000 + 2X5000]
    Mysql time : 200.25 Sec
    Pgsql time : 156.9 Sec
    So, it can be said that for a small system with less concurrency, mysql would perform better. But as concurrency increases, pgsql would perform better. I also saw that while running the pgsql benchmark, the system load was twice than while running mysql benchmark.

Enabling mysql binary logging for replication would ofcourse add an over head. Similarly enabling trigger based replication in pgsql would be another overhead. The fact that replication in mysql is very closely linked with the database server helps in making a high availability system easier. Whereas creating slaves using replication in pgsql is not that easy. All available products for replication in pgsql are external - 3rd party softwares. Still, for a high concurrency system pgsql would be a better choice.