Saturday, December 30, 2006

mysql cluster

The NDB mysql cluster is an in-memory storage engine which offers high availability and data-persistence features. Mysql cluster is available with mysql-5.1 and with mysql5.0-max binaries. This can also be compiled after obtaining the source and the binaries can be used.

We will have a quick start to mysql ndb cluster here...
The basics first :

There are 3 nodes/services necessary to start when you are planning to run a mysql cluster

1. The NDB management node daemon ndb_mgmd.
2. The SQL node - a simple mysql server (mysqld).
3. Data nodes - running NDBD daemon

The best configuration would be to have at least 4 differet machines to run the mysql cluster on. The management node could be any small machine. All the management node does is check the cluster on periodic basis and other management functionalities like stop a node, start another node and stuff like that.

The Data nodes need to be high configuration machines - having lots of RAM. This is because the mysql cluster is a memory based cluster which means that the data resides in memory. The more memory you have the better the engine would perform.

And finally the SQL node should be a good machine since it will be doing some processing also. Though it should not require high amounts of RAM.

Steps to install and get mysql cluster running :

step 1:
install mysql 5.1 (binary) on 3 machines (2 data nodes and 1 sql node).
my.cnf on these nodes can be as below:

[mysqld]
ndbcluster
ndb-connectstring=ip.of.mgmnode

[mysql_cluster]
ndb-connectstring=ip.of.mgmnode

step 2:
untar the binary of mysql-5.1 and copy the files ndb_mgm & ndb_mgmd to /usr/local/bin
There is no need to keep the binary of mysql. Once the ndb_mgm* files are copied to bin directory, the untared binary of mysql can be deleted.

step 3:
configuring the mgm node.
create a directory "mkdir /var/lib/mysql-cluster".
create a file config.ini and incorporate mysql cluster settings in it

config.ini
[NDB DEFAULT] #affects settings of all data nodes
#No of replicas. Must divide evenly into the no of data nodes. So for 2 data nodes the no of replicas can be 1 or 2.
NoOfReplicas=2
# Memory allocated for data storage. Used for storing actual records and indexes. The memory space allocated consists of 32K pages. Once a page is allocated, it cannot be returned to the free pool, except by deleting the table. Default value is 80 MB. Minimum is 1 MB and max is dependent on available system memory.
DataMemory=1024M
# Memory allocated for Index storage (Hash indexes in mysql cluster). Hash indexes are used for primary key, unique indexes. Default value is 18 MB. Minimum is 1 MB and max is dependent on available system memory.
IndexMemory=128M
# Log levels. Used to log messages for various events. By default all loggings are disabled. For our reference we would set the log level to 5 for Startup messages and 10 for Error messages/warnings. Minimum level is 0 (disabled) and maximum is 15.
LogLevelStartup=5
LogLevelError=10

[TCP DEFAULT] # TCP/IP options
# no settings required for normal setup

[NDB_MGMD] # Options for the management process
hostname=ip.of.mgmnode
# Data directory - Directory where output files from the management server will be placed. It includes cluster log files, process output files and daemon's pid file.
datadir=/var/lib/mysql-cluster

[NDBD] # Options for data node "A"
hostname=ip.of.datanode_A
# datadir specifies the directory on data node "A" where trace, log, pid and error log files are placed
datadir=/usr/local/mysql/data # assuming the path on node "A"

[NDBD] # Options for data node "B"
hostname=ip.of.datanode_B
datadir=/usr/local/mysql/data # assuming the path on node "B"

[MYSQLD] # Options for the SQL node
hostname=ip.of.sqlnode

step 4:
Start the cluster. Hang on. The way to start the cluster is to first start the management node. Use the following command on management node machine.

ndb_mgmd -f /var/lib/mysql-cluster/config.ini

now run "ndb_mgm". It will connect to the mgm daemon on port 1186. use the "show" command to list the current settings.

ndb_mgm> show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 (not connected, accepting connect from ip.of.datanode_A)
id=3 (not connected, accepting connect from ip.of.datanode_B)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @ip.of.mgmnode (Version: 5.1.14)

[mysqld(API)] 1 node(s)
id=4 (not connected, accepting connect from ip.of.sqlnode)


Now start the data nodes. On the data nodes run the following command.

./ndbd --initial

The --initial command deletes all log files and clears the data on node machines. Use this parameter only for starting the data node for the first time.

And start the sql node like when you start a mysql server

./bin/mysqld_safe --user=mysql &

And run the "show" command on the mgm node again.

ndb_mgm> show
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=2 @ip.of.datanode_A (Version: 5.1.14, Nodegroup: 0, Master)
id=3 @ip.of.datanode_B (Version: 5.1.14, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @ip.of.mgmnode (Version: 5.1.14)

[mysqld(API)] 1 node(s)
id=4 @ip.of.sqlnode (Version: 5.1.14)



Ok, now your mysql cluster is up and running.

If you need to restart the mysql cluster safely run the following commands

ndb_mgm -e shutdown # shuts down the mysql management and data nodes
ndb_mgmd -f /var/lib/mysql-cluster/config.ini #start the management node
ndbd # run this on the data nodes to start mysql cluster on data nodes
./bin/mysqld_safe --user=mysql & # And also shutdown and restart the sql node - like you shutdown and restart a normal mysql server.

To create a table using NDB cluster specify "ENGINE=NDBCLUSTER" in the create table syntax. Eg:

Create table testndb (id int, val varchar(255) ENGINE=NDBCLUSTER;

One error that i encountered while working with the NDB cluster is "ERROR 1114 (HY000): The table '#sql-15c1_2' is full". After a bit of digging i found the following in the log files on the mgm node.

2006-12-30 12:50:59 [MgmSrvr] INFO -- Node 3: Data usage increased to 80%(13205 32K pages of total 16384)
2006-12-30 12:50:59 [MgmSrvr] INFO -- Node 2: Data usage increased to 80%(13220 32K pages of total 16384)
2006-12-30 12:51:29 [MgmSrvr] INFO -- Node 3: Data usage increased to 90%(14757 32K pages of total 16384)
2006-12-30 12:51:29 [MgmSrvr] INFO -- Node 2: Data usage increased to 90%(14769 32K pages of total 16384)
2006-12-30 12:51:45 [MgmSrvr] INFO -- Node 2: Local checkpoint 12 started. Keep GCI = 168 oldest restorable GCI = 192
2006-12-30 12:52:13 [MgmSrvr] INFO -- Node 3: Data usage decreased to 86%(14112 32K pages of total 16384)
2006-12-30 12:52:13 [MgmSrvr] INFO -- Node 2: Data usage decreased to 86%(14109 32K pages of total 16384)
2006-12-30 12:52:17 [MgmSrvr] INFO -- Node 3: Data usage decreased to 74%(12212 32K pages of total 16384)
2006-12-30 12:52:17 [MgmSrvr] INFO -- Node 2: Data usage decreased to 74%(12207 32K pages of total 16384)
2006-12-30 12:52:33 [MgmSrvr] INFO -- Node 2: Local checkpoint 13 started. Keep GCI = 193 oldest restorable GCI = 192


I had allocated 512MB as the DataMemory, which was getting exhausted. So i increased the parameter to 1024M (1G) and since then the things are running fine.

The only problem i am facing is that I need huge amounts of RAM for storing the NDB tables. And i am looking for an option to reduce RAM usage and use simple data on disk clusters.

References:
Mysql Cluster Configuration
Mysql Cluster Complete documentation

No comments: