Saturday, October 31, 2009

Installing mogilefs for dummies

MogileFS is an open source distributed filesystem created by Danga Interactive to be used by LiveJournal project. Its features include

  • No single point of failure

  • Automatic file replication - satisfying the number of replica counts specified in the configuration

  • Flat namespace - Files are identified by named keys in a flat, global namespace. You can create as many namespaces as you'd like, so multiple applications with potentially conflicting keys can run on the same MogileFS installation.

  • Shared-Nothing - MogileFS doesn't depend on a pricey SAN with shared disks. Every machine maintains its own local disks.

  • No RAID required - RAID doesn't buy you any safety that MogileFS doesn't already provide.

  • Local filesystem agnostic - local disks for mogilefs storage nodes can be formatted with the filesystem of choice (ext3, xfs, etc...)



Files inside mogilefs cannot be accessed directly. You need specific APIs to access the files and there are client implementations for several
languages:

Perl - http://search.cpan.org/~bradfitz/MogileFS-Client/
Java - http://github.com/eml/java-mogilefs
Ruby - http://seattlerb.rubyforge.org/mogilefs-client/
PHP - http://projects.usrportage.de/index.fcgi/php-mogilefs
Python - http://www.albany.edu/~ja6447/mogilefs.py

We will skip directly to the installation. You will have to install the following modules. You can use the cpan shell to install these modules. For novice perl users, simply type cpan from the command prompt (sudo cpan) and type out install >module_name< on the prompt. Install the following modules.

IO::AIO
Danga::Socket
Gearman::Server
Gearman::Client
Gearman::Client::Async
Net::Netmask
Perlbal
Sys::Syscall
IO::Stringy

Cpan would automatically check the dependencies and install the required modules. If you are not interested in the latest release of mogilefs, you can go ahead and install the mogilefs from cpan itself. Install these modules as well.

MogileFS::Server
MogileFS::Utils
MogileFS::Client

If you prefer latest versions, just get the files and install them individually. With latest versions, your earlier installations of dependencies using cpan may not work - in which case you will need to get the files from www.cpan.org and install them.

To install mogilefs do the following


  • Checkout the latest code from svn
    $ mkdir mogilefs-src
    $ cd mogilefs-src
    $ svn checkout http://code.sixapart.com/svn/mogilefs/trunk

  • Create database mogilefs and grant necessary permissions
    $ mysql
    mysql> CREATE DATABASE mogilefs;
    mysql> GRANT ALL ON mogilefs.* TO 'mogile'@'%';
    mysql> SET PASSWORD FOR 'mogile'@'%' = OLD_PASSWORD( 'sekrit' );
    mysql> FLUSH PRIVILEGES;
    mysql> quit

  • Set up trackers and storage servers. Install the mogile server.
    $ cd <path to mogilefs-src>trunk/server/
    $ perl Makefile.PL
    $ make
    $ make test
    $ make install
    If during make test, you get a mysql related connection error, it could be safely ignored - assuming that you have mysql installed and perl-mysql connection (DBD::MySQL) configured.
    Now lets install some utilities:
    $ cd <path to mogilefs-src>trunk/utils/
    $ perl Makefile.PL
    $ make
    $ make test
    $ make install
    And the perl api:
    $ cd <path to mogilefs-src>trunk/api/perl/
    $ perl Makefile.PL
    $ make
    $ make test
    $ make install

  • Configure the database
    $ ./mogdbsetup --dbhost=mogiledb.yourdomain.com --dbname=mogilefs --dbuser=mogile --dbpass=sekrit

  • Create the configuration files
    $ mkdir /etc/mogilefs/
    $ cp <path to mogilefs-src>trunk/server/conf/*.conf /etc/mogilefs/
    Edit the configuration files.
    $ vim /etc/mogilefs/mogilefsd.conf

    #Configuration for MogileFS daemon
    db_dsn = DBI:mysql:mogilefs:host=mogiledb.yourdomain.com
    db_user = mogile
    db_pass = sekrit
    listen = 127.0.0.1:7001 # IP:PORT to listen on for mogilefs client requests

    $ vim /etc/mogilefs/mogstored.conf

    #Configuration for storage nodes
    maxconns = 10000
    httplisten = 0.0.0.0:7500
    mgmtlisten = 0.0.0.0:7501
    docroot = /home/mogile/mogdata #where data will be stored

  • Create the user mogile.
    $ adduser mogile

  • Start the storage node
    $ mogstored --daemon

  • Start the tracker node as mogile user
    $ su - mogile
    $ mogilefsd -c /etc/mogilefs/mogilefsd.conf --daemon

  • Now that we have the trackers & storage nodes up lets tell the tracker that a storage node is available.
    $ mogadm --trackers=<tracker_ip>:7001 host add <storage_node_name> --ip=127.0.0.1 --port=7500 --status=alive
    And check that the host is being recognized.
    $ mogadm --trackers=<tracker_ip>:7001 host list

  • Add a device to the storage node where files would be kept.
    $ mogadm --trackers=<tracker_ip>:7001 device add <storage_node_name> 1
    And create the directory for the device (dev1 in our case).
    $ mkdir -p /home/mogile/mogdata/dev1
    Check that the device information is being displayed
    $ mogadm --trackers=<tracker_ip>:7001 device list

  • Now we are up and running. Next step would be creating namespaces and adding files.
    We could create domains and classes withing the domains using the mogadm utility and then use some api to add files to the class.
    create domain
    $ mogadm --trackers=<tracker_ip>:7001 domain add <domain_name>
    check if domain has been added
    $ mogadm --trackers=<tracker_ip>:7001 domain list
    create a class in the domain
    $ mogadm --trackers=<tracker_ip>:7001 class add <domain_name> <class_name>
    check if class has been added
    $ mogadm --trackers=<tracker_ip>:7001 class list



You can use the stats command to see a summary of the status of mogilefs.

$ mogadm --trackers=<tracker_ip>:7001 stats

That finishes the tutorial for installing mogilefs on a single node. You can easily replicate these steps on multiple servers creating a number of trackers and storage nodes. All that is needed is to add all the storage nodes to the trackers and using the same db for storing all the information.

If you have built up enough redundancy with more than 3 storage nodes & trackers, there should not be any point of failure. The only single point of failure that i could figure out was the database. You should create a slave to use it for failover scenarios.

No comments: