High availability and scalability of fedora commons

HA and Scalability of Fedora Commons.

If you trying to run a media rich web site , it is better to separate serving media and static content such as audio,video, images ,css,js etc to a server other than your application server.

CDN is the solution for these kind of scaling issues .But cost of CDN is not cheap. if you do not need scaling across the globe/different locations, it is much better and cheaper to have a separate media server farm doing this work for you.

Fedora commons is a open source digital object repository server which we can use as the media server , the main uses of it being

  • Store and manage digital assets (audio,video files and images).
  • Serve images,css and other static content to web clients.

Unfortunately Fedora commons does not yet have a working clustering solution although it is a web application running inside a tomcat server.

So we built a scalable solution using a load balancer fronting a group of fedora tomcat servers which use the file system exposed through GlusterFS, an open source clustered storage solution from http://www.gluster.org/

GlusterFS server is started on 2 separate amazon ec2 instances to achieve high availability using separate volumes mounted on each server. Then this file system is exposed and mounted through GlusterFS client on each of the instances where Fedora Tomcat servers are running. Writing to the filesystem exposed through this GlusterFS client will replicate the data in both the GlusterFS Server volumes.

Scaling Fedora Commons using Tomcat cluster and GlusterFS involves

  • Set up GlusterFS server and clients
  • Set up fedora tomcat servers.
  • Update the Fedora code to support clustering.

1.GlusterFS setup

Environment : Ubuntu LTS, GlusterFS 3.0.2 – 2 GlusterFS servers for availability , n number of GlusterFS clients.

Refer to the following links to install GlusterFS

http://www.gluster.com/community/documentation/index.php/GlusterFS_on_Scalr/EC2

http://www.howtoforge.com/high-availability-storage-cluster-with-glusterfs-on-ubuntu-p2

http://www.gluster.com/community/documentation/index.php/Storage_Server_Installation_and_Configuration

After installation , log into the GlusterFS server instance.

Edit the glusterfs config file.

vi /etc/glusterfs/glusterfsd.vol

make sure you have similar entries as below

create and mount an EBS volume on each of the GlusterFS server instances.

ec2-create-volume –size 500

if you need 500 GB of data to be stored and served through GFS , for replication you need 2 volumes of 500 GB each.

Then attach and mount the newly created volume

ec2-attach-volume vol-id -i <your instance ID> -d /dev/sd<x>

/bin/mount -a

Start the GFS server on both the servers

/etc/init.d/glusterfs-server start

Log into glusterfs client instances .

Edit the glusterfs client config file

vi /etc/glusterfs/glusterfs.vol

make sure you have similar entries as below

volume gfsserver1

type protocol/client

option transport-type tcp

option remote-host ip-gfsserver-1

option transport.socket.nodelay on

option transport.remote-port 6996

option remote-subvolume brick1

option username user

option password pass

end-volume

volume gfsserver2

type protocol/client

option transport-type tcp

option remote-host ip-gfsserver-2

option transport.socket.nodelay on

option transport.remote-port 6996

option remote-subvolume brick1

option username user

option password pass

end-volume

volume mirror-0

type cluster/replicate

subvolumes gfsserver1 gfsserver2

end-volume

volume writebehind

type performance/write-behind

option cache-size 2048MB

subvolumes mirror-0

end-volume

volume readahead

type performance/read-ahead

option page-count 4

subvolumes writebehind

end-volume

volume iocache

type performance/io-cache

option cache-size `grep ‘MemTotal’ /proc/meminfo | awk ‘{print $2 * 0.2 / 1024}’ | cut -f1 -d.`MB

option cache-timeout 1

subvolumes readahead

end-volume

start the GFS client on all your Fedora Tomcat server instances.

/usr/sbin/glusterfs <fedora install dir>/data/datastreams (Start this after the next 2 steps , look into the next step on why this is needed)

2. Fedora setup

Read through http://www.fedora-commons.org/documentation/3.0b1/userdocs/distribution/installation.html for basic installation .

3. Update the fedora config and code

Edit config files to change the data store directory since we will be mounting a single replicated folder which will have both datastreams and objects folder in it .If you try to share the datasource directory as a whole , fedora complains of sharing violation during execution .so just share the data folders and everything else is left for the individual fedora instances to maintain.

vi /<fedora install dir>/server/config/fedora.fcfg

make sure you have similar settings as below.

<param name=”object_store_base” value=”data/datastreams/objects” isFilePath=”true”></param>

<param name=”datastream_store_base” value=”data/datastreams” isFilePath=”true”></param>

Now remove the folder datastreams and objects under /<fedora install dir>/data/ folder .

Start the glusterfs client to start sharing the folder

/usr/sbin/glusterfs <fedora install dir>/data/datastreams

Fedora also needs a temp directory when digital objects are uploaded into it. This folder needs to be on the share too , other wise load balanced requests will not be able to find the tmp uploaded file.

So remove the folder / <fedora install dir>/server/management/upload

Create a folder at /<fedora install dir>/data/datastreams/upload

create a link named upload to this shared folder

ln -s /<fedora install dir>/data/datastreams/upload/ upload

edit and compile DefaultManagement.java in fedora commons webapp in tomcat (you may need to download fedora sources if you do not already have)

/<fedora tomcat installation>/webapps/fedora/WEB-INF/classes/fedora/server/management/ DefaultManagement.class

In the getTempStream( method , we need to disable in memory checking for file exists .This is nto the best solution but works well. Comment out the foll line as shown

//if (m_uploadStartTime.get(internalId) != null)

if (1 == 1) {

// found… return inputstream

try {

return new FileInputStream(new File(m_tempDir, internalId));

} catch (Exception e) {

throw new StreamReadException(e.getMessage());

}

} else {

..

….

Do the same on all tomcat instances hosting fedora webapp and restart them , now whenever you create a digital object through the load balancer , no matter which server the request goes to , fedora will work fine and glusterfs will take care of replication and availability.

We can fire up as many fedora tomcats with glusterfs clients as we want to enable in auto scale.

Is Mysql Cluster Production ready

Is Mysql Cluster Production ready?

Setting up a mysql cluster is kind of easy on non-cloud environment by following http://dev.mysql.com/doc/mysql-cluster-excerpt/5.1/en/mysql-cluster-multi-install.html. But when you try to install on cloud (Amazon ec2) for a production environment with auto scale or at least auto heal it is not as straight as it looks, mainly because of few reasons,

  • Cluster nodes (ec2 instances) go down and new instances replace them which have new ip addresses.
  • The process of these NDBDs (re)joining the cluster is real slow.
  • Start sequence of mgmt node, api nodes being such an important issue for the cluster to be stable.
  • During add or restart of cluster nodes the overall health of the cluster was never predictable.

If you need a bare minimum HA & scalability solution for your database layer using cloud and clustering technologies, you will have to

  • Set up the mysql cluster and place it behind a load balancer of some sort for HA.
  • Try to configure auto heal / scale in amazon for HA and Scaling.

This is just a log of all the pain points we saw during install of such a setup using Mysql cluster GPL 7.1.3 on Ubuntu (x86_64) with Amazon EC2, AMI, ELB, S3 and Auto Scale.

We have a auto scaled web server (tomcat) layer connect to a 2 node mysql cluster through amazon’s elastic load balancer.

In this setup, we have 2 sql nodes,2 data nodes and 2 management nodes. Further, both data node and sql nodes run on the same ec2 instance, because if data node or sql node goes down, we treat the whole instance is down and fire another instance through amazon auto scale.

Mysql cluster setup needs the host name of all cluster members mentioned in the config.ini file and each of the my.ini file needs to have host name of the management server. This has issues in virtualized environment like AWS where we cannot depend on fixed set of host names or ip addresses. You can get around this issue by using your own DNS server or dynDNS or by playing around with the /etc/hosts file.

Trying out the hosts approach :

Once the mysql cluster is up and running, update the host names of all nodes in the cluster into the /etc/hosts file , sync it among all the nodes and place it in a easily accessible location, like an Amazon S3 location.

Then write a start up script on your ec2 instance to

  1. on initial boot , get the hosts file from s3(assuming you have the latest there) , update its own ip address into it , upload it back to s3 and call some kind of wake up script on other mysql nodes to update their own hosts file.
  2. on the mysql data+api node, Once the ip address work is done , update the script to start the ndbd with the –initial option so that the latest data is synched into it , wait till the ndbd is started and then start mysql api node.
  3. On reboot , do similar things except updating the hosts file.
  4. On mysql mgmt node , update the script to start the ndb_mgmd .

Create AMIs separately for mgmt node and data+api node.Once you have those AMIs you can start playing with the auto scale feature at Amazon.To use auto scaling, you need to create a load balancer, create a launch config on the AMI, finally create a auto scaling group on the launch config.

1. Create a elastic load balancer

elb-create-lb clusterdb-lb –availability-zones us-east-1b –listener “protocol=tcp,lb-port=3306,instance-port=3306”

2. Create the auto scale launch configuration based on the earlier created AMIs

as-create-launch-config clusterdb-lc –image-id ami-id –group <security group name> –instance-type m1.xlarge

3. Create the auto scaling group based on the earlier created launch config and fix it to the load balancer created before in step 1. since min-size and max-size is kept the same , it is not really auto scale but auto heal or auto restore. The following config makes sure we have 2 nodes all the time.

as-create-auto-scaling-group clusterdb-asg –launch-configuration clusterdb-lc –availability-zones us-east-1b –min-size 2 –max-size 2 –load-balancers clusterdb-lb

So that completes our mysql cluster setup for auto heal/restore, at least in theory. But there are many issues here related to.

1. IP Address / Host name changes

SSH clients start throwing out error messages because we changed the hosts file.

Change in single mysql node host change needs a complete cluster rolling restart.

We have not handled synchronizing hosts file access when more than one ec2 instance comes up at the same time.

Solution for these issues is to may be use a separate DNS server, which we still need to try.

2. ndbd, ndb_mgm, mysqld lifecycles.

Initial start of a new ndbd node sometimes took more than 30 minutes and was not always consistent and our DB had just 150 tables, total size of the dump being 10 MB.

Adding a new ndbd node did not bring back the cluster health and required manual restarts of the whole cluster.

Adding a new node requires restart of ndb_mgms, ndbds and finally mysqlds and in that order only.

Mysql cluster being such a delicate tool, it doesn’t give enough confidence for using it in production.

Other alternatives for HA and Scalability needs

1. master-master replication, one acting as a hot standby

2. Use dynDNS or a dns server itself

3. Try oracle’s commercial product, mysql cluster manager.


Connecting to a mysql cluster

Without a load balancer (ELB ) , you can connect to the mysql cluster through a datasource configured in your servlet container or in your spring config , basically you will need a apache commons datasource driver or something similar which support auto health check of the connections in your pool.

At a minimum , you need the following in your datasource config

<property name=”validationQuery”> <value>/* ping */</value> </property>

<property name=”testOnBorrow”> <value>true</value></property>

If your app does not use a datasource , you can use a connection pool configured in your servlet container with a comma separated mysql hosts. I am using Apache roller weblogger in which I did not find how to configure a datasource directly in its configuration. So I have the datasource definition defined at conf/context.xml file of my tomcat which does load balancing.

<Resource name=”jdbc/rollerDB” auth=”Container” type=”javax.sql.DataSource”

validationQuery=”/* ping */” testOnBorrow=”true” testOnReturn=”true”

maxActive=”100″ maxIdle=”30″ maxWait=”10000″

username=”xxx” password=”xxx” driverClassName=”com.mysql.jdbc.Driver”

url=”jdbc:mysql://clusternode1, clusternode2:3306/dbshcema?autoReconnect=true&amp;loadBalanceBlacklistTimeout=50&amp;autoCommit=true”/>

of course, if you have load balancer fronting your mysql cluster , you will not need the datasource to do any testing of corrupt connections though it is recommended.

One issue you need to know before using an elastic load balancer is “host blocked connection error”

Warning: mysql_connect(): Host ‘xxxx’ is blocked because of many connection errors. Unblock with ‘mysqladmin flush-hosts’

The ELB health check keeps on pinging mysql port on instances behind it to make sure they are alive. The default connection setting in mysql is 10 , soon after 10 pings are reached , none of your clients will be able to access the DB since the mysql cluster blocks the elb IP so in turn blocks access from all clients

You need to disable connection error check by using

max_connect_errors =999999999 in my.cnf

More explained here http://dev.mysql.com/doc/refman/5.0/en/blocked-host.html

Migrating Mysql innoDB or MyISAM database to NDB or mysql cluster

Migrating Mysql innoDB or MyISAM database to NDB or mysql cluster

Create users and schema

create database clusterdb;

CREATE USER ‘user’@’localhost’ IDENTIFIED BY ‘pass’;

GRANT ALL PRIVILEGES ON *.* TO ‘ ‘user’@’localhost’ WITH GRANT OPTION;

GRANT ALL PRIVILEGES ON *.* TO ‘user’@’%’ WITH GRANT OPTION;

FLUSH PRIVILEGES;

commit;

exit;

In your existing sql dumps , replace all occurrences of InnoDB/ MyISAM with ndb

%s/InnoDB/ndb

%s/MyISAM/ndb

Import the dump to the newly created cluster db

mysql –h host -u user –ppasswd schema < backup.sql

issues during import

  1. Need to change some varchars into TEXT , this means you need to change the table mapping in your app(hibernate mapping files , some code in our dao,service classes) too , think about it.

ERROR 1118 (42000) at line 667: Row size too large. The maximum row size for the used table type, not counting BLOBs, is 8052. You have to change some columns to TEXT or BLOBs

2.roller weblogger which we internally use , has prefix keys in its schema. You need to work around that to make it work with ndb.

ERROR 1089 (HY000) at line 1280: Incorrect prefix key; the used key part isn’t a string, the used length is longer than the key part, or the storage engine doesn’t support unique prefix keys

In sql dump change similar entries as below

UNIQUE KEY `ea_name_uq` (`entryid`,`name`(40)),

to

UNIQUE KEY `ea_name_uq` (`entryid`,`name`),

3.ERROR 1005 (HY000) at line 1511: Can’t create table ‘schema.instruments’ (errno: 136)

http://bugs.mysql.com/bug.php?id=28447

4. ERROR 1005 (HY000) at line 2329: Can’t create table ‘schema.purchases’ (errno: 708)

http://lists.mysql.com/cluster/881

for both 3 and 4 , Add extra params to config.ini file

[NDBD]

# IP address of the first storage node

NodeId=3

HostName=domU-12-31-39-0B-79-62.compute-1.internal

DataDir=/usr/local/mysql/data

BackupDataDir=/usr/local/mysql/backup

DataMemory=2048M

MaxNoOfOrderedIndexes =1024

MaxNoOfUniqueHashIndexes = 512

MaxNoOfAttributes=5000


Mysql cluster setup on amazon cloud

Setting up 6 node Mysql Cluster

Read thro http://dev.mysql.com/doc/mysql-cluster-excerpt/5.1/en/mysql-cluster-multi-install.html , there is not much after that. This is just a dump referring to auto scaling mysql cluster

This setup has 2 mgmt nodes on 2 different boxes, 2 data+api nodes on another 2 boxes.

In brief , you will need to

  • Get mysql mgmt installed on two boxes
  • Get mysql data+api node installed on another two boxes .
  • Open up firewall between all cluster members (open up all ports on udp and tcp)
  • If you are using ec2 , you can install on one instance and create AMI for another one , this will aid in auto scale too . choose a EBS store when choosing the image
  • Update cluster related configuration files on all cluster member nodes.
  • Start all members
  • Test the configuration

Mysql mgmt installation.

Mysql api/data node installation .

  • Get another amazon instance with EBS store .
  • wget http://dev.mysql.com/get/Downloads/MySQL-Cluster-7.1/mysql-cluster-gpl-7.1.3-linux-x86_64-glibc23.tar.gz/from/http://ftp.heanet.ie/mirrors/www.mysql.com/
  • mv mysql-cluster-gpl-7.1.3-linux-x86_64-glibc23.tar.gz /var/tmp/
  • groupadd mysql
  • useradd -g mysql mysql
  • cd /var/tmp
  • tar -C /usr/local -xzvf mysql-cluster-gpl-7.1.3-linux-x86_64-glibc23.tar.gz
  • ln -s /usr/local/mysql-cluster-gpl-7.1.3-linux-x86_64-glibc23/ /usr/local/mysql
  • cd /usr/local/mysql
  • ./scripts/mysql_install_db –user=mysql
  • ./bin/mysqld_safe &
  • ./bin/mysqladmin -u root password xxxx
  • ps -A | grep mysql
  • chown -R root .
  • chown -R mysql data
  • chgrp -R mysql .
  • cp support-files/mysql.server /etc/init.d/
  • chmod +x /etc/init.d/mysql.server
  • /etc/init.d/mysql.server stop/start and check if all is well.
  • mkdir -p /usr/local/mysql/backup
  • chmod -R 700 /usr/local/mysql
  • cp /usr/local/mysql/bin/mysqld_safe /usr/local/bin/
  • cp /usr/local/mysql/bin/mysqlmanager /usr/local/bin/
  • cp /usr/local/mysql/bin/mysql /usr/local/bin/

Setup Amazon Firewall.

Open up all ports between cluster members.


Configure cluster related files.

Login to mgmt server 1 and 2.

vi /var/lib/mysql-cluster/config.ini

[NDBD DEFAULT]

NoOfReplicas=2

DataMemory=80M

IndexMemory=18M

[MYSQLD DEFAULT]

[NDB_MGMD DEFAULT]

[TCP DEFAULT]

[NDB_MGMD]

# IP address of the management node 1 (this system)

NodeId=1

HostName=mgmt1

DataDir=/var/lib/mysql-cluster

[NDB_MGMD]

# IP address of the management node 2 (this system)

NodeId=2

HostName=mgmt2

DataDir=/var/lib/mysql-cluster

# Section for the storage nodes

[NDBD]

# IP address of the first storage node

NodeId=3

HostName=node1

DataDir=/usr/local/mysql/data

BackupDataDir=/usr/local/mysql/backup

DataMemory=2048M

[NDBD]

# IP address of the second storage node

NodeId=4

HostName=node2

DataDir=/usr/local/mysql/data

BackupDataDir=/usr/local/mysql/backup

DataMemory=2048M

# one [MYSQLD] per storage node

[MYSQLD]

[MYSQLD]

Login to data/sql node server 1 and 2 .

Do foll on both servers .

Vi /etc/my.cnf

[mysqld]

datadir=/usr/local/mysql/data

socket=/var/lib/mysql/mysql.sock

ndbcluster

ndb-connectstring=mgmt1,mgmt2

default-storage-engine=NDBCLUSTER

[client]

port=3306

socket=/var/lib/mysql/mysql.sock

[mysql.server]

user=mysql

basedir=/usr/local

[mysql_cluster]

ndb-connectstring=mgmt1,mgmt2

[mysqld_safe]

log-error=/var/log/mysqld.log

pid-file=/var/run/mysqld/mysqld.pid

Start up Mysql Cluster

Login to mgmt server 1 and 2 .

ndb_mgmd -f /var/lib/mysql-cluster/config.ini –initial (use intial only for the first tie and when config.ini chnages)

root@domU-X.X.X1-F2:/var/lib/mysql-cluster# ndb_mgm

— NDB Cluster — Management Client —

ndb_mgm> show

Connected to Management Server at: localhost:1186

Cluster Configuration

———————

[ndbd(NDB)] 2 node(s)

id=3 (not connected, accepting connect from domU-x.x.x3.compute-1.internal)

id=4 (not connected, accepting connect from domU- x.x.x4.compute-1.internal)

[ndb_mgmd(MGM)] 2 node(s)

id=1 (not connected, accepting connect from domU- x.x.x1.compute-1.internal)

id=2 @domU-x.x.x2 (mysql-5.1.44 ndb-7.1.3)

[mysqld(API)] 2 node(s)

id=5 (not connected, accepting connect from any host)

id=6 (not connected, accepting connect from any host)

login to both data/sql nodes

/usr/local/mysql/bin/ndbd

For debug , use /usr/local/mysql/bin/ndbd -v –foreground

You can also see debugging logs of ndbd in the data folder of your mysql installation

tail –f /usr/local/mysql/ndbd_x.log

/etc/init.d/mysql.server restart

Now go and check ndb_mgm on mgmt node, you should see all the members connected.

ndb_mgm> show

Cluster Configuration

———————

[ndbd(NDB)] 2 node(s)

id=3 @ip (mysql-5.1.44 ndb-7.1.3, Nodegroup: 0, Master)

id=4 @ip (mysql-5.1.44 ndb-7.1.3, Nodegroup: 0)

[ndb_mgmd(MGM)] 2 node(s)

id=1 @ip (mysql-5.1.44 ndb-7.1.3)

id=2 @ip (mysql-5.1.44 ndb-7.1.3)

[mysqld(API)] 2 node(s)

id=5 @ip (mysql-5.1.44 ndb-7.1.3)

id=6 @ip (mysql-5.1.44 ndb-7.1.3)

ndb_mgm>

check by creating tables in test database in node 1.

In node 2 , check if that table reflects in test schema.

Insert a row into this table in node 2 .

Test if the data is reflected in node 1.

NOTE : create users,permissions on both nodes. Creation of Database, Tables and data will be duplicated on both nodes when created on one.