Thoughts on Cloud Architecture in Open Source

Recently one of my friends I met from LinkedIn sent me several questions, about Cloud Computing, which looks like a Request for Proposal (RFP). I post my very personal response here for a mindshare.

About Cloud, Cluster/ Load Balancing, Data Center and Open Source, I could talk day and night without stop. I’m giving my personal but confident thoughts upon my friend’s email in brief.


Cloud Computing is a big topic. People have their own definitions depending on their view and objective.

- Open Source / Operating System / Framework
I’m Unix/ Linux guy. Other than Unix, Linux is the only operating system in Cloud management and Cloud resource pool, in terms of efficiency, green security, cost, and manageability. No Windows, No Mac… Certainly Windows could be one of computing services in resource pool as a kind of computing resource, that Infrastructure_as_a_Service (IaaS) provides and delivers.

Microsoft .Net framework doesn’t make much sense to Cloud @ IaaS, as Cloud is Open. That’s a defacto rule in Cloud. Windows isn’t “royal free” either, nor brings satisfied ROI in fact.

Most/ major components in Cloud are supposed to be operating system independent, like Java, Xen, Apache, MySQL, Linux/ Unix, JDBC, Mozilla/ Firefox, Eclipse, Tomcat… etc. As long as a software depends on a specific OS as a pre- requisite, it doesn’t fit in Cloud.

- Cloud Architecture
Several open source projects to recommend
http://www.opennebula.org > this is being used @ http://nebula.nasa.gov/
http://open.eucalyptus.com > this is a very AWS EC2- like open source cloud which I personally installed several time for Proof_of_Concept (PoC).

April 23rd, 2009, Ubuntu released 9.04 Server edition, which includes OpenNebula and Eucalyptus open source projects and supports Cloud environment >
http://doc.ubuntu.com/ubuntu/serverguide/C/opennebula.html
http://doc.ubuntu.com/ubuntu/serverguide/C/eucalyptus.html

5 cost- efficient flexible open source resources for Cloud

http://ostatic.com/blog/5-cost-efficient-flexible-open-source-resources-for-cloud…

Cloud hosting & storage toolbox

http://www.webresourcesdepot.com/cloud-hosting-storage-toolbox-options-tools/

- Virtualization and Virtual Machine Manager (VMM)
Virtualization is a key player in Cloud, but not all in Cloud. Virtualization technology is a carrier of computing resource dynamically managed and delivery in Cloud. In my opinion, its importance has been exaggerated in market. The keys of Virtualization are (1) standard (2) simple (3) easy to manage. You can’t deploy and manage multiple virtualization technologies within one Cloud environment.

Hyper-V doesn’t comply with the rule I mentioned earlier – Cloud is open. VMWare solution is expensive, isn’t it? Xen / XenSource is open source / free and recommended. Xen has been widely adopted in Cloud service provider in Internet. Xen is also compatible with monitoring system to be discussed below.

I’ve a blog describing the comparison between XenSource and VMWare > http://tr.im/mFFa

- Monitor
The ability of monitor determines how precisely and automatically Cloud detects from Cloud, and how fine- grained (granular) computing resource could be deployed to customer.

I recommend Ganglia and Nagios that are popular in Cloud, data center, server farm/ cluster, and widely adopted in many enterprises.

developerWorks > http://www.ibm.com/developerworks/opensource/library/l-ganglia-nagios-1/index.html
Nagios homepage > http://www.nagios.org/

- Request Driven Deployment / Provisioning
Deployment driven by request means the end user decides how much computing resource s/he needs from Cloud. Such as you specify number of CPU, # of gigabyte of memory and size of disk (priced @ http://aws.amazon.com/ec2 ) from Amazon Web Services (AWS) Elastic Compute Cloud (EC2). Very simple. We may learn a lot from AWS architecture > http://highscalability.com/amazon-architecture

Provisioning means that a specific software, like JVM or middleware software, or database, to be installed and configured in an deployed / existing virtual system on demand.

The scenario: eg. in AWS, you’re given a Linux operating system with 2 CPUs and 4G memory. But you can do nothing in a blank OS. You might need an Oracle database 11 or a middleware like JBoss or perhaps WebSphere. Then provisioning function will install and configure on the deployed virtual machine unattended.

For now, in that case @ AWS, you have to install by your own or you can select one available image from AWS image pool, with pre-loaded software you need, in which if someone else made one and published previously. Amazon doesn’t have provisioning service so far. That results that AWS has to manage thousand images in its image repository so that implies it is a challenge.

- SLA / Service Policy / Automation
An example of Service Policy: when an instance is running 80% disk utilization, a pre- defined service policy triggers a configured action – to add another 100G disk into such instance.

This is done by Service Policy, not by system operator and administrator. We can see how this differentiate from mainframe’s virtualization since 60′s.

In another word, Service Policy = Automation!

This kind of policy could be very specific in many and many scenarios. It does require customization effort to fit requirement. I do have some experience/ skill of elastic JVM/ JavaEE computing resource as being engaged with a banking customer now, @ Platform as a Service (PaaS) project.

- Load balancing (not F5)
Load balancing is a must in large enterprise and busy traffic internet. F5 is popular. But F5 throws URL (a request from web) as a token to go into web server cluster. In fact, URLs are not equal. F5 doesn’t know how much resource behind one URL might need from JVM in middleware. So F5 can round- robin requests / URLs, but not granularly down to CPU utilization level.

Cloud Computing @ each layers – IaaS, PaaS & SaaS – needs monitoring to know how much resource remains, how much needed, how much to deliver, how much to remove after use.

- Security/ Authentication / LDAP
Security is a big concern in Cloud, as well as alert and audit, in term of authentication, authority, data integrity, encryption/ decryption, etc.

LDAP (OpenLDAP) could be used @ Cloud portal where end user applies computing resource in resource pool, while Cloud operator manages in management portal. As long as a computing resource is deployed, end user is received an authority to access the delivered resource and doesn’t need LDAP authentication.

- Languages
Java is preferred as global languange
C is used in Eucalyptus Cloud Management
Ruby on Rail used to display @ Portal & UI
XML in data transaction, Python, etc.

- Approach
Have to have Proof_of_Concept – to build a small system to prove all works prior to production

This entry was posted in cloud, linux, opensource, virtualization and tagged , , , , , , . Bookmark the permalink.