I recently went through a software architecture evaluation for one of my projects. What follows is a technical summary of the evaluation and resulting decisions. I’ve also posted a sample application that demonstrates much of the important features.
The goal of the project was to choose the system architecture and software development environment for a large scale web application. The choices made at this stage will have far reaching consequences in terms of expenses, staffing and schedule (see platform peril). Some of the key decisions:
- Programming language
- Development frameworks
- Scalability architecture
- Database technology
- Third party applications, tools and components
- Build environment
The project is a large scale web application that operates in a homogeneous computing environment completely within the operator’s control (e.g. hosted). The application should be able to support tens of millions of unique visitors per month on hundreds of servers. The development team is small and is skilled with java programming with hibernate and spring, but they can easily switch to Ruby on Rails or PHP if necessary. The key design factors are (in order of importance):
- Low Hosting costs = Support a high traffic web site with a low number of CPU’s per million unique visitors per month.
- High Developer productivity = approach the speed of Ruby on Rails. Focus on object oriented and test driven development.
- Low Complexity = The less disparate parts, the better.
- Fast Learning curve = New developers should only need programming language and web development skills.
- Use stable, popular third party tools and components = maximize the available choices of UI widgets, AJAX libaries, etc.
I chose Java over Ruby on Rails and PHP. The total solution includes: Java, MySQL, Hibernate, Spring, Spring MVC convention over configuraion, Yahoo UI/Ajax, Ant/Ivy, MyEclipse with hot-deploy, JSP page.tag for layout, Memcached (or equivalent) for session management.
Here is a sample application with all of these things working, except for Yahoo UI and memcached.
Java: Java has the lowest hosting costs, and best scalability options, but the solution is more complex. Developer productivity varies drastically depending on which platforms, third party tools and development environments are chosen. Poor decisions have far reaching consequences.
Ruby: I like Ruby on Rails a lot. It offers the highest developer productivity and least complexity. Because rails is a complete solution, there are much fewer decisions to make and it is much faster to get started. Some people object to ruby because it is too easy to change the core behavior of the language (e.g. override the methods on Object), and thus make an application unintelligible to any new developer. According to my friend Billy, if you make that “loophole” available, someone is going to take advantage of it, especially on larger projects.
PHP: PHP is popular, scales well, and offers many third party components. There are a number of MVC (model view controller) frameworks (including Cake and Zend), but none seem to be a defacto standard. Another reported advantage of PHP over other languages is availability of a large pool of developers. In this case, the job market advantage is minimal because most PHP developers would not qualify. Like Ruby, PHP has very fast development cycle.
MySql vs Postgres: I chose MySql because of general adoption and because the developers are already experienced with it. Some very large sites use MySql extensively.
Scalability: The web tier will use load balancing routers (e.g. Big IP) configured without sticky allocation. Any HTTP request can be routed to any web server. This means avoiding the default implementation of the java servlet HTTPSession and instead using something like memcachd or equivalent. The database will be split up into separate instances, segregated by user group and application function.
If you are developing a large java applicaion, there is a sickening number of choices out there. Here are a few of the technologies I looked at:
Integrated Developer Environment (IDE): Evaluated MyEclipse and Intellij Idea. Chose MyEclipse. I’ve been using Idea for four years. Idea provides a slightly better coding environment, but MyEclipse has hot-deploy to tomcat and other tools. The ability to hot-deploy and easily debug webapps makes a huge difference in developer productivity.
Persistence / Object Relational Mapping (ORM): Evaluated EJB3, Hibernate 3 with Annotations. Chose Hibernate with Annotations. The developers already know hibernate and it works great. In face, the annotations layer conforms to the EJB3 JPA specification.
Frameworks: Evaluated JBoss Seam/JSF, Spring Framework 2. Chose Spring. Seam looks like a good tool, but it is wrapped up in the EJB3 specification which is designed for heterogeneous enterprise environments. The learning curve seems steep. Spring 2.0 has added some convention over configuration features that reduce the amount of irritating and unproductive XML configuration files. The development team already knows Spring, including all of the good and bad parts.
Build Tools: Evaluated Ant, Maven + Ant, Ivy + Ant. Chose Ivy + Ant. For big projects, Ant needs some sort of dependency management add-on. Maven has both a loyal following and many detractors. Maven’s integration with ant and eclipse is awkward. For our purposes, Maven is overkill. Ivy does a good job at managing dependency features and works with Maven repositories.
View / Page Layout: Evaluated Java Server Faces, Facelets, Velocity, Freemarker, Struts Tiles, JSP page.tag. Chose JSP page.tag. JSF has a steep learning curve and seems to abstract a lot of the session management which could be a problem when it comes to scaling. Velocity or Freemarker provide a nice way of removing some of the JSP irritations but they don’t work with other tag libraries like displaytag. JSP page.tag is mindlessly simple and much better than struts tiles.
Here are a few links that I found useful.
- A great interview with Tim Bray on Ruby on Rails, Java, Groovy, REST, SOAP, RSS/ATOM
- Interesting article on Ning Architecture, Features and business model
- Slashdot thread on scaling
- Livejournal’s architecture from the 2007 unsenix conference
- Interesting blog about rails vs java . Says the claims against complex architecture, slow compile, tomcat restarting are false.
- An ironic summary of java frameworks from 2006
- A nice rundown of all of the java platform choices
- A tradeoff analysis of EJB3/JBOSS Seam vs Spring Framework
- Matt Raible’s presentation Comparing Java Web Frameworks
- Java memcached client
- Andreesen: PHP succeeding where Java isn’t – from 2005.
- This blog from June 06 describes some performance issues with Cake PHP ActiveRecord
- Wikipedia’s roundup of PHP MVC Frameworks
- A blog about hiring PHP developers
- An interesting discussion about Ruby on Rails performance by Alex Payne from the Twitter development team. The comments are also very interesting.