On January 20, the U.S. Supreme Court overturned a law that banned political spending by corporations in candidate elections.  As Justice Stevens said, this decision makes  “corporate speech the same as that of human beings”.   I think it is only a matter of time before corporations begin to assert their unalienable rights as human beings and demand full citizenship.  If all goes as planned, the high court will rule favorably in the precedent setting “Apple vs the State of California” and Apple will win the governor’s race in the state elections.

When this happens we can look forward to a new era of fiscal responsibility and visionary leadership for a state that is sadly overwhelmed by failed ideas and big government spending.  Here are some of the insanely great public policy innovations that we can expect from the brilliant engineers at Apple:

  • California is re-branded iCal.  New logos and graphic look and feel standards are instituted.
  • Apple introduces the iCar, the only vehicle to pass California’s tough new vehicle usability standards and thus, the only car available for sale in the state.
  • A two year contract is now required for California citizenship.
  • Apple launches iTunes Liquor Store.  The Liquor Control Board mandates that all alcoholic beverages must be purchased through iTunes.  Purchases are licensed to a single household, but can be shared with up to five other iTunes users.
  • A slew of trademark violation suits are filed against companies using the name CaliforniaTM without permission.
  • Share/Bookmark

I open sourced concharto this week.  As a long time consumer of open source projects it has been exciting to actually contribute to the movement.

The project page is hosted on google code here.

Here’s a summary of what I went through to make this happen.

My Goals

1. The concharto atlas.  My main goal is to create a comprehensive online atlas of history and happenings founded the Wikipedia community process and enabled by modern mapping web services.  I believe contributors of “philanthropic information” generally want to know that their work is not going to enrich some venture capital backed startup so one way to instill confidence is to make the code open source.

2. Someone else could use it.   It would be gratifying to see other concharto based communities springing up.

3. Make the code better.  There are so many open source projects out there that the odds of anyone actually contributing to concharto are pretty low, but it has a few things going for it:

  • the code is stable
  • the code is live
  • It is still an active project

The Process

The process took about 1 month and involved a number of decisions:

  • Choice of an open source hosting platform
  • Choice of a license
  • Code cleanup

Each of these activities involved a lot of analysis and tradeoffs and I thought I would share my thought process.

Open source hosting platform

I’ve used projects that are hosted on source forge, google code, codehaus and a few others.  I really like google code’s group list and issue manager, and since good issue management is more important to me than anything else, I chose google code.

I currently use Jira (an awesome commercial issue manager) and I could have gotten a Jira open source deal, but google code’s simple and snappy groups, wiki and issue manager was an unbeatable combination.

Wikipedia’s open source software hosting page was an invaluable resource.

Choice of a license

I vacillated back and forth on this issue for about three months before I finally settled on Apache License 2.0.  Here again, Wikipedia has lots of good things to say.  The evolution of my decision went something like this:

1. Since Concharto is a web application and not a web library, I looked at respectable web applications like Drupal, Wordpress and Wikimedia.  Most of these use a Gnu Public License (GPL), which follows a copyleft model and imposes strict constraints on commercial use of the code.  This seemed just fine to me.

2. After a while I got to worrying about whether I really agree with the core principals behind the GNU license.  I looked at the Apache and MIT licenses, which follow a permissive model.  They seemed a little too permissive since they would allow anyone to take my code and sell it if they wanted to.  I decided that perhaps the Mozilla Public License, which is what Firefox uses, would be a little more restrictive, yet not as constraining as the GPL.

3. Eventually I decided that the chances of any organization actually taking my code and reselling it are pretty low and even if some company did that, I believe it would be highly beneficial for that organization to be a contributor to the project, so in the end the concharto web site would ultimately benefit.  I felt that the Mozilla license was too complicated and not that much different from the Apache 2 license.  I noticed that both Google’s Android and Apple’s WebKit are Apache 2 licensed, so that clinched it for me.

Code cleanup

Having privately worked on this code for several years, I had to ask myself whether I really wanted anyone to be able to see it – warts and all. As with any software project there are good and bad parts and the desire to make the code perfect before open sourcing it was a powerful influence on me.  Finally, I just changed the package naming, added the Apache Licence 2.0 license headers and let her fly.  A cool eclipse plugin made adding the license headers really easy.

  • Share/Bookmark

I’ll be demonstrating Concharto at this year’s Where 2.0 conference in San Francisco on May 12. I’ll use events and timelines from the map to illustrate all the important features of Concharto.

I’ll definitely show them “The Play.” As John Madden would say, “he goes left, he goes right, he hits the trombone player, Boom!” I bet I’ll irritate some Stanford grads but, hey, it’s just a demo.

  • Share/Bookmark

I’ve been teaching a class at University of Delware, CISC474 – Advanced Web Development. I’m using a wiki for most communications with the students. It has worked pretty well, though something like Moodle (which my children use for their school) would be even nicer.

  • Share/Bookmark

Via Making Light

Here is a fascinating presentation about traditional media vs user generated content; the “Cognitive Surplus” created by TV. Clay Shirky answers the question that a TV producer asked him about Wikipededia – “Where do they find the time?” Transcript is located here.

  • Share/Bookmark

This is a technical web development post.

The other night I was up until 1:30AM fixing a bug in concharto.com. At 10:00pm I took a look at the latest changes log and noticed hundreds of “reverts” had happened and were continuing to happen. All coming from the same IP address. I panicked! I had to shut the site down temporarily while I fixed the problem and repaired the damage.

The problem was caused by a combination of a badly behaved robot (web spider) and a bug in the “undo revision” code.

1. The bug. Concharto is a geographic wiki. All wiki’s need to make it easy to revert changes. The changes page has a series of “undo this change” links next to each change. It was originally implemented as a simple link with a query string that looked like this: /edit/undoevent.htm?id=188&toRev=3. The bug is that all HTTP GET requests like this one should be idempotent - i.e. repeated requests (e.g. a user clicking the link over and over) should have the same effect as only one request. For instance, a link to add something to a shopping cart is NOT idempotent and is best implemented with a POST method (and some javascript). When I coded this feature way back, I did it the lazy way.

2. The Robot. All robots are supposed to follow the instructions in the robots.txt at the root of your web site. This file provides directives on how you want robots to follow links. Our robots.txt file explicitly prohibits following links to the changes page (which contained the bug). Unfortunately, all robots don’t follow this advice. This particular robot found its way to the changes page and started following all of the links there, including each “undoevent.htm” link. Ugh!

I had to do three things to recover:

  • Fix the bug. I switched all of the links to invoke a javascript submit() method and post the results to the web controller. This has two benefits: robots don’t usually run javascript and they don’t usually do HTTP POST.
  • Revert the changes that the robot made. Database backups are a good thing.
  • Block the robot’s IP address.

3. The Lesson. Keep your GET methods idempotent. It is sometimes easier (less coding) to use the query string to pass parameters but it is a bug – you will probably have to fix it later.

  • Share/Bookmark

Back in the late 90’s, I dreamed of building an immersive mapping application that would let people travel through time to any place in the past and see what it was like. It was an impractical idea at the time, but things have changed recently and the result is Concharto. Last June, I alluded to the project when I noted that Leo Tolstoy, author of “War and Peace” proposed applying the scientific method to history, asserting that a complete understanding of an event could be obtained by slicing that event into smaller and smaller pieces, in exactly the same way that a math student performs integral calculus.

While not actually creating a calculus of history, Concharto does attempt to slice history into smaller pieces. There are three recent technological advances that make this possible:

  1. Advanced database software and cheap server hardware have made it easy to search huge repositories of information.
  2. Geographic web services have simplified the task of placing events in a spacial context.
  3. Wikipedia has demonstrated the awesome power of mass collaboration.

Hopefully, Concharto will one day be a comprehensive repository of thin slices of notable events from every place and time.

How can that happen? Concharto is a Geographic Wiki. It looks like Google Maps and works like Wikipedia. It has the all of the illustrative power Google Maps and all of the strengths and weaknesses of Wikipedia.

Unlike virtually all other mapping sites on the internet today, Concharto is not about places – it is about events. Unlike Wikipedia, it is about small discrete bits of information, rather than comprehensive articles.

You can read more about it on the community wiki and in the Concharto blog. .

—————————————-

View A Larger Map

Expansion of the Inca Empire of South America.

(Updated 4/29/08 to reflect our name change from Time Space Map to Concharto)

  • Share/Bookmark

This is a follow-up to an earlier post, Platform Peril

I once worked for an ambitious company that aimed to create a new type of web service. We ultimately succeeded, despite the tale I am about to tell.

Book the First: Reusable Code

Our company had developed some core capabilities in an obscure vertical market which showed some promise to investors. My group made money by applying our special skills to mostly fixed price software development contracts. It was decided that we should roll up all of our capabilities into a software platform of libraries and specialized data processing tools which I will call Platform LG. The project goals for LG were:

  1. Speed development of all projects
  2. License LG to our biggest customers for their own internal and external projects
  3. Reduce operational costs by standardizing our internal support infrastructure

These are the standard reasons to invest in reusable code projects. Unfortunately, the project suffered from the standard reasons that such projects get into trouble:

  1. Behind schedule. Big, ambitious reusable code projects are notoriously hard to manage, especially in a dynamic environment where requirements are uncertain. Worse still, the schedule slips are very expensive, because ongoing projects are affected. A lot of people relied on LG.
  2. Last year’s requirements. LG failed to meet the needs of our newest, biggest project. The platform was slow, ran on the wrong operating system and wasn’t sufficiently customizable to support the new requirements.
  3. Packaging the platform for use by our big customers exacerbated the other problems because the effort necessary to productize the code made it harder to respond to new requirements.

Thus, the LG failed to meet two of its three goals. As a result, our biggest project, which I will call Operation Bandwagon, decided to abandon the platform, instead resurrecting some old code and creating custom capabilities tailored exactly to their own needs.

Lesson 1. Platforms are expensive and slow to adapt to new requirements. They are best used in situations where requirements are well understood and relatively constant.

Book the Second: Operation Bandwagon

Operation Bandwagon focused on building an application, not a platform and so was able to create a great deal of new and innovative features very quickly with a small team. To some people in management, it made the LG look bad. This should have been no surprise however, since Bandwagon developers weren’t nearly as constrained as the LG team.

Lesson 2. It is easier and cheaper to build custom code than reusable code.

Book the Third: Platform Redux

Within a year, Bandwagon was a smash hit. During that time, the software development organization was split into two competing and antagonistic groups. The Bandwagon core code began to get the petrified feel of a platform. Unlike the LG however, Bandwagon was originally conceived as an custom application, and then retrofitted to act like a platform – and it showed.

Meanwhile, LG had finally grown into its own as a stable and capable base on which to build applications. The company now had two competing platforms, complete with release and support organizations. We were paying through the nose for the original schism.

Lesson 3. Pay attention! People love to build platforms, but there should be only one.

Book the Fourth: Unification

It was left to a small bad of brave developers to reunite the two warring platforms, a process that took years to accomplish. I wasn’t there to witness the effort, but I’ve heard stories. Some claim that good ultimately won out over evil. Others say that the two were synergistically merged into a new platform that was better than the individuals combined.

95px-charles_dickens_-_project_gutenberg_etext_13103.jpg

Dickens’ A Tale Of Two Cities is a tragic story centered on the French Revolution. Like its namesake, our story has a bittersweet ending. It was the best of times. We built some truly remarkable software. But it was the worst of times too. If we had been more careful we could accomplished much, much more.

Epilogue

History seems doomed to repeat itself. Revolutions come and go and so do tech bubbles. Two years later, I found myself enmeshed in platform peril that was weirdly similar to Bandwagon vs LG.

  • Share/Bookmark

Most software developers understand the relationship between complexity and cost; more complex = more expensive. Unfortunately many equate code complexity with lines of code, slavishly following design patterns that reduce code counts while actually increasing complexity.

Value Engineering

For years, engineers have noted that the overall cost of a manufactured device is roughly proportional to the number of parts it has. A whole discipline, known as Value Engineering, is dedicated to assigning a cost to each function of a product so that designers and manufacturers can can make wise choices about which functions to improve and which to throw away. A classic value engineering excercise is to take a common item like a circuit breaker, tear it apart, and then redesign it with less pieces. Removing one part from the design can yield dramatic cost savings and big improvements in reliability.

cost-of-complexity.png

Figure 1 – Cost and complexity for two competing solutions

In Figure 1, two companies are designing competing products with similar market requirements. The company that solves the problem with the least number of parts (Team A) is the big winner. Note that cost often increases exponentially with the complexity of a particular solution.

In many respects, software engineering bears little resemblance to other engineering disciplines, but in this case, there are real parallels. Just as in circuit breakers, more parts means more $, both in initial cost and ongoing maintenance and operations.

Virtual Parts

The analogy is useful because code reduction techniques often increase the number of virtual “parts” while decreasing lines of code. My favorite example is indirection. Many popular design patterns aim to reduce code duplication by introducing levels of indirection (for example, adapter or dependency injection patterns). If you think of each indirection as a new part, it is easy to see how some designs can have less code yet more parts. I use adapters and injection all of the time, but I also acknowledge that they complicate the design, development, testing and maintenance of the code.

Let’s say you are adding a new feature to an existing project and there are some similarities to other parts of your code. You must decide whether to build a new common module and refactor your existing code to use it, or ignore the existing code and build the new piece without regard to the existing stuff. Many developers, especially dogmatic adherents to agile development methodologies, would blindly choose the former approach, without regard to the costs involved. A better strategy is to choose based on a reasoned trade off between cost and benefit. And there are many hidden costs to certain complex design patterns:

  • Clarity. Some code is just too hard to understand. For example, xml based configuration files can make your code easy to configure and impossible to understand.
  • Unit Testing complexity. Multiple levels of indirection require multiple levels of testing. This usually means more support classes, including utility dao’s, mocks, etc.
  • Debugging time. You would think that designers would avoid any architecture that hinders efficient debugging, but many architecture decisions are made without even considering the effect on debugging and deployment.
  • Operational costs. If it is hard to understand the code, then it will almost always be hard to keep running.

Too Many Notes

too-many-notes.pngIn the play, Amadeus, the Austrian King tells Wolfgang Amadeus Mozart that his opera is too complicated, it has too many notes, and he should “take some away.” Mozart, who feels that his opera is perfect, asks the king which notes he would like to take away. It is a moment full of meaning for any creative person. Unfortunately, many creative software developers empathize too much with Mozart, favoring the ornamentation and flash of 18th century classical music. I believe we should all take the tone deaf King’s advice and take some away.

  • Share/Bookmark

I haven’t written in a while because I’ve been heads down coding – not much to say that hasn’t already been said. I’ve been in CSS hell recently, and I haven’t seen much about the following compatibility issue with IE 6.  Dreamweaver doesn’t pick up on the problem.

IE 6 doesn’t handle mixing units in CSS style sheets. For instance,

#main ul {
padding: .1em .4em .1em 18px;
}

Renders in a very unexpected way. The proper method is:

#main ul {
padding: 6px 6px 6px 18px;
}

OR

#main ul {
padding: .1em .4em .1em 1.4em;
}

  • Share/Bookmark

The Google Geocoder API

Google allows web developers to create “mashup” mapping applications using programming API’s that connect to Google’s mapping servers. One of the APIs is for geocoding – finding the location of a place based on its name. For example: when you want to see a map of “Turners Creek, MD” you first have to geocode the place name to its latitude and longitude coordinates (39.342013,-75.996743).

Fun with the Google Geocoder API

I was investigating how the geocoder returns information when an address lookup fails, so I typed in “la la land, MD” (MD = Maryland), expecting the geocoder to fail, but to my great surprise it returned an actual location near Front Royal, MD (you can try this at the Google geocoder api demo page or at any number of Google maps mashups). For my international readers, when an American says someone is in “la la land” she is suggesting that the person is not grounded in reality, e.g. living in a fantasy land. For the fantasy location “la la land, MD“, the Google API return results indicate that the accuracy is at the “country” level so I would have expected to get the centroid of the US (Google puts this somewhere near Portland Oregon, which makes sense if you include all 50 states). Nor is the location the centroid of the State of Maryland which, by a quirk of geography and history, is located in Fairfax, Virginia. This leads me to wonder whether someone actually put “la la land, MD” in the database. “la la land, NY” also has a “country” level accuracy. It is located near Muttontown, New York on Long Island, just off Brookville Rd. If you drive by, take a picture and I will post it here.

la-la-land-ny.png

Next I typed in “la la land, PA” and found that it actually returned a location with “street level” accuracy. Oops. Apologies to anyone who lives at “Farm Land Rd Way, Mifflinburg, PA 17844“! Surprisingly, the regular Google maps fails to geocode that location. It is possible that this could be a parsing error since “la land, pa” geocodes to an address of “Land Ln, Schnecksville, PA 18078, USA” and “la la land, CA” (California) is at “Garden Land Rd, Los Angeles, CA 90049“.

la-la-land-pa.png

Google Maps API Geocoder uses Tiger Data?

For US addresses, the Google geocoder API, which is free for no more than 50,000 geocodes per day, behaves differently than the one on maps.google.com. In fact, it behaves a lot like geocoder.us, which uses free US Census Bureau tiger data. The problem with tiger data is that is not nearly as complete or accurate as the NavTeq mapping data that all of the major mapping services use. Here’s an example: Use the address “210 south bank, Landenberg, PA” and enter it into to maps.google.com, geocoder.us and this demo page (uses the Google map geocoding API). You will see that maps.geoogle.com finds the correct location but both geocoder.us and the Google geocoder API fail. Now type in “1600 Amphitheatre Pky, Mountain View, CA” and you will see that geocoder.us and the Google API both succeed.

It makes sense for Google to use tiger data for their API partners. After all, accurate geocoding data costs money and the users of the geocoding API aren’t paying anything. And unlike map images, I can think of no obvious way to sell advertising for geocodes.

  • Share/Bookmark

google-kml-22-uml-diagram-thumb.png

Google provides great documentation for KML, but their diagram leaves much to the imagination. I looked around for a more complete diagram of KML 2.2 that I could use as a quick reference, but couldn’t find any so I made one myself. I did it using MyEclipse’s UML modeling tool, which has some irritating quirks, but for the most part is reliable. The diagram is Java-centric and doesn’t include every object, though most of the important stuff is there.

Click on the image to see the entire diagram.  Here’s the actual My Eclipse UML file. If you make any corrections or enhancements, please send me an update.

  • Share/Bookmark

After writing my essay about platform peril, I found this great video interview from Oct 2006 with Tim Bray, one of the inventors of XML and current Director of Web Technologies for Sun Microsystems (here’s his blog). The interview addresses software architecture and development frameworks, and deals with some of the same issues I raised in earlier my post. The topics include Ruby on Rails, Java, Groovy, REST, Web Services, SOAP, ATOM, Static vs Duck Typing:

  • He likes Ruby and Ruby on Rails, but thinks Java has performance and IDE advantages because of static typing.
  • He didn’t think (as of October 06) that groovy or grails was stable or had any community. I wonder if this has changed at all recently.
  • He likes the JRuby, Jython, etc.
  • He hates SOAP, likes REST or possibly other simpler technologies. Amazon offered APIs for soap and plain old xml. 90% use the plain old xml.
  • He thinks SOA is marketing fluff.
  • At one point in the conversation, he invoked Joel Spolsky’s term ‘architecture astronautics‘ to describe Web Services specifications (WS*). “… There’s a certain class of people who want to build big complicated systems from scratch, and it doesn’t work! It’s never worked! …
  • He thinks RSS is a mess and actively participates on the ATOM specification, which attempts to solve problems with the various RSS specifications. He has interesting things to say about the benefits of ATOM.
  • Share/Bookmark

I recently went through a software architecture evaluation for one of my projects. What follows is a technical summary of the evaluation and resulting decisions. I’ve also posted a sample application that demonstrates much of the important features.

Project Goal
The goal of the project was to choose the system architecture and software development environment for a large scale web application. The choices made at this stage will have far reaching consequences in terms of expenses, staffing and schedule (see platform peril). Some of the key decisions:

  • Programming language
  • Development frameworks
  • Scalability architecture
  • Database technology
  • Third party applications, tools and components
  • Build environment

Some Requirements
The project is a large scale web application that operates in a homogeneous computing environment completely within the operator’s control (e.g. hosted). The application should be able to support tens of millions of unique visitors per month on hundreds of servers. The development team is small and is skilled with java programming with hibernate and spring, but they can easily switch to Ruby on Rails or PHP if necessary. The key design factors are (in order of importance):

  1. Low Hosting costs = Support a high traffic web site with a low number of CPU’s per million unique visitors per month.
  2. High Developer productivity = approach the speed of Ruby on Rails. Focus on object oriented and test driven development.
  3. Low Complexity = The less disparate parts, the better.
  4. Fast Learning curve = New developers should only need programming language and web development skills.
  5. Use stable, popular third party tools and components = maximize the available choices of UI widgets, AJAX libaries, etc.

Results
I chose Java over Ruby on Rails and PHP. The total solution includes: Java, MySQL, Hibernate, Spring, Spring MVC convention over configuraion, Yahoo UI/Ajax, Ant/Ivy, MyEclipse with hot-deploy, JSP page.tag for layout, Memcached (or equivalent) for session management.

Here is a sample application with all of these things working, except for Yahoo UI and memcached.

Details
For languages, I considered PHP, Ruby on Rails and Java. I chose Java.

Java: Java has the lowest hosting costs, and best scalability options, but the solution is more complex. Developer productivity varies drastically depending on which platforms, third party tools and development environments are chosen. Poor decisions have far reaching consequences.

Ruby: I like Ruby on Rails a lot. It offers the highest developer productivity and least complexity. Because rails is a complete solution, there are much fewer decisions to make and it is much faster to get started. Some people object to ruby because it is too easy to change the core behavior of the language (e.g. override the methods on Object), and thus make an application unintelligible to any new developer. According to my friend Billy, if you make that “loophole” available, someone is going to take advantage of it, especially on larger projects.

PHP: PHP is popular, scales well, and offers many third party components. There are a number of MVC (model view controller) frameworks (including Cake and Zend), but none seem to be a defacto standard. Another reported advantage of PHP over other languages is availability of a large pool of developers. In this case, the job market advantage is minimal because most PHP developers would not qualify. Like Ruby, PHP has very fast development cycle.

MySql vs Postgres: I chose MySql because of general adoption and because the developers are already experienced with it. Some very large sites use MySql extensively.

Scalability: The web tier will use load balancing routers (e.g. Big IP) configured without sticky allocation. Any HTTP request can be routed to any web server. This means avoiding the default implementation of the java servlet HTTPSession and instead using something like memcachd or equivalent. The database will be split up into separate instances, segregated by user group and application function.

Java Architecture
If you are developing a large java applicaion, there is a sickening number of choices out there. Here are a few of the technologies I looked at:

Integrated Developer Environment (IDE): Evaluated MyEclipse and Intellij Idea. Chose MyEclipse. I’ve been using Idea for four years. Idea provides a slightly better coding environment, but MyEclipse has hot-deploy to tomcat and other tools. The ability to hot-deploy and easily debug webapps makes a huge difference in developer productivity.

Persistence / Object Relational Mapping (ORM): Evaluated EJB3, Hibernate 3 with Annotations. Chose Hibernate with Annotations. The developers already know hibernate and it works great. In face, the annotations layer conforms to the EJB3 JPA specification.

Frameworks: Evaluated JBoss Seam/JSF, Spring Framework 2. Chose Spring. Seam looks like a good tool, but it is wrapped up in the EJB3 specification which is designed for heterogeneous enterprise environments. The learning curve seems steep. Spring 2.0 has added some convention over configuration features that reduce the amount of irritating and unproductive XML configuration files. The development team already knows Spring, including all of the good and bad parts.

Build Tools: Evaluated Ant, Maven + Ant, Ivy + Ant. Chose Ivy + Ant. For big projects, Ant needs some sort of dependency management add-on. Maven has both a loyal following and many detractors. Maven’s integration with ant and eclipse is awkward. For our purposes, Maven is overkill. Ivy does a good job at managing dependency features and works with Maven repositories.

J2EE server: Evaluated Tomcat5, JBoss, Jetty. Chose Tomcat for now. Any one of these (and more) will do, but the development team is more familiar with Tomcat.

View / Page Layout: Evaluated Java Server Faces, Facelets, Velocity, Freemarker, Struts Tiles, JSP page.tag. Chose JSP page.tag. JSF has a steep learning curve and seems to abstract a lot of the session management which could be a problem when it comes to scaling. Velocity or Freemarker provide a nice way of removing some of the JSP irritations but they don’t work with other tag libraries like displaytag. JSP page.tag is mindlessly simple and much better than struts tiles.

References
Here are a few links that I found useful.

Archircture

Programming Language

  • Share/Bookmark