I’m interested in how people use Wikipedia, so I analyzed the Top 100 articles in the English Wikipedia for June and July 2007. Some observations:

  1. You can not extend this analysis by inference to characterize all of Wikipedia because it represents only the most popular 0.2% of the traffic of around 50 million visitors per month.
  2. 48% of articles are purely popular culture. Top categories include Pokemon, Anime, Movies, TV, Music, but there are also
  3. 14% of articles are biographies. Most of these are related to popular culture, including Princess Diana, Pop Singers, Pro Wrestlers
  4. 11% of articles are voyeuristic. These include the articles on Sex, erotic art, etc.
  5. In the month of June, Science, History and Politcs accounted for about 28% of the top 100, but that number dropped to 23% in July. Perhaps this is a reflection of how much Wikipedia is used for school work, since summer vacation starts somewhere in that time frame for many primary school kids.
  6. I filtered out certain articles such as the home page from this analysis. After filtering, the top 100 articles in June accounted for only about .2% of the total US traffic to Wikipedia (1,636,000/816,000,000).
  7. Overall about 70% of the top 100 articles are about popular culture (This certainly does not mean that 70% of all wikipedia articles or 70% of all wikipedia traffic is about popular culture).
  8. For one sample, I stretched the analysis from the top 100 to the top 167. The % Voyeristic went from 4% to 2.4% and other categories also changed slightly. This indicates that an analysis of the top 10,000 articles may yield different results.

One note about the data: the total article counts for July 07 is sparse for some reason. I worked around this by checking the Top 100 on July 7, July 11 and July 31. The percentage breakdown for July was pretty much the same for all three readings.

Here’s a summary data table:

wikipedia-top-100-06-07.png

  • Share/Bookmark

Marc Andressen has a fascinating post titled Age and the entrepreneur, part 1: Some data based on the research of a professor of psychology at University of California Davis named Dean Simonton. Among Marc’s many observations is the startling statement that:

Quality of output does not vary by age… which means, of course, that attempting to improve your batting average of hits versus misses is a waste of time as you progress through a creative career. Instead you should just focus on more at-bats — more output. Think about that one.

If this sounds insane to you, Dr. Simonton points out that the periods of Beethoven’s career that had the most hits also had the most misses — works that you never hear. As I am always fond of asking in such circumstances, if Beethoven couldn’t increase his batting average over time, what makes you think you can?

The odds of a hit versus a miss do not increase over time. The periods of one’s career with the most hits will also have the most misses. So maximizing quantity — taking more swings at the bat — is much higher payoff than trying to improve one’s batting average.

This is type Calvinistic determinism is unfortunate for several reasons. First, creative people have much less control over their number of swings at bat than they do over their actions while at bat. If you work for internet startups, it usually takes at least a few years to find out whether you’ve struck out. Three years is more typical. You might choose to only work on small projects to increase your at-bats, but that is not a good strategy because some of the best things take time to create.

Secondly, the study focuses on “outstanding achievement” (see his paper) – people like Beethoven. These people have less room to improve their average because it is already quite high. In pro baseball the top batters average between 30% and 50%. If your average is 48%, there isn’t much room to improve.

Thirdly, the ratio as defined by Dr. Simonton is overly simplistic. I’ve written on this topic before (engineering goodness); the relationship of success to failure is not a single ratio. It is a bell curve:

goodness-graph1.png

The bell curve provides at least two possibilities for improvement: narrow the curve (decrease the standard deviation) and change the average.

goodness-graph3.png

I believe most creative people have a better chance at improving their average than increasing their swings at bat. The bell curve shows the way.

  • Share/Bookmark

This essay is about the use of platforms within software development organizations.

Platform, Schmatform
Whenever I hear someone talk about building a new software platform for their organization I am instantly skeptical. My reaction has nothing to do with lack of confidence in the speaker. It is just that most platforms are failures. The problem is that the supply of frameworks, platforms and other reusable code projects far exceeds demand. More specifically, there are many, many people who would just love to create the platform that everyone else uses, while the market just wants one platform. The poster child for platform wannabes is Microsoft, which won the first rounds of the desktop operating system game and now owns a huge percentage of that market. Some more recent hopefuls are Facebook, Ning and Google Maps. Goeffry Moore, in The Gorilla Game, does a great job of explaining the risks and rewards of trying to be a platform.

But not all platforms are created with the goal of being the gorilla. Large companies tend to breed platforms like rabbits and open source frameworks grow like weeds. At one point in 2004, it got so bad that someone created a framework-framework, the Keel Framework. Happily, it didn’t last long (see bile blog’s writeup if you don’t mind strong language). There is so much platform proliferation that simple economic factors do not suffice as an explanation.

Platform Proliferation
I think the ultimate cause of platform proliferation is simply that software developers love to build platforms. For many, building a platform represents the pinnacle of technical achievement. As I’ve said elsewhere on this blog, creative technical people are in this business because they like to make things that are valued by others. What can be more self-affirming than creating something that your peers use to build even better things? And other aspiring platform developers often reinforce this feeling by buying into a particular platform ideaology. For instance, there were people who actually used the Keel framework-framework. Go figure.

While in principal, this tendency toward platforms is neither good nor bad, in practice it can lead to disastrous results. This is because architectural decisions regarding platform choices stay with you for a long time and are very difficult to reverse. Here are some patterns of failure that are particularly damaging:

  1. You don’t need it. Platforms are expensive and complex. Often the disadvantages of using or creating a re-usable code framework far outweigh the rewards. Unfortunately once you make a decision, it takes a long time to find out whether the decision was a good one or a bad one.
  2. You made it yourself when you should have used someone else’s. This is so common that there is an acronym for it NIH (Not Invented Here).
  3. You chose the wrong one. Everyone who wants to use a platform has their favorite choice among many. The selection of a platform is based on a number of factors, some of which are only partially related to the business problem at hand. For instance, don’t choose a platform if you can’t find good developers for it.

choose-wisely.jpg
Choose Wisely
In any software development organization, there is a constant tension around where to put your money; in the platform or the application. Software developers love to make platforms so there is always plenty of pressure to either use or make a new platform. Yet the selection of a platform is fraught with peril. As the ancient knight said in Indiana Jones and the Last Crusade, “choose wisely”.

Later I will write about specific examples of platform peril from my career.

  • Share/Bookmark

Call me a data geek, but I can’t help myself. I’ve updated my Wikipedia contributor map based on my recent discoveries.

The problem with my earlier post was that I discounted the contributions of people who don’t make a large number of individual edits but add a lot of content. Aaron Swartz (see link above) suggests that there are different types of contributors. First, there is a small group of contributors who make a lot of edits but don’t add a lot of words. For example, they might revert vandalism, fix grammar, reorganize or categorize. Second, there is a larger group of contributors who don’t make a lot of edits overall, but add a lot of words each time they edit. Aaron believes that this group creates the bulk of Wikipedia content.

Of course both types are critical to the success of Wikipedia and the data below indicates to me that it is more of a continuum than a statistical grouping of contributor types.

Another problem with the original post is the total number of worldwide visitors per month (according to comScore, May 07) is actually 217 million. I was using the number of US visitors, which is 48 million.

So here’s the update:

Here are some interesting factoids culled from Wikipedia contributor statistics.

Compare the population of world countries to the Wikipedia contributors. In the hierarchy of users the vast majority of visitors to Wikipedia, 217 million of them, are readers; for the most part they don’t edit articles. Next are the Regular Contributors who have contributed more than 10 times ever. There are about 340,000 if those. Next are the 105,000 Active Editors who contribute between 5 and 100 times per month. Finally, there are the 10,000 Very Active Editors who contribute more than 100 times per month.

wikipedia-contributor-math-update.png

So if Wikipedia readers are like China, then the contributors are like Macedonia, Montenegro and Grenada. To extend this analogy to absurd extremes, Macedonia, Montenegro and Grenada do all of the work, have the highest GDP and provide humanitarian aid to China!

Some background math:

The most recent total contributor data on the Wikipedia stats page is from Oct 2006. I applied a 41% growth rate to all numbers to arrive at estimates for May 2007, based on growth in overall traffic as reported by comScore. The ratio of users to contributors is:

  1. Regular Contributors; 217M/338k = 642:1
  2. Active Editors; 217M/105k = 2055:1
  3. Very Active Editors 217M/14k = 15,585:1

Here’s the spreasheet I used to calculate this data. The inspiration to make this map came from the Strangemaps blog.

  • Share/Bookmark

Earlier I posted a map of Wikipedia contributors comparing the most active users with general Wikipedia users. In his essay Who Writes Wikipedia, Wikipedian Aaron Swartz proposes that the contributions of the “core” group of Wikipedia contributors are significantly overstated. In addition, he suggests focusing on this small, but important portion of the Wikipedia community to the exclusion of the broader group of contributors is a big mistake.

If this is true, then my Wikipedia Contributor Map oversimplifies what is happening (as was pointed out in a reader comment).

Aaron Swartz wrote a script to analyze the entire history of a group of about 200 articles and conlcuded that while the majority of edits are done by the “core group”, the actual bulk of the words in the articles were done by a much broader group of contributors who have “…generally had made less than 50 edits…“.

Some quotes:

When you put it all together, the story become clear: an outsider makes one edit to add a chunk of information, then insiders make several edits tweaking and reformatting it. In addition, insiders rack up thousands of edits doing things like changing the name of a category across the entire site — the kind of thing only insiders deeply care about. As a result, insiders account for the vast majority of the edits. But it’s the outsiders who provide nearly all of the content.

And when you think about it, this makes perfect sense. Writing an encyclopedia is hard. To do anywhere near a decent job, you have to know a great deal of information about an incredibly wide variety of subjects. Writing so much text is difficult, but doing all the background research seems impossible.

Here’s another paper that comes to a similar conclusion, with some great charts and graphs.


wikipedia-edits-vs-words.png

  • Share/Bookmark

Michael Mace, in his post Why is Apple porting its browser to Windows?, predicts a coming battle between companies vying for control of the rich browser interface. He speculates that Apple is porting its Safari web browser to windows as a kind of trojan horse in order to get the full Apple OS layer onto Windows systems.

…The war to come. This could set up a brutal competition in software layers, between Adobe Apollo, Microsoft Silverlight, Sun’s revised Java, Firefox’s platform, and Apple. Google fits in there somewhere as well, but it’s not clear if they’ll try to create their own platform or work with several other players…

Adobe is certainly in the running. Adobe already benefits from a huge adoption rate of flash players and a lot of new web applications are built to run in flash. The other day, I got a demo from a friend who is building an application for a new web startup. He built it using openlaszlo, an open source development framework for Adobe Flash and native web browser Dynamic HTML. It is clean an snappy and has great interactive animations.

On the other hand, I know others who are building exclusively with javascript / DHTML applications, using libraries like script.aculo.us and prototype and intend to stay away from flash or other “lock-in” frameworks.

I don’t like the idea of one company dominating – I believe it will be far better for everyone if none of the commercial vendors win this war.

– –

By the way, after going years with just Adobe Photoshop, I finally caved in to family pressure and ordered Adobe Creative Suite for my kids (educational license from Academic Superstore). It contains Photoshop, Illustrator, Flash, Dreamweaver and InDesign. Here’s their latest creation:

cosmokitty.png

Cosmokitty Magazine, Summer 07
10 Ways to Get More Food from Your Owner! And how to keep it that way!
How Cutestuff Stays Cute. And you can too!
How does he do it? Durango’s scratching secrets: 20 places to go!
Harmless? 15 fantastic and creative ways to get on the counter and not get caught!
(larger view)

  • Share/Bookmark

The other night my nephew told me he doesn’t contribute to Wikipedia much. He feels that all of the good articles are taken and he doesn’t want to waste his time editing an article only to have an editor or administrator revert his work because they jealously guard that territory already. I wonder if this is a widespread feeling?

In the the life sciences, the “carrying capacity” of a species is the population that an environment can support without significant negative impacts to the given species and its environment. A common example is White Tailed Deer population in the United States. In wild areas, the normal predator-prey interaction keeps deer populations in balance. In areas where people have removed predators, the deer populations can exceed the capacity of the local environment to the point where deer starve. Conversely, when the population drops below a certain point, the population is unable to sustain itself and disappears. Of course, there’s more to it than that. Modeling populations of organisms is a popular and notoriously complex subject of systems theory.

Perhaps certain collaboration “environments” also have a carrying capacity for contributors. This seems especially applicable to collaborations like a Wikipedia article where many people are contributing to a finite set of tasks (as opposed to a social network where there are as many tasks as there are people). If this is so, then there is a threshold beyond which every contributor you add to an Article actually has a negative impact on the Article’s community. Likewise when the population of contributors drops below a certain threshold, the health of the Article’s community suffers. Just as with animal populations, modeling this effect would be a complex task, since the “population” of contributors is very dynamic as is the “environment” in which the contributors operate.

  • Share/Bookmark

[Update 8-6-07: I've updated the wikipedia contributor map based on my recent discoveries. Please see this post for a better contributor map]

Here are some interesting factoids culled from Wikipedia contributor statistics.

Compare the population of world countries to the Wikipedia contributors. In the hierarchy of users the vast majority of visitors to Wikipedia, 48 million of them, are readers; for the most part they don’t edit articles. Next are the regular contributors who contribute between 5 and 100 times per month. There are about 77,000 of those. Finally, there are the 10,000 anchor contributers (I’ve borrowed this phrase from retail marketing) who contribute more than 100 times per month.wikipedia-contributor-math.png

So if Wikipedia readers are like China, then the regular contributors are like Macedonia and the anchor contributors are like the Barbados. To extend this analogy to absurd extremes, Barbados and Macedonia do all of the work, have the highest GDP and provide humanitarian aid to China!

[Update 7-16-07: Here's the spreasheet I used to calculate this data. The inspiration to make this map came from the Strangemaps blog. ]

  • Share/Bookmark

The Google Earth Blog recently mentioned an article by Michael Jones, Chief Technologist of Google Earth, in the IEEE “Computer Graphics and Applications” magazine. The article can be downloaded here.

Michael quotes from Rudyard Kipling:

I Keep six honest serving-men:
(They taught me all I knew);
Their names are What and Where and When
And How and Why and Who.

The rest of the article is devoted Google’s vision of “Where”. But I think there is also a hidden meaning in that particular analogy. The poem is from Just So Stories, The Elephant’s Child which is an allegorical children’s tale about the dangers and rewards of ’satiable curiosity (Kipling’s words). Here’s the full text:

I Keep six honest serving-men:
(They taught me all I knew)
Their names are What and Where and When
And How and Why and Who.
I send them over land and sea,
I send them east and west;
But after they have worked for me,
I give them all a rest.

I let them rest from nine till five.
For I am busy then,
As well as breakfast, lunch, and tea,
For they are hungry men:
But different folk have different views:
I know a person small —
She keeps ten million serving-men,
Who get no rest at all!
She sends ‘em abroad on her own affairs,
From the second she opens her eyes —
One million Hows, two million Wheres,
And seven million Whys!

The person small in this case was Rudyard Kiplings daughter, but we could easily substitute “company large and ambitious” in its place. Perhaps ’satiable curiosity is at the heart of Google’s success.

  • Share/Bookmark

The Wikipedia article on Neutral Point of View is an official policy statement, but it is not the kind of “policy” that is typically spewed by bureaucratic IT departments, corporate HR groups or local politicians. I find it inspiring.

NPOV policy is summarized as: “All Wikipedia articles and other encyclopedic content must be written from a neutral point of view, representing views fairly, proportionately and without bias.”

The reasoning behind the policy is beautifully written and thoroughly reasoned. Here or some of my favorite passages:

…A solution is that we accept, for the purposes of working on Wikipedia, that “human knowledge” includes all different significant theories on all different topics. We are committed to the goal of representing human knowledge in that sense, surely a well-established meaning of the word “knowledge”. What is “known” changes constantly with the passage of time, and so when we use the word “know,” we often enclose it in so-called scare quotes. Europeans in the Middle Ages “knew” that demons caused diseases; we now “know” otherwise….

…There is another reason to commit ourselves to this policy, that when it is clear to readers that we do not expect them to adopt any particular opinion, this leaves them free to make up their minds for themselves, thus encouraging intellectual independence. Totalitarian governments and dogmatic institutions everywhere might find reason to oppose Wikipedia, if we succeed in adhering to our non-bias policy: the presentation of many competing theories on a wide variety of subjects suggests that we, the editors of Wikipedia, trust readers to form their own opinions. Texts that present multiple viewpoints fairly, without demanding that the reader accept any particular one of them, are liberating. Neutrality subverts dogmatism. Nearly everyone working on Wikipedia can agree this is a good thing…

  • Share/Bookmark

Recently Michael Gorman at the Encyclopedia Britannica Blog has published a series of screeds against Wikipedia and Web 2.0. Clay and Danah at Many To Many have written some fantastic responses and even though the posts are longer than the average blog post, their writing is so luscious that it is worth the effort. Here are my favorites:

Gorman, redux: The Siren Song of the Internet

Knowledge access as a public good


  • Share/Bookmark

My family and I recently took a road trip from Philadelphia southward. Google Maps wanted to route us through the Washington D.C. beltway, which can be a parking lot at times, so I re-routed us through Fredrick which only added 5 minutes. Google Maps has the ability to add “waypoints” but it is a hassle and you have to know or guess at intermediate addresses, even if you don’t actually want to go to those places.

You may ask why I haven’t gotten one of those nifty in-car navigation modules – maybe I will write about that another time.

For years I have been irritated that Mapquest doesn’t allow you to add intermediate stops to driving directions. I know they could do it if they wanted to. When we were making the first version of Mapquest in 1994, we already had consumer CD-ROMs with very clean and user-friendly methods, usually involving right-clicking on the map. I suspect that AOL (owner of mapquest) has done some sort of focus group that indicated that people don’t care about this feature.

update – 6-30-07 —–
I just realized that Mapquest added a way to add intermediate stops a year ago. I guess I wasn’t paying attention! My apologies to Mapquest.
update —–

Well someone at Google didn’t get the memo because yesterday, they came out with an awesome way to add waypoints to driving routes, and it is better than what we were doing in 1994. As far as I can tell, this is the first time an online map is easier to use than the desktop applications from 13 years ago. I think it is an important turning point.Here’s what you do: First, plan your route.

google-map-routing-waypoint1.png

Second, grab the blue “route” line near Washington DC and drag it to Fredrick.

google-map-routing-waypoint2.png

Done!

  • Share/Bookmark

Phil Conrad, after reading my essay about engineering goodness, pointed me to Jeff Attwood’s blog, “Bridges, Software Engineering, and God”, which nicely demonstrates that software development has very little in common with classical engineering disciplines like Civil, Mechanical or Electrical Engineering.

Attwood concludes that “…Software development is unquestionably a profession, but I don’t think we can learn as much from the fields of mathematics or traditional engineering as is so often assumed…”

I completely agree with him and I share his frustrations with naive comparisons (I am speaking as a mechanical engineer, computer engineer and a software developer).

engineers-software-engineers-2.pngI think one cause of misunderstanding is that people tend to confuse the professions with the professionals. Even though software development and engineering are quite different, the people in those professions are very similar – engineers are just like software developers. I’m thinking in particular about “creative technical people.” You know the type; they are drawn to engineering and computer science because they like to make things. They are caricatured in literature, movies and TV as the typical inventor-geek.engineers-software-engineers.png

Of course, engineers and software developers vary in how well they fit this “technical creative” stereotype, and some don’t fit it at all. And some of very the best don’t have formal degrees in their fields but were instead irresistibly drawn to their careers from other professions. My children fit the mold, but in different ways; My daughter is more of an engineer and my son is more of a computer person.

Although engineers and software people must necessarily follow a totally different process for building things, their ultimate value to society is measured in terms of what they have created. This ties in nicely with my observations on engineering goodness; no matter what process you follow to create new things, you should measure your progress against some standard, evaluate how well you are doing and follow strategies to improve your record.This could be said of most endeavors, but for engineering and software development the application of the principal is the same.

  • Share/Bookmark

I ordered Cinema 4D for my kids who currently are using Bryce for 3D modeling. But I clicked on the wrong button

cinema-4d-mistake-3.png

and received the “CINEMA 4D WIN NON CG STUDENT/PROFESSOR” instead of the “CINEMA 4D MAC NON CG STUDENT/PROFESSOR.” In case you missed it, the kids use a Macintosh (a sweet Mac Pro – the quietest computer I’ve ever owned) but I bought the Windows version by accident. I didn’t notice the difference so I opened the box. Even though the software installs fine on the Macintosh, the serial number in the box won’t work. I found out from the Academic Superstore, the reseller, and Maxon, the manufacturer, that they won’t take a return on an open box but I can pay a $100 fine to switch the license from Windows to Macintosh. I must also sign a document promising not to use my Windows serial number. I tried to persuade them to cut me some slack for being honest but careless, but no dice. Luckily, after much groveling on my part, the reseller agreed to take back the open box for only 15% plus shipping. I am proud that I’m only paying a $55 stupidity tax instead of $100!

Obviously Maxon is concerned about software piracy but it is amazing to me that they haven’t been able to think of a better strategy than this. Software companies have more options than ever for creative distribution of their products, yet they seem to be stuck in the 1990’s. It reminds me of the Recording Industry Association of America’s battles over online music, which I hope will end badly for RIAA.

Cinema 4D is an awesome tool that is used to make movies and TV productions. Here’s one of Jonathan’s latest creations.

cinema-4d-blob2.png

  • Share/Bookmark