Wednesday, December 10

Reflection with Generics

I ran into some pretty interesting reflection problem today, and thought to jog down some notes here. In Java Generic, concept introduced with Java 5, is implemented pretty much as a compiler trick which performs the type check then generates the same byte code as the non-generic code, and inserts casts for the generic code automatically. The following two code samples will generate identical byte code by the compiler.

void doSomething(List ids)


void doSomething(List<Long> ids)

* That's also why code like: list instanceof List<Long> does not work in Java since this information is not available at runtime

So now lets imagine you try to use the reflection to find the method with a List<Double> at runtime, you will actually be able to find the method doSomething(List<Long> ids) and you can even invoke the method without any problem since at the byte code level this method is really taking a non-generic List. Now the problem arise, in the method since you already specify the generic parameter thus you probably will not check the element of the list again but rather use foreach loop as shown in the following code sample:

for(Long id : ids){

This code works fine without reflection since the compiler will force the type check, but while working with reflection you can basically pass in any List at runtime, hence you will get a class cast exception at the start of this loop. Fortunately the generic type information is not all gone at the runtime. They do get compiled into meta information for the class so using reflection you can retrieve this kind of information at the runtime, although I have to say I am not a big fan how this generic related type API was designed in the reflection package, its very inconvenient to use to say the best. If you are interested IBM Developer Works has an excellent article on this topic.

In summary, keep in mind generic is just a compile time transformation and be very careful when using reflection with generic.

Wednesday, December 3

Implement Flash socket policy file server

Recently I had to implement a Flash socket policy file server. Here are a few things I discovered while creating the server.

Why socket policy file server?

Since Flash player 9 you can no longer use the HTTP based crossdomain.xml file to specify cross-domain policy for socket based server as part of the on-going security enhancement from Adobe. Therefore a specialized socket file server (default on 843 for master policy) needs to be created to return the policy file(s) for the socket based client. See Policy file changes in Flash Player 9 and Flash Player 10 for more details.

Whats the expectatoin?

The basic idea behind this file server is fairly simple. The server will wait for the policy file request from the Flash player sent directly by the Flash player. The policy request is just a string of followed by a NULL byte (00), upon receiving the request the server will return the content of the policy file.

Whats the catch?

1. String chararacter set

In Adobe's document there was no mentioning of the encoding charset for the returning string, and based on my experiment I found the Flash player did not like the UTF-8 encoding in Java. Probably because Java DataOutputStream uses a modified UTF-8 encoding. At the end I found that ISO-8859-1 worked out pretty well with the Flash player, and since your policy file should not contain any unicode character anyway I think this is a pretty good approach.

2. Write timing

One interesting thing I found is although Adobe mentioned in their documentation the request handling should be in the following sequence:
  • Receive request
  • Verify request
  • Write policy content
  • Close socket
I found the Flash player is happy even if you just simply write policy content as soon as the socket is accepted. Although if you are writing your own file server you probably should stick with Adobe's spec not only because it makes sense but also because the undocumented Flash player behavior might change without notice in the future release.

3. Close your socket

In Adobe's documentation it mentioned that the Flash player will close the socket as soon as it received the policy file, but based on my experiment it seems that Flash player performs less consistent if the server does not close the socket right after the content is written.

Hope this information will help you save some time while implementing your own policy file server.

More reference:

Thursday, November 20

Exception vs. Error Code - round 1

Many times I have heard this argument in my career that throwing exception is too expensive therefore high performance code should always favor returning error code over exception. While it was probably true on early-day JVM, I am still not a big fan of this. First of all in my opinion the biggest drawback of extensive usage of error code is that you lose the most powerful tool that allows you to highlight your main flow from the exceptional ones. Since the main flow is probably the most valuable, most executed and most read code you should always thrive to make the main flow as easy to identify as possible. As Kent Beck suggested in his Implementation Pattern book:

Expressing all paths equally would result in a bowl of worms, with flags set here and used there and return values with special meanings. Answering the basic question, "What statements are executed?" becomes an exercise in a combination of archaeology and logic. Pick the main flow. Express it clearly. Use exceptions to express other paths.

Now consider the following code sample written using error code:

def input = input()
if(input.errorCode == IO_ERROR)
return new Result(IO_ERROR, ...)
// could be more error types here

def processResult = process(input)
if(processResult.errorCode == IO_ERROR)
return new Result(IO_ERROR, ...)
// could be more error types here

def output = output()
if(output.errorCode == IO_ERROR)
return new Result(IO_ERROR, ...) // could be more error types here
return new Result(SUCCESS)

If we can rewrite it using Exception, it will look like this:


* The code sample is in Groovy so the error code version is already somewhat shorter than the Java version. Although the Exception version will stay pretty much the same in both Groovy and Java.

As you can see it is obvious using Exception greatly simplify the main flow, and make it so much easier to read hence cheaper to maintain and easier to improve. But how about the performance argument that we always hear. Well... modern JVM is highly optimized for exception handling, although optimization strategy varies between vendors, some strategy focus on optimizing the normal path, others focus on the exceptional paths such as EDO, and the newest ones are adaptive at the runtime and smart enough to pick the right strategy for the specific scenario. In a simple test it took 30ms to throw 10,000 exceptions on my 1.6_07 32 bit JDK with Hotspot 10. Also remember cleaner code makes it much easier to pin-point performance bottle-neck and make improvement. In most cases, 90% of the performance slow down is usually caused by 3% of the code and in my experience the culprit was never the exception handling, and if it is then you have a much bigger design issue at hand than just merely exception handling.

What if someone argues that their main flow actually triggers the exception path way too often and it is causing significant performance degredation, for example during a DOS attack. My suggestion? Simple! If thats the case, your code is telling you this exception path is actually not exceptional but rather part of the main flow. For example if you server is expected to withstand a DOS attack then you can't really treat certain corrupted packets or pre-maturely dropped connection as exceptional cases anymore. In other words, if exception handling is causing you performance problem, you better rethink how the system is designed in the first place instead of simply replacing it with error code.

Last but not least, I would like to make it clear that I am NOT suggesting here that you should use exception as your first choice to control your flow, you should always use sequence, message, condition, iteration, and exception (in this order) to control your flow. Use exception to handle only the exceptional cases but don't simply dismiss it because some out-dated misunderstood performance concerns.

Wednesday, October 22

Oopsla 2008: Designed as Designer

Eassy by Richard P. Gabriel, it is as controversial as usual but also the most intriguing presentation of the day at least from my point of view. The essay is almost a follow up of Fred Brook's speech on Oopsla 2007 around the central argument of that conceptual integrity arises not (simply) from one mind or small number of agreeing resonant minds, but from sometimes hidden co-authors and the thing designed itself. A few points I took away from this presentation as well as the interesting Q&A session:

  • "First to the market wins" is merely a myth
  • Worse is Better - ship the rough product earlier to allow user to contribute and generate the real requirement. Perfection is the enemy of good.
  • People are often judged by their reward instead of their skills, and that's why CEO in successful (sometimes even failing) company are usually rewarded the most even when they have little impact on the success.
  • Software is implemented by the compiler and machine, every single line of code is a practice of design
  • The first draft of design is usually just a collaboration enabler so others can contribute with certain degree of conceptual integrity
  • By refusing fully knowing the world when you design you open the door for new insights and let the product itself to lead you to the truth
  • Facts are not truth. Simply because you customer is describing a certain feature to you does not mean that is what they really want or need.

Oopsla 2008: Comunity-Based Innovation: from Sports Equipment to Software

This presentation was given by Sonali Shah of University of Washington Business School. The speaker shared many of her research findings and insights into how community-based innovation is reshaping the business world today from sports equipment industry to cloth design and manufacturing to software. One really interesting point raised by this research as a by-product was that it found the most committed and innovative long-term open source developers all have a full time day-job but found their day-job is either not challenge enough or too restritive or both for them to innovate therefore they would rather spend their spare time to work on something else. At this age of uncertainty when many economists predict that the only way for North America to maintain the competitive edge is innovation, its puzzling to see that many of the companies out there fail to recognize that.

See Clay Shirky's excellent presentation at TED from a different angle on this topic, and my post "What can we learn from the Open Source community" in 2006 on how I think the companies can harvest this kind of passion and innovation.

Oopsla 2008: Social Programming A Pyramid, and possibly other lar

Very interesting presentation given by Mark Lehner on how ancient Egyptian managed to organize almost humanly impossible effort building those gigantic pyramids both physically and socially for the most part of their 3000 years of history. The speaker also drawn some, although sometime a bit far fetched, analogies from this type of massively labor-intensive undertaking to how software itself and project is being organized nowadays. One really interesting point I took from this presentation was that although the social organization contains highly modularized units, the final product of eternity (Pyramids) does not necessarily resemble any strict modularity but rather simply and straight replication. Which says something does it? Or maybe we are all trying too hard to find similarity, the proof or disproof of what we do and the way we are doing it.

Monday, September 29

Want Groovy?

Groovy as a dynamic JVM powered language (JSR-241) has been gaining a lot of momentum and attention recently, especially when the Grails - a web application development framework built using Groovy with design similar to Rails. Some people even started to consider Groovy as a better version of Java, which I don't personally agree but I do think that Groovy is better suited for many tasks typicall performed by Java triditionally, such as building DSL, creating dynamic framework, and more.

Despite of all the grooviness about Groovy, it is still pretty difficult for any organization to adopt this relately young technology due to a classical chicken-and-egg delimma. Before formal adoption in a coporation settings, most of us would like to try the language out, but without formal adoption in a real project its almost impossible to really evaluate and learn the language, so what do we do? One effective way I found to introduce Groovy into your organization is starting with writing your unit tests in Groovy first. Because its just the test code usually there is less red tapes on it and since it will never be deployed into a production environment typically its a lot easier to get approval for trying it out. With the help of Maven and Groovy plugin its actually quite easy to add some grooviness to your project.

Step 1 - Add GMaven plugin to your pom.xml





This definition basically tells Maven to use GMaven plugin to compile all *.groovy files under your standard test/java directory, which esentially allows you to write unit tests in both Java and Groovy.

Step 2 - Add Groovy runtime to your test classpath


By add this dependency to your dependencies will add Groovy runtime to your project and Eclipse classpath if you are using eclipse:eclipse plugin.

Step 3 - Install Groovy Eclipse plugin (Optional)

If you are using Eclipse, you might find its useful to install the Groovy plugin for your IDE although this plugin still has a few rough edges, it allows you to run your Groovy powered unit tests using GUnit which I found is a productivity boost.

Groovy! Now you are free to writing your unit and integration tests in both Groovy or Java, hence free to try out the language and feature at your own pace. Have fun.

Friday, September 26

New Open Source Project - Hydra Cache

Inspired by our recent experience and events, a few of my friends and I started a new open source project called Hydra aiming to provide the community a open source implementation of Amazon Dynamo in Java. The project design is based on the published papers and algorithms in public domain only and mainly Werner's paper on Amazon Dynamo. Currently the project is in its design and prototype stage. If you are interested in this project, check out our Wiki at and if you are interested to contribute as a developer please contact the project admins at Hydra Project Page.

Tuesday, September 16

Consistent Hash based Distributed Data Storage


In an ultra large enterprise application, such as an online e-commerce or online gaming site, the site is dealing with millions of users and thousands of transactions every second. To handle this kind of traffic the number of servers, routers, databases, and storage hardware makes hardware or network failure a norm instead of an exception. Despite of the constant hardware failure in your system, your customer will not tolerate the slightest down time; the more successful your system is the more important it becomes to your client, and less happy they are when they experience an outage.

Solution -

To solve this challenge we need a highly available, decentralized, and high performance data storage service shielding the application from the harsh reality and complexity. No exception will be thrown when hardware or network failure occurs and the application code can always safely assume that the data storage service is available to write and read at any given point of time. This data storage service also needs to be evolutionarily scalable since down time is not acceptable, thus adding new node and storage capacity should not require shutting down the entire system, and it should only have limited impact on the service and it's client. A bonus side effect of this solution is that the distributed data storage can also act as a distributed cache system to reduce the hit to the persistent data storage such as a relational database.

Design -

- Consistent Hash

In a large online application, the type of data that require this kind of high availability are usually data that can be identified by a primary key and stored as binary content, for example user session(session id), shopping cart(cart id), preferences(user id), and etc,. Due to this nature, a Consistent Hash based distributed storage solution was proposed. Consistent Hash algorithm was initially introduced in Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web by David Karger in 1997. The key benefit of Consistent Hashing is that hashing any given key will always return the same value even when new slots are added or removed. The principle of this design can be illustrated using the following diagram. Imagine a virtual clock that represents an integer value from -231 to 231-1, and each server (A, B and C) in the storage cluster has a hash value assigned, hence for each given key (K1 and K2) can only land somewhere between these server nodes on the clock. The algorithm will search the clock clock-wise and pick the first server it encounters as the storage server for the given key, and because the hashing algorithm is consistent therefore any future put or get operation is guaranteed to be performed on the same node. Moreover the consistent hash algorithm also minimiz the impact for adding and removing node to its neighboring nodes instead of the entire cluster.

Challenge 1: Non-Uniformed Distribution

The first problem we need to solve is that the server hash value are most likely not uniformly distributed on the clock, as the result the server utilization will be skewed which is hardly an ideal situation. To solve this problem we are planning to borrow the idea discussed in Werner's paper on Amazon Dynamo by creating virtual nodes for the servers, and when you have enough virtual nodes created on the clock a close to uniformed distribution can be achieved.

Challenge 2: Availability and Replication

To provide high availability, the stored data need to be replicated to multiple servers. Based on the algorithm we are employing, in the case of a server failure any data stored on this specific server will automatically become the subjacent server's responsibility as shown in the following diagram.

Therefore our replication strategy is quite simple that every node will replicate the data it stores to it's immediate subjacent neighboring server. You can also include more than two servers in the replication group for even higher availability, although in our project I believe paired availability server group will provide desired availability without introducing too much complexity.
Open Source Alternative -

Open source Consistent Hash based distributed cache Memcached is built based on the similar design but without the high availability replication capability. It expects the application code being able to recover and restore the cache when a node becomes unavailable, which is usually an acceptable alternative for replication based availability at the cost of performance penalty during outage and increased code complexity. I usually recommend Memcached over custom in-house distributed cache system, since its proven, free, and a lot less work; what more can you ask :-)

Further Improvement -

Although its currently not in the plan but down the road there might be a need to implement Vector Clock based object versioning and merging capability to support simultaneous writes on multiple nodes, which is crucial for maintaining data consistency during partial system failure. Currently we are simply planning to employ "last-write-win" strategy to resolve the conflict.

Related Readings -

Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web

Amazon Dynamo

Tim's Blog on Consistent Hash

Thursday, September 4

Enable USB in VMWare under Ubuntu 8.04 - Hardy Heron

Just figured out how to enable USD controllers in VMWare windows instance under Ubuntu 8.04 - Hardy Heron, thought it might be useful to share it here in my blog.

If you just upgraded to Hardy Heron, you will notice that the USB are no longer working for your VMWare instance, its because that the Ubuntu development team removed /proc/bus/usb mount, and thats what VMWare depends on for detecting USB devices. To re-enable this is actually quite simple, just modify the device mount script "sudo vim /etc/init.d/" and uncomment the following part:

# Magic to make /proc/bus/usb work
mkdir -p /dev/bus/usb/.usbfs
domount usbfs "" /dev/bus/usb/.usbfs -obusmode=0700,devmode=0600,listmode=0644
ln -s .usbfs/devices /dev/bus/usb/devices
mount --rbind /dev/bus/usb /proc/bus/usb

Then restart the mount "sudo /etc/init.d/ start", now you should be able to add USB device to VMWare instance again. Have fun!

Sunday, August 31

Overlay DHTML popup on top of a Flash movie

Just solved a tricky overlay problem, at least a pretty tricky problem for someone like me who do not work with Flash a lot, and thought to jog down some notes here in my day log.

I ran into a overlay problem today when using RichFaces Calendar component with Open Flash Chart, for some reason the pop-up date picker overlay generated by the Calendar component kept staying behind the Flash movie used by Open Flash Chart. Firstly I tried playing with z-index with both of the calendar and flash div had no luck. After some research found out that apparent this is caused by how browser treats Window and Windowless flash differently. By default flash has its window mode set to "window", and as result the browser will always render it on top level in its own window, therefore no matter how you set the z-index the overlay will not be able to go over on top of the flash movie. To fix this issue you need to change the flash wmode parameter to either "opaque" or "transparent" by adding a new parameter in the object tag:

<param name="wmode" value="transparent"></param>

or wmode attribute to the embed tag:

<embed src="..." wmode="transparent"></embed>

To have the Flash rendered in the Windowless mode, which will allow the browser to render Flash as a regular internal layer hence can be covered by a DHTML overlay.

Note: Adding wmode only to the object tag will work for IE but not Firefox.

Tuesday, August 12

Java a OS?

For the last several weeks I had opportunities working on three vastly different projects, one built using old school Java + JDBC, one with JSF + Hibernate + Spring, the other one using Groovy + Grails. The experience of joggling among these three quite different projects every day (one with my day job, the other two with my night job - open source) actually reminded me a very interesting question that was asked during the 2008 Java One during James Goslin's talk. One of developer from the audience asked him a question like "Now Java has grown to a tremendous size with multiple languages supported, it feels more like an OS than a language" (can't really remember the exact question). James' answer was surprising straightforward, he said something like "Yes, that's what was intended from the very beginning" (again can't really remember the exact answer :-)

My experience in the past weeks showed me how much more productive and natural you can get while using the same set of API and Libraries (OS) but a different language. Even the best framework in Java can not stack up with the productivity you gain from Grails, based on my personal estimate the JSF/Hibernate/Spring stack is probably 3-4 times more productive than the simple Java approach. However productive it is comparing to the naked Java approach, the Groovy/Grails combo out-performed it about another 3-4 times (sometimes even more for the CRUD operations), mostly due to the opinionated framework, dynamic methods generation, code generation, and various ready to go plug-ins. If you look at the end result of the generated code by Grails, its pretty much as good as a Java application can be - a standard Spring + Hibernate architecture with full JEE compatibility enhanced with any plug-in you choose to deploy, built-in Ajax support, and fully unit testable with mock objects plus automated integration testability. So if you ask me now if I will ever start building another Java web application without using Groovy/Grails, I can hardly come up with a scenario that I will go with Java alone. Just as though you can implement the most sophisticated web 2.x site right now using C and CGI API alone but at what cost?

With JVM being a general purpose virtual machine or portable OS, we can predict many special purpose languages and DSL will be developed down the road in our never ending quest for the silver bullet. Even today you can already see the trend, people are now using Groovy or Jython for scripting, Grails or JRuby + Rrails for web development, Scala for concurrency or library building, and with of course a few touches of old school Java here and there some times. While for sure an exciting landscape for the Java community is ahead of us, it also reminds me a bit of the time when Java was still a new born baby, people were excited about Java and using it as an easy language to glue C and C++ code together through JNI, now as a full grown adult its time for Java to pass the torch. An exciting time indeed :-)

Thursday, July 17

Project Vital Sign Charting Spreadsheet

Recently I read the article in Thoughtworks Anthology called Project Vital Sign written by Stelios Pantazopoulos. In this article the author proposed a few type of charts that an Agile team can produce, usually by the Team Lead or Iteration Manager, as Information Radiators to improve communication among team members as well as with stake holders.

After reading the article, I suddenly recalled a conversation I had a while back when working with a PM who is relatively new to the Agile landscape. She asked me a question that at the time I thought the answer was obvious, she asked that how can you find out if an Agile project is in trouble or on schedule under budget. At the time my answer was, if you attend every kickoff and retrospective meeting then you can pretty much tell from the story board. She left with a puzzled look on her face, apparently what I thought was obvious was not obvious at all to some folks on the team. If this is the case for a PM who works pretty much everyday with the developers in the trench, then you can imagine the disconnection and difficulty a less technical senior manager would face when he/she tries to find out the status of an Agile project. One of the contributing factors* to Scrum's rapid adoption rate in larger corporation is the Burn-Down chart it produces which clearly communicates project status to anyone who would like to know.

But thanks to Stelios' article now we can generate several very useful charts for any Agile project for both the developers as well as any one who is interested to know the project status including the senior managers. I am planning to use some of these charts in my next project, and created spreadsheet template based on the suggestion in the article. I have uploaded this template, please feel free to modify and use it in your project, and let me know if it turns out useful at all for you.

* Other factors are Long Iteration (less agility) and a nice title for the PM (Scrum Master) plus certification program, and of course as always better marketing ;-)

Wednesday, July 16

Cygwin and Maven problem

I ran into a pretty nasty problem while running Maven under Cygwin yesterday. Why? since all workstation at my current client's site are Windows based. The problem happens when I replaced Windows CMD with Cygwin Bash by adding Autorun key under 'HKEY_CURRENT_USER\Software\Microsoft\Command Processor' in the registry. After that Maven stopped working, took me a while to pinpoint the problem, although all other Java applications run just fine including Eclipse, Groovy, and Tomcat.

Finally I gave up replacing the Windows CMD with Bash. At the end, I decided to run all my console using an open source program which allows you to have multiple tabs of Cygwin console running on your Windows machine plus some eye candies. This setup worked out pretty good for me so far.

Tuesday, July 15


Recently I had a few discussion with different developers regarding to server architecture, and to my surprise that few really understand what SEDA is and is not, so I decided to jog down some of my thoughts here and hopefully can clear the things a bit. SEDA - Staged Event-Driven Architecture - the de facto industry golden standard for implementing scalable server. SEDA was firstly introduced in 2001 by Matt Welsh, David Culler, and Eric Brewer, see the original paper. The common misunderstanding about SEDA is many developers believe "SEDA is a high performance architecture", it is not, actually based on my experience implementing the SEDA model usually means sacrificing 10-20% performance. The main problem that SEDA addresses is graceful degradation in server scalability not the performance. What graceful degradation (also known as well conditioning) means is when your server experience an extremely high burst, for example 100x or more volume than the average traffic. While it is certain that the server will not be able to handle the burst but instead of becoming non-responsive or simply crash, ideally we would like to have the server performance degrade gracefully for instance maintain the quality of service for existing clients but rejecting all new clients with a user-friendly message, that's where SEDA comes into the picture. Here is some comparison between common server architectures and the problem while handling this kind of burst.

1. Thread-based Concurrency (The vanilla one thread per client model)

This model does not scale very well* and will not be able to handle the burst and eventually become completely unresponsive or crash the server due to resource exhaustion.

* When I use the word 'scale' here I mean scale to tens of thousand of sockets or more. This simple minded model can scale pretty well on modern operating system (Linux kernel 2.6+ and Windows NT 4+) and shown superior performance with relatively small number of threads (a few thousands), so if you server is never expected to handle tens or even hundreds thousands of sockets this is actually a pretty good architecture.

2. Bounded Thread Pool

To solve the over commit problem with the model #1, thread pool implementation was introduced and widely used, and since the thread pool has a max number of threads configured therefore it is impossible for the server to create unlimited number of threads which leads to the resource exhaustion problem in model #1. But this model can introduce great deal of unfairness during the saturation since when all the threads are busy from the pool all requests will be queued up, thus the server service quality degrades rapidly as soon as it starts reaching the limit of max thread pool size. This degradation is especially fatal for stateless request-and-response based protocol such as HTTP.

3. Event Driven Concurrency (Async Non-Blocking IO)

Event driven server design relies on non-blocking IO to processing each task as a Finite State Machine (FSM). The thread only works on a task when receiving an event from the scheduler informing there certain operation, read or write, is available to be performed. This kind of design usually is implemented by a single thread. Although with some additional complexity in programming, this model scale fairly well with even millions of tasks and maintaining consistent throughput. Although massively more scalable, this model still does not address the fundamental problem during durst, when reaching the saturation point the task processing latency increases exponentially, so this model just simply postpones the problem instead of solving it.

4. Staged Event Driven Architecture (SEDA)

To address the problem in straight event driven model, SEDA introduced a new concept Stage. Instead of processing each task individually, SEDA breaks the process procedure to multiple stages, and for each stage a dedicated scheduler, event queue, and thread pool are implemented. The main benefit of this architecture is because of the multiple stages design you end up having multiple response measuring point for implementing request shedding. For example a SEDA based web server can implement shedding logic at the second stage when normally a dynamic page (JSP/PHP/ASP) would be executed, but while experiencing a burst the second stage event can be reroute to an alternative queue where simple static but user friendly content can be returned to signify that the server is overloaded hence providing some user friendly feedback while protecting the server from resource exhaustion at the same time. Of course SEDA also provides some additional benefit such as dynamic resource allocation and easier code modularity, but nevertheless the biggest benefit is no doubt the graceful degradation capability.

Some final notes on SEDA:

In my experience, I found SEDA model actually behave the best when there is not too many stages implemented, usually 2-4 stages.

Interestingly enough, Enterprise Service Bus (ESB) architecture actually resembles SEDA model at a much larger and higher level, but because of the resemblance ESB architecture has also shown excellent massive concurrency and graceful degradation capability. ESB is a good architecture of choice for well conditioned massive concurrent enterprise integration system, if you do it right of course ;-)

Thursday, July 10

Set up Felix OSGi container with Maven and Spring in 20 mins

Recently I had a chance to try out the Apache Felix an open source implementation of OSGi R4 Service Platform and found there is not a lot of document regarding how to setup your development environment for OSGi, that's why I decided to record some of my finding and learning experience here, and hopefully will shed some light on this issue.

My goal, when I started this exercise, is firstly to use Maven 2 as the build management tool so I can setup an OSGi project just like any other Java project plus easy integration with any continuous integration tools out there. Second, I wanted to setup Spring as the micro container to manage all the wiring and all the neat aspect oriented programming stuff. Personally at the beginning I did not know exactly how well Felix and Spring will mix together, but roughly I had the idea to use OSGi for service level dynamic module management and Spring for lower level wiring. Ok, enough intro lets do some coding.

a. Download Felix binary from
Apache Felix

b. Start Felix by running 'java -jar bin/felix.jar', and just type any name for the profile name
Note: running java -jar in bin folder directly will not work

c. Type ps in Felix shell will show you a list of bundles that are already loaded. Now type shutdown to stop Felix.

d. After some research and trial-and-error, I found the Maven spring-osgi-bundle-archetype is the best fit I can find as a starting place for my little project. Type 'mvn archetype:generate' and pick 32 for spring-osgi-bundle-archetype.
Note: I am using maven 1.0.9

e. Run 'mvn clean install' will actually produce a valid OSGi bundle already without any coding. Try it out.

f. Restart Felix and in the shell type 'install file:path-to-your-bundle-file' should install the bundle you just created, yes its that simple, use 'ps' to check it out. Now you can start or stop the bundle.

g. Lets implement the Activator tutorial from Felix with our newly setup Maven project. See code sample and explanation here. Now if you do another 'mvn clean install' and run 'update #bundel-number' in Felix and expect to see something, you won't. Why? Because we haven't configure the Activator class as a bundle activator. To do that you need to first remove the auto generated Felix plug-in version number 1.0.0 from your pom.xml, since the old 1.0.0 plug-in does not support this configuration. After removing the version number, you need to add


under plug-in configuration.

h. Now run 'mvn clean install' again, then type 'update #bundle-number' in Felix shell, now whenever you start or stop the bundle you will see the log message got printed on the screen.

As you probably already noticed that the Maven project has already Spring configured for you, so you are pretty much all set at this point to start developing a OSGi application. Alright hopefully this setup did not take you more than 20 mins, unless you have a super slow internet connection ;-) Last but not least, you can also integrate Felix inside your Eclipse IDE for debugging and profiling purposes, see for more details.

Have fun with OSGi and feel free to let me know your experience trying this setup out.

Wednesday, July 9

Interview Phantom Read

I have been conducting both technical and management interviews for quite a few years now, and occasionally had to sit in a few of these interview as well. Recently just a few weeks ago, when I was sitting in the meeting room and conducting an interview with three other interviewers, one interviewer asked a question "if you encounter ..... this type of scenario, what would you do?"and I could not stop but start thinking that this type of question is useless at best and usually misleading. The reason is simple since the scenario in the question is hypothetical, thus the interviewee can fabricate an answer without worrying about any kind of constraints existed in the reality. I am not saying everybody will lie under this kind of circumstances, but the problem is you can't verify whether it is a lie or not, since the whole thing is fabricated. In most cases, I found that the interviewee will give you a perfect answer, or a solution that they would like to perform if they are working in an ideal world, I call this kind of answer - Phantom Read. If you buy into this kind of answer you get, you will probably end up hiring the person that the interviewee would like to be in the ideal world but not the actual person sitting in the room, in other words the Phantom.

So what is a good question then? A good question should always be based on the actual experience, sometimes a mere description of what they did could be the best answer you will need. Usually you can comfortably lead to this kind of question by simply asking about the past project experience, and then ask "As you mentioned ..... could you also tell us about what you did when ..... happened?" A follow up question like "If you get to do this all over again, what would you do differently to improve ..... " can provide further insight into your candidate's thinking process and self learning capability from their success or failure.

Monday, June 23

Porting Container Managed Authentication (CMA) to Spring Security 2

Last week I ported one of my open source project from Container Managed Authentication (CMA) on Tomcat 6 to Spring Security 2, and decided to record some of my finding here. Spring Security 2 offers a wide range of support for different types of authentication mechanism and also allows you to centralize all security related configuration in your Spring context xml file, plus like its predecessor Acegi it allows you to fully customize every step of authentication process which is way more flexible comparing to CMA in my opinion. Enough introduction, here is the steps I performed for the porting.

Step 1 - Add Spring Security dependency in your POM:

If you are not using Maven, you need to download Spring Security library manually and add it to your project build path.

Step 2 - Remove CMA security configuration in your web.xml:

Including all security-constraint, login-config, and security-role elements in your web.xml file

Step 3 - Remove Tomcat security realm configuration:

My realm configuration was defined in META-INF/context.xml file in the WAR, but it could also be defined in server level conf/context.xml file.

note: till now your CMA configuration has been completely removed

Step 4 - Add Spring Security filter chain in your web.xml:




note: don't worry about where the chain is, for now ;-)

Step 5 - Create Spring Security context xml file




As you probably noticed already, two spring beans referred in this xml file have not been mentioned yet - authenticationService and passwordEncoderService. AuthenticationService is a custom class that implements UserDetailsService interface which is responsible for loading the user details for authentication purposes. You can use Hibernate if you have mapping setup for user and role entities already or lighter weight iBATIS or even raw JDBC call for the implementation. PasswordEncoderService class, implements PasswordEncoder interface, was created to help Spring Security compare encoded password, although Spring Security comes with some build-in encoders but I could not find a match for Tomcat's MD5+Hex (Base 16) style encoding therefore provided my own implementation, see Tomcat documentation and source code for details. Both beans are declared using annotation and autowiring.

Step 6 - Change your login form submit target

Change your login form submit from j_security_check to j_spring_security_check so it can be processed by Spring Security instead

Now you should be able to login as usual without changing anything in your database through Spring Security, but some of your pages might not be rendering correctly. That is because some of the method calls on HttpServletRequest do not return correct value anymore, such as getRemoteUser(), since default Tomcat HttpServletRequest implementation is not aware of Spring Security SecurityContext therefore you need to provide a wrapper class that can correctly translate these calls to return the right value. Luckily Spring Security has these wrappers build-in already, all you need to do is add an extra filter in the filter chain.

Final Step - Add securityContextHolderAwareRequestFilter


Now the porting is finally done. Hopefully through this example you can see the power and flexbility of Spring Security 2, with much simplified configuration comparing to Acegi it is indeed a well designed and robust security framework deserve much consideration .

Thursday, June 12

Be aware the implication when using Spring JMS template with JBoss Messaging

This is a daylog note that I keep for myself and share with anyone that is interested to know. We all know the convenience of using various of template classes that Spring framework provides, but seldom we pay much attention to the implication and implicit design choices made by Spring, and recently while working with JBoss Messaging I almost missed something critical due to this kind of implication.

JmsTemplate implementation for JMS 1.1 made a crucial design choice of relying on application container to providing JMS connection pooling and caching. According to the API document:

The ConnectionFactory used with this template should return pooled Connections (or a single shared Connection) as well as pooled Sessions and MessageProducers. Otherwise, performance of ad-hoc JMS operations is going to suffer.

Spring framework assumed that most of the application server will provide a pooled JMS connection, but apparently JBoss does not think so and even openly declared that they do not provide any support if you running into any problem using Spring JmsTemplate, in this JBoss Messaging Wiki page they mentioned:

The Spring JMSTemplate code employs several anti-patterns, like creating a new connection, session, producer just to send a message, then closing them again.


This not only results in very poor performance, but can also make you run out of operating system resources such as threads and file handles, since some of the connection resources are released asynchronously.


Please note that JBoss / Red Hat will not support people using the Spring JMSTemplate with JBoss Messaging apart from the one acceptable use case for the reasons outlined above.

That being said, I am not sure that JMSTemplate actually employs anti patterns, as serious as they mentioned, in my opinion its a rather a design choice that JBoss messaging decided not to support. The recommended solution for small scale JMS usage is to decorate your connection factory with SingleConnectionFactory, this factory always returns a shared single connection wrapper and ignores the close method, which will work for most of the low volume scenario. On top of that, the new JBoss Messaging clustered connection factories, when used with clustered destination, allows seamless fail-over using a single connection which
makes this a pretty reliable solution in a clustered JMS environment. For high volume scenario, my suggestion is to use JCA managed connection factory within JBoss application server or implement your own JMS connection factory bean with connection caching or pooling outside the container (check out open JCA container implementation - Jencks) to eliminate this problem, if you are determined to work with JMSTemplate implementation.

Other references:

Tuesday, June 3

How to hire cheaper talent and retain them

If you are wondering where you can find cheaper talent, check out Mr. Fowler's new hypothesis:

That's great now you got "cheaper" talent, but you will also realize that they are actually quite picky and keep jumping ships. Don't worry, the following post reveals the secret of retaining these "cheaper" talent (thanks to David for introducing this post).

People are not asset, the right people are!

Monday, June 2

TinyMCE the best replacement for Tomohawk inputHtml

Recently I upgrade one of my web application from JSF 1.1 to 1.2 core, and this specific application was built with MyFaces Core + Tomahawk + Facelet + Richfaces + Ajax4JSF. After the upgrade everything pretty much worked out-of-box, the only part that broke was the Tomahawk inputHtml component due to a known issue Tomahawk issue 1088, but since Tomahawk currently does not officially support JSF 1.2 there is really not much we can expect.

After this little setback, I did some research and found that TinyMCE seems to be a nice rich text editor implementation and based on the MyFaces Wiki documentation it turned out was a quite strightforward integration. Although I did end up had to do a little bit enhancement on top of the suggested implementation, the reason for the enhancement was because I had more than one text area on the page but I did not want to turn all of them into rich text editor. The only change I had to do was using 'editor_selector' instead of 'mode' while initializing the TinyMCE script.

mode : "textareas",
theme : "advanced",
editor_selector : "mceEditor",
width : "640",
height : "480"

After that all you need to do is just set your textArea style class to 'mceEditor', which will automatically convert your textArea to a nice rich text editor.

My environment info:

MyFaces - 1.2.2
Tomahawk - 1.1.6
Facelets - 1.1.11
Richfaces - 3.2.1.GA

Saturday, May 31

Hibernate 3 session createCriteria returns duplicated entities

Just ran into a minor problem with Hibernate 3, and thought might worth recording the solution and notes here.

Problem statement:

When using Hibernate Session.createCriteria(Class persistentClass) method to fetch a list of entities based on the given class type, the return result will contain duplicated instances if the entity has many-to-many eager-fetch relationship, see HB-520 for more details on this issue.


For my specific problem, the solution is pretty simple all I had to do is add a transformer


this way Hibernate will remove all duplicated entities from the return list. Although this simple solution will not work for more complicated scenario, when you have pagination or other requirement, in which a custom query might be used to provide more control and flexibility.

Code syntax highlighter for Blogger

In my previous post, in order to display some xml code nicely on Blogger I did some research and finally found a nice little javascript/css library called SyntaxHighlighter which does pretty awesome job for most of the mainstream programming languages. On top of that it is also extremely easy to setup and does require any server side programming capability from your hosting company. It also has a one neat feature to remove the annoying "br" line breakers that Blogger automatically generates for you.

Give a try, it will make it a lot easier and prettier for you to post some example code on your blog.

Thursday, May 29

Integrating Spring Security 2 with Active Directory

Recently I worked on getting newly released Spring Security (formerly known as Acegi Security) to work with Microsoft Active Directory LDAP server. Although the configuration for Spring Security has massively improved comparing to the early days of Acegi, however since Active Directory has its own format plus some bugs in the early release (I am using 2.0.1 right now since thats the latest one in public Maven repository) therefore integration is not as straightforward as I expected. Thats why I decided to record the finding here in my Daylog.

Firstly we need to setup http security:

here it was configured to protect everything under protected folder using Basic authentication and also forcing the HTTPS protocol.

Secondly, we need to connect to the LDAP server:

Then, we need to let Spring Security know where to search for the users:

in my case the search base is "ou=Offices" but based on your LDAP setting it might be different. The strange looking "(sAMAccountName={0})" is the Active Directory specific syntax for matching the user name.

Last but definitely not least, we need to setup our authentication provider:






If you are familiar with Spring 2.x configuration you probably will start asking why all of sudden I switched from name space based security configuration to manual bean based approach. The reason is that a known bug (SEC-836 - its fixed in 2.0.2 release which is currently not yet available in Maven) in Spring Security prevent the group search from scanning the sub-tree
, therefore if your group tree has multiple levels the search will not return the right result.

Last note, all your roles defined in your directory will be returned in upper case with "ROLE_" prefix appended. This configuration was created and tested with Spring 2.5.2 and Spring Security 2.0.1.

Saturday, April 12

The Death of a Start-up

As a company grows and becomes more complex, it begins to trip over its own success - too many new people , too many new customers, too many new orders, too many new products. When was once great fun becomes an unwieldly ball of disorganized stuff. Lack of planning, lack of accounting, lack of system, and lack of hiring constraints create friction. Problems surface - with customer, with cash flow, with schedules.

In response, someone (often a board member) says, "It's time to grow up. This places needs some professional management." The company begins to hire MBAs and seasoned executives from blue-chip companies. Processes, procedures, checklists, and all the rest begin to sprout up like weeds. What was once an egalitarian environment gets replaced with a hierarchy. Chains of command appear for the first time. Reporting relationships become clear, and an executive class with special perks begins to appear. "We" and "they" segmentations appear - just like in a real company.

The professional managers finally rein in the mess. They create order out of chaos, but they also kill the entrepreneurial spirit. Members of the founding team begins to grumble, "This isn't fun anymore. I used to be able to just get things done. Now I have to fill out these stupid forms and follow these stupid rules. Worst of all, I have to spend a horrendous amount of time in useless meetings." The creative magic begins to wane as some of the most innovative people leave, disgusted by the burgeoning bureaucracy and hierarchy. The exciting start-up transforms into just another company, with nothing special to recommend it. The cancer of mediocrity begins to grow in earnest.

--- quoted from "Good to Great"

Although sound unrealistic, but if you have experienced of seeing a start-up company tripping over its own success and eventually getting bog down, even killed sometimes, by the exact bureaucracy and hierarchy created to save them, you know how accurately Jim Collins illustrated the scenario here. I am always amazed to see different start-up company with different style of leadership in completely different industry repeating the same story over and over again. I have personally witnessed two of such melt-down, and perhaps witnessing another one with my present job. It used to puzzle me, so "How can a somewhat successfully start-up company transform into not just another company but a good successful company or even maybe a great company without losing the innovative culture?". If you have been haunted by the same question, or working for a company that is going through the transformation stage, or founding a start-up company, check out the Good to Great by Jim Collins (if you haven't), although most of the book is about how a good company can transform itself to a great company, there are also a lot of insight and systematic thinking that I personally found priceless for any entrepreneur.

Just remember the only purpose of bureaucracy and hierarchy is to compensate the incompetence, and whatever you believe they are not the magic bullet to transform the company.

Saturday, March 22

Spring 2.5.1 auto wire annotation has problem working with inner class

This afternoon I ran into a problem with the new auto-wiring annotation provided by Spring 2.5.1. It seems that the @Service or @Component does not pick an annotated class if it contains an inner class. For example:

public class A{
private class B{

With this kind of structure, for some reason Spring annotation scanner will just simply ignore the class A in this example. For now the workaround I found is instead of using inner class, use a default scope class. For example:

public class A{

class B{

Or fall back to the old style xml bean definition for any of your service bean that contains inner class.

Currently the Spring issue tracking system is down, as soon as it comes back online I will see if this is a known issue, if not will create a issue for this problem.