Woozle Wuzzle
In vogue languages

I'm often asked why I don't hop on the lastest language bandwagon and just start coding up a storm. The answer comes in two parts: the first is that I do try out these languages to see what the hype is all about, to see where they can fit in and to see their pros and cons. The second is that I realize that there is more to software engineering than just writing code. Software spends disproportionately more time in maintenance than it does in initial development. Just because a language such as Ruby is much faster for initial development doesn't mean that it's much easier to maintain. (Do note that I'm not saying that Ruby is hard / harder to maintain. I'm simply saying that a one cannot determine what the maintenance model for a language is from doing only initial development.) The long and short of all of this is that I am forced by my professionalism and my responsibilites to not only look at how a language works for initial development but also for long term maintenance. By definition this means that it takes me a very long time to determine if a language is suitable in the long term. Since many newly in vogue languages simply haven't been out long enough to have either the community's or my own understanding of its maintenance model one simply cannot start writing production code with them.

One quick example of all of this is AOP. I'm enamored with AOP but I cannot and will not use it in production software. The reason is that AOP simply does not have a maintenance model at all. In other words, I cannot take an AOP'ified application and future apply AOP on it (i.e. maintain it) and have understandable and determinable effects.

Editor's note: this is a stream of consciousness posting to get an idea down and is not complete or thorough in any way. But as always, comments are welcome.

Java instance initializers

I have been playing around with some UI mock ups lately and it is often that I have code that looks like the following:

    ...
    add(new Label("Some text"));
    ...

If I want to see what Some text looks like in a different color or font then I need to extract the label creation to a local variable and then set the appropriate members. Since I'm a lazy lazy man I was getting tired of this. Luckily I remembered about instance initializers. Because of instance initializers you can do the following:

    ...
    add(new Label("Some text") {{ setFont(someFont); }});
    ...

{{ and }} are not new tokens. Writing it another way makes it more clear:

    ...
    add(new Label("Some text")
    {
        { 
            setFont(someFont); 
        } 
    });
    ...

The only question that remains is: when is the instance initializer executed? You can read section 8.8.5.1 of the JLS if you want. (The problem that I have with section 8.8.5.1 is that it is for explicit constructor invocations which doesn't seem to be the case for this example but I cannot find another reference to instance initializers in the JLS.) The code below provides a clearer answer to the question.

    class PreTest {
    
        {System.err.println("PreTest First");}

        public PreTest() {
            System.err.println("PreTest constructor");
        }
    
        {System.err.println("PreTest Second");}
    }

    class Test extends PreTest {
        {System.err.println("Test First");}

        public Test() {
            System.err.println("Test constructor");
        }
    
        {System.err.println("Test Second");}
    
        public Test(final int number) {
            this();
            System.err.println("Test constructor(" + number + ")");
        }
    
        {System.err.println("Test Third");}
    
        public void method() {
            System.err.println("Test method");
        }
    
        {System.err.println("Test Fourth");}
    }
    
    class Dummy {
        public Dummy() {}
        public void go(Test test) {}
    }
    
    final Dummy dummy = new Dummy();
        dummy.go(new Test(10) {{ method(); }});

When executed, the following is output:

    PreTest First
    PreTest Second
    PreTest constructor
    Test First
    Test Second
    Test Third
    Test Fourth
    Test constructor
    Test constructor(10)
    Test method

An instance initializer is executed during construction but after all other instance initializers and all (appropriate) constructors have finished.

I should point out that this "trick" is also great in unit tests for populating collections.

    ...
    something.addList(new ArrayList() {{ add("1"); add("2"); add("3"); }});
    ...
    foo.addMap(new HashMap() {{ put("1", "one"); put("2", "two"); }});
    ...
i = i++

A question was recently posted on the CJUG forum with regards to the following code:

    int i;
    i = 1;
    i = i++;
    System.out.println("i: " + i);

The poster wanted to know why in Java the result was 1 whereas in C / C++ the result was 2.

If you're like me, the first thing that popped in your head was: This person does not have a clear understanding of the postfix increment operator does. Let's tell him to go suck an egg and get a clue. But then I remembered back to the days when I didn't know what I was doing either and all of the kind and gentle people on usenet that steered me onto the path of knowledge.

In case you're having trouble getting over that hurdle, you can look at the problem as follows:

int[] a = { 0, 1, 2 };
int i;
i = 1;
a[i] = i++;
System.out.println("a[0]=" + a[0] + ", a[1]=" + a[1] + ", a[2]=" + a[2]);

which is slightly more palatable and results in the following:

a[0]=0, a[1]=1, a[2]=2

I'll spare the long winded answer to the reason why Java returns what it does and refer you to a Java forum posting. (Though if you want me to ramble on about it, just ask!) That takes care of the Java part, but what about C / C++? Well, if you didn't get all soft and squishy developing in Java all these years, you'll remember that the order of evaluation of operands of individual operators and the order in which side effects take place is unspecified in C / C++. You'll also start remembering about sequence points and all of that but before you begin to spasm uncontrollably, you'll remember that you've left all that behind you now. At the end of the day, the fact that the particular C / C++ compiler, runtime, etc resulted in 2 is simply luck of the draw. The expression is undefined. To quote Dale King:

...in C that statment might assign 0, 1, 42, -1 or any other value to i. It might crash your machine, erase your hard drive, or cause your computer to melt down. All would be acceptable results of executing that statement, since its behavior is undefined.

There is also Steve Summit's famous response which provides links to the C FAQ for more information.

JBoss and autonomic computing

I spent some time a few days ago working with JBoss to determine if it would be a valid service oriented platform for autonomic computing. You can read my multiple JBoss JMX posts for more information. Until a more well defined service lifecycle exists ala JSR 77, the ability to autonomically manage a JBoss service is not possible.

I'm looking into OSGi, Avalon and Excalibur next. (If you're like me and forget how Avalon, Merlin, Excalibur, etc are related then refer to this.) Stay tuned for results.

Proxy

Given the vast amount of crap that we as programmers need to know these days (which is growing exponentially) I typically wrap unknowns into a black box and add them to my list to check out at a later time. java.lang.reflect.Proxy fell onto this list.

I typically associated a "magic" factor to anything that's in the core Java classes. Take for example how NIO's InterruptibleChannel interacts with Thread. I still want to know how Sun expects third parties to use the SPI to create other NIO implementations but that's another battle for another day.

I assumed incorrectly that Proxy had some magic tie-ins to the JVM that allowed it to masquerade as another class. Instead, Proxy actually goes the sane route. It generates a class (as a byte array) using reflection to inspect the specified interfaces. The class is then loaded using something very similar to ClassLoader.defineClass() (why it does not just use ClassLoader is not known but if there is one thing that I've learned over there years is that programmers love to keep secrets). By default, these generated files are not persisted to disk. You can set the system property sun.misc.ProxyGenerator.saveGeneratedFiles to true to save the files for examination (I do not know where they are saved to).

It's actually unfortunate that Proxy does not use JVM magic since then it might be possible to create a proxy to a class (rather than just interfaces) which would provide a truely useful generic proxy mechanism (facilitating AOP, for example).

More clients performance results

I received a number of equiries to get performance numbers with larger numbers of clients. Unfortunately, I am limited to five client machines and one server machine. To increase the number of clients communicating with the server, I had to have multiple clients per machine. From the previous tests, a hypothesis can be made that the clients are either CPU or I/O bound. Adding more clients to each machine is not going to produce interesting results. The is essentially what was seen.

There are a total of eight configurations (three with SSL and four without). To simplify analysis, each graph contains the results either from the three SSL servers or the four non-SSL. Three cases were chosen:

  • 1 client per machine (so that comparisons could be made back to previous results)
  • 5 clients per machine
  • 10 clients per machine

The same environment was used as in the previous tests.

The choices (besides one client per machine) was completely arbitrary. Numbers were chosen such that the tests would complete in a reasonsable amount of time.

Without SSL

5 Machines, 1 Client per Machine 5 Machines, 5 Clients per Machine
5 Machines, 10 Clients per Machine

With SSL

SSL, 5 Machines, 1 Client per Machine SSL, 5 Machines, 5 Clients per Machine
SSL, 5 Machines, 10 Clients per Machine

Analysis (see the other previous tests for more analysis):

  • As expected, being bound (either CPU or I/O) has not yielded interesting results.
  • For the non-SSL case with more than 1 client per machine, the performance for NIO, IO, Converted IO and Converted IO with Selector have effectively merged. I attribute this to "fill in the blanks". By increasing the number of clients per machine, any nearly-bounded resources were maxed out.
  • The SSL case is known to be CPU bound (as the encryption and decryption are being done in software). It too had a "fill in the blanks"-style result (i.e. it is asymptotically reaching it's maximum value per machine).

A few tests were made to determine if the clients were CPU bound or IO bound. It could be guessed from previous results that they were IO bound (given the signature of the SSL results). Futher testing has shown this to be the case (e.g. all client echo validation was removed). Since the clients are IO bound, adding more clients to each machine would show no greater throughput to the server which is precisely what was observed in these tests.

Link-back to main entry: NIO and SSL.

SSL performance results

In following with the previous tests, I performed a performance test of IO and Converted IO with SSL.

The testing environment is the same as the previous tests except that anonymous software SSL was enabled.

There are a total of three cases:

  • IO Server, IO Client
  • Converted IO Server, Converted IO Client
  • Converted IO w/ Selector Server, Converted IO Client

"IO" uses the standard Java IO (from the java.net). "Converted IO" is an NIO wrapper to InputStream and OutputStream. The server with "Converted IO" uses a separate thread per client. The server with "Converted IO w/ Selector" uses a single thread for all clients and switches between them using an NIO Selector.

SSL IO Server, IO Client SSL Converted IO Server, Converted IO Client
SSL Converted IO w/ Selector Server, Converted IO Client

Analysis (see the previous tests for more analysis):

  • Comparing with the non-SSL tests you can see that there is a significant (~50%) but expected loss in throughput.
  • As hoped, the difference between Converted IO and Converted IO using a Selector decreased dramatically (from ~33% to ~7%) due to the overhead of SSL.
  • Unfortunately, the difference between IO and Converted IO became more pronounced (from ~11% to ~25%). I do not have an explanation at this point and more investigation is needed.

A special thanks goes out to Carlo Segre for use of the cluster.

Link-back to main entry: NIO and SSL.

More NIO and IO performance results

I took the opportunity to create a standard IO client and server and performed some changes / optimizations on the Converted IO. The source is available at the usual place.

The testing environment is the same as the previous tests.

There are a total of four cases:

  • NIO Server, NIO Client
  • IO Server, IO Client
  • Converted IO Server, Converted IO Client
  • Converted IO w/ Selector Server, Converted IO Client

"NIO" means that the component was created using only NIO. "IO" uses the standard Java IO (from the java.net). "Converted IO" is an NIO wrapper to InputStream and OutputStream. The server with "Converted IO" uses a separate thread per client. The server with "Converted IO w/ Selector" uses a single thread for all clients and switches between them using an NIO Selector.

NIO Server, NIO Client IO Server, IO Client
Converted IO Server, Converted IO Client Converted IO w/ Selector Server, Converted IO Client

Analysis (see the previous tests for more analysis):

  • Standard IO performs slightly better (and with less variance) than NIO. This follows the standard claim that the use of a selector adds a bit of overhead (even more than that seen by using multiple threads). A future test should use many more clients to see if the overhead of a selector overcomes the overhead of context switching many threads.
  • The clean up of the Converted IO appears to have created a positive result. The difference between ~10.3 MB/s (NIO), ~11.1 MB/s (IO) and ~9.8 MB/s (Converted IO) (~4% and ~11%, Converted IO to NIO and Converted IO to IO, respectively) is much better than the previous difference of ~20%.
  • The Converted IO using a Selector has a similar trend as before: large variance in throughput and much lower throughput. More investigation is needed.

A special thanks goes out to Carlo Segre for use of the cluster.

Link-back to main entry: NIO and SSL.

NIO and Converted IO performance results

I finally had an opportunity to perform some performance testing on the source I made available.

A few notes about the testing environment:

  • It was not a closed environment. It is a cluster of machines running Linux (2.4.21) with MOSIX with a number of NFS mounted drives. There were a few other processes bouncing around the cluster at the time. This caused a number of dips in the throughput. All processes for this test were pinned to their respective machines.
  • All machines are AMD Athlon(tm) XP 2500+ with 512MB RAM
  • Sun JDK 1.4.2_05
  • The network is 100Mb but it is not known if it is switched (highly doubtful)
  • The echo server was started. The first connected to the server, followed by the second approximately seven second (arbitrary) later, followed by the third approximately seven seconds later, etc for all five clients.
  • Each client transferred 500MB to the server before disconnecting.
  • The data sent from the clients is random and in random sized batches of less than 4096 bytes.
  • All data received by the server is immediately written back to the client.
  • The clients wait for the server to echo back their batch of data and validate it before sending another.

As with most performance tests, the results must be interpreted correctly and cannot be taken at face value. You should not look at absolute values but instead you should look at relative values and trends. For "pure test" results the environment was not ideal but for a more "real world" feel for how applications behave, the environment was adequate.

There are a total of six cases:

  • NIO Server, NIO Client
  • NIO Server, Converted IO Client
  • Converted IO Server, NIO Client
  • Converted IO Server, Converted IO Client
  • Converted IO w/ Selector Server, NIO Client
  • Converted IO w/ Selector Server, Converted IO Client

"NIO" means that the component was created using only NIO. "Converted IO" is an NIO wrapper to InputStream and OutputStream. The server with "Converted IO" uses a separate thread per client. The server with "Converted IO w/ Selector" uses a single thread for all clients and switches between them using an NIO Selector.

All of the source for the clients and servers is available but the test harness is not available. It should be a trivial matter to create you own testing mechanism ideal for your environment.

Ideally, there should be a standard Java IO implementation as a control but unfortunately time is not on my side.

NIO Server, NIO Client NIO Server, Converted IO Client
Converted IO Server, NIO Client Converted IO Server, Converted IO Client
Converted IO w/ Selector Server, NIO Client Converted IO w/ Selector Server, Converted IO Client

Analysis:

  • The large dips in the graphs were caused by MOSIX and the processes that were running on the clusters.
  • The NIO server performed slightly better than the Converted IO server as expected since the Converted IO server has an additional delay associated with the pipe that converts from standard IO to non-blocking IO. The difference between ~7.5 MB/s (Converted IO) and ~9.5 MB/s (NIO) (~20% difference) is significant and further work needs to be performed to tune Converted IO.
  • The NIO client performed slightly better than the Converted IO client (for the same reasons as the server).
  • The selector-based server to multiplex multiple clients performed worse than using a separate thread per client. Based on other results found on the internet, this is a typical result. The overhead of the selector is not mitigated with only a few clients.
  • The fluctuation of the selector-based server was not expected. Further investigation is warranted.
  • It appears that client 3 was not behaving consistently as is seen by its lower throughput and longer times. It is understandable that client 3's trend is not seen in the "Converted IO Server, NIO Client" case considering the large number of times that MOSIX interfered.

A special thanks goes out to Carlo Segre for use of the cluster.

Link-back to main entry: NIO and SSL.

Are we doing it again?

Kris mentioned something that I hear all the time in regards to SOA's:

it's probably something you could implement using reliable JMS topics

(I'm pulling this slightly out of context but it's relevant regardless.) To developers that have kept their eyes open for at least part of the past few years, SOA's will be "yeah, so what?" or "I can just do / I am already doing that with [blah]". For example, an enterprise service bus (ESB) can be considered to be MOM (message oriented middleware).

Is "ESB" just another TLA (three letter acronym) that business people use to make themselves appear to be more intelligent? Maybe just a little. But really it's wrangling in a whole bunch of existing ideas (and some new ones like WS-*) and putting it under one umbrella. So if you get that deja-vu feeling you shouldn't feel uneasy.

Rather than spouting out more goop, just the links below to get you a feel for what's going on:

Pragmatic Programmers

I recently attended a CJUG talk given by Dave Thomas of The Pragmatic Programmers regarding decoupling code. This was a very well put together talk that was able to reach both novice and advanced developers. Based on this talk I am seriously considering purchasing some of the books that they publish.

Thank you Dave for an excellent talk.

Setting a flag in the case of an exception

There are a number of cases where something needs to be done only in the case where an exception is thrown (checked or not). A first pass on this would look like:

    ...
    // allow the user to do something.  If it fails for any reason 
    // the error flag must be set so that further operations are not
    // attempted.
    try
    {
        doSomething();
    } catch(final Throwable t)
    {
        // some exception has been thrown; set the error flag.
        error = true;

        // continue the exception
        throw t;
    }
    ...

The problem with this is that unless the method signature includes throws Throwable you're out of luck. To circumvent this, I do the following:

    ...
    // allow the user to do something.  If it fails for any reason 
    // the error flag must be set so that further operations are not
    // attempted.
    boolean exceptionThrown = true; // set to false -only- if successful
    try
    {
        doSomething();

        // no exceptions were thrown
        exceptionThrown = false;
    } finally
    {
        // if there was an exception thrown (exceptionThrown will have
        // been set to false if an exception was -not- thrown) then set
        // the error flag.
        if(exceptionThrown)
            error = true;
        /* else -- there was no exception thrown */
    }
    ...

Are there any better techniques out there or is this acceptable?

MBeans

Some quick notes on JMX and XML descriptors.

I don't see anything about standardizing the XML format which is very surprising. Personally, I think the XMBean looks the most palatable.

Currently there is only XDoclet support for XMBean. Modeler 1.1 mentions future XDoclet support. JMX is one of the few "Rob approved" XDoclet uses since it is not a "let's use a new technology everywhere it could possibly be applicable and more often than not, not applicable" case (we'll save that rant for another day).

Update (August 13th)

My XDoclet statements above may be a bit misleading. The JMX XDoclet task will write out standard JMX interfaces (which is very convenient). It will also write out XMBean and JBoss <servicefile>-service.xml files (along with a few other things). I don't want to give the impression that there is no XDoclet support for standard MBeans. And yes, I'm confusing XDoclet tag support with XDoclet Ant task support, but to me and the way I use them, they're completely coupled and without one, the other is uninteresting.

Input / OutputStream NIO wrapper to faciliate Java 1.4 SSL

As was alluded to in the main NIO and SSL entry, I have made convenience code available at:

    http://www.realityinteractive.com/software/oss/index.html

Refer to the release notes for information about what is been made available. If you have any comments or questions, just post a comment and I will respond as soon as possible.

It is not mentioned in the source or readme (I will rectify this shortly) that the intention for the conversion is specifically for long running clients. No thought has been given to "fast attack" clients or server (e.g. HTTP).

Performance results are available through the following links:

Link-back to main entry: NIO and SSL.

NIO CharsetDecoder

I am using a NIO CharsetDecoder to covert from bytes to chars in a UTF-8 environment. I received the following CoderResult error:

MALFORMED[1]

OK, that's helpful. After a little code splunking I determined that this means that the error is "malformed" (pretty obvious) and the length is "1" (not so obvious).

What's interesting about the CoderResult is that not only does it not fit any other paradigm used in the SDK but telling me the length of the erroneous input is, for all intents and purposes, useless. What would have been more helpful is to have included the position in the input buffer at which the malformed result occurred. Luckily CharsetDecoder.decode() advances the buffers as it reads so that you can use its current position as a guide (I should point out that this is mentioned in decode()'s javadoc).

Now I just need to determine why bytes that are supposedly UTF-8 have a value of -82. Uuugh!

Since my problem is clearly not on my end, I have added:

decoder.onMalformedInput(CodingErrorAction.REPLACE);

to circumvent the problem. This will use the CharsetDecoder's replacement value to replace any malformed characters.

AOP

There has been a lot of press around Aspect-Oriented Programming (AOP) and Software Development (AOSD). Every time I read an article such as this one the QA guy in me shudders uncontrollably. How can I possibly resolve the risk associated with AOP with the benefits that it is purported to provide? Also, given the inherent decoupled nature of AOP from the actual code (using, for example, deployment time AOP or byte-code based AOP), how can one effectively perform change managment?

Recently, I attended a JBoss discussion in hopes that it would quell some of my AOP concerns. Instead, the exact opposite occurred. Scott Stark managed to scare the bejesus out of me with transactions and protocol concerns being injected at deployment.

  • How in the world can I test and certify a single deployment of my application if significant and complex components are deployment specific?
  • Can I repackage this deployed application after testing and certification so that I'm guaranteed my clients will receive the same application?
  • How can debug a stack trace that I get back from a client?
  • How can I reproduce the client's environment in my test lab?

I know that these "advances" provided by AOP sound great to the trench developer (to which Mr. Stark was directing his discussion) that would normally have to struggle to create this functionality but there are clearly maintenance concerns with these approaches that have yet to be addressed.

Rickard Öberg voices some of my current concerns but unfortunately, like most developers, he limits it to "testing". Testing isn't the only concern; it's the full product life-cycle. I typically associate a 5 to 1 ratio of maintenance and debugging time to initial development time on any piece of complex code (where I will leave complex undefined here) throughout its lifecycle. If AOP is only addressing the "1" part of that ratio while increasing the "5" part then that's pretty crappy!

This thread (based on Rickard Öberg's blog entry) has some interesting insights. Do check other months for follow ups to the thread or related threads. [The AOSD links go down from time to time.]

People have spent a good deal of time claiming the programmatic benefits of AOP, but now it is time to start looking forward at debugging, maintaining, changing and growing AOP based code.

Contracts

I'm interested in an SOP (service oriented platform) for some of the work that I'm currently doing. It would make my life much easier if there was a container with which I could register my services that would take care of lifecycle concerns.

After doing a little research to see what's going on out there I started looking at JBoss's org.jboss.system.Service interface as well as their org.jboss.deployment.Deployer. This is the Service interface:

/**
 * The Service interface.
 */
public interface Service
{
   /**
    * create the service, do expensive operations etc 
    */
   void create() throws Exception;
   
   /**
    * start the service, create is already called
    */
   void start() throws Exception;
   
   /**
    * stop the service
    */
   void stop();
   
   /**
    * destroy the service, tear down 
    */
   void destroy();
}

(The above code is available under the LGPL.)

Do you notice anything missing from the above interface? What's the threading contract?!? Should start() start its own thread if necessary? Does the container provide a thread to start() so that it can manage the lifecycle better and ensure that a faulty start() would not block the entire infrastructure? Etc. After a few minutes of code splunking I discovered that it's just simply undefined (the assumption is that it's the first case).

I'll spare everyone the rant and I will just say: Please document the complete contract on important interfaces. When you write javadocs, ask yourself what would someone need to know that has never seen the code. Attempt to place yourself into their shoes and you will likely end up with more useful javadocs.

More NIO depression

In my persuit of a 1.4 NIO + SSL solution I had a momentary glimmer of hope in SSLServerSocket.getChannel(). This would allow me to registed an accept Selector to watch for connections and then I could use the SSL server socket to accept them. Unfortunately, the javadocs for getChannel() read:

A server socket will have a channel if, and only if, the channel 
itself was created via the ServerSocketChannel.open() method. 

This was confirmed with a trivial test. At first I thought that I was cut off at the knees. I now believe that I have been cut off at the torso.

I should mention that because of java.net.ServerSocket.accept():

IllegalBlockingModeException - if this socket has an associated 
channel, and the channel is in non-blocking mode.

I would have been screwed in any case but at least getting at the channel would have made me feel better.

Link-back to main entry: NIO and SSL.

Try - finally performance problems

This blog entry mentions serious performance concerns regarding try - finally blocks. I have yet to do any experimenting on my own but if this is true, then that sucks!

The angers of not having a common interface

It boils the blood that the pair javax.net.ssl.SSLSocket and javax.net.ssl.SSLServerSocket as well as the pair javax.net.ssl.SSLSocketFactory and javax.net.ssl.SSLServerSocketFactory do not have common interfaces. You have to have separate and completely identical code to configure each socket type as well as set the enabled cipher suites.

These interface-type defects and inconsistencies are common throughout the package hierarchy. I was hoping that the next major release of Java would make a concerted effort to clean these up but it looks like that's not going to happen. Phooey!

Note to self: file RFE on Java bug parade for these interfaces.

Link-back to main entry: NIO and SSL.

Don't forget to set the cipher suite!

I was attempting to use a vanilla SSL server and client socket (such as outlined in this article) but kept getting the dreaded:

javax.net.ssl.SSLException: No available certificate corresponds 
to the SSL cipher suites which are enabled.

The usual searches turned up a million posts about junk I already knew. The JSSE ref guide is great for people that already know what they're doing an is therefore self deprecating.

The long and short of it is that if you use a default SSLServerSocketFactory and create a socket then you must have an anonymous cipher suite installed. For example:

final SSLServerSocketFactory sslSocketFactory = 
    (SSLServerSocketFactory)SSLServerSocketFactory.getDefault();
final SSLServerSocket sslServerSocket = 
    (SSLServerSocket)sslSocketFactory.createServerSocket(port);

// use an anonymous cipher suite so that a KeyManager or TrustManager
// is not needed
// NOTE:  this assumes that the cipher suite is known.  A check -should-
//        be done first.
final String[] enabledCipherSuites = { "SSL_DH_anon_WITH_RC4_128_MD5" };
sslServerSocket.setEnabledCipherSuites(enabledCipherSuites);

A unless you do the same on the client side, you will receive the following:

javax.net.ssl.SSLHandshakeException: no cipher suites in common
javax.net.ssl.SSLHandshakeException: 
    Received fatal alert: handshake_failure

Link-back to main entry: NIO and SSL.

Debugging JSSE

To aid in debugging JSSE (J2SDK 1.4 and greater) use:

-Djavax.net.debug=all

The usefulness of this cannot be expressed in mere words.

Your Java security IQ

While looking for the current paradigms on storing passwords in Java I stumbled on this Security IQ Test. It's a bit thin but at least you can get a feel for if you know what's going on at a fundamental level. Perhaps the best part is the answers provided after you get your score.

This is also an interesting thread.

The question that I currently have is: what is the correct techique for obtaining passwords from a configuration file? Currently I store system passwords in an encrypted properties file. Do I have to read and decrypt the properties file each time I need the passwords? I don't think that just reading the passwords once on start makes sense (for the same reason that you use char[] over String for storing the password).

Array .clone() or System.arraycopy()?

I was doing some work this morning with passwords stored as char arrays when I reverted to my C upbringing and wrote the following:

final char[] passwordCopy = new char[password.length];
System.arraycopy(password, 0, passwordCopy, 0, password.length);

I stopped myself and said: Hey! Why am I doing that when arrays have a convenient .clone() method on them?!? I rewrote the code to be the following:

final char[] passwordCopy = (char[])password.clone();

The QA side of me really likes the latter approach as it has a much lower risk associated with it (i.e. there are fewer ways to make a mistake), but the performance side said Whoa! Let's take a look at performance first!

I was going to write up a quick test but the lazy side of me went to Google first. This page has a nice test and performance numbers. The shocking result is System.arraycopy() vs. a for loop. Based on a few JVM's I tried (all on win32) I get the following normalized results:

          .clone():  2.26
System.arraycopy():  1.27
        for-loop():  1.00
Shocking revelation

I was talking with another developer the other day and he revealed an interesting piece of information: he believed that comments and code style were a matter of personal choice. To me this was like believing that the world was flat and then having someone say that it's round. All of that time I spent perplexed wondering why I couldn't see things from long distances over a "flat" plane finally become crystal clear. Learning that developers may believe comments and code style are a matter of personal choice has allowed me to understand and put into perspective a number of other conversations that I have had with developers.

Hypothesis 1.1

Code comments and style are a function of quality.

This is currently my running hypothesis that I am attempting to prove through empirical evidence. My non-scientific research has shown it to be true. The difficulty in firmly establishing quantitative evidence for this hypothesis stems from the fact that, for example, diligently and effectively commenting code intrinsically changes ones approach to coding. In other words, you cannot separate out the processing of adding and maintaining comments without changing the nature of how one programs.

Self commenting code

In an attempt to dispel the "I don't need to comment my code since if the code is written clearly enough it should describe itself" theory, I present the following:

Definition 1.1

The purpose of code comments is to present intent.

Definition 1.2

A software defect is a deviation from intent. This definition does not make a distinction between implicit (i.e. expected but not defined in a requirement) and explicit (i.e. defined in a requirement) intent.

Theorem 1.1

Code is incapable of sufficiently presenting desired intent.

Proof   I will provide an indirect proof of Theorem 1.1 by assuming "code is capable of sufficiently presenting desired intent" and obtaining a contradition. Choose a section of code that contains defects. By Definition 1.2 this section of code does not correctly describe the intent. QED

Notice that Theorem 1.1 contains the word desired. This is necessary to distinguish between the intent that a section of code with defects presents and the intent that is required. Also notice that Theorem 1.1 contains the word sufficiently. Later entries will expound on this in more depth but for now it will suffice to say that code utilizing crafty programming may obfuscate intent.

I do acknowledge that for those who use the "I don't need to comment my code since if the code is written clearly enough it should describe itself" to mean "I'm too cool / talented / whatever to comment" or to cover for "I'm too lazy to comment" that my argument will have fallen on deaf ears. I'm getting to you next!

Pipe selector problems

As if the previous java.nio.channels.Pipe inconsistencies weren't enough, on both Linux and Windows it appears that you have to drain a pipe's source before you can write to the sink again. Again, this is a case that falls under the system-dependentness mentioned in Pipe's javadoc. Poopy I say!

Update

After I determined that the write selector was lying to me on Linux, and after pouring over Stevens' Advanced Programming in the UNIX Environment to refresh my memory on pipe() (which Linux uses), it turns out that the pipe does not require drain-then-fill. Again, the write selector is lying (which appears to be a known "issue" with Linux's select() on a pipe()). Windows remains drain-then-fill.

Update II

It seems that the results of drain-then-fill are dependent on how the sink is filled. If the sink is filled one byte at a time, then neither Linux nor Windows is drain-then-fill but Linux will still have an inaccurate write selector. If the sink is filled in 8k chunks, then Windows will exhibit a drain-then-fill requirement; Linux is never drain-then-fill.

Closing thoughts

Given that Linux's write selector is not accurate and always returns "none available" when there is data in the pipe (but will always return "go ahead" when the pipe is empty), it is nearly impossible to generically replace a file or network channel with a Pipe.SinkChannel. Rob angry good!

Link-back to main entry: NIO and SSL.

More Pipe inconsistencies

The number of bytes written to a java.nio.channels.Pipe's sink at a time will determine the apparent size of the pipe (i.e. the number of bytes that can be written before the pipe blocks). If one byte is written at a time, the size of the pipe is 33012 for Windows or 1 (yes, 1) for Linux. If two bytes are written at a time the sizes are 33012 and 2. Four bytes gives 33012 and 4. Seeing a trend?

Windows will eventually give you the known value of 32768. With Linux, you need to specify more bytes than the known buffer size (4096) to get the buffer size.

The code used is as follows:

// create a Pipe and retrieve its sink and source
// NOTE:  the sink and source are SelectableChannels
final Pipe pipe = Pipe.open();
final WritableByteChannel sink = pipe.sink();
final ReadableByteChannel source = pipe.source();

// set the sink to non-blocking and create and register a write 
// Selector on it.  The Selector is used to determine when the sink 
// is "full".
// NOTE:  the cast is required since there is no common super-type
//        for selectable + readable / writable
((SelectableChannel)sink).configureBlocking(false/*non-blocking*/);
final Selector writeSelector = Selector.open();
((SelectableChannel)sink).register(writeSelector, SelectionKey.OP_WRITE);

// continue to write to the sink until it is "full"
// NOTE:  the sanity upper-bound is used to ensure that, in a remote
//        case, a sink is not infinite
final ByteBuffer writeBuffer = ByteBuffer.allocate(BUFFER_SIZE);
for(int i=0; i<BUFFER_SIZE; i++)
    writeBuffer.put((byte)(i & 0xFF)); // arbitrary
writeBuffer.flip();
boolean isInfinite = true;  // set to false if limit found on write
int numberOfBytesWritten = 0;
for(int i=0; i<UPPER_BOUND/*sanity*/; i++)
{
    // ensure that data can be written
    // NOTE:  selectNow() is used so that it does not block
    if(writeSelector.selectNow() > 0)
    {
        // clear the selected keys (required)
        writeSelector.selectedKeys().clear();

        // write the data 
        // NOTE:  the actual data written is arbitrary
        numberOfBytesWritten += sink.write(writeBuffer);
        writeBuffer.rewind();
    } else
    {
        // the sink is full.  Flag that a limit was found and break 
        // out of loop.
        isInfinite = false;
        break;
    }
}

UPPER_BOUND is some large number (say 10000) and BUFFER_SIZE has values as described above.

And, no, changing selectNow() to something like select(1000L/*1s*/) doesn't matter (for those worried about concurrency problems).

I should mention that Pipes javadoc does state platform inconsistencies and that this case falls under that unbrella. It's one thing to read a javadoc. It's another to actually see those inconsistencies first hand.

An interesting Linux tidbit: if the following code is added after the code listed above with a BUFFER_SIZE less than 4096 then numberOfBytesWritten will be non-zero.

// attempt to write more to the sink even though we shouldn't be 
// able to
writeBuffer.limit(1/*writes one byte*/);
numberOfBytesWritten = sink.write(writeBuffer);
writeBuffer.rewind();

It seems that the Linux write selector is lying to us. This is not the case on Windows.

Link-back to main entry: NIO and SSL.

Pipe inconsistencies

Here's some java.nio.channels.Pipe triva questions for you:

Q:What happens when you close the sink of a pipe?

For all those that answered "The source returns -1 on read indicating an end-of-stream", you get a gold star! Now on to a toughie.

Q:What happens when you close the source of a pipe?

Belt it out! "Writing to the sink will throw java.io.IOException." Well, you're half right. And what does half right mean? Yup. You're half wrong. The correct answer is "It depends on the platform."

  • Linux: any write will cause .java.io.IOException: Broken pipe to be thrown.
  • Windows: any number of writes can be performed before java.io.IOException: An established connection was aborted by the software in your host machine is receveied (WinSock error 10053).
  • Other: as of yet untested

The concern is the windows case where a write (or multiple writes) can be performed successfully.

Why isn't there a close() on Pipe itself? And why doesn't closing either the sink or the source automatically and synchronously close the other? I just don't know. Part of Sun's grand scheme to render its developers into blubbering masses of goo? That's just anger talking again. I'll just chalk it up to lack of foresight that has caused such things as the SSL + NIO problem.

Expect to see a defect report / RFE in the bug parade on this topic.

A big hearty thanks goes out to Igor for doing the Linux testing!

Link-back to main entry: NIO and SSL.

Momentary elation: Channels

If you've been following my lamenting over NIO and SSL then you can probably guess that I've made it to step 5 (acceptance). I had a moment of elation this morning when I found another one of those obscure NIO classes: Channels. Channels does various conversions between traditional IO streams and NIO channels. In theory I could take SSLSocket's getInputStream() and get a ReadableByteChannel. One problem: ReadableByteChannel is not selectable. Oh well, back to the drawing board.

First grieveing and now momentary elation followed by good swift kick in the gut. Doesn't Sun care about the unstable mental state all of this has left me in?!?

Side note: isn't it annoying that there's no interface that describes a selectable, readable / writeable channel? In other words, there's no common way to describe a Pipe.SourceChannel and SocketChannel. Ppfth!

NIO and SSL

I have worked with NIO quite a bit in the past. It has a high activation energy but once you're over that initial hump, it's pretty smooth sailing. I find it difficult not to write non-blocking IO these days.

I recently wrapped up a client / server prototype and I am just beginning to get it ready for a "real world" test. The first thing that I thought of was SSL. So like all good programmers, I brought up Google and typed "NIO SSL". Much to my chagrin I find that it is not possible to combine NIO, Selectors and SSL. My first thought was "This must be from the initial 1.4 release. There's no way that in three years Sun would let NIO rot without SSL.", so I continued my search.

To make a painful story short, there is no information regarding SSL ever being a possibility with NIO in 1.4. 1.5 will introduce an SSLEngine to solve the problem, but again, nothing is said if this will be made available for 1.4 users.

For those in the same boat as I am, there are solutions for using Selectors with SSL such as wrapping a standard stream with a Pipe. The problem with any wrapped solution is that the connection (which is done with a standard socket) is blocking. Non-blocking connections are one quarter of the problem that you're typically trying to solve with NIO (the other three being read, write and accept).

I'll spare you the Sun rant but let's just say that I'm less that impressed with their decisions to not provide SSL with NIO and to, for all intents and purposes, cover it up. When you read the 1.4 datasheet about NIO and then about JSSE, you get the impression that all is just sunshine, rainbows and lollipops. How can one think that it's acceptible to provide developers with the ability to "write ultra-scalable, high-performance server applications" without parity with existing sockets? And then, in 3 years, not make up for the discrepancy?

If you're into conspiracy theories, what do you think about the missing RFE for SSL + NIO? My tin foil hat has been firmly placed on my head!

Follow up:

I've been doing a lot of poking around to see if there are freeware implementations of JSSE that support NIO. There aren't. I did find this interesting link. Given all of my ramblings about features vs. quality, if Sun didn't ship SSL with NIO due to quality risks then I can buy that. If Sun hasn't shipped an updated JSSE for NIO due to pervasive changes required then I can buy that too. The length of time between releases is just hard to swallow.

As you may be able to tell, I have moved onto phase three of the Kubler-Ross 5 stages of grief. The initial entry was written while firmly in phase two. I fully expect to be at phase five by mid-day tomorrow and I will begin to find an acceptable solution to my current problems.

Related Entries

Minor ZipFile gotcha

java.util.zip.ZipFile and its subclass java.util.jar.JarFile has a minor gotcha when using getEntry(String); leading slashes are not ignored. For example:

    final JarFile jarFile = new JarFile("rt.jar");
    System.out.println(jarFile.getEntry("/java/lang/Object.class"));
    System.out.println(jarFile.getEntry("java/lang/Object.class"));

will return:

    null
    java/lang/Object.class

In general, this is not a big deal. But when manually parsing URLs (don't ask) such as:

    jar:file://rt.jar!/java/lang/Object.class

it can bite you in the butt.

J2SE 1.5 mother lode

While doing my standard early morning web-walk I stumbled on a mother lode of J2SE 1.5 information. JDiff isn't necessarily 1.5 specific, but it allows you to see all changes that occurred in the API. JDiff is one of those thing you wish you stumbled on years ago. While perusing the diff on java.lang.Thread I noticed that setUncaughtExceptionHandler() and setDefaultUncaughtExceptionHandler() have been added. If you've ever used java.lang.ThreadGroups uncaughtException() then this addition to Thread will be old hat. Regardless, this is a welcomed change. Javalobby has a decent write up on these two new methods.

Does web programming make for lazy developers?

Given the plethera of "enabling technologies" such as J2EE, does web programming (specifically, tier two -- business logic) make for a lazy developer?

In the recent past, I was prototyping a web application using Spring, Struts, and a few other technologies sprinkled in for good measure. After a few weeks of stateless whos-its and whats-its, injecting transaction doo-dads, and so on, I moved on to a project involving NIO, wire protocols, and high degrees of concurrency. Getting back into the swing of worrying about multi-threaded issues, object creation weight, and the like was not a trivial excercise.

Let me stress that I'm not referring to API nuances. I'm speaking to the vastly different sets of skills that need to be employed. I felt that a much larger degree of care and awareness was needed when dealing with "systems programming". The web technologies on the other hand made me feel less concerned: "JTA will handle that for me so I don't need to worry."

Don't get me wrong, JTA, JMS, JNDI, etc are wonderful things that eliminate much of the tedium and start-from-scratch'ness that allows projects to get done are the current break-neck pace. (I admit that I am making the overgeneralization that enabling technologies and web development are synonymous.) But does all of this "simplification" provided by enabling technologies allow developers to go lax?

... or has all of the hype and marketing surrounding these enabling technologies simply obscured the diligence required?

Limit the scope of try-catch blocks

How many times have you seen the following?

public void myMethod(...)
    throws ...
{
    try {
        ... entire method is here ...
    } catch(SomeException se) {
        ....
    }
}

Consider when entire method is here is more than a dozen lines or so with a number of statements that throw SomeException lumped:

Limiting the scope of try-catch blocks forces the developer to think about each exception that can be thrown and how to appropriately handle them. I immediately associate a higher risk (as in QA) with a section of code that is in a single large try-catch block. It is likely that there are statements in that block that throw the caught exception that were never considered. This is a similar case as presented in That nasty java.io.IOException where unexpected exceptions bubble out due to a throws clause.

A common case to watch out for is one where a try-catch block surrounds a loop or vice-versa. The tendancy with a large try-catch block is to miss cases where the exception, if properly caught in a smaller try-catch block, would continue rather than break or vice-versa.

There are cases where a large try-catch block makes sense. Through comments it is a simple matter to indicate the reason why such a choice was made (see Code Comments for more information) thereby reducing the risk associated with the block.

That nasty java.io.IOException

I constantly run across code that looks like:

public class FileReader {
    ...
    /**
     * <p>Reads the file with the specified name and returns the 
     * contents in a {@link java.nio.ByteBuffer buffer}.</p>
     * 
     * @param  filename the name of the file to read
     * @return an allocated (not direct) <code>ByteBuffer</code> 
     *         with the contents of the file
     * @throws IOException if an I/O error occurs
     */
    public ByteBuffer readFile(final String filename)
        throws IOException
    {
        ...
    }
    ...
}

What's wrong with that? you're probably asking yourself. It's even got comments! The title of this entry should give you a little clue. I will spare you the rant and soap box about the proper use of exceptions and attempt to appeal to your common sense: If you, as the developer of the function, couldn't handle or recover from the IOException, what makes you believe that someone calling the function (someone that has little notion of what's actually going on in the function) can do something with it?

The interface or contract that you expose should not break encapsulation. The fact that you (as the developer) have I/O issues to deal with doesn't need to be exposed out to the user. What the user cares about is: did the function succeed or not, and if not, are there cases that they can possibly recover from. A more sane interface might look like the following:

public class FileReader {
    ...
    /**
     * <p>Reads the file with the specified name and returns the 
     * contents in a {@link java.nio.ByteBuffer buffer}.</p>
     * 
     * @param  filename the name of the file to read
     * @return an allocated (not direct) <code>ByteBuffer</code> 
     *         with the contents of the file
     * @throws NoSuchFileException if there is no file with the specified
     *         name
     * @throws ReadFailedException if there was any unrecoverable
     *         problem while reading the file
     */
    public ByteBuffer readFile(final String filename)
        throws NoSuchFileException, ReadFailedException
    {
        ...
    }
    ...
}

This interface throws two exceptions: NoSuchFileException and ReadFailedException. The former is "recoverable" by the user in that they may have specified the filename incorrectly and they can do something about that. The latter tells the user that it just didn't work and there's nothing that they can do about it.

What's more is that by not throwing IOException the developer is forced to look at each case explicitly and determine how each one should be handled. How many times has a handlable IOException bubbled out of an interface unexpectedly due to the throws clause? Personally, I will leave off the throws clause (or comment it out) while I'm developing a function to ensure that I'm not letting anything slip by. In the later Eclipse milestones you can now select an exception in the throws clause and it will mark each occurrance in the function that throws that exception. This is invaluable.

The next time that you are developing an interface, think about how a user will use that interface. Get into their shoes and think about their concerns. And most importantly, make sure that you're not breaking your own encapsulation.

Perforce and change lists

If you use Perforce I find it best to start off a new change with "New Changelist" and a rough outline of what I intend to do. This is a nice way of informing others (especially in a decoupled work environment) what you are going to be working on. As I begin to make changes I will "Edit spec" to keep the change list description up to date. This ensures that not only will others be aware of what I am doing but I wont run into the dreaded situation where I don't actaually remember all of the changes that I made.

Don't forget to add the added, updated, or deleted files to this changelist as you go.

ByteBuffers, String, and C

I end up doing a lot of marshalling between Java and C over the wire. ByteBuffers are a natural fit for this situation given ByteBuffer.order(ByteOrder) and NIO's selectors.

The problem comes in when dealing with Strings.

There's no ByteBuffer.put(String) but that's OK because there's CharBuffer.put(String). But wait! In Java a char is two bytes. So CharBuffer.put(String) on "bollocks" will return:

0062006f 006c006c 006f0063 006b0073  .b.o.l.l.o.c.k.s

This is all fine and dandy if you're going to another Java application (or something that's commonly double-byte) but when going to vanilla C you're looking for single byte characters.

Your next bet is to try:

final String string = "bollocks";
final ByteBuffer buffer = ByteBuffer.allocateDirect(string.length());
buffer.put(string.getBytes());

This is fine and dandy for most applications. (It should be noted that the default character set is used in the transformation and that unless this code is used in a controlled environment, you may end up getting BufferOverflowException. So it's better to do:

final String string = "bollocks";
final byte[] stringBytes = string.getBytes();
final ByteBuffer buffer = ByteBuffer.allocateDirect(stringBytes.length);
buffer.put(stringBytes);

Or even better yet, explicitly put the charset in String.getBytes(String charsetName).)

So what am I complaining about? Everything seems fine. That's true up to this point. But what if you need to chunk up the string? CharBuffer provides CharBuffer.put(String src, int start, int end) which is ideal except for that problem of double-byte chars. What you actually end up doing is String.getBytes() and then walking over the resulting byte array. This may seem all fine and dandy except for the fact that the whole reason for doing the chunking in the first place is that the string is very large. Using String.getBytes() will cost you about three times the memory (the original string, the string as a byte array and the ByteBuffer into which you are writing).

If you're NIO Charset savvy then you may have said to do:

final String string = "bollocks";
final Charset charset = Charset.forName("UTF-8");
final ByteBuffer buffer = charset.encode(string);

This kills lots of birds with a single stone and is very tight code. ("UTF-8" must be supported by Charset so there's no need to check.) The parallel code for chunking is similar:

final String string = "bollocks";
final Charset charset = Charset.forName("UTF-8");
final CharBuffer charBuffer = CharBuffer.wrap(string, 0, 3);
final ByteBuffer buffer = charset.encode(charBuffer);

(where the loop over the remaining chars is not shown). Again, this is nice code that solves the problem. So what am I still complaining about? Well, it's better on the memory consumption but, even though I know the size of my chunking and can allocate a ByteBuffer of this size, I have to allow it to allocate the buffer for me.

If really know your java.nio.charset you would suggest:

final String string = "bollocks";
final Charset charset = Charset.forName("UTF-8");
final CharsetEncoder encoder = charset.newEncoder();
final CharBuffer charBuffer = CharBuffer.wrap(string, 0, 3);
final ByteBuffer buffer = ByteBuffer.allocateDirect(3);
final CoderResult encodingResult = encoder.encode(charBuffer, buffer, true/*no more input*/);

This is an "elegant" solution that allows for reuse of the ByteBuffer and fits the bill almost exactly! There is the extra CharBuffer in there that has to suck up space but at least it's limited in size.

When IDE's are too smart for their own good!

I was cleaning up some JavaDocs yesterday in a large, multi-project code base. The process was getting tedious so I enlisted the help of my old friend GSR (global search and replace). Since I wanted to update java files, text documents and package HTML files, I opted to use * as my wildcard rather than, say, *.java, *.html, *.txt. Boy what a mistake that was.

Everything was going well but then my IDE (Eclipse in this case) starting throwing a fit. I was getting AST creation errors all over the place and it seemed as though the world was caving in. I attempted the old tried-and-true technique of software; I restarted the IDE. No go. Same errors.

I was near the point of panic when I took a look at the IDE's log. The first thing I see is java.lang.InternalError and lots of them. Oh crap! I don't even know what that is, but anything that starts with java.lang and ends with Error isn't good. I scroll to the end of the exception line and see: invalid LOC header (bad signature). Egad!

To make a painful story short, it turns out that the GSR was doing replacement in JARs as well as text files. This was corrupting the header and java.util.zip.ZipFile was throwing exceptions to no end. After replacing the JAR files with fresh clean ones all was well.

I want to hand it to the Eclipse people for making the IDE tolerant to the stupidity of the average Joe out there doing his best to muck things up. Sure, I got errors up the wazoo but had I taken a moment to look at what they were really telling me I would have figured out the problem instantly.

Perhaps this should have been titled "When programmers are too smart for their own good!".

Time dependent bugs

I just about pulled my hair out over the weekend on a bug that was time dependent. The code roughly looked like:

final int index = (int)(System.currentTimeMillis() / intervalPerFrame) %
                   numberOfFrames;
final Frame frame = frames[index];

This is perfectly legimate code and ran just fine a few months ago but was now throwing ArrayIndexOutOfBoundsException since the index was getting set to -2.

So how could a series of positive values return a negative number? Well, it just so happens that on Saturday, January 10th at 7:37:04AM 2004 the time in milliseconds goes from 1073741823999L to 1073741824000L which just so happens to correspond to 2147483647 and -2147483648 respectively when cast to an integer.

Lesson learned: be much more careful casting long to int when dealing with times. Also, when your test manager says that he's going to perform "date testing" don't balk since Y2K's already over. There are a lot more issues in dealing with time than just two digit dates.

QA vs. Testing

I constantly hear developers calling testing QA. "Send the build to QA". Based on ANSI/IEEE standards:

  • Testing: The process of executing a system with the intent of finding defects including test planning prior to the execution of the test cases.
  • Quality Control: A set of activities designed to evaluate a developed working product.
  • Quality Assurance: A set of activities designed to ensure that the development and/or maintenance process is adequate to ensure a system will meet its objectives.

The key difference to remember is that QA is interested in the process whereas testing and quality control are interested in the product. Having a testing component in your development process demonstrates a higher degree of quality (as in QA).


Testing links


Testing / QA FAQs


Test Interview Questions

General interview tips

For those of you who also want the answers to these questions I offer you the following advice: if you spend the time to look up the answers yourself then it is much more likely that you will have a greater understanding of the answer and you will be more confident when talking with the interviewer or when taking the test.


I cannot stress enough to everyone to spend some time looking for answers through Google before posting your questions here. I enjoy answering the occasional difficult or obtuse question, but when I'm swamped with a hundred questions whose answers are easily found via Google then it's hard to become motivated.

For example, someone asked: "please explain how to test a web application with winrunner or with any other testing tool". If I go to Google and type in "testing web application", I find Downloadable Reference Library Testing Web Applications which has more information than I know what do to with.


This page has links to a Winrunner 7.0 tutorial, users guide, and TSL (Test Script Language) reference. It should be very helpful for those of you that are interested in learning this tool. A WinRunner FAQ is located here.

Which is better?

Which is a better choice?

  1. Use a technology that decreases the ability to introduce bugs (e.g. restrict the domain) but takes more time to use and maintain.
  2. Use a more raw technology where more care is needed to ensure that bugs are not introduced but is much easier to perform changes.

There is no clear-cut answer to this: "it depends".

Let's throw out numbers to attempt to make sense of this. It takes 5 man days to implement a change using technology number one that has an average bug rate of 1 bug per week. It takes 0.2 man days to implement a change using technology number two but it has a bug rate of 10 bugs per week. This means that technology #2 is 10x more error prone but takes 25x less effort to use it. From this, it appears that it is better to use the more bug-prone technology than it would be to use the restrictive technology.

This is obviously a contrived case. The gedankenexperiment behind all of this is to determine when one should introduce a technology into a project in order to reduce risk (in this case, bugs). If a technology has high costs associated with its use (e.g. time, training, personnel) then it may not reduce the overall project risk.

Bottom line: Understand all risks associated with a new technology and appropriately factor those into the over-all risk of the project. If the risk increases, it may not be worth while to use technology. If the risk decreases, then the technology will likely provide the desired returns. If there is no appreciable change in risk, then look at other factors such as long-term benefits, project duration, and cost.

Sounds like ...

I have been doing to research lately into natural language processing (NLP) and information extraction (IE) when I stumbled on The Double Metaphone Search Algorithm and phonetic distance. This is a good starting point for reference information.

NAP's and WAD's

I was just working with some of the testing folks and they were talking about NAPs and WADs. I had never heard these TLAs (three letter acronyms) before so I had to look them up:

  • NAP: Not A Problem
  • WAD: Works As Designed

You learn something new every day.

Do development skills scale?

A colleague of mine and I were chatting around the proverbial water cooler this morning when we ventured onto the topic of developer skills. Do developer skills scale with increasing complexity and project size?

Personally, I have witnessed excellent small project programmers completely fall apart on large projects. I have also seen programmers get barreled over by a complex software suite. Is this a marking of cognitive ability, a lack of developed or necessary skills, or do the possessed skills simply not scale?

What do you think?

Code Comments

How many time have I heard a programmer say: "If the code is written well enough, there's no need for comments". This statement could not be farther from the truth. The code, sans comments, obviously defines the what and how: what does the code do and how does it do it. But what is missing from this is the why: under what conditions and what assumptions were made.

Comment Types and Categories

Before discussing commenting practices let's break down the various types of comments. Some of these are specific to Java and javadoc but it should be an easy exercise for the reader to extrapolate to other langauges.

  • File comment: a comment at the top (or just below the package statement) of each file that typically contains the name of the file, the date it was created and any copyright or license information for that file.
  • Type comment: a comment for the type or class that contains the purpose and functionality as well as any global information about the class.
  • Member (method or field) comment: a comment for each field and method of a class. All member comments will contain a description of its use. For methods, expected and allowed argument values as well as any expcetions that are thrown are listed. For fields, value ranges and expected or not allowed values are listed.
  • Block comment: a comment before an algorithm or section of code elaborating on its use. A block comment may contain only a single line. (This definition varies from what is traditionally used.)
  • Inline comment: a comment contained on the same line as a program statement typically used to describe the need for or use of that statement.
  • Exclusion comment: a comment delineating a section of code that is not to be executed. (This is typically called "commented out code".)

These comment types can be divided into categories:

  • For using the code / API: These comments assume that the code will not be present. Type and member comments are in this category and are typically contained in javadoc-style comments.
  • For maintainers of the code: The code is present but a greater understanding of the code is necessary. Block and inline comments are for maintainers of the code.
  • For Housekeeping: File and exclusion comments and portions of the type comments contain information that is used purely as meta-data for the code.

Commenting Guildelines

  • Commenting is not an after thought. Commenting should occur before each code block is written. All functionality, potential limits and problems should be completely described before the code is written. There should be no surprises when you actually begin coding a block. All functionality should be understood and described in the comment before the code is written.
  • Comments are in part for you, but are most commonly used by others. Ensure that the content and style are such that someone who has never seen the code before understands the "why"s of the code. Have some compassion for the poor clod that has to maintain the code ... because that poor clod may be you in tweleve months!
  • Work from the outside in. When creating a new class, begin by explaining the function of the class and why it is needed (and, for example, why another class wasn't used in its place). For each member field and method added, descibe its function clearly and concisely as well as its reason for being. Given that javadoc comments are used to produce API documentation, limit the explaination of the internal functioning of the method in the comment. Having such verbosity tends to confuse readers who simply want to use the function. Save the verbose description for inside the method itself.
  • Comments should be written as if you are new to the code. Do not assume anything. Attempt to reference the reader to as much pertinent information as possible.
  • It is more important to explain the "why"s rather than the "what"s or "how"s Most often, the "what"s and "how"s can be determined from the code itself, but the "why"'s can never be derived. Each decision that is made and the reasons for it, should be explained clearly and concisely in comments. Refer to the entry What about the "Why"? for more information.
  • Comments should be clear and concise. You're not writing War and Peace and you're not going to win a Pulitzer for your work.
  • Try to limit replication of comments, but also remember that you do not know where and how another developer will begin looking at the code. Use "see <other>" to inform the developer where to go for related information or use roll-up documentation such as the package.html file.
  • Keep the language to 3rd person as much as possible. Other developers, many times removed, will not know who "I" or "we" are. If "I" is necessary for describing a "why", include your initials to give some context and make sure that you have been added to the author list of the file. For example, "(RG)".
  • Always keep comments up to date. If a comment described undesired functionality and that functionality is reparied, leaving the comment in place will cause loss of productivity while another developer attempts to understand the "problem" and fix it. Always check the code to ensure that any changes are reflected in the comments. It does not need to be commented that the problem was fixed (that is what version control is for). Simply remove comments that no longer apply and replace with comments describing the new or changed functionality.
  • Perform a code walk-through of your own code before you submit it. You should be able to read the comments and understand completely what (and why!) the code is doing without ever reading a single line of code. Then, ensure that the code matches the comments exactly.
  • Exclusion comments must have an accompanying explanation and that comment must be added at the time when the exclusion is made. The number of times that a section of code is commented out for debugging purposes only to be checked in in that state is staggering. Unless there is a comment associated with the exclusion, there is no way to distinguish if that code is broken, unneeded, an example, or removed for debugging.
  • Before each logical block of code, there should exist a block comment completely describing the code section. Inline comments are used to describe the action or reason for a particular line. For example:

    int size = array.length;        // purely a convenience variable
    
  • Clearly state expected, allowed and disallowed values as well as scope for fields.

    /**
     * The user of the application.  This value is only allowed to be null
     * between construction and when the value is set (via 
     * {@link #setUser(User)}).  If null when used an exception should 
     * be thrown.  This value may only be set once and if reset an 
     * exception should be thrown.
     */
    private User user;
    
  • Comment all non-explicit else or default statements. Because a significant number of bugs are the result of not understanding the implication of an else or default, the extra time spent thinking though them and commenting them are time well spent. For more information, refer to the entry: Coding Defensively.
  • Comments are not editorials. Be professional and refrain from writing editorials.

Comment Notifiers

It is common to see XXX or FIXME in code. But there are more notifiers that can be used:

  • NOTE describes a situation which is not obvious from reading the code directly but is desired behavior or describes assumptions or constraints that are not explicit.

    Ex: "NOTE: this will not work with values less than zero."
    Ex: "NOTE: this function was added as a work around for the bug found "
  • FIXME denotes code that simply does not work. Describe the symptoms or whatis broken.

    Ex: "FIXME: values greater that 10 cause this function to fail."

  • TODO describes functionality that still needs to be added or describes code that performs a function, but work still needs to be done.

    Ex: "TODO: multi-auth is not implemented"

  • CHECK describes functionality that at initial overview does not appear to be correct but does perform the necessary fuction.

    Ex: "CHECK: can 'j' be less than zero here?"

  • BUG followed by an identifier marks a section of code that is in place or has been fixed because of a bug report.

    Ex: "BUG #8452: the value much be checked for null before using"

  • REQ followed by an identifier is a reference to the requirements documentation.

    Ex: "REQ 5.7.1a: non-logged in users are not allowed"

  • PERF is essentially a NOTE but is specific to a performance optimization. For all intents and purposes NOTE can be used in its place.

  • Ex: "PERF: ArrayList is explicitly used (rather than List) to minimize the overhead of polymorphism"

At times the modifiers can be ambiguous; a CHECK may represent a FIXME that needs to be TODO'ed and NOTE'ed. It is always better to use a more servere marker such as FIXME rather than CHECK for tracking purposes.

What about the "Why"?

I'm constantly at odds with my developers over the importance of documenting why a piece of code does what it does. Having been in the code maintenance business for a long time, I have learned the hard way that a particular implementation is only valid for a particular set of conditions. Unless those conditions are well documented, there is no way to effectively determine if the code is valid in another (perhaps the same since there is no way to know) situation.

Some examples of questions that should have documented "why"s:

  • What types of processes are expected to communicate with the code and by what means (theaded / non, etc)?
  • What conditions are expected to never / always occur?
  • Is a call expected to block?
  • Is there an expected and required order to a particular set of calls?
  • Do items automagically maintain themselves (e.g. will a map shrink as entries are removed)?
  • Can an item be reused w/o reconstruction and what are the constraints to reuse?
  • Does an item expect to be reused in a different set of conditions?

Pulling an example from my own code:

"There are certain optimizations that have been made in the writer based on the fact that the send timeout is a constant and is based on the time at which a message is added to the queue (i.e. the queue will contain monotonically increasing timeout values). This implies that until the currently active message's (the message currently being written) timeout occurs, no other message in the queue needs to be checked."

As time went on, it was determined that there would be messages that never timed out. This means that the constraint that the timeout values are monotonicly increasing was no longer valid and therefore the implementation was no longer valid. Only by specifying the conditions under which the code was written (assumptions that were made) was it known that the implementation needed to be changed.

It is common for the conditions under which an implementation is written to be defined in other systems or documents such as requirements or the bug tracking system. Unless the conditions are presented either within the code itself or the same directories as the code the correlation is lost. Also, the implementation typically has its own specific set of conditions that would not be found in requirements.

There is little actual overhead in serializing these conditions as, by definition, they are all known at developement time. In other words, the conditions are all known, they simply must be written out. Once a suitable convention has been established for this documentation and the developers overcome the initial inertia of performing this task, it becomes very natural. Any minimal time lost over the process of typing is overshadowed by the extra level of communication that it provides.

Smoke Test

My wife asked me the other day why is it called a "smoke test". I honestly didn't know. Here's what Jargon has to say about it:

  1. A rudimentary form of testing applied to electronic equipment following repair or reconfiguration, in which power is applied and the tester checks for sparks, smoke, or other dramatic signs of fundamental failure.
  2. By extension, the first run of a piece of software after construction or a critical change.
Web Applications

I have been involed in architecting and writing web applications as long as there has been a "web". Recently, I have been doing due dilligence on web architectures. Most architectures recognize the value in the Model 2 (or MVC) approach in their design. But is this this sufficient?

This is a work in progress so excuse the mess and please check back for updates.

Intended audience

This article is geared towards enterprise web applications. An enterprise web application in the context of this article consists of the following:

  • An application backed by some well defined business process.
  • At least one developer per tier (JSP, Servlet and business process) with the ability to easily scale to multiple developers per tier without resource contention.
  • There is a well defined development and release cycle. That is, development is not ad-hoc.

If your application does not fall under the above constraints then the concepts defined herein may not apply. For example, introducing Model 2 into an environment where there is only one developer may kill productivity due to the overhead associated with the multiple layers.

Starting points

There are just as many starting points as there are web frameworks. Below is an attempt to enumerate a few of the initial conditions for a web-enabled application.

  • Scratch. Nothing exists except a set of requirements.
  • A business process exists that is exposed via a well-defined API. This API may or may not be tooled for the pecularities of a web application.
  • An exsiting application that is to be web-enabled. Depending on the architecture of the application, this may fall into the case above. In a worst-case scenario the application is tightly coupled to a presentation mechanism (such as a monolithic VB app).
What's going on?

I was originally going to do a full write-up on the request / response, MVC, and the like but after re-reading Designing Enterprise Applications with the J2EE(TM) Platform, Second Edition and MVC Detailed it would be significantly redundant.

I will be updating this entry with more information using the above link as a reference.

Coding Defensively

I have been involved in a code review for the past few days. Time and time again I have come across code that fits into the "if you know something will never happen, it most certainly will" category of development. Take a look at this example:

List users = session.find("select u from User u where u.loginName = ?", ... );
if(users.size() > 0)  {
    ...
    return true;
}
return false;

This probably looks like 99% of the code out there. The problem is that you're only concerned with the case where the size is equal to one. The case where the size is greater than one is undefined.

I know, you're thinking to yourself: "But that will never happen since I have unique constraints on my primary keys. The entry app will puke when it attempts to enter more than one row." Never say never. A few years ago I was working on an application with the same constraints. In order to speed up and allow for an ETL operation that the DBA was doing, he disabled the all of the constraints and forgot to re-enable them. Rather than having logging in place that would have caught this error immediately, a few weeks went by without anyone noticing. Needless to say, it took a few weeks to clean up the resulting mess. Oh, did I forget to mention that this was a production database?

A more sensible and defensive coding strategy would be:

List users = session.find("select u from User u where u.loginName = ?", ... );

// NOTE:  the size of users is expected to be [0, 1]
final int usersSize = users.size();

// if the size of users is greater than one, log an error
// but continue as this is not fatal
if(usersSize > 1) {
    // log something
    ...
} /* else -- users size is not greater than 1 */

// there is at least one user.  The first user will always
// be used.
// NOTE:  more than one user may be present at this time. 
//        This case can be safely ignored at this point.
if(usersSize > 0)  {
    ...
    return true;
} else if(usersSize == 0) {
    return false;
} else { // usersSize is less than zero 
    // this is an error that cannot be attributed to this code
    // in any way.
    throw new DeveloperException("<some helpful text>");
}

It is up to your particular application guidelines to determine whether or not the exception cases should be immediately bubbled out to the user as errors. Personally, I am not a big fan of asserts in Java 1.4 in web apps due to the problems of effective exception handling. (Here is a good starting point for the problems associated with Java's exceptions in general.) Let me stress that how you handle the cases that are unexpected is not as important as getting into the habit of thinking about them and notifying someone somehow when they occur. As long as you're consistent in dealing with these cases the time spent up front will save you precious time in the end.

Swing vs. JFace: Why not both?

It seems that every discussion about Eclipse these days quickly degrades into a fighting match about Swing (AWT) vs. JFace (SWT). "Swing is great and it's part of Java. You'd be a fool to anything else!"

Rather than attempt to obliterate SWT why don't we embrace it as the must needed alternative. Compitetion is a good thing; it forces each product to a higher level of quality. APIs (especially those for UIs) are not one stop shops. Each product has its pros and cons and having multiple products allows each developers to choose what is best for a particular application.

Like Linux to Microsoft, Pepsi to Coke or any coffee house to Starbucks, having Jface / SWT provides a much needed alternative to the firmly implanted incumbent. And having a choice makes everyone happy.

java.util.AbstractMap

java.util.AbstractMap's hashCode() (which is used for java.util.HashMap and java.util.TreeMap to name a few) has the following implementation:

int hashValue = 0;
for(final Iterator i=entrySet().iterator(); i.hasNext(); )
    hashValue += i.next().hashCode();

(This is noted in the javadoc for java.util.AbstractMap.)

If you have a significant number of entries in your HashMap then you're going to get bit computing the hash code. Performance gets even worse if the entries in the map are complex.

I would recommend subclassing and overriding hashCode() (with adequate documentation explaining the reasons for overriding as well as outlining the computation for the new value!) if you find yourself in a sticky performance situation. A more sane hash code might be:

int hashValue = 0;
for(final Iterator i=keySet().iterator(); i.hasNext(); )
    hashValue += i.next().hashCode();    

as it is common to use simple types for the keys of a map. The use of keySet() preserves the:

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

contract of hashCode().

If you're jamming a HashMap into another map to meet the constraints of some interface (as is common in web programming) it may be even better to simply use size() as the hash code to minimize the performance impact.

Don't forget that you're still going to get hit with the equals() (which is a full pass over the entry set) each time that you retrieve the HashMap from the enclosing map.

Java's new static imports

It seems that Kris has been lured toward the sirens that are the proposed JDK 1.5 static imports. I am convinced that static imports will reduce code clarity and therefore increase the bug rate. I offer a contrived example to demonstrate my position:

I am working on a class that staticly imports java.lang.Math to use sin() as well as a number of others functions. Due to the large overhead associated with sin() and the fact that my domain consists of only integral angles, I want to add my own sin() method to the class and switch to a pre-computed look up table of values.

According to the updated JLS:

A static-import-on-demand declaration never causes any other declaration to be shadowed.

That's fine but it does create confusion on the order of that that would be caused by operator overloading (which is not in Java due in part to "added complexity" associated with it).

For the sake of this example, let's say that java.lang.Math contains the following signatures:

public static float sin(float angle);
public static float sin(double angle);

(again, this is contrived to prove a point) and the signature that I added to my class is:

float sin(float angle);

The call to sin() resembles:

...
float angle;
...
rotated = PI * sin(angle);
...

In this exmaple, it may be easy to determine which sin() is being called, but in reality it is not. Somewhere along the way, I find a library that contains the signature of:

public static float sin(int angle);

that I want to staticly import. I can't do it according to the JLS:

If two single-static-import declarations in the same compilation unit attempt to import members with the same simple name, then a compile-time error occurs, unless the two members are the same member of the same type, in which case the duplicate declaration is ignored.

So you hopefully see the mess that I'm in. I'll attempt to illustrate to drive the point home.

import static java.lang.Math.*;
import com.someco.MathFunctions; // can't be static

private float sin(final float angle) { ... }
        
public class ContrivedExample {
    ...
    float angle;
    ...
    rotated = PI * sin(angle); // from local sin()
    ...
    other = sin(angle / 1.5); // from java.lang.Math (1.5 is double)
    ...
    uugh = MathFunctions.sin((int)(angle / 65535)); // from MathFunctions
    ...
}

That is a debuggers worst nightmare.

Of course placing strict coding constraints on how, when, where, and under what conditions static imports are used will help alleviate these problems but given a large set of imports, it might be easier said than done.

David Flanagan has also touched on some other issues -- specifically, how one can import a method with the same name but different signatures.

Static imports "solves" something that was never a problem to begin with (i.e. explicit names are a good thing). A more suituable solution to this "problem" would be in-line (or horizontal) code folding; just as some IDE's provide the ability to vertically fold various scopes, horizontal folding would fold the qualifier of the name.

Updated April 23 at 11:45AM
Computer Science not Computer Art

My education has been molded within the tenets of the natural sciences. We would follow the scientific method. We had sayings like "if you didn't document it, it didn't happen!". We had a set of common accepted techniques that were used as building blocks to achieve a desired result.

Fast forward to the present day and my current foray into computer science. I have struggled to impress the tenets of the sciences into every environment I have participated in: tests exist to ensure correctness and conformity; all code is consistent and thoroughly documented; patterns and common libraries are used.

It seems the efforts I take are not universal. I do not claim that I am the only one following these tenets, but I will insist that I am in the minority.

Are we practicing computer science or computer art?

:s/science/engineering/g
java.util.HashMap

From the constructor of java.util.HashMap from J2SE 1.4.2 (reprinted without permission):

// Find a power of 2 >= initialCapacity 
int capacity = 1; 
while (capacity < initialCapacity) 
    capacity <<= 1; 

this.loadFactor = loadFactor; 
threshold = (int)(capacity * loadFactor); 
table = new Entry[capacity]; 

where

    loadFactor = 0.75; 

If initialCapacity is a power of two then it is used as the capacity. Combining this with the load factor you get a threshold < initialCapacity. Had they only put that pesky "=" sign in the equation then we'd be all set. Ah well.

So what does this all mean? Given a distribution of hash values that fills each bucket only once (such as adding integers) and the default load factor of 0.75, if an initial capacity is a power of two then adding initialCapacity elements will require at least one resizing of the hash table!!

It should also be noted that chaining is dominant for small non-power-of-two initial capacities (again, given the default load factor).

Something to keep in mind.

HashMap hash function problems in 1.4.0
When software engineering goes horribly wrong

Incidents like the code snippet below underline the root cause of failure on most projects (and why I fully expect to die from a heart attack at a very young age):

if(!((yearObj.options[yearObj.selectedIndex].value / 4).toString().indexOf('.') == -1))

(Sorry about any line wrapping that may have occurred.)

That beautiful specimen was purported to compute if a selected year was a leap year or not. No, really. I could spend the rest of this day discussing the failure of the industry to police itself to maintain minimum standards, how programmers are not just generic blobs that can be pulled from one project and jammed into another, how lack of time and infrastructure perpetuate catastropic problems, etc, etc, etc ... but I wont.

Another one just in (from the same person as the beauty above):

for (var i = 1; i < days + 1; i++)

Of course there is nothing inherently wrong with the statement, but what is wrong is that there is a fundamental un-understanding (rather than a misunderstanding which implies that there is some understanding to begin with) of the principles of software engineering.

Java array sizes

While attempting to copy a file's contents to an array of bytes in Java I noticed something interesting that is taken for granted: array indexes are integers. This implies that only ~2G entries are available. Even though Java may have large file support (>2GB) and extended memory access enabled one cannot perform certain functions.

Nothing earth shattering here but it was one of those Hmmmm moments.

Working on the first try? Try again!

Tip: Be wary of something working on the first try.

If something works on the first try, it's guaranteed to be screwed up in some way.

A common one for me, as it's easy to forget, is enabling Java's assertions. They're disabled by default and if you use an IDE's fancy doo-dads to automatically run your JUnit tests then it wont have the assertions enabled (you typically have to manually enable them). All shows green and you move on. At some point later you hit an NPE (NullPointerException -- the bane of a Java programmer's existance) and see that there is an assertion in place. Then you realize that you never enabled assertions. A forhead smack occurs and you spend a day debugging what you should have fixed in the first place (when it was fresh in your mind).

Preventing NPE's

Java Tip: Put string constants on the left side of a .equals().

This prevents the dreaded NullPointerException (NPE) from occurring. For example:

if(name.equals("rob"))
   return;

should always be written as:

if("rob".equals(name))
   return;

"rob" will never be null so this is NPE safe.

Coding Defensively

Tip: Never check for a single value when you actually are interested in a range.

The common case where this occurs is with sizes (list, arrays, etc). The statement:

if(list.size() == 3)
    return;

or:

for(int i=0; i!=10; i++)
    ...

is error prone and should be avoided at all costs. Why? Most of the time the list will have multiple entries added (this is especially poignant in the case of MT (multi-threaded) code) and an equality can be missed. In the for-loop case, it is common (but oooohhh so bad) to see the loop counter manipulated in the loop body. So the correct statements would be:

if(list.size() >= 3)  // or (list.size() > 2)
    return;

and

for(int i=0; i<10; i++)
    ...

This is called coding defensively. You're preventing bugs before they've had a chance to form.

Nested functions in Java

I am always looking for ways to increase code clarity and reduce confusion and maintenance associated with "dangling methods". What's a dangling method? It's a method that is only used by another method to reduce code duplication. The scope of this method should therefore be local to only the calling function.

I tend to run into this problem when doing string manipulation. Currently I need to do a "last added character" for a CharBuffer. The only way to currently do this is to add a member function:

private char lastChar(final CharBuffer buffer)
{
    // determine if there are already chars in the buffer.  If
    // there are none, throw.
    if(buffer.position <= 0)
       throw new IndexOutOfBoundsException();
    /* else -- there are characters in the buffer */

    // retrieve the last character placed into the buffer
    // NOTE:  the above check ensures that there will be a char
    return buffer.get(buffer.position - 1);
}

to the class. This is no good since the scope of the method is too large. Large scope equals more time determining dependencies which equals more time to debug.

If Java allowed for nested functions, one could write:

private String normalize(final String string)
{
    ...

    // inner function for determining the last character
    // added to a buffer
    char lastChar(final CharBuffer buffer)
    { 
        ...
    };

    ...
 
        case '/':
            if(lastChar(buffer) != '/')
                ...

    ...
}

Having nested (or inner) functions in Java would help enormously. Kris Wehner brought up a Smalltalk technique which would be somewhat useful in this case. What do you think a solution to this problem would be?

CharBuffer vs. StringBuffer

Performance of java.nio.CharBuffer vs. java.lang.StringBuffer:

 Iterations 

 CharBuffer 

 StringBuffer 

  % diff 

10000

~0

~0

0

1000000

187

243

77

For "normal" string processing there appears to be no difference between the two -- the effects are lost in the noise. For large strings (documents and the like), CharBuffer has a distinct advantage.

CharBuffer has the perk of pointer-like manipulation via CharBuffer.subSequence() and CharBuffer.slice() but lacks good string searching functions like StringBuffer.lasIndexOf().

The only caveat with CharBuffer is that the size of the buffer must be known a priori.

Notes:

  • The length of the strings added were between 0 and 10 characters in length to simulate standard text processing.
  • This should be taken with a grain of salt since it is a micro-benchmark.
Dealing with Java's URLs

I have found myself in a position where I am yet again wadding through the quagmire that are Java's URLs.

  • Goal: virtualize a filesystem (i.e. a VFS).
  • Interface: URL's are provided that define the scope of the filesystem. These URL's are files (archive and non, local and non) and directories. Lookups and retrievals are done against the VFS and return ByteBuffers. (Think "resource" on ClassLoader.)
  • First thought: Just use java.lang.ClassLoader. The problem is that I need granular access to the data to optimize reads and there is no way to change a classpath at runtime.

So what is the problem with Java's URLs? Archives (i.e. JAR and ZIP). Play around with URL's such as:

jar:jar:file:///some/directory/file.jar!/nested/file1.jar!/finally.txt

and you'll know the pain I feel.

There will be more on this ... believe me!

Side notes:

  • The path component of a URL can be null. This is a gotcha for handling local files with a statement like:
        if("file".equals(url.getProtocol)))
            File file = new File(url.getFile());
    
  • Are query strings useful when the protocol is "file"?
  • Consistently using either URL.getPath() or URL.getFile(). When dealing with local files (protocol is "file") getPath() makes the most sense since it will not include the query string (see above).
  • URL.getPath() and URL.getFile() may return a URL encoded string. This string cannot be used in File as it will not correctly URL decode it. The string must be manually URLDecoded. Example URL:
        new URL("file:///C:/Program%20Files/Java/j2re1.5.0/bin/java.exe")
    
    URL.getPath() will return:
        /C:/Program%20Files/Java/j2re1.5.0/bin/java.exe
    
  • URL test cases is an excellent resource for reminding oneself of the various forms URL's come in for local files.
java.io.File gotcha!

File.isDiectory() and File.isFile() are not mutually exclusive. This is commonly seen in the case:

if(file.isDirectory())
    // do something with a directory
else 
    // do something with a file

Unfortunately, the above is true if and only if File.exists() returns true. This is in the javadoc for the methods but it's common to assume that file and directory are mutually exclusive.

Since it is possible for a file to be removed between File.exists() and the corresponding File.isDiectory() and File.isFile(), it seems that best practices dictates that code similar to the following is used:

if(file.isDirectory())
    // do something with a directory
else if(file.isFile()
    // do something with a file
else
    // do something with a non-existing file

A side note to this: File.isDiectory() and File.isFile() will actually touch the native file system. It does not just check for a trailing slash or some other such thing.

Creative Commons License Unless otherwise expressly stated, all original material of whatever nature created by Rob Grzywinski and included in this weblog and any related pages, including the weblog's archives, is licensed under a Creative Commons License.