Woozle Wuzzle
What about the "Why"?

I'm constantly at odds with my developers over the importance of documenting why a piece of code does what it does. Having been in the code maintenance business for a long time, I have learned the hard way that a particular implementation is only valid for a particular set of conditions. Unless those conditions are well documented, there is no way to effectively determine if the code is valid in another (perhaps the same since there is no way to know) situation.

Some examples of questions that should have documented "why"s:

  • What types of processes are expected to communicate with the code and by what means (theaded / non, etc)?
  • What conditions are expected to never / always occur?
  • Is a call expected to block?
  • Is there an expected and required order to a particular set of calls?
  • Do items automagically maintain themselves (e.g. will a map shrink as entries are removed)?
  • Can an item be reused w/o reconstruction and what are the constraints to reuse?
  • Does an item expect to be reused in a different set of conditions?

Pulling an example from my own code:

"There are certain optimizations that have been made in the writer based on the fact that the send timeout is a constant and is based on the time at which a message is added to the queue (i.e. the queue will contain monotonically increasing timeout values). This implies that until the currently active message's (the message currently being written) timeout occurs, no other message in the queue needs to be checked."

As time went on, it was determined that there would be messages that never timed out. This means that the constraint that the timeout values are monotonicly increasing was no longer valid and therefore the implementation was no longer valid. Only by specifying the conditions under which the code was written (assumptions that were made) was it known that the implementation needed to be changed.

It is common for the conditions under which an implementation is written to be defined in other systems or documents such as requirements or the bug tracking system. Unless the conditions are presented either within the code itself or the same directories as the code the correlation is lost. Also, the implementation typically has its own specific set of conditions that would not be found in requirements.

There is little actual overhead in serializing these conditions as, by definition, they are all known at developement time. In other words, the conditions are all known, they simply must be written out. Once a suitable convention has been established for this documentation and the developers overcome the initial inertia of performing this task, it becomes very natural. Any minimal time lost over the process of typing is overshadowed by the extra level of communication that it provides.

Cosmological Mysteries Part 3

The final set of notes on measuring dark matter and energy (Part 3 of 3):

  • Why does everything in the universe look as if it is receeding from us? Are we at some ideal location in the universe? Pick any point on a sphere. As the sphere increases in size (think about blowing up a balloon) any other point on that sphere moves away from it. So we are not at some special point in the universe. Every point moves away from every other. The distance a point is away from another is proportional to the velocity at which it is receeding.
  • Light is red shifted for the same reason as above. As space expands it stretches light making it redder (longer wavelenghts).
  • Experimental evidence shows that this linearity of distance to velocity (determined via measuring the red shifts) breaks down with large distances. This implies that there is an acceleration.
  • How are distances measured in cosmology?
    • Standard Ruler uses the apparent angular separation compared to a known (actual) fixed separation. An exmaple of this technique is measuring how far a car is away from you by measuring the distance between the headlights (given that you know the actual distance between the headlights).
    • Standard Candle uses apparent brightness compared to a known (actual) fixed luminosity. An example of this can be done by measuring the brightness of a car's headlights at some distance away (given that the actual brightness of the headlights are known).
  • Various experiements to use the measuring techniques:

    • The universe is spatially flat. This is not to be confused with being flat in space-time which would go against Einstein's theories of relativity (equating curvature of space with mass providing for gravity). Spatially flat means that two parallel light beams will remain parallel (when ignoring the expansion of the universe). Coming back to the fact that space-time is not: since the expansion of the universe is subtracted to provide for the light remaining parallel, this shows that space-time is not flat.
    • Since there is no direct way to measure parallel photons, the elongation of the cosmic microwave background (CMB) (the red shift) can be used to measure the expansion of the universe. That is, since the expansion of space causes a red shift (expanding space elongates the waves) and variance in the spectra from distant galaxies can be easily measured, the CMB can be used as a standard ruler.
    • Supernovae are considered to be standard candles and are therefore used for measuring distances. Because supernovae are formed by drawing in mass from another object until a critcal mass (the Chandrashekar mass) is achieved and the mass explodes, the brightness of this explosion should be known. In practice, supernovae are known to not be standard candles but there are techniques to compensate for the fact that the luminosity does vary. These techniques involve measuring the time over which the explosion occurs and the brightness.
    • The geometry of space-time and the expansion history is determined by the matter-energy content of the universe.
  • The power spectrum of the CMB has a very unique signature as seen in the graph. (The x-axis is the inverse of the spot size (measured as an angle) and the y-axis is the brightness.) The "bumps" or peaks in the graph on the right side can be explained by harmonics in the theory of sound.

    The spectrum of sound has a fundamental frequency and harmonic overtones. The parallel in cosmology is that space (the distance over which the sound wave travels) is swapped for time. Very early in the big bang (t ~ 10^-36s) the universe was much more dense (~1000x smaller than it is now) and composed of elementary particles and photons. As the universe expanded, recombination occurred which means that it cooled enough for the particles to form atoms (i.e. hydrogen). In comparison to sound waves, recombination is the maximum displacement for photons.

  • The close correlation of the peaks in the power spectrum to that of the theory proves that only the initial conditions (i.e. the big bang) contributed to the signature as there are no other mechanisms that would produce that signature.
  • One can use a standard ruler of the fundamental frequency to measure the curvature of space.
  • The density of dark matter can be determined directly from the power spectrum as one of the harmonics can be directly attributed to dark matter. The first peak is the fundamental frequency. The second peak is attributed to baryons ("normal" matter). The thrid peak is attributed to dark matter.
  • The heights of the peaks in the power spectrum indicates the contribution of each harmonic (component of matter). For example, a small second peak is indicative of baryon density comprable to the photon density. Also, without dark matter (matter that does not interact with light (photons)) the harmonic peaks would be much smaller.
  • Dark energy could not have contributed at the time of recombination as it would be observed in the peaks (the relative heights of the peaks).
  • Dark energy's density decreases much more slowly than other matter as the universe expands. As the universe expands, the density of dark matter and normal matter decreases but the dark energy is constant (check the "constant" part). We are at a very unique time in which the density of dark matter and dark energy are comprable. This is relavant in the formation of large structures (see previous notes).
  • As the density of dark and normal matter decreases (with the expansion of the universe), dark energy is much more prevalent. Dark energy has an effect that is opposite of gravity. This is believed to be the reason behind why the universe is currently accelerating.
  • The discrepancy between particle theory and the measurements of dark energy is approximately 120 orders of magnitude. We need a new theory -- enter string theory.
  • These higher order theories (such as dark energy and string theory) may be due to a fundamental misunderstanding of the effects of gravity at very large distances.

Lecture notes are available from Wayne Hu.

Reference links:

Smoke Test

My wife asked me the other day why is it called a "smoke test". I honestly didn't know. Here's what Jargon has to say about it:

  1. A rudimentary form of testing applied to electronic equipment following repair or reconfiguration, in which power is applied and the tester checks for sparks, smoke, or other dramatic signs of fundamental failure.
  2. By extension, the first run of a piece of software after construction or a critical change.
Web Applications

I have been involed in architecting and writing web applications as long as there has been a "web". Recently, I have been doing due dilligence on web architectures. Most architectures recognize the value in the Model 2 (or MVC) approach in their design. But is this this sufficient?

This is a work in progress so excuse the mess and please check back for updates.

Intended audience

This article is geared towards enterprise web applications. An enterprise web application in the context of this article consists of the following:

  • An application backed by some well defined business process.
  • At least one developer per tier (JSP, Servlet and business process) with the ability to easily scale to multiple developers per tier without resource contention.
  • There is a well defined development and release cycle. That is, development is not ad-hoc.

If your application does not fall under the above constraints then the concepts defined herein may not apply. For example, introducing Model 2 into an environment where there is only one developer may kill productivity due to the overhead associated with the multiple layers.

Starting points

There are just as many starting points as there are web frameworks. Below is an attempt to enumerate a few of the initial conditions for a web-enabled application.

  • Scratch. Nothing exists except a set of requirements.
  • A business process exists that is exposed via a well-defined API. This API may or may not be tooled for the pecularities of a web application.
  • An exsiting application that is to be web-enabled. Depending on the architecture of the application, this may fall into the case above. In a worst-case scenario the application is tightly coupled to a presentation mechanism (such as a monolithic VB app).
What's going on?

I was originally going to do a full write-up on the request / response, MVC, and the like but after re-reading Designing Enterprise Applications with the J2EE(TM) Platform, Second Edition and MVC Detailed it would be significantly redundant.

I will be updating this entry with more information using the above link as a reference.

Coding Defensively

I have been involved in a code review for the past few days. Time and time again I have come across code that fits into the "if you know something will never happen, it most certainly will" category of development. Take a look at this example:

List users = session.find("select u from User u where u.loginName = ?", ... );
if(users.size() > 0)  {
    ...
    return true;
}
return false;

This probably looks like 99% of the code out there. The problem is that you're only concerned with the case where the size is equal to one. The case where the size is greater than one is undefined.

I know, you're thinking to yourself: "But that will never happen since I have unique constraints on my primary keys. The entry app will puke when it attempts to enter more than one row." Never say never. A few years ago I was working on an application with the same constraints. In order to speed up and allow for an ETL operation that the DBA was doing, he disabled the all of the constraints and forgot to re-enable them. Rather than having logging in place that would have caught this error immediately, a few weeks went by without anyone noticing. Needless to say, it took a few weeks to clean up the resulting mess. Oh, did I forget to mention that this was a production database?

A more sensible and defensive coding strategy would be:

List users = session.find("select u from User u where u.loginName = ?", ... );

// NOTE:  the size of users is expected to be [0, 1]
final int usersSize = users.size();

// if the size of users is greater than one, log an error
// but continue as this is not fatal
if(usersSize > 1) {
    // log something
    ...
} /* else -- users size is not greater than 1 */

// there is at least one user.  The first user will always
// be used.
// NOTE:  more than one user may be present at this time. 
//        This case can be safely ignored at this point.
if(usersSize > 0)  {
    ...
    return true;
} else if(usersSize == 0) {
    return false;
} else { // usersSize is less than zero 
    // this is an error that cannot be attributed to this code
    // in any way.
    throw new DeveloperException("<some helpful text>");
}

It is up to your particular application guidelines to determine whether or not the exception cases should be immediately bubbled out to the user as errors. Personally, I am not a big fan of asserts in Java 1.4 in web apps due to the problems of effective exception handling. (Here is a good starting point for the problems associated with Java's exceptions in general.) Let me stress that how you handle the cases that are unexpected is not as important as getting into the habit of thinking about them and notifying someone somehow when they occur. As long as you're consistent in dealing with these cases the time spent up front will save you precious time in the end.

Swing vs. JFace: Why not both?

It seems that every discussion about Eclipse these days quickly degrades into a fighting match about Swing (AWT) vs. JFace (SWT). "Swing is great and it's part of Java. You'd be a fool to anything else!"

Rather than attempt to obliterate SWT why don't we embrace it as the must needed alternative. Compitetion is a good thing; it forces each product to a higher level of quality. APIs (especially those for UIs) are not one stop shops. Each product has its pros and cons and having multiple products allows each developers to choose what is best for a particular application.

Like Linux to Microsoft, Pepsi to Coke or any coffee house to Starbucks, having Jface / SWT provides a much needed alternative to the firmly implanted incumbent. And having a choice makes everyone happy.

java.util.AbstractMap

java.util.AbstractMap's hashCode() (which is used for java.util.HashMap and java.util.TreeMap to name a few) has the following implementation:

int hashValue = 0;
for(final Iterator i=entrySet().iterator(); i.hasNext(); )
    hashValue += i.next().hashCode();

(This is noted in the javadoc for java.util.AbstractMap.)

If you have a significant number of entries in your HashMap then you're going to get bit computing the hash code. Performance gets even worse if the entries in the map are complex.

I would recommend subclassing and overriding hashCode() (with adequate documentation explaining the reasons for overriding as well as outlining the computation for the new value!) if you find yourself in a sticky performance situation. A more sane hash code might be:

int hashValue = 0;
for(final Iterator i=keySet().iterator(); i.hasNext(); )
    hashValue += i.next().hashCode();    

as it is common to use simple types for the keys of a map. The use of keySet() preserves the:

If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.

contract of hashCode().

If you're jamming a HashMap into another map to meet the constraints of some interface (as is common in web programming) it may be even better to simply use size() as the hash code to minimize the performance impact.

Don't forget that you're still going to get hit with the equals() (which is a full pass over the entry set) each time that you retrieve the HashMap from the enclosing map.

Java's new static imports

It seems that Kris has been lured toward the sirens that are the proposed JDK 1.5 static imports. I am convinced that static imports will reduce code clarity and therefore increase the bug rate. I offer a contrived example to demonstrate my position:

I am working on a class that staticly imports java.lang.Math to use sin() as well as a number of others functions. Due to the large overhead associated with sin() and the fact that my domain consists of only integral angles, I want to add my own sin() method to the class and switch to a pre-computed look up table of values.

According to the updated JLS:

A static-import-on-demand declaration never causes any other declaration to be shadowed.

That's fine but it does create confusion on the order of that that would be caused by operator overloading (which is not in Java due in part to "added complexity" associated with it).

For the sake of this example, let's say that java.lang.Math contains the following signatures:

public static float sin(float angle);
public static float sin(double angle);

(again, this is contrived to prove a point) and the signature that I added to my class is:

float sin(float angle);

The call to sin() resembles:

...
float angle;
...
rotated = PI * sin(angle);
...

In this exmaple, it may be easy to determine which sin() is being called, but in reality it is not. Somewhere along the way, I find a library that contains the signature of:

public static float sin(int angle);

that I want to staticly import. I can't do it according to the JLS:

If two single-static-import declarations in the same compilation unit attempt to import members with the same simple name, then a compile-time error occurs, unless the two members are the same member of the same type, in which case the duplicate declaration is ignored.

So you hopefully see the mess that I'm in. I'll attempt to illustrate to drive the point home.

import static java.lang.Math.*;
import com.someco.MathFunctions; // can't be static

private float sin(final float angle) { ... }
        
public class ContrivedExample {
    ...
    float angle;
    ...
    rotated = PI * sin(angle); // from local sin()
    ...
    other = sin(angle / 1.5); // from java.lang.Math (1.5 is double)
    ...
    uugh = MathFunctions.sin((int)(angle / 65535)); // from MathFunctions
    ...
}

That is a debuggers worst nightmare.

Of course placing strict coding constraints on how, when, where, and under what conditions static imports are used will help alleviate these problems but given a large set of imports, it might be easier said than done.

David Flanagan has also touched on some other issues -- specifically, how one can import a method with the same name but different signatures.

Static imports "solves" something that was never a problem to begin with (i.e. explicit names are a good thing). A more suituable solution to this "problem" would be in-line (or horizontal) code folding; just as some IDE's provide the ability to vertically fold various scopes, horizontal folding would fold the qualifier of the name.

Updated April 23 at 11:45AM
Computer Science not Computer Art

My education has been molded within the tenets of the natural sciences. We would follow the scientific method. We had sayings like "if you didn't document it, it didn't happen!". We had a set of common accepted techniques that were used as building blocks to achieve a desired result.

Fast forward to the present day and my current foray into computer science. I have struggled to impress the tenets of the sciences into every environment I have participated in: tests exist to ensure correctness and conformity; all code is consistent and thoroughly documented; patterns and common libraries are used.

It seems the efforts I take are not universal. I do not claim that I am the only one following these tenets, but I will insist that I am in the minority.

Are we practicing computer science or computer art?

:s/science/engineering/g
Multiplication

The last thing that the MRI technician said to me before leaving the room was: don't move. "Don't move", piece of cake ... or so I thought!

Within moments of listening to the ear-plug muted rumblings of the machine I found myself being lulled into a semi-conscious state. Like a student nodding off in a boring lecture, I began the vicious cycle of stupor then jerking myself awake, stupor ... awake. Oh crap, I thought, if I don't find some way of staying awake I'm going to have to go through all of this again.

Frantically recalling what Meg did in A Wrinkle in Time to save her from control of IT, I began to recite the multiplication tables in my head. I started with nine (I just like nine). 9 x 1 = 9 ... I quickly blew through to 9 x 10 = 90. This ain't gonna cut it -- too easy. 9 x 11 = 99, 9 x 12 = 108, 9 x 13 = ... crap! Grade school multiplication tables only went up to tweleve. I could just add nine to one-oh-eight and cheat my way along. But what if I got to 9 x 100 and I didn't get nine-hundred due to cumulative error? What would my friends think? My mother and her weak heart, it might just kill her to find out! There has to be a better way.

We're going to have to do this the long way:

  13
 x 9
----
  27
+ 9
----
 117

Now why do we do this whole "multiply by the ones and write down the result and then multiply by the tens and write that down shifted over one" nonesense? There has to be a simple answer but no one bothered to tell me!

Rewriting the multiplication in a more linear form we get:

(9)(13) = 117

But thirteen can be written as a sum:

(9)(3 + 10)

Bringing in the distributive law of multiplication:

(9)(3) + (9)(10)

Simplifies to:

27 + 90

Ah ha! I recalled one of my grade school teachers saying that you can put a zero in that "shifted over" spot.

  13
 x 9
----
  27
+ 90
----
 117

That makes the two statements identical! This is pretty obvious when you think about it. Since we're taught the mechanics and not the theory it's just all taken for granted.

By the time I worked through all of this (as well as some other interesting tidbits involving nine that I will write up later) the technician was rolling me out of the bowels of the great machine. I made it!

(After the fact I did recall that it was IT that was trying to gain control over Meg by forcing her to recite the multiplication tables and was therefore the worst thing that I could have done but it had saved me. Thank you Madeleine L'Engle!)

Cosmological Mysteries Part 2

Some notes on the experimental side of dark matter (Part 2 of 3):

  • Supersymmetry: A symmetry relating fermions and bosons in which every "ordinary" particle has a corresponding "superpartner" which differs in spin by half a unit. This theory attempts to find a common explanation for forces.

  • One possible explanation for dark matter is WIMPs (weakly interacting massive particles). A possible candidate for WIMPs are supersymmetric particles. These particles have a very low probability of interaction (less than that of the lighter neutrino). Fortunately supersymmetric particles interact with baryons via nuclear recoils (billiard-ball interactions).

  • Current detectors can detect ~1 recoil per kg per year. To detect a WIMP a ~1 recoil per ton per year is required.

  • There are a number of problems in creating a detector sensitive enough to detect WIMPs. The primary problem is background radiation. This radiation can come from cosmic sources, radon from the ground, naturally occurring isotopes in metals used for constructing the detector, and even potassium 40 found in human sweat. What's most unfortunate is that WIMPs are low energy which corresponds directly with this background radiation.

    Interesting factoid: radon pools if ventilation is not adequate. Radon levels were ~2x greater than recommended in the underground "red button" rooms of France's nuclear agency due to poor air circulation. These are the unknown casualities of the cold war.

  • CDMS (Cryogenic Dark Matter Search) is a detector for WIMPs. It is housed in the Soudan underground laboratory.

  • DRIFT is an experiment to eliminate background radiation by exploting properties of the WIMP "wind". It is assumed that WIMPs are isotropic throughout the universe. Because the earth and sun are moving through the galaxy, there should be a "wind" of WIMPs. The wind will have a particular signature due to the rotation of the earth about the sun and the earth about its axis (For example during July the earth will be moving in the direction of the motion of the sun through the galaxy whereas in December it will be moving in the opposite direction). This signature caused by the WIMP wind should be different than that seen by any other background radiation.

  • MINOS (Main Injector Neutrino Oscillation Search) involves a gun at Fermi that shoots neutrinos at a detector in the Soudan mine. This will be the first attempt to detect neutrinos from a source other than cosmic rays.

java.util.HashMap

From the constructor of java.util.HashMap from J2SE 1.4.2 (reprinted without permission):

// Find a power of 2 >= initialCapacity 
int capacity = 1; 
while (capacity < initialCapacity) 
    capacity <<= 1; 

this.loadFactor = loadFactor; 
threshold = (int)(capacity * loadFactor); 
table = new Entry[capacity]; 

where

    loadFactor = 0.75; 

If initialCapacity is a power of two then it is used as the capacity. Combining this with the load factor you get a threshold < initialCapacity. Had they only put that pesky "=" sign in the equation then we'd be all set. Ah well.

So what does this all mean? Given a distribution of hash values that fills each bucket only once (such as adding integers) and the default load factor of 0.75, if an initial capacity is a power of two then adding initialCapacity elements will require at least one resizing of the hash table!!

It should also be noted that chaining is dominant for small non-power-of-two initial capacities (again, given the default load factor).

Something to keep in mind.

HashMap hash function problems in 1.4.0
When software engineering goes horribly wrong

Incidents like the code snippet below underline the root cause of failure on most projects (and why I fully expect to die from a heart attack at a very young age):

if(!((yearObj.options[yearObj.selectedIndex].value / 4).toString().indexOf('.') == -1))

(Sorry about any line wrapping that may have occurred.)

That beautiful specimen was purported to compute if a selected year was a leap year or not. No, really. I could spend the rest of this day discussing the failure of the industry to police itself to maintain minimum standards, how programmers are not just generic blobs that can be pulled from one project and jammed into another, how lack of time and infrastructure perpetuate catastropic problems, etc, etc, etc ... but I wont.

Another one just in (from the same person as the beauty above):

for (var i = 1; i < days + 1; i++)

Of course there is nothing inherently wrong with the statement, but what is wrong is that there is a fundamental un-understanding (rather than a misunderstanding which implies that there is some understanding to begin with) of the principles of software engineering.

Java array sizes

While attempting to copy a file's contents to an array of bytes in Java I noticed something interesting that is taken for granted: array indexes are integers. This implies that only ~2G entries are available. Even though Java may have large file support (>2GB) and extended memory access enabled one cannot perform certain functions.

Nothing earth shattering here but it was one of those Hmmmm moments.

Working on the first try? Try again!

Tip: Be wary of something working on the first try.

If something works on the first try, it's guaranteed to be screwed up in some way.

A common one for me, as it's easy to forget, is enabling Java's assertions. They're disabled by default and if you use an IDE's fancy doo-dads to automatically run your JUnit tests then it wont have the assertions enabled (you typically have to manually enable them). All shows green and you move on. At some point later you hit an NPE (NullPointerException -- the bane of a Java programmer's existance) and see that there is an assertion in place. Then you realize that you never enabled assertions. A forhead smack occurs and you spend a day debugging what you should have fixed in the first place (when it was fresh in your mind).

Preventing NPE's

Java Tip: Put string constants on the left side of a .equals().

This prevents the dreaded NullPointerException (NPE) from occurring. For example:

if(name.equals("rob"))
   return;

should always be written as:

if("rob".equals(name))
   return;

"rob" will never be null so this is NPE safe.

Coding Defensively

Tip: Never check for a single value when you actually are interested in a range.

The common case where this occurs is with sizes (list, arrays, etc). The statement:

if(list.size() == 3)
    return;

or:

for(int i=0; i!=10; i++)
    ...

is error prone and should be avoided at all costs. Why? Most of the time the list will have multiple entries added (this is especially poignant in the case of MT (multi-threaded) code) and an equality can be missed. In the for-loop case, it is common (but oooohhh so bad) to see the loop counter manipulated in the loop body. So the correct statements would be:

if(list.size() >= 3)  // or (list.size() > 2)
    return;

and

for(int i=0; i<10; i++)
    ...

This is called coding defensively. You're preventing bugs before they've had a chance to form.

Life's Expectations

I have worked in an upper management position in a number of small companies throughout my career. It is always been my position to offer up as much of my control to anyone that is willing and able to take it. After a decade of watching people either ignore the opportunity or take the opportunity and completely fall apart, I have reduced the experiences down to the following:

If presented with a situation that will deliver everything that you've ever wanted in your career, will you:

  • Recognize it?
  • Be able to handle it?

Notes:

  • People may be under the impression that the path to reaching their goals will be presented thru obvious opportunities and the challenges associated will be easily overcome similar to someone who "plans" financial success thru winning the lottery.

Corollary 1

If it is everything that you wanted in life and you could not handle it, will you be able to bow out gracefully?

Corollary 2

If it is determined that the situation will not be able to deliver everything that you've wanted in your career (as conditions do change) will you be able to control and change the situation such that it delivers on a subset of your desires?

There is much more to this post and as I have more time available I will elaborate.

Nested functions in Java

I am always looking for ways to increase code clarity and reduce confusion and maintenance associated with "dangling methods". What's a dangling method? It's a method that is only used by another method to reduce code duplication. The scope of this method should therefore be local to only the calling function.

I tend to run into this problem when doing string manipulation. Currently I need to do a "last added character" for a CharBuffer. The only way to currently do this is to add a member function:

private char lastChar(final CharBuffer buffer)
{
    // determine if there are already chars in the buffer.  If
    // there are none, throw.
    if(buffer.position <= 0)
       throw new IndexOutOfBoundsException();
    /* else -- there are characters in the buffer */

    // retrieve the last character placed into the buffer
    // NOTE:  the above check ensures that there will be a char
    return buffer.get(buffer.position - 1);
}

to the class. This is no good since the scope of the method is too large. Large scope equals more time determining dependencies which equals more time to debug.

If Java allowed for nested functions, one could write:

private String normalize(final String string)
{
    ...

    // inner function for determining the last character
    // added to a buffer
    char lastChar(final CharBuffer buffer)
    { 
        ...
    };

    ...
 
        case '/':
            if(lastChar(buffer) != '/')
                ...

    ...
}

Having nested (or inner) functions in Java would help enormously. Kris Wehner brought up a Smalltalk technique which would be somewhat useful in this case. What do you think a solution to this problem would be?

CharBuffer vs. StringBuffer

Performance of java.nio.CharBuffer vs. java.lang.StringBuffer:

 Iterations 

 CharBuffer 

 StringBuffer 

  % diff 

10000

~0

~0

0

1000000

187

243

77

For "normal" string processing there appears to be no difference between the two -- the effects are lost in the noise. For large strings (documents and the like), CharBuffer has a distinct advantage.

CharBuffer has the perk of pointer-like manipulation via CharBuffer.subSequence() and CharBuffer.slice() but lacks good string searching functions like StringBuffer.lasIndexOf().

The only caveat with CharBuffer is that the size of the buffer must be known a priori.

Notes:

  • The length of the strings added were between 0 and 10 characters in length to simulate standard text processing.
  • This should be taken with a grain of salt since it is a micro-benchmark.
History

"The common belief that we gain 'historical perspective' with increasing distance seems to me to utterly misrepresent the actual situation. What we gain is merely confidence in generalization that we would never dare to make if we had access to the real wealth of contemporary evidence."

Otto Neugebauer

Dealing with Java's URLs

I have found myself in a position where I am yet again wadding through the quagmire that are Java's URLs.

  • Goal: virtualize a filesystem (i.e. a VFS).
  • Interface: URL's are provided that define the scope of the filesystem. These URL's are files (archive and non, local and non) and directories. Lookups and retrievals are done against the VFS and return ByteBuffers. (Think "resource" on ClassLoader.)
  • First thought: Just use java.lang.ClassLoader. The problem is that I need granular access to the data to optimize reads and there is no way to change a classpath at runtime.

So what is the problem with Java's URLs? Archives (i.e. JAR and ZIP). Play around with URL's such as:

jar:jar:file:///some/directory/file.jar!/nested/file1.jar!/finally.txt

and you'll know the pain I feel.

There will be more on this ... believe me!

Side notes:

  • The path component of a URL can be null. This is a gotcha for handling local files with a statement like:
        if("file".equals(url.getProtocol)))
            File file = new File(url.getFile());
    
  • Are query strings useful when the protocol is "file"?
  • Consistently using either URL.getPath() or URL.getFile(). When dealing with local files (protocol is "file") getPath() makes the most sense since it will not include the query string (see above).
  • URL.getPath() and URL.getFile() may return a URL encoded string. This string cannot be used in File as it will not correctly URL decode it. The string must be manually URLDecoded. Example URL:
        new URL("file:///C:/Program%20Files/Java/j2re1.5.0/bin/java.exe")
    
    URL.getPath() will return:
        /C:/Program%20Files/Java/j2re1.5.0/bin/java.exe
    
  • URL test cases is an excellent resource for reminding oneself of the various forms URL's come in for local files.
java.io.File gotcha!

File.isDiectory() and File.isFile() are not mutually exclusive. This is commonly seen in the case:

if(file.isDirectory())
    // do something with a directory
else 
    // do something with a file

Unfortunately, the above is true if and only if File.exists() returns true. This is in the javadoc for the methods but it's common to assume that file and directory are mutually exclusive.

Since it is possible for a file to be removed between File.exists() and the corresponding File.isDiectory() and File.isFile(), it seems that best practices dictates that code similar to the following is used:

if(file.isDirectory())
    // do something with a directory
else if(file.isFile()
    // do something with a file
else
    // do something with a non-existing file

A side note to this: File.isDiectory() and File.isFile() will actually touch the native file system. It does not just check for a trailing slash or some other such thing.

Saiko

Saiko
1307 S. Wabash
312-922-2222

Cuisine:

Japanese, Sushi, Steak

Price:

$$$

Atmosphere:

Contemporary, trendy. Definitely not your standard sushi bar.

Service:

Excellent. Very prompt and attentive wait staff. It should be noted that it was a Thursday night and the dining room was half full.

I ordered the Heavens Door sake which started with a heavy chocolate flavor that would immediately vanish as you were trying to determine what it was. My wife had the Pride of the Village which was a crisp fruity flavor -- not overpowering.

We started with a california roll (it is a standard that we use to baseline new sushi restaurants) which had real crab meat, avocado, roe and sesame seeds. My wife also had tako (octopus). Both were better than average quality. We are both looking forward to returning to try other sushi.

For our entrees, I had the Cripsy Whole (Yokuzuna) Bass and my wife had the Sirloin Steak. Our server (Angelique) recommended presenting the bass and then having the chef filet it -- a wonderful idea. The presentation of the bass was excellent. Its flavor was subtle with a hint of spice tacked on the end and the skin remained surprisingly crispy thoughout the meal. The delicate flavor made this a perfect dish for me. My wife's steak was served with melted wasabi cheese served with pea shoots and a garlic oyster sauce. My wife is not a cheese lover so that was not her personal favorite but overall it was quite tasty.

For dessert my wife had the Saiko chocolate which was a flourless chocolate cake topped with green tea cheese cake accompanied by mint ice cream on the side. Unfortunately the chocolate completely overwhelmed the flavors (not that that's a bad thing) but it would have been better if the green tea was a better complement. I had the Fuji Apple Tart which combined a delicate apple tartlet with ginger ice cream with caramel drizzled on the side. Being an apple tart lover this was an exceptional choice.

Cosmological Mysteries

Some notes from a lecture on Dark Matter and Structure formation in the Universe (Part 1 of 3):

  • Zone of avoidance: the region along the galactic equator that cannot be easily observed due to the absorption of light from the dust in the plane of the Milky Way
  • Universe expansion does not play in local effect such as star - planet, star - star, and even local galaxy - galaxy. "Close in" galaxies (such as ours and Andromeda) are actually approaching (display blue shifts). Once the 1 / r^2 of gravity falls off the expansion (red shift) effects are seen.
  • Sloan Digital Sky Survey (SDSS) is an effort to map one million galaxies in the universe. The process is done via measuring red shifts.

    Stars can be separated from galaxies by looking for a 4 spike cross. Stars are essentially points that cannot be resolved which causes a star pattern from the lenses. Galaxies will appear as blobs as they can be resolved. (I need to find a definitive reference for this phenomena.)

  • NASA's Wilkinson Microwave Anisotropy Probe (WMAP) is measuring the cosmic background radiation.
  • By combining the results of the SDSS and WMAP physical evidence is found for the existance of dark energy. Dark energy is gravitationally repulsive. What's interesting about dark energy is that it corresponds to Einstein's cosmological constant. The cosmological constant was added by Einstein to his equations to achieve a static model of the universe. After it was determined that the universe was not static (it's currently expanding) this constant was considered to be Einstein's blunder. It seems that Einstein was right after all.
  • Dark matter does not lose energy like standard matter does. Standard matter (baryons) loses energy (by radition) and will condense forming dense matter. Dark matter on the other hand does not lose energy and therefore has a lower bound on its density. Because of this dark matter does not have an effect on the planetary scale as standard matter dominates. As the scale moves to intergalactic distances the density of dark matter is such that it contributes to interactions.
  • "Gravity is like the economy: the rich get richer and the poor get poorer." The results of gravity follow that of chaos -- that is, it is highly dependent on initial conditions. Small initial anisotropies in a distribution of matter will quickly become large clusters of matter. A locally dense region will pull more and more matter into it ("the rich get richer") and a locally spare region will have more and more matter pulled from it ("the poor get poorer").
  • Using the background microwave radiation (from WMAP) as a baseline, simulations have been performed to show that fibers similar to that seen in the results of SDSS evolve.

    It has also been shown that the effects of dark matter / energy must be added into these simulations in order to maintain the structure of the filaments. The dark matter / energy dominates over the gravitional effects to curtail further collapse of the filaments

Course slides are available.

A useful astronmy reference.

Thank you Andrey Kravtsov for a great talk!

I christen thee ....

This site is officially open! Woo-hoo!

Creative Commons License Unless otherwise expressly stated, all original material of whatever nature created by Rob Grzywinski and included in this weblog and any related pages, including the weblog's archives, is licensed under a Creative Commons License.