Saturday, October 20, 2012

Mythbuster












I'm asked to use an automated code review tool at work. It's not FindBugs; nevermind the name. Given a repository URL, it checks for architectural integrity, programming practices, and documentation. It reports scores on the academic scale, with zero being the worst and 4.0 being the best possible score. It flags offensive components and presents citations to buttress its arguments.

I'm particularly bothered by one offense against architectural integrity. It says that instantiation of objects inside a loop is inefficient and gives the lowest possible architectural score. The goal is to avoid instantiation of too many objects.

How on earth is a developer supposed to populate a list of objects?

Besides, this might have been a serious issue on the earliest versions of the JVM. But object creation now rivals C in speed, and newer generational garbage collectors are also big improvements.

Rather than rail against the injustice, I decided to create a small benchmark and see for myself.

I started with a simple Person class:

package cast;

import org.apache.commons.lang3.StringUtils;

import java.text.DateFormat;
import java.text.SimpleDateFormat;
import java.util.Date;

/**
 * Person
 * @author mduffy
 * @since 9/19/12 7:38 AM
 */
public class Person {
    private static final DateFormat BIRTH_DATE_FORMATTER;
    
    private final String firstName;
    private final String lastName;
    private final Date birthDate;

    static {
        BIRTH_DATE_FORMATTER = new SimpleDateFormat("yyyy-MMM-dd");
        BIRTH_DATE_FORMATTER.setLenient(false);
    }
    
    public Person(String firstName, String lastName, Date birthDate) {
        this.firstName = (StringUtils.isBlank(firstName) ? "" : firstName);
        this.lastName = (StringUtils.isBlank(lastName) ? "" : lastName);
        this.birthDate = ((birthDate == null) ? new Date() : new Date(birthDate.getTime()));
    }

    public String getFirstName() {
        return firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public Date getBirthDate() {
        return new Date(birthDate.getTime());
    }

    @Override
    public String toString() {
        final StringBuilder sb = new StringBuilder();
        sb.append("Person");
        sb.append("{firstName='").append(firstName).append('\'');
        sb.append(", lastName='").append(lastName).append('\'');
        sb.append(", birthDate=").append(BIRTH_DATE_FORMATTER.format(this.birthDate));
        sb.append('}');
        return sb.toString();
    }
}

I needed to keep statistics on wall time and heap used, so I wrote a Statistics class:

package cast;

import org.apache.commons.lang3.StringUtils;

import java.util.Collection;

/**
 * Statistics accumulates simple statistics for a given quantity "on the fly" - no array needed.
 * Resets back to zero when adding a value will overflow the sum of squares.
 * @author mduffy
 * @since 9/19/12 8:16 AM
 */
public class Statistics {
    private String quantityName;
    private int numValues;
    private double x;
    private double xsq;
    private double xmin;
    private double xmax;

    /**
     * Constructor
     */
    public Statistics() {
        this(null);
    }

    /**
     * Constructor
     * @param quantityName to describe the quantity (e.g. "heap size")
     */
    public Statistics(String quantityName) {
        this.quantityName = (StringUtils.isBlank(quantityName) ? "x" : quantityName);
        this.reset();
    }

    /**
     * Reset the object in the event of overflow by the sum of squares
     */
    public synchronized void reset() {
        this.numValues = 0;
        this.x = 0.0;
        this.xsq = 0.0;
        this.xmin = Double.MAX_VALUE;
        this.xmax = -Double.MAX_VALUE;
    }

    /**
     * Add a List of values
     * @param values to add to the statistics
     */
    public synchronized void addAll(Collection values) {
        for (Double value : values) {
            add(value);
        }
    }

    /**
     * Add an array of values
     * @param values to add to the statistics
     */
    public synchronized void allAll(double [] values) {
        for (double value : values) {
            add(value);
        }
    }
    
    /**
     * Add a value to current statistics
     * @param value to add for this quantity
     */
    public synchronized void add(double value) {
        double vsq = value*value;
        ++this.numValues;
        this.x += value;
        this.xsq += vsq; // TODO: how to detect overflow in Java?
        if (value < this.xmin) {
            this.xmin = value;
        }
        if (value > this.xmax) {
            this.xmax = value;
        }
    }

    /**
     * Get the current value of the mean or average
     * @return mean or average if one or more values have been added or zero for no values added
     */
    public synchronized double getMean() {
        double mean = 0.0;
        if (this.numValues > 0) {
            mean = this.x/this.numValues;
        }
        return mean;
    }

    /**
     * Get the current min value
     * @return current min value or Double.MAX_VALUE if no values added
     */
    public synchronized double getMin() {
        return this.xmin;
    }

    /**
     * Get the current max value
     * @return current max value or Double.MIN_VALUE if no values added
     */
    public synchronized double getMax() {
        return this.xmax;
    }

    /**
     * Get the current standard deviation
     * @return standard deviation for (N-1) dof or zero if one or fewer values added
     */
    public synchronized double getStdDev() {
        double stdDev = 0.0;
        if (this.numValues > 1) {
            stdDev = Math.sqrt((this.xsq-this.x*this.x/this.numValues)/(this.numValues-1));
        }
        return stdDev;
    }

    /**
     * Get the current number of values added
     * @return current number of values added or zero if overflow condition is encountered
     */
    public synchronized int getNumValues() {
        return this.numValues;
    }

    @Override
    public String toString() {
        final StringBuilder sb = new StringBuilder();
        sb.append("Statistics");
        sb.append("{quantityName='").append(quantityName).append('\'');
        sb.append(", numValues=").append(numValues);
        sb.append(", xmin=").append(xmin);
        sb.append(", mean=").append(this.getMean());
        sb.append(", std dev=").append(this.getStdDev());
        sb.append(", xmax=").append(xmax);
        sb.append('}');
        return sb.toString();
    }
}

Then I wrote a test method to exercise creating a million instances and compared the two idioms:

package cast;

import java.util.Date;
import java.util.LinkedList;
import java.util.List;
import java.util.Random;

/**
 * InstantiationInsideLoopTest attempts to answer the question: Is it bad to instantiate an object inside a loop
 * with a modern JVM?
 * @author mduffy
 * @since 9/19/12 7:35 AM
 */
public class InstantiationInsideLoopTest {
    private static final int DEFAULT_INSTANCES = 1000000;
    private static final int DEFAULT_REPETITIONS = 5;
    private static final String[] FIRST_NAMES = {
            "Abel", "Bob", "Charlie", "David", "Edward", "Fred",
    };
    private static final String[] LAST_NAMES = {
            "Smith", "Jones", "Carver", "Brown",
    };

    private Random random;
    private Statistics begHeapStats     = new Statistics("beg heap (MB)");
    private Statistics endHeapStats     = new Statistics("end heap (MB)");
    private Statistics wallTimeStats    = new Statistics("wall time (sec)");


    public static void main(String[] args) {
        int numRepetitions = ((args.length > 0) ? Integer.valueOf(args[0]) : DEFAULT_REPETITIONS);
        int numInstances = ((args.length > 1) ? Integer.valueOf(args[1]) : DEFAULT_INSTANCES);
        boolean isInside = (args.length > 2) && "inside".equalsIgnoreCase(args[2]);
        System.out.println(String.format("# reps: %d # instances: %d inside? %b", numRepetitions, numInstances, isInside));
        InstantiationInsideLoopTest tester = new InstantiationInsideLoopTest();
        if (isInside) {
            tester.instantiateInside(numInstances, numRepetitions);
        } else {
            tester.instantiateOutside(numInstances, numRepetitions);
        }
    }

    public InstantiationInsideLoopTest() {
        this(System.currentTimeMillis());
    }

    public InstantiationInsideLoopTest(long seed) {
        this.random = new Random(seed);
    }

    private void instantiateOutside(int numInstances, int numRepetitions) {
        this.begHeapStats = new Statistics("beg heap (MB)");
        this.endHeapStats = new Statistics("end heap (MB)");
        this.wallTimeStats = new Statistics("wall time (sec)");
        for (int i = 0; i < numRepetitions; ++i) {
            long begTime = System.currentTimeMillis();
            long begHeap = Runtime.getRuntime().freeMemory();
            List persons = new LinkedList();
            Person p;
            for (int j = 0; j < numInstances; ++j) {
                p = new Person(getFirstName(), getLastName(), getBirthDate());
                persons.add(p);
            }
            long endHeap = Runtime.getRuntime().freeMemory();
            long endTime = System.currentTimeMillis();
            this.begHeapStats.add(begHeap/1024L/1024L);
            this.endHeapStats.add(endHeap/1024L/1024L);
            this.wallTimeStats.add((endTime - begTime)/1000.0);
//          summarize("outside", i, begTime, endTime, maxHeap, begHeap, endHeap, persons);
        }
        System.out.println(this.begHeapStats);
        System.out.println(this.endHeapStats);
        System.out.println(this.wallTimeStats);
    }

    private void instantiateInside(int numInstances, int numRepetitions) {
        this.begHeapStats = new Statistics("beg heap (MB)");
        this.endHeapStats = new Statistics("end heap (MB)");
        this.wallTimeStats = new Statistics("wall time (sec)");
        for (int i = 0; i < numRepetitions; ++i) {
            long begTime = System.currentTimeMillis();
            long begHeap = Runtime.getRuntime().freeMemory();
            List persons = new LinkedList();
            for (int j = 0; j < numInstances; ++j) {
                Person p = new Person(getFirstName(), getLastName(), getBirthDate());
                persons.add(p);
            }
            long endHeap = Runtime.getRuntime().freeMemory();
            long endTime = System.currentTimeMillis();
            this.begHeapStats.add(begHeap/1024L/1024L);
            this.endHeapStats.add(endHeap/1024L/1024L);
            this.wallTimeStats.add((endTime-begTime)/1000.0);
//            summarize("inside", i, begTime, endTime, maxHeap, begHeap, endHeap, persons);
        }
        System.out.println(this.begHeapStats);
        System.out.println(this.endHeapStats);
        System.out.println(this.wallTimeStats);
    }

    private String getFirstName() {
        return FIRST_NAMES[this.random.nextInt(FIRST_NAMES.length)];
    }

    private String getLastName() {
        return LAST_NAMES[this.random.nextInt(LAST_NAMES.length)];
    }

    private Date getBirthDate() {
        return new Date(this.random.nextLong());
    }

    private void summarize(String testName, int repetition, long begTime, long endTime, long maxHeap, long begHeap, long endHeap, List objects) {
        System.out.println(String.format("test name    : %s", testName));
        System.out.println(String.format("repetition # : %d", repetition));
        System.out.println(String.format("list size    : %d", objects.size()));
        System.out.println(String.format("beg heap (MB): %d", begHeap / 1024L / 1024L));
        System.out.println(String.format("end heap (MB): %d", endHeap / 1024L / 1024L));
        System.out.println(String.format("max heap (MB): %d", maxHeap / 1024L / 1024L));
        System.out.println(String.format("wall time (s): %10.3f", (endTime - begTime) / 1000.0));
        for (int i = 0; i < 5; ++i) {
            System.out.println(objects.get(i));
        }
    }
}
I ran this test case on two different machines - one running Windows XP, another using Windows 7 - and both running Sun JVMs for Java 6. The results consistently said that the idiom recommended by the automated code review suite was incorrect. I understand the dangers of benchmarks like these, but it's comforting to attempt to be scientific rather than merely complain.
profile for duffymo at Stack Overflow, Q&A for professional and enthusiast programmers

Sunday, October 14, 2012

Race Day













I was nervous during the week leading up to the race. My last long run went very well. I did another six mile run the following Saturday, followed by an easy three mile jaunt the next day that included my first taste of gel packs, those doses of instant fuel that could keep me from hitting the wall at the end of the race. I followed the "don't introduce anything new on the day of the race" mantra and tried one. The vanilla flavor was delicious; it went down easily but took time to get out of the package; most importantly, it didn't upset my stomach.

It was a week to taper, according to my plan, but I ended up doing nothing. Yoga was cancelled due to Columbus Day. I didn't run at all, because my wife was in the midst of a cold for the entire week, as were my co-workers. Disease was all around me! I was concerned about waking up on race day with a whopper of a cold that would put me on the sidelines. So I went to bed early every night and hoped that my immune system would fight the good fight.

Race Day dawned bright and cold. Temperatures were below freezing overnight. What to wear? I decided to run in shorts and forego the tights. I had an Under Armor shirt with the half-marathon long-sleeved t-shirt over the top. I wore the same baseball cap that I'd done all my training in; no hat over my ears. I heeded the "no changes on race day" advice and ran in my minimals. I was determined to not give in to sneaker fear. My one concession to the cold was gloves. That turned out to be a very smart thought.

We drove into the city, parked in the convention center garage, and walked over to Bushnell Park. We had time, but there wouldn't be too much standing around. I was in line, right next to the "9 minute mile" sign, with only twenty minutes til race time. The crowd was shoulder to shoulder. Would the start be as viscous as the Manchester Road Race? It only took me two minutes to cross the starting line, and I was running comfortably right away. I hit the start button on my stopwatch and fell into my cadence - ninety right leg strikes per minute. The full and half marathon groups ran in one pack at the beginning. My biggest concern was taking a wrong turn and getting lost. But that turned out fine. There was a big sign at Main Street telling full marathons to turn left and half marathoners to turn right. After that, all I had to do was follow the crowd.

I didn't feel warm for the first mile or two, but everything was calm and relaxed. My breathing was smooth and comfortable. Nothing hurt. There was a Gatorade stop around mile two. I decided to take a cup at each stop to make sure that I didn't dehydrate. I didn't hurry. I'd stop, take the cup, gulp it down, and then head off. Drinking and running don't mix any more than drinking and driving do. I made sure that all my paper cups made it into a bin. The street was covered with crushed cups and spilled Gatorade. The volunteers would have a big cleanup job.

The training plan works. There were no official mile markers, but I got the idea that I was maintaining a nine-minute per mile pace without any strain. When I ran under the halfway arch the clock read 59 minutes. It was the first time I thought that I could actually break two hours. I noticed a woman wearing a yellow shirt that said "2:00 finish" on the back. She was the pace keeper: as long as I could stay with her I'd have a chance of meeting my time goal.

I was ahead of her when I entered Elizabeth Park, but some gentle slopes slowed me down more than I realized. I could see the Travelers tower ahead when I ran down Albany Ave past the eleven mile marker. The pace lady was within sight but ahead of me, surrounded by suffering acolytes who hoped for the reward of a two hour finish time. I had to decide: settle into a relaxed pace, call it a nice effort, and miss out on a two hour finish, or pick up my cadence and catch her. It was uncomfortable, but I decided on the latter course. I heard her say "You wanna bring it home?" as I passed her. I kept going until I couldn't hear her voice anymore. Less than two miles to go? That was all I could manage when I started this journey 16 months ago, but now it was easy. I pumped my arms and legs the whole way, under the Bushnell arch, into the chute, and under the finish arch. The clock read 1:59. My watch said 1:57. My average speed was a shade under nine minutes per mile, which was better than any training run I'd done.

I surprised my wife. I walked along like a veal calf to pick up a medal, a water bottle, and a bag with a banana and some other snacks. She sounded surprised when I called her on my cell phone. "Are you done?" she asked incredulously.

I enjoyed both the journey and the destination. It's so satisfying to say that I managed such a thing at my age. I thoroughly enjoyed the experience, so much so that I'm happily thinking about another run in November.

Most importantly, I've made myself into a runner. What other transformation can I undergo? How else might I remake myself again?

profile for duffymo at Stack Overflow, Q&A for professional and enthusiast programmers

Wednesday, October 3, 2012

Ready!














Last Saturday I had my last long run before the half marathon on 13-Oct. It was overcast and threatening rain when I hit the Airline Trail at noon, so I had it mostly to myself. There were a few hardy mountain bikers and a dog walker or two, but the traffic was greatly diminished.

The plan called for 12 miles, but I was ahead of schedule for the last two long runs. I wanted to have the confidence of knowing that I could complete the distance. I had a water bottle filled with ice-cold green Gatorade in hand. I planned to stop every 3-4 miles for 30 seconds to get a drink and stay relaxed. I wanted to simulate my plan for the race.

The first mile was a warm-up. I had a sweatshirt on that may have been overkill. By the time I stopped for my first drink I was loose and lathered up. The markers for the fourth mile went by quickly. I was at nine minute miles or better the whole way. I never saw the fifth or sixth mile markers, same as the week before. The trail had a gentle upward slope. I felt anxious as I ran: would there be a marker for the turnaround point at mile 3? I crossed a road that intersected the trail, ran to the next gate - and there it was! I was surprised when I checked my watch: I was still maintaining a nine-minute mile pace! I felt good. I had a long, relaxed drink and started down the slope towards my car.

I have to compliment the folks who put together the plan that I've used to prepare for the race. They know what they're doing. I felt strong on the way back. When I passed the ten mile mark, I had plenty left. I thought back to how I felt at the end of that personal best long run two weeks ago. I was greatly improved. I was slowing down and suffering a bit more when I passed the twelve mile mark. I thought back to how knackered I felt at the end of last Saturday's run. I was beyond that. The last two miles were difficult. But I felt like a running man when I finished in 2:15.

Now I'm tapering down for the next two weeks. The longest run will be five miles. I'll do some swimming, shorter runs, and yoga. I'm looking forward to that day. I just hope I can dodge catching a cold for the next two weeks.



profile for duffymo at Stack Overflow, Q&A for professional and enthusiast programmers