Saturday, February 14, 2009

Where Have All The Data Modelers Gone?





I noticed a curious thing at work recently. We have hundreds of instances of Oracle and SQL Server deployed on servers in our data center. Data warehouses that dwarf Home Depot archive many terabytes of information. There's an army of administrators keeping watch, ensuring that no SELECT goes unfulfilled. You can't throw a rock without hitting a Java or .NET developer, well-versed in SQL, chomping at the bit to write an application to deliver that information via the web to any and all clients. Everyone agrees that data are the crown jewels, the lifeblood of any modern business.

If this is true, where have all the data modelers gone?

I see very few people who are comfortable and conversant with concepts like normalization and dimensional modeling. Application developers who possess encyclopedic knowledge of SQL syntax, but lack sound relational design principles and best practices, are given free rein to create whatever tables satisfy the immediate needs of their current project. Extracts are taken, data is duplicated, modifications are made without communicating them back to the source system from whence they came. The number of servers and schemas multiplies. The disk storage required grows exponentially at an ever-increasing rate. Terabytes are maintained, but the unique data might fit on a modest USB key if it could be normalized and extracted. The database administrators don't have any passion for the discipline. Data warehouse designers, ignorant or dismissive of dimensional modeling ideas, create duplicates of source data that are little more than staging areas. There's no cleansing or de-duplication going on. Reports are generated from these staging tables, in spite of better advice.

How did we get here? Where did all that knowledge go?

When the COBOL dinosaurs walked the earth, and mainframes were the kings of business, developers were of one mind. The person who wrote the logic also handled the flat files for persistence and the green screens for the user interface. They were expected to know everything about the system.

Relational databases came along in the 1970s after Ted Codd published his seminal paper on the relational model. There was a land rush to create working implementations of the model. Where there once was a single dominant language for business logic and persistence, COBOL, now there were two: client and SQL. Client/server was all the rage.

Personal computing took processing power out of the data centers, with their raised floors and frigid temperatures, and put it out on desktops. Networks knitted those islands together, first using LANs inside companies and then the Internet over the whole world. Client/server fell out of favor. Fat clients gave way to thin. Two tiers became three; three became four or more. Object-oriented programming took over both the desktop and middle layers, relegating relational databases to the back room. First C++, then Java, and now C# became the brokers between the persistent data and the end users. Object-relational impedance mismatch became the order of the day.

I don't know if increased specialization and layering has put relational practitioners in a funk, but it looks to me like true expertise in this area is dying. I've been fortunate to work with someone lately who is both a well-read devotee of Ralph Kimball and experienced at standing up several successful data warehouses that use dimensional techniques. But he's a singleton; I don't know of anyone else in the enterprise who speaks the same language. Tools for entity-relationship modeling (e.g. ERWin) are buried in the DBA's toolbox, hidden by scripting and client tools (e.g. TOAD), like a specialized wrench reserved for out-of-the-ordinary tasks.

Contrast this with the state of the middle tier. There's a cacophony of voices whenever someone proposes an object design. Good UML tools, both commercial and free, help to make even the gnarliest whiteboard session intelligible and clean.

It has dawned on me that data are still the crown jewels of every business. Client and middle tier technologies continue to evolve rapidly, coming and going at a rapid pace. But the data lives on forever.

Data warehousing and OLAP were huge in the late 90s. Kimball and Inmon slugged it out for dominance. Articles and books were written; consultancies prospered; businesses gobbled up advice and software as quickly as their pocketbooks would allow.

Has the animal spirit gone out of this field? The furor has died down and moved elsewhere, but the need to manage information and deliver it in a timely way to customers is still with us. I don't know if the problem is "solved" so completely that it's become routine everywhere but at my current employer.

But from where I sit today, knowing relational databases and dimensional modeling very well looks like a good bet if you want to have a rare and valuable skill.

Sunday, February 1, 2009

How to use the same JNDI resource name on Tomcat and WebLogic




I figured out how to do something that has inspired a few questions on Java forums that I frequent. I write applications using Java 5 and Spring 2.5 and deploy them on Java EE app servers like Tomcat 5.5.26 and WebLogic 10.0. The problem is how to have a single configuration that can be deployed on both without requiring any changes.

Here's how I did it:

First, I added a JNDI resource to my web.xml file with resource name "jdbc/Foo".

Next I had to set up data sources in both my WebLogic project domain, using the admin console, and Tomcat. The mechanics for each are slightly different.

For Tomcat, I created a context.xml and put it in the META-INF directory of my web application. The JNDI name is the same as the in the web.xml: "jdbc/Foo".

When I go into the WebLogic admin console to create the JNDI data source, the key is to specify the JNDI name like this: "jdbc.Foo". Note the dot instead of the slash.

The last detail is the Spring configuration. I use the org.springframework.jndi.JndiObjectFactoryBean. The JNDI name that I give it is "java:comp/env/jdbc/Foo".

Once I build the WAR file, I can switch between Tomcat and WebLogic in IntelliJ at will. No changes needed.

Sunday, January 18, 2009

I'm Excited About 20-Jan-2009







I'm looking forward to next Tuesday.

It goes without saying that it'll be a historic day. I'll be at work. I wonder if we'll be able to see the speech live? I'm guessing that this will be a speech that will be quoted often in the future.

I think it's terrific that we have our first African-American president. But it's even better that he won for the right reasons: because he's smart, capable, inspiring, and will surround himself with good people.

Eight years ago we had a president-elect who was not thought to be so smart. The conventional thinking was that he would defer to others in his administration. He surrounded himself with people from an earlier era who would act as his "adult supervision." His supporters pointed to his faith, and he pledged to restore the "dignity" of the presidency. Some people considered that a high priority after the Clinton impeachment. Meetings would be run on time.

In hindsight, I don't think it worked out very well.

Today we have:

  • an economy in shambles;

  • a national debt over $10T and growing;

  • Osama bin Laden and Dr. al-Zawahiri still free;

  • two intractable wars;

  • more unrest in the Middle East;

  • a greater dependence than ever on oil that may be "past peak";

  • a ruined reserve currency;

  • a debt that increased by 50% over the last eight years;

  • 1 in 7 adults illiterate;


Mr. Obama will have his hands full.

So why am I excited?

I think this government is going to be rational, not faith-based. I hope it does a better job of understanding the world as it is without relying too heavily on pre-packaged ideology and dogma.

The appointment of Stephen Chu to head the Department of Energy is a great sign that science will be taken seriously. For as long as I can remember, the Energy department has meant nothing more than "big oil", a perennial candidate for budget cuts, and the whipping boy for every president who wanted "smaller government." I hope that Dr. Chu can change this, because we desperately need a change.

"Waterboarding is torture" - thank you, Eric Holder. This has been known by all civilized people since the Spanish Inquisition. It was considered a war crime when practiced during World War II. From what I've read about interrogation techniques, this kind of thing doesn't work because the prisoner will do or say anything to make it stop. I'm glad that John Yoo's legal opinions will no longer inform our policy.

Perhaps the Justice Department will be a less political place. Hiring and firing will be more of a meritocracy and less blatantly about ideology.

Let's hope Timothy Geithner will have fewer Wall Street buddies to dispense TARP funds to. I'd say that Henry Paulson's pleas for $700B haven't helped us.

I'm glad to see Bush, Cheney, and the rest of the administration leave Washington. Enough!

I am worried that things might have gone too far for anyone to reverse now. I'll be concerned if Obama tries to tell us that we can go on as we have. A painful change is inevitable here. I hope he tells us the truth: that the free lunch is over, that it's not sustainable for the rest of the world to finance our excess, that we have to produce and save and defer consumption in order to accumulate wealth, that living beyond our means on a credit card has to end, that money loaned to you from the bank cannot be accounted for the same way as income.

I'm optimistic about an end to magical thinking.

I'm confident that this smart young man will be able to make a difference.

I hope that he's kept safe.

Good-bye, Mr. Bush. Welcome, President Obama.

Saturday, January 3, 2009

When Did Ignorance Become A Good Thing?

I came across another blog today where a fellow software profession confessed that they don't have a degree in computer science.

I'll follow suit and confess that I don't, either. I do have degrees in mechanical engineering, two of them graduate work. I was working full-time as a professional the entire time after I earned my BSME degree. I completed nine out of twelve courses towards an MS degree in computer science from Rensselaer. I was sitting in a large lecture hall at the beginning of that last fall semester, listening to the requirements for a one-credit, year-long capstone course that would have had me pick a topic, write a paper, and present it in public at the end of the spring.

On that day I had been going to school every semester since I was five years old, but by the end of that class I felt as if my pilot light had been blown out. I didn't need to prove to myself that I could write a paper. Been there, done that. The motivation and energy needed to finish off the degree was gone. I withdrew from the class and didn't go back.

While we're in the confessional, I'll also admit that my mechanical engineering degree does not make me the best man for fixing what's wrong with your 2006 Toyota Tercel or your furnace. I take my car to others to be repaired, like lots of other people, because I have more discretionary income than time, tools, and mechanical talent. Mea maxima culpa!

Like my fellow developer and blogger, I've continued to learn since then. I have a home technical library that inspires awe ("Wow, you've got a lot of books!") or derision ("What a nerd! Look at all the money you've wasted on books!"), depending on your viewpoint. I've continued to drink from the fire hose of knowledge as deeply as I can.

The gist of his argument is certainly true: knowledge can be acquired by many avenues, formal and informal. Everyone can cite self-taught successes and clueless academics. "Did you know that Bill Gates flunked out of Harvard? What a bunch of losers they are!"

But the tone of his blog was disturbingly dismissive:
  1. "...the purported merits of having a degree is…just crap"
  2. "...This leads me to conclude that classroom learning is not particularly time-efficient."
  3. "...I think the primary benefit of learning in a college/university classroom setting is the fact that most people do not have the motivation or, in some cases, the ability to educate themselves."
  4. "...I would venture that the majority of these people 'perceive' that having a degree is better than not having one (the example set by Bill Gates notwithstanding)."
The blog ends with a prototypical story about an interview with someone looking for a developer's position: "...This person had an MBA...." (italics are mine). You can feel the indignation rising to the crescendo as the candidate's ignorance became more apparent: "...I eventually realized that this person could not tell me what a 'string' was. Yikes!"

Yikes, indeed.

I certainly agree that this candidate doesn't sound like a good fit for a developer job. But that single incident should hardly be used as a blanket condemnation of formal learning.

Having experienced both sides of it, I can say from personal experience that degrees are not crap. There are some topics that require a certain depth, rigor, structure, metered effort that can't be duplicated easily by one's own efforts. General math is one thing; tensor calculus is quite another. The learning that I do nowadays is different from what I did during my academic years. Perhaps it's age or the onset of adult ADD in this Internet, Twitter, YouTube world, but the quiet, sustained contemplation and concentration required to absorb truly complex material seems harder for me to come by.

I think it says something about computer science that the field can be penetrated so easily without any background. The talented amateur has become rare or extinct in medicine, physics, and engineering.

In his "Teach Yourself Programming In Ten Years", Peter Norvig says "...One of the best programmers I ever hired had only a High School degree; he's produced a lot of great software, has his own news group, and made enough in stock options to buy his own nightclub." His essay strikes the right balance, in my opinion. It cautions against 21 day experts while acknowledging different aspects of learning the field.

Could Larry Page and Sergey Brin have come up with Google without knowing about linear algebra, eigenvalues, and numerical methods? Could someone else who didn't go to Stanford have done the same thing? We'll never know, because they were enrolled in the computer science program at Palo Alto, and they invented Page Rank first.

But I don't think I'm going too far out on the limb when I say that it's unlikely that a self-taught, "Learn C# In 21 Days" programmer could manage it, even with the benefit of hindsight.

The anti-intellectualism tone of the blog disturbed me greatly.

Sunday, December 21, 2008

How To Measure Things















I made a bit of personal history this past Thursday. I got up early and got to the pool before work, as is my daily habit, and logged another 3,000 yard workout. I've been spending some time with "toys" these days: 3x200 free w/ pull buoy; 3x200 free w/ snorkel; 3x200 kick w/ fins; 3x200 free w/ fins; 3x200 IM on 4:00. I felt great when I was done, as usual. I usually do 2,000 yards in the morning, but I cranked it up to 3,000 for three days to meet a goal. I've logged 520,000 yards for 2008. That's 10,000 yards each and every week, or 2,000 yards for each and every working day. My previous best total was 440,000 yards back in 2006. I've been keeping track since 1994. My average over all that time is 340,000 yards per year.

This isn't an extraordinary total for a really good swimmer. I think high school athletes can put in 6,000 yards per day, and Olympic athletes will do my weekly total in a single day. But it's something special for me.

How did this happen? There are a few important factors:
  1. Good health. I've blogged about the bulging disk in my neck that ruined my 2007. It helps that I've had no injuries or sick time.
  2. Availability of a pool. I've been fortunate for my entire career, because I've always been able to find a pool nearby wherever I've worked. And they've generally been staffed with people who make the opening of the doors and the arrival of a lifeguard as dependable as the rising of the sun.
  3. Consistency. Woody Allen said "Eighty percent of success is showing up." That's certainly true here. I've always managed to weave swimming into the day in such a way that it's relatively transparent to my wife.
  4. Swimming with others. For the longest time my ritual was to swim a mile of freestyle at lunchtime and get out. I never swam competitively, so I didn't know about workouts or intervals. I managed to figure out how to do butterfly on my own so I could swim the individual medley, but that was my only concession to trying anything new. In 2000 I finally worked up the courage to try Masters swimming. I was nervous. I didn't know if I could finish a longer workout, or keep up with the intensity. I found out that swimming with others makes all the difference in the world. The people I swim with are now among my best friends. The social interaction and better workouts have contributed greatly to my enjoyment of swimming.
  5. Measurement. My tracking system is simple: I have an Excel spreadsheet that acts as the database and does all the lovely plotting for me. I roll everything up by the week. I have "best month" totals. I can tell every week exactly how I'm doing. I derive a great deal of motivation knowing that I'm close to a goal of some kind. Sometimes that's the one thing that gets me out of bed early in the morning.
I'm already thinking about how I can top it. Just 10% more each day will do the trick. One more 200 to make my habitual daily total 2200 yards. They add up. I find that the snorkel is helping my lung capacity, so I'll work that in every day. And I'd love to learn how to tolerate butterfly for longer stretches. I can endure 100 yards of fly in relative comfort now. I'd like to work that up to 200 yards at a time in 2009.

I've been thinking a lot about that last bit, measurement. Tracking makes a big difference. I've been working on a star schema for a relational database to house my swimming data. I've learned more about data warehousing from that simple problem than I ever knew before. If it ever reaches a more finished state I may blog about that as well.

But it makes me wonder: How can I use measurement, metrics, and tracking to motivate the other important things in my life? My last blog talked about my frustration over lack of progress in learning Lisp and other things. How can I start tracking my progress through "Practical Common Lisp" to ensure that I end up at the desired destination?

What about measuring software development progress? If I could figure out how to do that, I'd be a wealthy man. Everyone in the software development industry laments the sorry state we're in. A large percentage of projects are never completed or fail to deliver the desired functionality. If we only had enough management, oversight, governance, and measurement!

It's not that those things haven't been tried before. You can't throw a rock without hitting a methodology that promises better results than the scorned "waterfall". Project managers have been around for as long as there have been projects. (You can see them in Egyptian hieroglyphs, holding Gantt charts on papyrus.) Analogies abound: "Software development is like building a building" - hence the love of the title "architect". "Software is like manufacturing" - hence the term "widgets".

Where I work, the conclusion is that if some hasn't worked in the past, more in the future will make things better.

Why hasn't any of this worked? What's so different about software development?

I think my swim tracking has a couple of key features that software lacks.

First, it has a well-defined time scale built in. I know exactly how long it takes me to swim those 2000 yards every morning. I have a metronome in my head for each lap, each 200 yards. (Unfortunately, it's a very slow timepiece.) The week closes every Sunday, and the workouts pile up in a steady drip, drip, drip as the year goes on. I can easily extrapolate to the goal at the end of the year: "Still averaging 10K per week, with two in the bank to anticipate that week's vacation in the summertime."

Second, the metrics are well defined. There's nothing fuzzy to account for, like "quality" or "maintainability" or any other "-ility". It's distance and time and frequency.

Third, the metric are additive. I swim in a pool that's 25 yards long. Laps roll up into sets, sets into workouts, workouts into weekly, monthly, and yearly totals. If I look at the plot of yards aggregated at the end of the month I can clearly see where I'm leading or lagging. I got up early to do 3,000 yard workouts last week because I wanted to cross the 520K line without having to worry about winter weather keeping me out of the water.

Fourth, it's just one individual. I'd imagine that it's harder for a coach to keep track of an entire team. There's an art to cultivating each individual to achieve their best and orchestrating a team to an end of year peak for a big meet.

Manufacturing has employed statistics to drive continuous improvement since Deming and Shewhart showed us out. There are any number of Six Sigma black belts running around applying these techniques.

So why does software development resist such improvements? I believe there's something different about it. Like Knuth said, there's an art to it. The social aspects of doing it in large, far-flung groups don't lend themselves to statistical measure. We've all heard about Dilbert-esque measures that reward those who find bugs ("Look at that! Found another one!"). We've griped about the scourge of unintended consequences caused by bad measurements. Measure success and progress by KLOC or number of classes checked in and you're likely to see your code base expand in size.

I'm going to spend some time seeing if I can extend my success at measurement in swimming to other things. If I make any progress, I'm sure I'll post it here.

Saturday, December 13, 2008

Overcoming Writer's Block

I've been suffering from terrible writer's block, as evidenced by this blog. It's not that there's nothing to say. Sometimes there's TOO MUCH to talk about. I'm stymied by a kind of shyness: "Why would anyone be interested in what I have to say?" I'm self-conscious about putting my opinions in public. The Web is wonderful, but I worry about how it breaks down our privacy.

I downloaded TiddlerWiki today, just to try it. It's supposed to have some nice features for linking and searching that might be useful.

I'm thinking about two applications for it. First, it'd be a great way to centralize little tidbits about software development (e.g., the stuff I have for setting up a MySQL database buried in my daily diary). Second, it might be a web-based alternative to my personal journal.

One thing I like about it is that I've downloaded an empty.html file to my hard drive. It's in my control. I'm still wary about putting stuff out there, "in the cloud", living on someone else's servers.

I also looked at a site called GoalsOnTrack.com. It's a nicely conceived and implemented site for establishing and tracking goals: simple, graphical, and elegant. Why didn't I write that? I signed up for a username and password, but first I read the terms of use. It was a real eye-opener. It's free now, but they actually have extensive language about billing and monthly charges and cancellation fees. The currency amount just happens to be $0.0, but it'd be easy to change. If the site is free, why add all that unless you plan to monetize it as soon as you reach a threshold number of users?

There's also language that says your data no longer belongs to you. The people who own the site can do whatever they bloody well want with it once you upload it. Woe be to you if you happen to have a goal of working through all the positions listed in the Kama Sutra by your next birthday! It could be dicey if your partner happens to be a co-worker or someone else who isn't your wife.

Anyway, both technologies are right up my alley. I've been keeping an electronic journal in Word since 1994, and I've been tracking my swimming yardage in an Excel spreadsheet since 1996. I've been looking for ways to improve both. The Word docs are nice, but I can't search them very easily. I thought about using Lucene to index and search the documents, but then I learned that my swim tracking got in the way. I started by embedding a spreadsheet into the Word doc at the start of every month. The POI library that I used to read the Word docs blew up over the Excel spreadsheets. So I had to move the Excel data out of the Word documents and into something better.

I became interested in data warehousing and dimensional modeling. I'm designing a data warehouse for my swimming data now. It's a surprisingly good demonstration for the problems of data warehouses and ETL. I'm guilty of "premature roll-up" (don't tell my wife). I aggregated my data by week, and now I'll have to work really hard to get back the finer grained daily results.

GoalsOnTrack seems like a great way to track all the other things that I need to get done. I've been pretty good about doing yoga on my own since I started taking a weekly class fifteen months ago, but actually seeing the hours spent in downward dog accumulate would be a great motivator. I'd love to brush up on the mathematics I've forgotten and learn the news things I need to take a new direction at work, but I'm a lazy man who tends to watch the Boston Celtics on television too much. Having a measure, like my swimming yardage totals, seems to motivate me.

How much tracking does one person need? After a while, the tracking activity consumes a great deal of your time. How anal.

Bloggers like Seth Godin and Jeff Atwood are my latest heroes. I don't know how they manage to be so insightful. Seth Godin posts every day. That old adage about writers having to write every day seems to be true. If I'm going to find a voice, I'll just have to start getting used to the idea of writing every day.

Sunday, September 21, 2008

Financial Meltdown - Unbelieveable







What a week it's been.

Is everyone else as incredulous as I am about the week's news? The US government, long the champion of free markets and deregulation being best for everyone, led by a Republican administration that styles itself as the champions of small government, has bailed out Bear Sterns, nationalized the mortgage market by taking over Fannie Mae and Freddie Mac, ponied up $89B to shore up AIG, and then promised to take on all the bad assets that were too radioactive for a rational investor to touch to the tune of $700B?

Each move was supposed to calm the markets and restore confidence, but the tonic never lasted more than a day.

I thought that Congress was the branch of government that controlled the purse strings. Now Henry Paulson and Ben Bernanke, neither of whom is an elected official, seem to be calling all the shots. Is this constitutional? Why aren't Congress and the Supreme Court weighing in on this?

George W. Bush must be hiding in a bunker, riding his mountain bike, or clearing brush somewhere. He's not much in evidence. I'll bet he sits silently in any meetings he's actually invited to. He's certainly doesn't appear to be a main player. Is this what degrees from Yale and Harvard get you?

Glass-Steagall was repealed in 1999 by Phil Gramm et. al, during the Clinton administration. Sandy Weill and his cohorts said we didn't need it anymore, so the wall between commercial and investment banks came down. We were so much smarter than we were in the 20's.

Eight years of Republican leadership have brought us to this. They actually controlled both the House and Senate for six of those eight years, so there's no one else to blame.

The Democrats have not distinguished themselves, either. Has there ever been a less effective opposition? I've lost count of all the times when they rolled over and died in the name of bipartisanship. From where I sit, there's not much difference between the parties. Democrats might be tax and spend, but Republicans are cut taxes and still spend. We'll have the largest deficits and overall debt in history at the end of this Republican administration. Our children and grandchildren will be left to survey the wreckage and curse their parents for what they've allowed to happen.

Henry Paulson used to be the chairman of Goldman Sachs. It feels like an inside job. The fix is in, and he can bail out all his cronies from Wall Street.

We can come up with three-quarters of a trillion dollars to bail out people who have been collecting bonuses that are more money than I'll make in a lifetime, but anything else is "fiscally irresponsible" or "liberal".

And all we can talk about during election season is Sarah Palin's hair, lipstick, and who's more of a Christian.

I simply can't believe it.