Wednesday, March 6, 2013

Indexing My Diary

We had a winter storm for the ages here last month. January had been relatively mild, with little snow and below-average degree day totals. I ran outside on the road every weekend. My fitness was good; I felt strong.

This storm dropped three feet of snow in my yard. I went out on Friday night and blew 4" of snow off the driveway at 9 pm. When I went out the next morning the snow spilled over the top of my Honda snow thrower. It measures about 2' from the ground to the gas cap. When I went out the third time it spilled over the top again! The news said the storm cut a swath up the Connecticut River and left 30-36" of snow in its wake. I hit the snow jackpot.

I know there's no link between being cold and falling ill, but I was wet and chilled to the bone after each pass with the snow thrower. I felt fine that Friday when the snow started. By Sunday night my sinuses were full. On Monday it descended into my lungs. The coughing wouldn't stop. Rather than feel miserable and infect my co-workers, I decided to stay close to home. I had my work laptop with me, so I could have said I was "working from home." But I didn't want to feel guilty if the need to lie down and take a nap came over me, so I called in sick for a few days to beat it once and for all.

The funny thing is that I didn't take that nap. I've got a backlog of projects that I'm interested in finishing. I'm a little embarrassed about how long some of them have remained on the list, without any progress being made. One of them involved the electronic journal that I've kept for the last 19 years and counting. There's a folder for every year, a Word doc for every month, and an entry of one or more pages for every day that I decided to blather on about myself. It predates the coming of the World Wide Web; I started doing it on the first PC that I ever bought.

So I've got lots of stuff locked up inside. I found myself wondering "When did such and such happen? When did I last mention so and so?", but I didn't have any way to search. Then came Lucene, the Java based search engine from Apache. I downloaded the latest version and set about creating an index for my journal. Reading and parsing the Word documents was difficult. I used the Apache POI library, because I started with a Word 97 template; docx didn't come along until much later. I didn't like the API or documentation much, but Google found a terrific link that got me off the dime.

I fell into a nice rhythm: code, test, check in, rinse, repeat. I use Git as my local repository and a GitHub account as my master. There were problems and obstacles to overcome, but I persisted and found my way through all of them. It was a satisfying feeling when I created an index and searched it for a few terms that I knew the answer to. When I typed in "Celine", my youngest sister's name, the first entry that came back was a one-sentence entry that must have been rushed. It puzzled me at first, but I think the frequency of her name was high because the entry was so short. Fortunately her wedding day was high on the list, too. She's mentioned often on that day, but it's a longer entry so the word frequency of her name is smaller. I'll have to dig into the internals to see if I can better understand and optimize my searches.

I checked the code into GitHub; there's read-only access granted to the curious at git://

I plan to put Apache Solr on top of my index so I can have a lovely web interface. I'd also like to leverage either a timed service or a Java 7 file watcher to update my index on a schedule or whenever I make a new entry. I'm also considering abandoning Word and keeping my diary in TeX. Keeping all my thoughts in plain text will insulate me from the whims of format changes in Word...and I love TeX. (I typeset my dissertation myself using LaTeX.) PDFs can be beautiful.

I felt healthy again by the time I went back to work. It was also a reminder of how much fun it is to fall into a long, sustained coding trance and produce something that's useful and beautiful at the end.

I'm onto the next project on my To-Do list. More to come soon.

profile for duffymo at Stack Overflow, Q&A for professional and enthusiast programmers

No comments: