I received the certificate with distinction for Coursera Data Analysis Using R tonight. I know it shouldn't matter, but it does to me. I want to hold myself accountable and keep pressing with these courses.
Data Analysis has completed its second week. I'm in the midst of the first data analysis assignment. I have more work to do on it, but I think it's going well. I have a plan of attack that I'm following. I think the writeup portion plays to my strength. I'm slowly becoming more comfortable with R. I need to be reading more statistics books to figure out how to attack problems better.
Toastmasters is moving along, too. I finally completed the Competent Leader designation. I also finished the requirements for Advanced Communicator Silver designation. Just Gold to go, then two more leadership tracks to become a Distinguished Toastmaster. I don't know if I'll manage the leadership portion, but the speaking track is well within reach.
But wait - there's more.
And now there's vert.x, the non-blocking IO framework for Java.
I just finished my second Coursera course. It's hard to believe that "Data Analysis Using R" started just four weeks ago.
This wasn't a terribly difficult course, but lecture time added up. The assignments weren't hard conceptually, but I found myself struggling with the API and the docs. "How do I do that?" was a common question. It was easy to think how I'd do something in a language that I knew better, like Java or Python, but I wasn't always able to conjure up the R equivalent at will. I had to do small experiments on the fly to figure out how to make the language do my bidding.
The third programming assignment set was time-consuming. I was behind the eight ball because I was out of town at a family wedding the weekend before it was due. Thankfully we were given an allotment of late days that we could apply as needed. I used up three of them after returning from MN so I could get the assignment in late without penalty.
Debugging in the R environment is crude, reminiscent of the gbd command line debugger that comes with Java. It's a comedown for a person who's used to using the best IDEs in the world to work with Java and Python. I started using the R plugin for IntelliJ for the fourth assignment. I hope they keep expanding and improving it.
This one was a sprint. The next one, "Data Analysis", will emphasize problems that R is used for, diving deeper into regression and analysis. I'm looking forward to getting back in touch with my mathematics roots. It begins next Monday.
I start the next step with Coursera tonight: It's opening night for "Computing For Data Analysis" from Roger Peng at Johns Hopkins University. It's a four week introduction to using R that should be good. He blogs at Simply Statistics, which looks like it'll be a good resource for stretching my brain.
I'm running on my Windows 7 desktop. I've downloaded the latest version of R 3.0.1. There's an IDE called Tinn-R that might be okay. I'm sure it won't replace IntelliJ from JetBrains as the world's greatest IDE. Until those brilliant Russians come up with an R environment for me I'll make do.
My friend Steve Roach pointed out a port of R that runs on the JVM called Renjin. I think this statement is surprising:
We built Renjin, a new interpreter for the JVM because we wanted the beauty, the flexibility, and power of R with the performance of the Java Virtual Machine.
My first thought was that it'll be hard to beat LAPACK in C or Fortran. But perhaps a version that leverages parallelism could tip the balance.
So this will be sucking up some of my time and energy for the next four weeks. I hope the lessons diffuse into my brain quickly.
This has been a difficult year for me physically (more about that another time), but I'm having a great year reading. I've been on a tear lately, thanks to my local library and my oldest daughter.
My dog is pretty smart. He knows Saturdays are different from any other day of the week. He follows me around the house, tail wagging, from the moment we get up. He comes running if I make a move towards my car keys on the table in the mud room: "Are you going out? In the car? Will you take me with you?" I leave him behind if conditions are too hot or cold to leave him unattended in the car, but on mild days I'm happy to take him with me. We go to the bank, the post office, the barber shop, Dunkin Donuts, or the town library. It amazes me to see how often I'll find something good.
I started this recent tear with "Going Clear" by Lawrence Wright. It's the history of Scientology, from L. Ron Hubbard's World War II record through to the present day. I didn't know the details before. I found them educational and amusing.
I enjoyed it so much that I picked up Lawrence Wright's "The Looming Towers". It's the riveting story of al Qaeda from Sayyid Qutb in the prisons of Egypt to the crashing of planes into the World Trade Center on 2001-Sep-11. The references to Ali Soufan, the FBI agent who investigated the bombing of the U.S.S. Cole and interrogated captured al Qaeda, led me to "The Black Banners". This is a wonderfully written, but ultimately sad and frustrating book. You know how it ends. You can see how it might have been possible to connect the dots beforehand if the right people had been privy to certain facts, perhaps preventing disaster. Ali Soufan presents evidence for something that I suspected was true: harsh interrogation methods don't work. Zero Dark Thirty would lead us to believe that waterboarding turned up the information that led to Osama bin Laden's compound in Pakistan, but all the actionable intelligence the U.S. got from captured al Qaeda came from Ali and traditional interrogation methods.
After all this serious stuff I was ready for some fun. I heard Gillian Flynn on NPR's "Wait Wait, Don't Tell Me" on the way to the library one day. She was so smart and funny when talking about her book "Gone Girl" that I had to read it. One of the panelists said "After reading this book, I think you're one of those people who could murder someone and get away with it."
The writing device was unusual. The book opens with a young husband and wife on the morning of their fifth wedding anniversary. They argue before the man leaves for work. When he comes home, he finds the front door open, the cat sitting on stoop, the house torn up, and his wife gone. What happened to her? Chapters alternate from the husband's point of view in the present to the wife's voice in flashback. Things get nuttier with each turn of the page. It was most entertaining!
My oldest daughter recommended Ellen Ullman's "By Blood". It's set in San Francisco in the 1970s. An academic who's struggling to finish a project decides to rent an office in a seedy part of town. He finds out that it's next door to a psychiatrist's office. He starts listening in on a young woman's weekly session and finds himself identifying closely with her. He becomes secretly involved in her thread. Terrific writing!
Wait, there's more. My oldest daughter also suggested that I take a look at a piece of non-fiction by Cheryl Strayed: "Wild". It's the true story of a 22-year-old woman who loses her not-even-fifty single mother to cancer in 1995. Her family and marriage fall apart over the next four years. When she hits rock bottom she decides to hike the Pacific Crest Trail from Mexico to Canada. Think Appalachian Trail on the West Coast, except at 10,000 feet along the peaks of the Sierra Nevada mountains. It made me wish I was a hiker. I thought the author was incredibly brave to be so forthright and honest.
Amazon should just garnish my wages. I love the instant gratification of Kindle. When I finished "Wild" I couldn't wait to get back to the library. I downloaded Colum McCann's "Transatlantic". I'm only a little way into it, but it's a series of stories that all center on Ireland. The writing is wonderful.
I haven't been writing much lately - more on that another time. My last post talked about my first massive open-source on-line course in statistics at Udacity. It went so well, and I enjoyed it so much, but I still hadn't committed myself to signing up and getting a certificate of completion. I wanted to see what that was like, so in May I signed up for Introduction to Data Science at Coursera. It meant eight weeks of lectures and assignments in a new field.
It's been a long time since I was in school. I finished my last formal degree 21 years ago. I was marching towards yet another one in 2000. I was a course or two and a capstone project away from completing a Masters in computer science when the pilot light inside me blew out. I remember it like it was yesterday: I was sitting in a large lecture hall, listening to a professor drone on about the year long capstone project. I realized that I couldn't do it anymore. The motivation was gone. I stood up, marched over to the bursar's office, withdrew from the course, and never went back.
I've continued to read and dabble on my own, but I haven't done anything formal since then. All my education was obtained the traditional way: take a class in a classroom, do assignments, pass tests, get a grade. I realized that I needed to add more structure to my efforts, but I couldn't go back to the way I've always done it. The rise of Khan academy and on-line courses was perfect for me. I wasn't sure if a new style would suit me, but I was anxious to try. Statistics at Udacity taught me that I could do it. Now it was time to try it out for real at Coursera.
I'm impressed with how Daphne Koller and Andrew Ng are running little experiments on-line to learn about this new educational idiom. It's a challenging problem: How do you handle tens or hundreds of thousands of students in a single course? Lectures, grades, assignments - everything has to be re-thought with this in mind. There were times when mid-course tweaks were added.
The material and assignments were varied and well done. I love Python, the language of choice for all the programming assignments. I have PyCharm from JetBrains. They make the best programming tools on the planet, so my local environment was a pleasure. The grading meant immediate feedback - perfect for an impatient American like me. The variety was wonderful - Python, an online Map Reduce, Hadoop on Amazon Web Services, and data analysis at Kaggle.com. I thought the online materials were very good. Bill Howe did a fine job with the lectures. I could have taken more advantage of the course site and forums, but there are only so many hours in the day.
The course finished during the first week of July. It took a while to sort out all those students, but I finally got my certificate, with distinction, the other day.
I need some time to work on some personal projects, but I've already got my next one lined up. The data analysis using R course that I had my eye on is being offered again in September. I can't wait!
I've always prided myself on being a lifelong learner. I've been watching the rise of massive open-source on-line courses with great interest and curiosity. All of my education was done the traditional way: sit in a classroom with a lecturer and other students on a fixed schedule, do homework, take tests, let material diffuse into your brain over a week or a semester and hope it sticks.
I've never taken a class on line. How would it feel? Could it be as effective as the traditional approach?
I signed up and started with the best of intentions, but then Hurricane Irene knocked our power out for ten days and put me behind the eight ball. No Internet; no computer; no lectures.
My interest in statistics has been growing over the last few years. I've tried to better understand the Bayes approach - what it means and how it differs from the frequentist view that I've been exposed to. I've read Doing Bayesian Analysis Using R and BUGS by John K. Kruschke. Don't let the adorable puppies on the jacket fool you: this is a terrific, well-written book. I've got blog posts describing other books about Bayes that have caught my attention.
But I've still never taken a basic statistics course. I saw that Sebastian Thrun, one of the AI class instructors, was offering intro statistics at Udacity. I liked the lectures I saw him give for the AI class, so I thought I'd give it a go. I started just after Thanksgiving, with the goal of finishing before the end of the year.
The key for me is to make regular, concentrated effort, track my progress, and make sure that I avoid long gaps between sessions. I set up an Excel spreadsheet to record the date and units I covered. It was the same approach that got me through my first half marathon: plan the work, work the plan. It made it easy to see when I had a few days without getting another dose of learning.
I didn't meet my time goal of finishing before the end of 2012, but I didn't miss it by much. More importantly, I got through the entire course - every lecture, every assignment. The programming assignments were in Python, which I loved. I have the latest version of PyCharm - the Python IDE from JetBrains, makers of the best programming tools on the planet. I have NumPy and SciPy, two terrific libraries for scientific computing and numerical methods. It made programming a pleasure.
Most importantly, I proved to myself that I can take good advantage of all the courses on-line: MIT, Stanford, Coursera, Udacity, Apple U and others.
We had a winter storm for the ages here last month. January had been relatively mild, with little snow and below-average degree day totals. I ran outside on the road every weekend. My fitness was good; I felt strong.
This storm dropped three feet of snow in my yard. I went out on Friday night and blew 4" of snow off the driveway at 9 pm. When I went out the next morning the snow spilled over the top of my Honda snow thrower. It measures about 2' from the ground to the gas cap. When I went out the third time it spilled over the top again! The news said the storm cut a swath up the Connecticut River and left 30-36" of snow in its wake. I hit the snow jackpot.
I know there's no link between being cold and falling ill, but I was wet and chilled to the bone after each pass with the snow thrower. I felt fine that Friday when the snow started. By Sunday night my sinuses were full. On Monday it descended into my lungs. The coughing wouldn't stop. Rather than feel miserable and infect my co-workers, I decided to stay close to home. I had my work laptop with me, so I could have said I was "working from home." But I didn't want to feel guilty if the need to lie down and take a nap came over me, so I called in sick for a few days to beat it once and for all.
The funny thing is that I didn't take that nap. I've got a backlog of projects that I'm interested in finishing. I'm a little embarrassed about how long some of them have remained on the list, without any progress being made. One of them involved the electronic journal that I've kept for the last 19 years and counting. There's a folder for every year, a Word doc for every month, and an entry of one or more pages for every day that I decided to blather on about myself. It predates the coming of the World Wide Web; I started doing it on the first PC that I ever bought.
So I've got lots of stuff locked up inside. I found myself wondering "When did such and such happen? When did I last mention so and so?", but I didn't have any way to search. Then came Lucene, the Java based search engine from Apache. I downloaded the latest version and set about creating an index for my journal. Reading and parsing the Word documents was difficult. I used the Apache POI library, because I started with a Word 97 template; docx didn't come along until much later. I didn't like the API or documentation much, but Google found a terrific link that got me off the dime.
I fell into a nice rhythm: code, test, check in, rinse, repeat. I use Git as my local repository and a GitHub account as my master. There were problems and obstacles to overcome, but I persisted and found my way through all of them. It was a satisfying feeling when I created an index and searched it for a few terms that I knew the answer to. When I typed in "Celine", my youngest sister's name, the first entry that came back was a one-sentence entry that must have been rushed. It puzzled me at first, but I think the frequency of her name was high because the entry was so short. Fortunately her wedding day was high on the list, too. She's mentioned often on that day, but it's a longer entry so the word frequency of her name is smaller. I'll have to dig into the internals to see if I can better understand and optimize my searches.
I checked the code into GitHub; there's read-only access granted to the curious at git://github.com/duffymo/diary-index.git.
I plan to put Apache Solr on top of my index so I can have a lovely web interface. I'd also like to leverage either a timed service or a Java 7 file watcher to update my index on a schedule or whenever I make a new entry. I'm also considering abandoning Word and keeping my diary in TeX. Keeping all my thoughts in plain text will insulate me from the whims of format changes in Word...and I love TeX. (I typeset my dissertation myself using LaTeX.) PDFs can be beautiful.
I felt healthy again by the time I went back to work. It was also a reminder of how much fun it is to fall into a long, sustained coding trance and produce something that's useful and beautiful at the end.
I'm onto the next project on my To-Do list. More to come soon.