Sunday, August 1, 2010

Python and PyCharm



I bought a book entitled "Core Python Programming" by Wesley J. Chun at the urging of John D. Cook. It's a wonderful book, but I stopped reading about three-quarters of the way through. I wanted very much to learn Python, but I didn't have a motivating problem. I was also lacking a good IDE. As a Java developer, I'm used to having IntelliJ by Jetbrains on hand at all times. I think it's the best IDE there is. I buy a personal license every year, because I don't want to work without it.

I've been trying to do more coding on my own time lately, because my architecture day job doesn't afford me any opportunity to write code. (We draw UML diagrams, write documentation, and act as a go-between for the business and the developers.)

I've been aware of Peter Norvig's brilliant spelling corrector in Python for a while. He makes magic happen in just 21 lines of Python 2.5 code. It inspired a cottage industry of efforts to match his functionality and succinctness in other languages, including Java and Groovy.

This weekend I thought I'd revisit his spelling checker and see if I could reproduce it in Java. I wasn't concerned with minimizing lines of code. My goal was to maximize my understanding.

The shell of the code is simplicity itself, but when I got to the heart of the matter I didn't understand the Python idiom well enough to see my way through in Java, so I got my "Core Python Programming" off the shelf and started trying to piece things together. Running the code in a debugger would help. What tools could I use?

That's when Jetbrains came to my rescue again: I downloaded the latest version of PyCharm, their new Python IDE that's now in beta. It offers the same wonderful feel that I've enjoyed in IntelliJ for years.

It was easy to create a new project and add Norvig's code and seed file. I didn't know how to run a module in the console - that's how green I am - but it didn't take long to figure out how to import. Then there was the problem of command line arguments. I knew how to enter them in Java - how to duplicate the trick in Python? Thank god for Google; the answer was soon at hand.

The script took a very long time to run when I tried it the first time. What was taking so long? The debugger clued me in: the first command line argument was the full path to the script being executed, which drove the poor spelling corrector crazy. How to avoid processing the first argument? With Python, the answer is easy. You change this:


if __name__ == "__main__":
for arg in sys.argv:
print arg, correct(arg)


to this:


if __name__ == "__main__":
for arg in sys.argv[1:]:
print arg, correct(arg)


It felt great to work through some simple difficulties that are challenging to a newbie like me. I'd love to develop a comfort level with Python sufficient to start taking advantage of its terrific scientific programming libraries like NumPy and SciPy.

I'm going to continue working to port the spelling corrector over to Java. It's a terrific application of Bayesian statistics.

But I hope that having a world-class IDE at my disposal will re-inspire my Python efforts.

No comments: