Interview: Tom Copeland of PMD & Rubyforge

This is the first in a series of interviews we're making available to the CodeSnipers community. We have been working to track down people who we thought had something valuable to say about the software development community, tools, practices, or direction. Some of the names you will recognize immediately, others you've probably never heard of, but all of them have made an impact in one way or another. Without further delay... our first victim... er... is Tom Copeland of the PMD project.

Some of our community may be familiar with the Java tool PMD, but many are not. Could you tell us a bit about PMD and your role in the project?

Sure! PMD is a utility for finding various "opportunities for improvement" in Java source code. It uses static analysis, meaning that it parses and analyses the source code without actually running the program, to find unused code, unnecessary object creation, and bad practices. You can run it using Ant/Maven/command line/various IDEs and generate text, HTML or XML reports.

Also, you can easily write your own rules to find problems specific to your environment. You can write these rules in either Java or XPath. For example, you could find all the methods in a source file with names less than 3 characters long like this:

//MethodDeclarator[string-length(@Image) < 3]

There are lots of examples on the PMD web site and in "PMD Applied"... but more on that later.

David Dixon-Peugh and I started PMD back in the summer of 2002, and I've been more or less acting as lead developer/coordinator for the past two years or so.

What's the point of PMD? Does it take the place of Unit Testing?

Not at all. PMD is an adjunct to unit testing - it can find all sorts of problems that wouldn't cause a unit test failure, but are still slowing down the code. For example, this method would probably pass a unit test:

int add(int a, int b) {
 List list = new ArrayList();
 return a + b;
}

but it obviously wastes memory since it instantiates a new (and never used) ArrayList object.

Just as important is the cognitive dissonance that seeing the above code brings. Perhaps that list was at one time part of a cache or some such - but anyone seeing that code will look twice at it and wonder what all that's about. And that person may be less likely to refactor or fix other problems in that class because, you know, "huh, there's some weird stuff happening here, I don't want to dig into it right now".

So anyhow, PMD helps clean up the code, unit testing helps ensure correctness; they're complimentary.

What do you think is the most useful or powerful PMD check?

Probably my favorite is UnusedLocal - it find problems that are easy to fix (by simply deleting the variable) and most people agree that unused code is bad.

I also like SimplifyBooleanReturns which suggests that this code:

boolean isBig(int x) {
 if (x > 2) {
  return true;
 } else {
  return false;
 }
}

be replaced with this:

boolean isBig(int x) {
 return x > 2;
}

I like this change because it replaces five lines of code with one code - and is (I think) more readable.

Some people talk about "best practices" and then rarely use them. How often do you run PMD on PMD? What is the most egregious issue you've encountered there?

Occasionally. I've got a ruleset of my favorites picked out, and once in a while it'll pick up something that I've missed.

As far as egregious issues go - one challenge is that PMD's source code base includes a JavaCC generated parser, which contains dead code and various other problems. I won't fix this code since it's generated and I regenerate it every time I make a grammar change. The obvious fix would be to hack around JavaCC itself, fix the problems there, and submit patches back to the JavaCC maintainers. I plan on doing that in my copious spare time (tm).

Some of those problems in JavaCC might be because I'm using JavaCC 3.2 vs the newer v4.0 - maybe those problems have been fixed. Must upgrade!

How can people find more information on PMD, how to use it, and writing their own checks?

I'm glad you asked! There's lots of stuff on the PMD web site, but an excellent source of more detailed information is my recently published book "PMD Applied". This fine tome contains complete descriptions of all the rules, notes on getting PMD to work with various IDEs (NetBeans, Eclipse, IDEA, etc), lots of details on how to write both XPath and Java rules, and descriptions of how PMD works internally. It's even got a survey of open source Java code analysis tools - FindBugs, JLint, Checkstyle, and so forth.

Furthermore, "PMD Applied" is heavy enough to serve as a paperweight while being light enough to be effectively hurled across the room. It also may serve well as a dessert topping/floor wax, although I haven't tried either one.

You also seem to have some connections with the Ruby community. What is your involvement there?

Yup, I've been writing various programs in Ruby for the last few years, including this Doom map generator. I also help administer the open source Ruby project repository RubyForge.

I'm fortunate in that we're using Ruby at work, so I get to code up all sorts of things to get my Ruby skills up to par. Right now I'm working on a Ruby C extension that interacts with Evolution and a Ruby on Rails application.

Which is better: Ruby or Java? Don't bother explaining why, you're wrong.

Well, certainly, I must say that Haskell is the greatest. Unless you know Haskell, in which case OCaml is king. And if you know both of those, I bow to your greatness!

Is there anything that you'd like the CodeSnipers community to know about yourself, your projects, or life in general that I've missed?

Did I mention "PMD Applied"? Ah yes, so I did. Erm.

I suppose I'd like to drop a plug for the JavaPosse podcasts. They've interviewed some nifty folks (Cedric Beust, Joshua Bloch) and the recordings just sound nice - they just sound like a couple of guys who are good programmers and who enjoy talking about what's going on in the Java world. Good times!

About Tom Copeland: He started programming on a TRS-80 Model III, but demand for that skill has waned and he now programs mostly in Java and Ruby. He's a contributor to various open source projects including PMD, GForge, and Maven. He and his wife Alina have five children (Maria, Tommy, Anna, Sarah, Steven) and live in northern Virginia. To argue with him directly, you can check out his blog.

Thanks

Great interview! PMD sounds awesome, let me know when you are going to write it for php. :) I started on a TRS-80 as well (but can't remember the model?)

PMD in the real world

I already wrote Tom about this, but I thought I'd share publicly:

I've managed to convince my team that PMD is a useful tool. It's now part of our ant build and we're using the Eclipse plugin. We've already been able to deprecate a pair of classes completely and break out some underlying Utils code into more logical/related pieces. So far we're only using imports, unusedcode and the CPD, but I figure it's a start. We only found 11 instances of C&P with a threshold of 100 tokens... which I consider impressive as this code has been under development by at least 6 people over a total of 4-5 years and weighs in at about 250k.