Archive for May, 2006

We need a newer PyDev

Thursday, May 18th, 2006

Red Hat’s Toronto office recently got 6 new interns. It’s going to be a fun summer with 13 interns sitting together in one cube area :D .

Other than yelling very loudly for the whole day (I swear we were quieter last year!!) the interns are working on some improvements to yum. They’ve all used Eclipse before so they naturally wanted to use it. Very quickly they learned that we ship PyDev and so they tried that. Unfortunately we can only ship versions < 0.9.6, and currently we have 0.9.3. Even more unfortunately 0.9.3 sucks. Mostly because there were tons of additions and fixes since then, possibly also because of the way we’re running it, or because of GCJ. In any case, I decided that we *really* need to have a newer version, because this one is pretty close to useless as it is.

In the next few days I’m going to look to backporting PyDev 1.0.6 (latest released) to Java 1.4 and then pushing that as an update to FC5. We don’t have to do this often, but if it only takes 2 days, I can repeat the procedure before the end of the summer (when I’ll be going back to school). And then we could either repeat it again a few months after or just wait until we have real 1.5 language feature support (I don’t think PyDev uses any new library features, except Genericsized collections).

How can Python not have HTML unescaping?

Friday, May 12th, 2006

I’m in the process of trying to migrate a MovableType blog to WordPress. The process itself isn’t that bad, but I’ve found a whole slew of problems with preserving links. In the end it was decided to preserve internal link integrity but not worry about other people’s links. OK.

Due to weirdnesses in how both MT and WP handle post id’s I had to write a bunch of scripts that would extract post ID’s and titles from the old MT blog and a test installation of the WP blog with the imported entries, then compare the titles, and figure out the new URLs of every post that was internally linked to.

In the process of doing that I found out that Python doesn’t have sane handling of HTML unescaping!

Huh? What? You may ask.

Python has nice methods for escaping and unescaping URLs in urllib. But guess what? There’s no one function to do the same for HTML code. Lets see. Anything in htmllib? Nah, that’d be too easy. Anything in urllib, or urllib2, maybe? Just as a favour? Nah…

I’m still shocked that there’s no function for this. In the end I used xml.sax.saxutils.unescape() which does some of the unescaping but doesn’t handle all the HTML entities, who knows why, so I had to add some of the entities that I encountered in the titles manually.

Wow…