Tag Archives: Jython

Jython and my zipping drama (or really my unzip problem)

This post is really an effort to comfort myself after experiencing something of a late night conniption fit while on a business trip.  The goals of the trip were many but my role was simple, assist our Solution Architect in un-buggering a few problems at a client site in North Carolina. The buggering I’m referring to was frustrating bad performance of our product. Said product is an internal enterprise web-based solution for solving complex printer workflow problems with data. Ever wonder how your bank statements, cell phone bills, or even your IRS statements are print, stuffed, mailed, and tracked? We’ll those are the sorts of strangely exciting problems we work with.  I’m not being facetious here… this can be some seriously honest-fun geekery.

I was sitting in my hotel room bed with my laptop propped open on my lap, it was around 2 am, and I had a bad horror movie playing on the TV set, a Colorado beer near at hand, and my in-room jacuzzi was slowly filling with hot water… don’t ask, I tend to get into the zone when there’s  background distraction (I’ve always blamed it on my ADD). At 9am my colleague wanted to begin a series of load tests to begin zeroing in on the problem areas. The script I was working on would provide the means to throw seriously large amounts of data at the customers systems, we wanted to observe the systems when they were churning hard.

import os

... do a bunch of fairly nifty stuff...
os.popen("unzip large_zip_file.zip")
... do a whole lot more nifty stuff, before being mean to the server...

The script was meant to unzip an archive, modify several of the unzipped files, and then do nasty things to the servers by injecting the files into parts of our workflow. What was perplexing me was that the script seemed to work fine most of the time. Large zip files (2+ gb) seemed to illicit perplexing behavior sometimes. After being confused for awhile, it appeared that the unzipping wouldn’t quite finish before the remainder of the script would start to run. Before I go much further I should add some constraints to this exercise, I am stuck with Java 1.4 and Jython 2.5.0.

It made sense to try for a solution confined to Python’s api instead of reaching out to the OS.  A solution that still still didn’t work for my needs (code snippet credit goes to Corey Goldberg). Jython (at least 2.5.1) cannot handle large files, http://bugs.jython.org/issue1253. A Java OutOfMemory Error is thrown.

import zipfile

file_handler = open('foo.zip', 'rb')
zip_files = zipfile.ZipFile(file_handler)
for name in zip_files.namelist():
    outfile = open(name, 'wb')
    outfile.write(zip_files.read(name))
    outfile.close()
file_handler.close()

Time to try hacking something together in Java since I can harness the power of Java in Jython (code snippet credit goes to java_geek on StackOverflow). Again I was faced with an out of memory error along due to the limitation of the runtime environment… aargh!

import java.io.*;
import java.util.zip.*;

public class UnZip {
   final int BUFFER = 4096;
   public static void main (String argv[]) {
      try {
         BufferedOutputStream dest = null;
         FileInputStream fis = new FileInputStream(argv[0]);
         ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
         ZipEntry entry;
         while((entry = zis.getNextEntry()) != null) {
            System.out.println("Extracting: " +entry);
            int count;
            byte data[] = new byte[BUFFER];
            // write the files to the disk
            FileOutputStream fos = new FileOutputStream(entry.getName());
            dest = new BufferedOutputStream(fos, BUFFER);
            while ((count = zis.read(data, 0, BUFFER)) != -1) {
               dest.write(data, 0, count);
            }
            dest.flush();
            dest.close();
         }
         zis.close();
      } catch(Exception e) {
         e.printStackTrace();
      }
   }
}

The quick fix that I ended up implementing was to explicitly shell out a subprocess to ensure the command finished running. This is was suboptimal but I was tired.

import subprocess

unzip_file = subprocess.Popen("unzip " + "large_zip_file.zip", shell=True)
unzip_file.wait()

After getting home from the trip I came across this solution (credit goes to S.Lott on StackOverflow). Much cleaner and OS agnostic.

import zipfile
import zlib
import os

src = open( doc, "rb" )
zf = zipfile.ZipFile( src )
for m in  zf.infolist():

    # Examine the header
    print m.filename, m.header_offset, m.compress_size, repr(m.extra), repr(m.comment)
    src.seek( m.header_offset )
    src.read( 30 ) # Good to use struct to unpack this.
    nm= src.read( len(m.filename) )
    if len(m.extra) > 0: ex= src.read( len(m.extra) )
    if len(m.comment) > 0: cm= src.read( len(m.comment) )

    # Build a decompression object
    decomp= zlib.decompressobj(-15)

    # This can be done with a loop reading blocks
    out= open( m.filename, "wb" )
    result= decomp.decompress( src.read( m.compress_size ) )
    out.write( result )
    result = decomp.flush()
    out.write( result )
    # end of the loop
    out.close()

zf.close()
src.close()

V2 – a little WebDriver toss in some CraigsList

So I revisited the code that I originally wrote and did quite a bit of refactoring along with adding a stats counter that keeps track of pages visited and results found. The greatest inspiration came when my wife took interest and wanted to use it to do some job searches of her own, her only caveat was that she wanted the results to look a little prettier.

My next mini goal is to add the ability to parse arguments from the command-line so that it becomes a more general purpose tool.  I think that I’ll write a Jython wrapper to do this, I really like the simplicity of Python’s optparse library.

Git repository | Example output

Jython – How to install Python libraries

This wasn’t immediatly obvious to me even though in hindsight it makes sense.  Without putting much thought into my first attempt I numbly typed python setup.py install.  My goal, to use both the twitter and the simplejson (a dependency for the twitter api) apis from my Jython scripts. I quickly discovered that to explicitly install these libraries for use in Jython you need to run their setup.py scripts explicitly from Jython.

To get started download the src code of the Python libraries that you want to install.

From the command-line, note I’m on a Mac OSX 10.5:

cd python-twitter-0.6/
jython setup.py install

cd simplejson-2.1.1/
jython setup.py install

I have to give a shout out, Jython rocks my socks!

Now onto solving the next problem, how to get Eclipse (pydev) to resolve the imports correctly?