Tag Archives: Python

Euler problem #1

Inspired by a conversation with a few people on twitter I came up with a mindmap laying out a personal syllabus of learning goals. I’ve been out of school for a few months and I have a sinking feeling that I’d better keep my gears turning or the proverbial “use it or lose it…” bit will completely grab me with its forgetfulness.

One of the branches or areas that I want to keep fresh is algorithm evaluation. Its also an area of weakness for me. Its been at least two years since I had a class that focused on BigO notation or on speed, I’ve never taken a formal algorithms class, and lastly on a professional note I’ve never seen anyone do an analysis on a performance problem using anything other than capturing run times in a log. From my experience developers sprinkle log captures throughout their code in areas they think are bottlenecks.

Professionally I haven’t seen a discussion/analysis on why certain bits of code are faster or slower other than watching developers do it by trial and error.  This next bit is taken from Project Euler. I’ve done two implementations. Can you see why one might run faster than the other?

Example: If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. (Problem #1)

Question: Find the sum of all the multiples of 3 or 5 below 1000.

assumption if we input 16 the multiples are: 3, 5, 6, 9, 10, 12, 15, 15

Full solution is here on github if you want to download it.

def multiple1(limit):
    count = 0
    for i in range(1, limit):
        if (i % 3) is 0:
            count += i
        if (i % 5) is 0:
            count += i
    return count

Let’s run the first algorithm with a value of 10,000,000.

ordo-grande:Desktop matt$ time ./play.py 1000000
the answer is: 266666333333
real	0m0.423s
user	0m0.373s
sys	0m0.047s

And now the second algorithm.

def multiple2(limit):
    count = 0
    for i in range(3, limit, 3):
        count += i
    for i in range(5, limit, 5):
        count += i
    return count
ordo-grande:Desktop matt$ time ./play.py 1000000
the answer is: 266666333333
real	0m0.104s
user	0m0.083s
sys	0m0.018s

I drew up these two methods on the whiteboard in my team room yesterday along with the problem declaration. Note, I didn’t include the run times.  I asked my team which method was more effecient and over half the team guessed wrong.  I won’t spell out the discussion or the evaluation of the two examples, but can you see why one is faster/more efficient than the other?

Jython and my zipping drama (or really my unzip problem)

This post is really an effort to comfort myself after experiencing something of a late night conniption fit while on a business trip.  The goals of the trip were many but my role was simple, assist our Solution Architect in un-buggering a few problems at a client site in North Carolina. The buggering I’m referring to was frustrating bad performance of our product. Said product is an internal enterprise web-based solution for solving complex printer workflow problems with data. Ever wonder how your bank statements, cell phone bills, or even your IRS statements are print, stuffed, mailed, and tracked? We’ll those are the sorts of strangely exciting problems we work with.  I’m not being facetious here… this can be some seriously honest-fun geekery.

I was sitting in my hotel room bed with my laptop propped open on my lap, it was around 2 am, and I had a bad horror movie playing on the TV set, a Colorado beer near at hand, and my in-room jacuzzi was slowly filling with hot water… don’t ask, I tend to get into the zone when there’s  background distraction (I’ve always blamed it on my ADD). At 9am my colleague wanted to begin a series of load tests to begin zeroing in on the problem areas. The script I was working on would provide the means to throw seriously large amounts of data at the customers systems, we wanted to observe the systems when they were churning hard.

import os

... do a bunch of fairly nifty stuff...
os.popen("unzip large_zip_file.zip")
... do a whole lot more nifty stuff, before being mean to the server...

The script was meant to unzip an archive, modify several of the unzipped files, and then do nasty things to the servers by injecting the files into parts of our workflow. What was perplexing me was that the script seemed to work fine most of the time. Large zip files (2+ gb) seemed to illicit perplexing behavior sometimes. After being confused for awhile, it appeared that the unzipping wouldn’t quite finish before the remainder of the script would start to run. Before I go much further I should add some constraints to this exercise, I am stuck with Java 1.4 and Jython 2.5.0.

It made sense to try for a solution confined to Python’s api instead of reaching out to the OS.  A solution that still still didn’t work for my needs (code snippet credit goes to Corey Goldberg). Jython (at least 2.5.1) cannot handle large files, http://bugs.jython.org/issue1253. A Java OutOfMemory Error is thrown.

import zipfile

file_handler = open('foo.zip', 'rb')
zip_files = zipfile.ZipFile(file_handler)
for name in zip_files.namelist():
    outfile = open(name, 'wb')
    outfile.write(zip_files.read(name))
    outfile.close()
file_handler.close()

Time to try hacking something together in Java since I can harness the power of Java in Jython (code snippet credit goes to java_geek on StackOverflow). Again I was faced with an out of memory error along due to the limitation of the runtime environment… aargh!

import java.io.*;
import java.util.zip.*;

public class UnZip {
   final int BUFFER = 4096;
   public static void main (String argv[]) {
      try {
         BufferedOutputStream dest = null;
         FileInputStream fis = new FileInputStream(argv[0]);
         ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
         ZipEntry entry;
         while((entry = zis.getNextEntry()) != null) {
            System.out.println("Extracting: " +entry);
            int count;
            byte data[] = new byte[BUFFER];
            // write the files to the disk
            FileOutputStream fos = new FileOutputStream(entry.getName());
            dest = new BufferedOutputStream(fos, BUFFER);
            while ((count = zis.read(data, 0, BUFFER)) != -1) {
               dest.write(data, 0, count);
            }
            dest.flush();
            dest.close();
         }
         zis.close();
      } catch(Exception e) {
         e.printStackTrace();
      }
   }
}

The quick fix that I ended up implementing was to explicitly shell out a subprocess to ensure the command finished running. This is was suboptimal but I was tired.

import subprocess

unzip_file = subprocess.Popen("unzip " + "large_zip_file.zip", shell=True)
unzip_file.wait()

After getting home from the trip I came across this solution (credit goes to S.Lott on StackOverflow). Much cleaner and OS agnostic.

import zipfile
import zlib
import os

src = open( doc, "rb" )
zf = zipfile.ZipFile( src )
for m in  zf.infolist():

    # Examine the header
    print m.filename, m.header_offset, m.compress_size, repr(m.extra), repr(m.comment)
    src.seek( m.header_offset )
    src.read( 30 ) # Good to use struct to unpack this.
    nm= src.read( len(m.filename) )
    if len(m.extra) > 0: ex= src.read( len(m.extra) )
    if len(m.comment) > 0: cm= src.read( len(m.comment) )

    # Build a decompression object
    decomp= zlib.decompressobj(-15)

    # This can be done with a loop reading blocks
    out= open( m.filename, "wb" )
    result= decomp.decompress( src.read( m.compress_size ) )
    out.write( result )
    result = decomp.flush()
    out.write( result )
    # end of the loop
    out.close()

zf.close()
src.close()

Python update all your eggs

Quite awhile ago I needed an easy way to update all my Python eggs, this is tedious at best especially if you didn’t explicitly keep track of  what you’ve installed over time.  The script below isn’t mine (I’ve made a few subtle changes) but it is invaluable. I’ve been using it for the last 8-months without fail.

Many thanks to Flávio Codeço Coelho for making his script available online.

#!/usr/bin/env python
from setuptools.command.easy_install import main as install
from pkg_resources import Environment, working_set
import sys

#Packages managed by setuptools
plist = [dist.key for dist in working_set]

def autoUp():
    for p in Environment():
        try:
            install(['-U', '-v']+[p])
        except:
            print "Update of %s failed!"%p
        print "Done!"

def stepUp():
    for p in Environment():
        a = raw_input("updating %s, confirm? (y/n)"%p)
        if a.upper() =='Y':
            try:
                install(['-U']+[p])
            except:
                print "Update of %s failed!"%p
        else:
            print "Skipping %s"%p
        print "Done!"

print "You have %s packages currently managed through Easy_install"%len(plist)
print plist
ans = raw_input('Do you want to update them... (N)ot at all, (O)ne-by-one, (A)utomatically (without prompting)')
if ans.upper() == 'N':
    sys.exit()
elif ans.upper() == 'O':
    stepUp()
elif ans.upper() == 'A':
    autoUp()
else:
    print "Oops, you chose a non-existant option. Please run the script again."