Archive | Python

FactFinder makes screen scraping easy

Dec 9th, 2005No Comments

I need to do some research in church on average household incomes in different neighborhoods. Luckily the U.S. Census Bureau has all that informaction and they distribute it in the public domain at American FactFinder. Of course, I have a list of ZIP codes to research and I don’t want to do this manually.

The FactFinder website makes automation pretty easy. For one thing, the ZIP code to look up is in the URL three times. So I copied it out of my text editor and replaced the ZIP code with “%s”. Now, make to make the URL, I use these two line:

insertTup = (zipcode,)*3
factinfo = urlopen(factfindURL %insertTup)

So now I have a URL for any ZIP code, but how do I parse it? Well, naturally, mix it into some BeautifulSoup:

soup = BeautifulSoup(factinfo.read())

Okay, so now how do I get do the data? If you a show source on a Fact Sheet Page, you’ll see that the data table well-labelled with ID names. For example, we want the median household income which is in row 46 in the second column. Luckily for us, their class names are in the tag — we need to look for a td tag with a header value of R46 C2. But the content isn’t in the td tag directly but in the p tag that is td tag’s only child. BeautifulSoup makes this very easy:

 income = soup("td",{'headers':'R46 C2'})[0].first().string.strip()

So there it is! When I approached this problem, I thought it would be quite difficult, but a good use of web standards and a good HTML parser made this almost trivial.

Introducing geocode_pic.py

Dec 5th, 2005No Comments

I wrote this script as a response to Hack 10 in the most-excellent Mapping Hacks from O’Reilly. I named it geocode_pic.py and it takes a directory of JPEGs and a GPS tracklog and tries to list out what coordinates each picture was taken at.

It hasn’t been tested extremely well, but it works well in my enivironment. I’m looking for feedback to see how it works for others!

It requires :

More info can be seen by doing a “pydoc geocode_pic”.

This script works by taking the DateTime from the JPEG’s EXIF metadata and the time from each trackpoint and try to match them up. You can give the script a “timedelta” in hours. This means that every photo taken timedelta hours past a point from the tracklog it to count as from that point. Of course, if your camera is quite off from your GPS, will you have problems. It also assumes that the DateTime from the JPEG is in your timezone and the tracklog time is in UTC (which is what the schema says it should be!). Variences from that will cause problems.

What to do with this data? You could make a Google Map application. I’m thinking about hacking something into our gallery.

IronPython 0.9.5 is out

Nov 29th, 2005No Comments

This is a little slow on the draw, but IronPython 0.9.5 is out. I hafta start playing with it again. Soon, I will be able to dedicate some time into implementing FePy into my Java/COM testing application. What has been stopping me is the .Net requirements — our stuff is on .Net 1.1 while FePy depends on .Net 2.0. Now that 2.0 is out of beta I can see if I can get our API into FePy.

Jython changes hands again

Nov 15th, 2005No Comments

An announcement was just made that Brian Zimmer steps down as head of Jython development. There is someone to replace him and continue the work.

Brian Zimmer did a great job getting the ball rolling on Jython 2.2: he applie for and received a PSF grant, he built a community of developers around Jython, and they released a couple of alpha builds of Jython 2.2. Though he was only in charge a short time, his work has ensured that Jython’s future looks bright.

Using Jython, zxJDBC, and DBUnit

Oct 17th, 2005No Comments

Not much time for the long post that this deserves, but here is an example of using Jython, zxJDBC, and DBUnit. This little snippet makes a JDBC connection to a JDBC database and gets all the data out of it and stores it in an XML file

from com.ziclix.python.sql import zxJDBC
from org.dbunit.database import DatabaseConnection
from org.dbunit.dataset.xml import FlatXmlDataSet

if __name__ == '__main__':
    d,u,p,v = ("jdbc:db2://172.16.21.217:50000/mh332",
               "db2inst1","db2inst1","com.ibm.db2.jcc.DB2Driver")

    db = zxJDBC.connect(d,u,p,v)
    ## you have to use this almost-undocumented __connection__ property
    conn = DatabaseConnection(db.__connection__)
    fullset =conn.createDataSet()
    FlatXmlDataSet.write(fullset,open("db2backup.xml","w"))
Page 7 of 8« First...«45678»