Archive | Tech

How I Read The Internet

Jan 8th, 2013No Comments

I don’t use Google Reader anymore, and I there are very few blogs that I want to make sure I’m caught up on. I last really hung out on Reddit five years ago (I know this because I just checked) and I’ve never really gotten into Hacker News.

But, if you follow me on Twitter or App.net (aka ADN), you know that I send out a lot of articles. Actually I read at least twice as many as I read . . . so how does this happen?

I also rarely read articles on at my desk anymore . . . I’d say 80% of my reading is done on my Android phone. When I’m on couch with my son, and he’s watching a particular Thomas the Train movie for the hundredth time, I take my phone out and read an article or two. After reading it, I quickly decide if it’s worth sharing, click a button, and then I know it will be shared.

I decided that I would describe how I do all this. I have no idea if this is unique for some people or if it’s how many others do it. I don’t even know if it’s the most efficient method. But I will say that this has been evolving for about eight or nine months (well, technically *years&), and what I’m describing here has stable for about a month, which is quite a long while in this process. So it works well. For me at least.

Here the three main tools I use:

  • Pocket, formerly Read It Later. This normalizes the screen very well for reading, removes the ads, extra formatting, etc. There are other alternatives: Instapaper or Readability would work just as well. I chose Pocket because it was the first one with good support on Android when I started doing this, but now both Instapaper and Readability do as well. So I’m saying that they would work as well. One killer feature of Pocket is that it remembers where I stopped reading — i.e. if I started reading an Pocket article on my laptop, and later open it up in Pocket, Pocket will so to the part where I left off. No idea if the other two do this (they probably do).
  • Buffer is what sends out the links out. What is nice about Buffer is that it does it on a schedule. Buffer isn’t really needed — I could just send out the links as I read them, but sometimes I may read five or six articles in a setting and I don’t want to annoy anyone with a flood of tweets. Buffer will spread them out throughout the day.
  • IFTTT is really the glue that makes all this process work in an automated fashion. It sends some articles to Pocket, and it is responsible to putting all my Starred items into Twitter and ADN. One of my very few complaints about IFTTT is how it interfaces into Buffer: IFTTT can only send to one linked account. I have IFTTT send to my ADN account via Buffer, and then have a work-around (again, using IFTTT) that sends the same message to Twitter.

Here is a diagram of how this works, and a more detailed explanation is below:

When I see an article come into my social media net, I click on it, quickly decide if it’s worth my time (I may read a paragraph or two) and then I add to my Pocket list via the Aside Chrome extension. Pocket has an official extension, but I like Aside because it’s quick, and it’s in my address bar, and I can see if I’d already added the page.

If I’m on my phone I do a similar thing — I open the page, give it a quick once-over, and then use Android’s brilliant “Share” functionality, and then choose “Add to Pocket”.

As I said, there are a few blogs (OK, two blogs) that I like enough to at least look at all the posts in Pocket. I use IFTTT for that. I just setup a couple recipes for saving new feed items to Pocket. Boom — now all latest entries are there.

Now you know how I aggregate all my articles into Pocket. Getting them out of Pocket into the social streams is more automatic.

When I get a few moments to read, either at night, waiting for a meeting, at waiting for an oil change, etc., I take my phone out, start Pocket, and read. If I want to share the article, I star it (aka favorite). Then IFTTT takes over.

I have an IFTTT recipe that says “Add my Starred Items in Pocket to Buffer”, so then the Starred items get scheduled in Buffer. When the time comes, Buffer sends out my message on ADN, but also with a “#tw” tag. And, then, guess what? Another IFTTT recipe sees I have an ADN message with “#tw” and it then sends that message to Twitter (This is the work around I talked about previously).

Complicated? Heck yeah. I didn’t come up with this system overnight, but it has evolved over months. But now the things I want to read on the Internet come to the same place with little effort, and I can share them with even less effort. And those were my goals.

Hoss Dreams of Software

Jan 2nd, 2013No Comments

The other night we watched Jiro Dreams Of Sushi which I highly recommend everyone watch, even if you don’t like sushi. Even if you don’t know anything about sushi. Because it’s not about sushi — it’s about Jiro, an artist who is obsessed about quality, and his craft. And his craft is making sushi.

Jiro Ono is 85 years old and owns a nondescript sushi restaurant in Tokyo. His restaurant only has 10 seats, but it costs $300 per seat and you have to make your reservations at least a month in advance. Oh, and it is a 3-star Michelin rated restaurant. Jiro is, in face, the oldest chef to be awarded a 3-star Michelin award. The restaurant reviewer interviewed in the film said, many times, that Jiro’s sushi is the consistently the best he’s ever had. It’s always the best — never was there a time a bit worse than the other. And that is an astounding review.

This all has to do with Jiro, who has committed his entire life to making sushi. Meaning, he’s been at this since he’s been 14 years old. He’s at his restaurant every day, overseeing the preparation of the fish, rice, eggs, etc. He will quickly give a criticism when he sees or tastes something under his exact standards — including his own 50-year old son who works there. Jiro keeps a close eye on his customers, noticing if they are left-handed (he puts the sushi in a different place on the plate if it is) as well as making slightly smaller pieces for females. H also admits when his restaurant is closed on state holidays, he doesn’t know what to with himself.

I’ve been saying for years that cooking is a lot like programming software, and I thought many times about that through this film. Jiro said that, if you want to be the best chef, you can never be satisfied, always strive to be better, and you have to love it. These traits, to me, are the same as what makes a great developer. You have to always been learning, striving to make you things better, and you have to love the work. I think the last item is the most important — writing software is hard and takes a certain kind of dedication and brain work that, frankly, not everyone is cut out for.

But if you decide that you like this kind of work, then you dedicate your life to it. And, if you want to dedicate your life to it, then you should be constantly looking for ways to get better. Back to Jiro . . he has been making sushi for 70 years. 70 years! And he is always looking for ways to get better. Not necessarily One Big Thing that will change sushi forever, but little increments, like the kind of rice to use, the temperature of the rice when the the sushi is made and served, massaging the octopus for a longer time to bring that much more flavor out of it, finding the best fish mongers to buy from . . the list goes on and on.

I think most software developers (including myself) want to find the silver bullet, the one thing that will make us all better. But, alas, it doesn’t exist. There is no one methodology to follow, no one language to use, no One True Editor or IDE that solves all the problems. We have to get better, in bits of a time.

Really, what I am talking about comes back to craftsmanship. We want to write great software and, after we do that, we want to do it again, but better this time. Never going back, but always improving. Uncle Bob already wrote a great summary of what this looks like so I will just close with telling you to read that. And get started on your personal improvement.

Add a Read-Only Role to Django Admin

Nov 13th, 2012No Comments

I was in a meeting where I was asked to give someone read-only access to the Admin part of our application. That was fine — it was written in Django and Django has really fantastic Admin functionality. So I assumed that it could handle it, no problem. So I said yes.

Of course, after a little googling, I found that that it doesn’t support this at all — you can only give people Add, Change, or Delete permissions. You can make individual fields read-only but, in an ideal world, I needed a whole object to be read-only or not, hopefully determined by Group membership.

My searches didn’t give me a lot of hope, but I did find something close in this post.. So I expanded it to look for a Group.

This is what I came up with:

So you used ReadOnlyAdmin to inherit from instead of ModelAdmin for all Admin objects you want to make read-only. Then you also have to add these two properties:

  • user_readonly – list of the fields to be read-only. If you don’t put in there, the user will be able to change the Model!
  • user_readonly_inlines – If you have a related Model that you want to display Inline, then you can’t add it to user_readonly because it’s not part of the Model. You have create a read-only InlineAdmin object and list that here.

Creating a read-only Admin object is simple:

  class MyModelInline(admin.StackedInline):

     model =MyModel


class MyModelReadOnlyInline(MyModelInline):

    readonly_fields = ["label",]

Then you just list MyModelReadOnlyInline in the user_readonly_inlines and MyModelInline in inlines.

To use the ReadOnlyAdmin:

  • Create a Admin Group called readonly.
  • Add the User to readonly and give them full access to the Models you want them to read — yes, give them Add, Change, etc. Or they can’t view them at all.

When the user logs in, they will see the Model and go to individual ones, but none of the fields will be in form fields — just straight text.

The Many Roads Of PDF Processing

Oct 11th, 2012No Comments

The Easy Path

So you have a PDF, or a bunch of PDFs, and want to extract the text out of them? A few years ago, this would have been a horrible task, but life has gotten easier since then.

If your PDF is just filled with text, this becomes really easy:

 pdftotext pdfname.pdf

You can find pdftotext for most operating systems.

How you you know that it’s just text? If you open it up in Acrobat/Preview/XPDF/etc and can highlight the text, then pdftotext should work fine.

But if you can’t do that, then what the author probably did was make an image and embedded it in a PDF file. You then have to use OCR, which can give you some output which isn’t always right. A Google-sponsored tool called tesseract does a good job with this OCR stuff.. I remember that it used to stink, but it doesn’t anymore. Simply:

tesseract pdfname.pdf textpat

That will try to do an OCR scan of pdfname.pdf and save each page into a file called textpat.txt.

But, of course, the path isn’t always easy.

The Long and Winding Road

I have a bunch of typed documents (read: hard copies) coming in, all of which have to be typed in. Lucky me. We have a scanner on-site and I asked if it does OCR, and I was told that it doesn’t. I’m even getting luckier.

But I’ve parsed PDF’s before. I should be able to handle it.

I scanned in a few and had the PDFs sent to me. I installed tesseract via Homebrew. The results were. . . disappointing:

$ tesseract pdfname.pdf out
Tesseract Open Source OCR Engine v3.01 with Leptonica
Error in pixReadStream: Unknown format: no pix returned
Error in pixRead: pix not read
Unsupported image type.

So a quick google shows that either tesseract doesn’t have the right libraries installed, or the PDF wasn’t well-formed. Since tesseract told me it found Leptonica, I have to assume the proper libraries are there. So our scanner is making improper PDFs. This is great.

After some googling and head scratching, I discovered that tesseract works very well on Tiff files. I used Preview to export the PDF to a Tiff and — success!

 $ tesseract pdfname.tiff out
 Tesseract Open Source OCR Engine v3.01 with Leptonica
 Page 0
 Page 1
 Page 2
 $ ls  out*
 out.txt

Ok, I didn’t want to open all of these files in Preview. How to convert them from the command-line? Well, the first tool to think of is convert from ImageMagick. That has always been a tricky road for me nnd, sure enough, the resulting Tif file had horrid resolution. That made tesseract spit out garbage. I searched some more, even for OSX-specific solutions. I found sips which comes with OSX, but most people haven’t heard of it. The usage is a bit arcane but it uses the OSX libraries (i.e. the same thing my Preview export used). And, yes, it worked great out of the box — except that it doesn’t handle multi-page PDF’s. Ugh.

How does one break up a PDF into pages? More googling, and I found pdftk which is a little swiss army knife of PDF processing. And, hey, it can break a PDF into pages with the burst option! Or, maybe not:

 $  pdftk pdfname.pdf burst
 Unhandled Java Exception:
 java.lang.NullPointerException
 at com.lowagie.text.pdf.PdfCopy.copyIndirect(pdftk)
 at com.lowagie.text.pdf.PdfCopy.copyObject(pdftk)
 at com.lowagie.text.pdf.PdfCopy.copyDictionary(pdftk)

That’s not good. A few searches showed someone else with that same problem. The cause? A bad PDF of course! The thing that has started me down this path! But I could extract the PDF a page at a time . . but that’s bad to me.

Ok, time to refocus. I thought, “What I am trying to accomplish?” And that was converting the broken PDFs to Tifs so I can run tesseract. So let’s focus back on the PDF->Tiff part. I did more searching and found a StackOverflow entry that talked about the problem I had with ImageMagick and tesseract. and someone posted a nice recipe for using Ghostscript:

 /usr/local/bin/gs -o out.tif -sDEVICE=tiffgray -r720x720 \\
 -g6120x7920 -sCompression=lzw in.pdf

And I got a Tiff file out that tesseract could process wonderfully! Woot! The bad part was that tesseract took a long time to process this tif — much longer than the one from Preview. Most of that processing time was done in the first page of my PDF, which is essentially a cover page. How do I get rid of that cover page? Well, back to pdftk:

 pdftk pdfname.pdf cat 2-end output nocover.pdf

So that makes another PDF from the second page on (these PDF’s have a variable number of pages).

Running the PDF->Tiff conversion on the nocover.pdf command gave some errors. But then I ran tesseract on the resulting tif file and I had no problems.

Just for fun, I ran tesseract on the nocover.pdf that pdftk created — same error and the first thing. I figured as much but it was worth a shot.

So, in the end, I wrote a shell script that takes a PDF as a parameter and does this:

oldname=`basename $1`
name=${oldname%.pdf}

pdf=nocover/$name.pdf
tiff=tiffs/$name.tiff
text=extracted/$name

pdftk $1  cat 2-end output $pdf
/usr/local/bin/gs -o $tiff  -sDEVICE=tiffgray -r720x720 -g6120x7920 -sCompression=lzw $pdf
tesseract $tiff $text

And that, my dear readers, is how to put a PDF through an OCR process.

The Road to Scala

Sep 12th, 2012No Comments

To be honest, Scala has been on my periphery for some time now. I had heard of it before, but the first real mention I actually remember was a talk Ted Neward gave at No Fluff one year. I couldn’t go to that talk, but I remember him talking about it a few times some other talks he did that weekend.

Fast-forward 2010. When I went to Strange Loop, there was some buzz on Scala. Of course, Scala is kinda mainstream for Strange Loop then so there wasn’t that much talk on it, but there was buzz. Of course I ignored it.

So, with all that, this is what I knew about Scala:

  • It’s statically-typed. Since Python has been my first love, I really can’t get into static typing. I see the benefits, but writing code in those languages makes it feel pedantic.
  • It runs on the JVM. I already have Jython as my JVM-alternative of choice.
  • It’s kinda functional and kinda OOP. OK, Python is also like that, but that idea weirded me out.

Then we fast-forward to just a couple months ago. I read this excellent blog post and thought he was spot on when talking about the perils of modern day software developers. I honestly know nothing else about Michael Church, but he was spot on in the second part, so how right was he on the first part — the list of languages?

I already know Python and C. And, OK, not ML and Clojure, but I know what their general idea was. And then there was Scala again. It was this thought that got my attention:

I think Scala is the language that will salvage the 5 percent of object-oriented programming that is actually useful and interesting, while providing such powerful functional features that the remaining 95% can be sloughed away. The salvage project in which a generation of elite programmers selects what works from a variety of programming styles — functional, object-oriented, actor-driven, imperative — and discards what doesn’t work, is going to happen in Scala. So this is a great opportunity to see first-hand what works in language design and works in language design what doesn’t.

And I’m all for that — there are some good parts of OOP, but a lot of it has become painful. All the styles Church listed have some merits as well as downsides. If you can actually do all of them, then the cream of each style should rise to the top.

Another one of his thoughts grabbed me was:

[Scala] has an incredible amount of depth in its type system, which attempts to unify the philosophies of ML and Java and (in my opinion) does a damn impressive job.

Incredible type system? In a static language? I have yet to see such a beast. OK, the only static typed languages I have used are Pascal, C, and Java, and not one of them are good.

So, not to lengthen this anymore, I decided to dip more than my toe in the Scala waters and see what all this hype was about. After mucking with it off and on for about a week, I have to say that I’ve impressed. Actually, more than impressed: I haven’t had this much joy in discovering a language since I started banging on Python over 10 years ago.

I’m far from a journeyman in Scala, but I’m getting up to speed on it rather quickly. When I learn something, I need to be a do’er , not a reader. I’ve been using Scala Koans to play with. It uses SBT to continuously run the tests, which is very cool. When I get to the point of mucking around a little deeper, I use Scala Test with SBT to give me the same continuous feedback.

I recently did Osherove’s String Calculator kata to Step 6 in 30 minutes, without any Googling or even too much fumbling. That says something about how easy it can be to get started creating code that actually does something.

Here are some things I have learned to love in Scala:

  • Pattern Matchers. This is probably my favorite. Now that I have groked them, I may never want to write a parser in anything but Scala ever again. I should also state I avoid switch-case statements of any kind in any other language but that structure works really well for Scala’s pattern matching. When you use them with regular expression groups, magic happens.
  • Case Classes. It does a lot of the boiler plate of making objects for you, and you get a sane equals to boot. And, as the link says, they go nicely with pattern matchers.
  • The static type system does make sense, and does not annoy me. Look at this line: val negatives = numbers.filter{ _>0} What is numbers? Well, since we are filtering it, it must be a collection of some sort. Is it a List or is it an Array? Then what is negatives? Well, since we are using filter, it must be the same kind of collection that numbers is. But my favorite part is this: it doesn’t matter. I know how negatives should behave, because it should behave just like numbers does. This makes sense to me, so much so that a type declaration for negatives becomes superfluous (hello Java …)

Now there are things that have annoyed me in Scala. But I’m a beginner so I think some of those things will iron themselves out. I’ve been coming up with web app ideas that I can start writing in Lift, which probably says something about how how I feel about learning it.

Page 1 of 5912345»102030...Last »