Feb 22 2008
Crazy One-Liners
So I wrote a pretty interesting one-line command for a specific task today. Here it is– can you guess what it does?
awk '1 {system("lwp-request -Sm HEAD " $0)}' \
input.txt | awk '/200 OK/ {print $2}' > output.txt
Yeah, me either if I were just looking at it. But let’s break it apart, piece-by-piece. You’ll notice that’s its essentially two commands, strung together through some piping and redirection (the “\” character is just to break the command up in two lines). It’s broken up like so:
{command1} | {command2} > {file}
This says to execute command1 first. Then pipe it’s output into command2 as input. Finally take the output of command2, and throw it all into a file. So we start with the first command:
awk '1 {system("lwp-request -Sm HEAD " $0)}' input.txt
So what the heck does awk do? Well, it’s basically a utility to read in input text, do some filtering on it, and then execute a specific task (or tasks) based on the results. In this case, it has the form:
awk 'filter {command}' input
Skipping first to input, we see that the text we want to process comes from a simple text file– in this case, input.txt. filter is what decides which lines of the input actually get used. Generally it’s in the form of a regular expression, and the matching lines are processed. In our case, we just use 1, which means everything matches, and we will process all lines. Next to the command:
system("lwp-request -Sm HEAD " $0)
In awk, the system command actually specifies that the parameter command should be executed in a sub-shell. The parameter is a quoted string, and using $0 means that we should use the first token of the matching line each time. So the function we really want to look at is:
lwp-request -Sm HEAD {token}
The lwp-request command, as seen here, is a command-line utility to send HTTP requests to a server, and observe there response. It has one required argument, which is the URL to query. Since we don’t see that explicitly here, that must be coming from the token we parsed from our input. We also specify two other parameters. -S tells the program to “print the response chain,” meaning that it will show any redirection or authorization handled automatically. Also, we use -m HEAD, which specifies that we are interested in the header data from the HTTP response. So far, pretty confusing, right? Well, let’s see what a sample query looks like:
$ lwp-request -Sm HEAD http://google.com HEAD http://google.com --> 301 Moved Permanently HEAD http://www.google.com/ --> 200 OK Cache-Control: private Connection: Close Date: Sat, 23 Feb 2008 01:27:27 GMT Server: gws Content-Length: 0 Content-Type: text/html; charset=ISO-8859-1 Client-Date: Sat, 23 Feb 2008 01:27:36 GMT Client-Peer: 64.233.167.104:80 Client-Response-Num: 1 Set-Cookie: PREF=ID=4b507d757f70e13b:TM=1203730047: (...)
Interesting, sort of. Anyway, let’s move on. So that little piece of code is getting executed for the first token of every line in our input file. Then, the output is getting “piped” into our next command:
awk '/200 OK/ {print $2}'
We’ve seen awk before! This time, though, we don’t specify an input file, because the input comes directly from the previous command. Our other parameters have changed as well. The filter is no longer 1, but rather /200 OK/. This is a true (albeit simple) regular expression, and matches any line that contains the string “200 OK”. Only lines with this string will be processed. Which brings us to our command, or action: print $2. print means to simply output what follows. In this case, $2, which represents the second parsed token. awk is going to consider everything that is piped in from command1, filter out lines it doesn’t care about, and execute the action on the filtered set. Looking at our sample output above, the only matching line is:
HEAD http://www.google.com/ --> 200 OK
This line will be used in the command, print $2. The command simply prints the second token (separated by a space) on the line, so it outputs:
http://www.google.com/
The final piece of our code redirects command2’s output into a file, output.txt. And that’s it! So putting the pieces together, let’s look at what is really happening here:
- We read in data from an input file for parsing. We can infer that each line contains a URL, which is needed later
- Each URL is passed to the lwp-request command, which outputs header information from the server
- We filter the response information down to only the bits we care about. In this case, a new URL
- Finally, we output each of these “new” URL’s to an output file.
So, that’s the whole one-liner. A little more compactly, it’s a piece of code that takes a list of input URL’s, and outputs the URL’s that each one redirects to. It’s a pretty specific snippet, and has absolutely no error-checking, so is definitely prone to bugs. But, it worked for me the one time I needed it, and it was enough to show off.
On a side-note, this little piece of code made the difference between hours of mindless data-entry, and automated awesomeness.
If you're new here, you can subscribe to automaticable's RSS feed by clicking here. Or, you can get post updates through your email. Thanks for visiting!


My recent changeover to Ubuntu Linux has had me searching for easy replacements for all of my Windows programs. The Linux community has made this a pretty easy task, especially with Ubuntu. Ubuntu provides you with Gimp (Photoshop), Firefox (Internet Explorer), Thunderbird (Outlook Express), and the OpenOffice Suite (Microsoft Office Suite). One thing they do not provide you with, however, is an easy replacement for Adobe Flash. Adobe Flash was one of my commonly used programs on Windows, because I do a lot of intro movies for my youth group. Not having a replacement for it was a major downfall for Linux.




