A Tasty Pixel » Shell

Resuming ADC downloads (‘cos Safari sucks)

So, Safari’s resume facility is just awful — it’ll randomly restart downloads from the beginning, clobbering anything that’s already been downloaded, and the resume button will frequently disappear entirely and mysteriously from the downloads window. And if the session has expired, it’ll cause all kinds of havoc.

Anyone downloading the gazillion-gb iOS/Mac SDK + XCode on a slow and/or expensive connection will know the sheer fisticuffs-inspiring irritation this creates — speaking personally, living on a mobile broadband connection that’s usually changed at £3 per gig and often runs about as fast as I could send the data via carrier pigeon, this usually makes me want to storm Cupertino with a pitchfork.

Okay, so I could probably use Firefox or something else, but instead I figured I’d whip up^* a shell script that lets me use my favoured long-haul download tool – curl. And in case there were any other sufferers of insanely-priced broadband and Safari’s antisocial behaviour, I thought I’d share it.

It’ll ask for your Apple ID and password, and store it in the keychain for you, and it’ll resume from the current working directory.

Chuck it somewhere like /usr/local/bin, make sure it’s executable (chmod +x /usr/local/bin/adc_download.sh) and call it from Terminal like:

`adc_download.sh https://developer.apple.com/devcenter/download.action?path=/Developer_Tools/xcode_4_gm_seed/xcode_4_gm_seed_.dmg`

If you’ve already started the download in Safari, just grab the partially-downloaded file from within the .download package Safari creates.

Here ’tis:

ADC Download Script (on Github)

P.S. I’d be interested to see how incremental updates fare when transferred from an intermediate server with rsync. It’s rather bizarre that Apple reissue the whole 3.x gb SDK with each update, rather than offering a ‘patch’ (I guess Apple lives blithely in the world of cheap bandwidth!), and it makes me wonder whether there’d be sufficient correlation between versions to save some bandwidth by avoiding transferring the similarities…

^* read: spend hours on, as is my way.

OS X service to filter selected text through a shell command

The UNIX shell provides a host of extremely useful utilities for modifying text. This OS X Automator service makes all of them available for filtering text in all OS X applications.

This can be handy for performing quick operations, like replacing text with regular expressions, sorting lists or swapping fields around.

When triggered, the service requests a command to use for filtering, then runs the command and replaces the selected text with the result.

Some sample operations:

Sort lines alphabetically/numerically: sort or sort -n
Change to lowercase: tr "[:upper:]" "[:lower:]"
Replace a spelling mistake, taking care of case: sed -E 's/([tT])eh/1he/g'
Re-order elements in a tab- or comma-separated list: awk '{print $2 $1}' or awk -F, '{print $2 "," $1}'

Put it in Library/Services, and it should appear in the ‘Services’ menu.

Filter through Shell Command.zip

Unix tools tutorial

Last week we took a quick look at shell scripting. This week we’ll finish off with a look at some useful tools – awk, sed, sort and uniq.

These are used mainly for manipulating text, and gathering information. Awk and sed provide a fairly rich language set, but can be used for very simple things too. Sort and uniq are simple utilities that carry out one operation – sorting, and extracting unique entries (among other things), respectively.

We’ll start simple, with sort and uniq.

Sort & Uniq

Sort is useful for putting items into order – it takes newline-separated text data on input, and outputs the data, sorted, in the order specified as parameters. When used with the ‘-r’ parameter, sort will perform a reverse order sort.

This tool is particularly useful when used with uniq, which requires that the input is sorted.

Uniq will remove duplicate entries from the input, display only duplicate entries (when the ‘-d’ parameter is used), or display a count of duplicates in the input (when used with ‘-c’). The latter function is quite useful for collecting data – I used it just the other day to examine my log file to block the IP addresses of some spammers. See the bottom of this post for more details.

Sed

Every Unix user’s friend, sed is great for performing replacements on input, among other things. Sed’s great for taking some input, applying a regex-based search-replace, and printing the result. It’s useful for removing unwanted characters from input, pruning input for further processing, and changing the order of fields (although awk can do this too).

For example, converting all files in the current directory from png to jpg:

$ find . -maxdepth 1 -type f -iname *.png -print0 | xargs -0 -i{} sh -c ‘convert “{}” “`echo {} | sed s/png/jpg/`”‘

This grabs all ‘png’ files in the current directory, and runs ‘convert’ with the original filename as the first argument, and the same filename with ‘png’ replaced with ‘jpg’, as the second argument.

So, a common usage of sed is:

$ source | sed s/search/replace/ | destination

This replaces the ‘search’ pattern with ‘replace’ in the input from ‘source’.

‘Search’ is a regular expression that specifies the text to have the transformation done to it. It can include parenthesis ‘(‘ and ‘)’ to designate captured text, which can then be referenced in the replacement through backreferences, which are denoted by ‘n‘, where n is the index of the captured expression. This is a trivial example, which could probably be done in many ways, but:

$ cat index.html | sed ‘s/<[bB]>([^<]*)</[bB]>/<H2>1</H2>/’

This looks for a string of characters that is not a ‘<‘ between a ‘<b>’ string (b can be lower or uppercase) and a ‘</b>’ string. This string is replaced with <H2>, then the text that was between the two <b> tags, and </H2>.

Tricky, but after some practice, sed is very useful.

Awk

Awk is a very full featured processing suite, which includes a fairly complex programming language. It allows for variables, arithmetic operations, fairly complicated search patterns, text manipulation functions, execution control statements – if, while, for, and so on.

There are a huge number of awk tutorials, for more information than this very quick intro.

So, an awk expression typically looks like:

PATTERN { Action; Action; Action } PATTERN2 { Action; action }

…That is, one or more pattern-action pairs, where an empty pattern will match everything. Patterns can be ‘BEGIN’, which causes the associate actions to be executed before anything else (for variable initialisation, for example), ‘END’ which executes after everything else (for reporting, perhaps), or something like a regular expression, or other comparison expression.

Variables are treated much like c variables (‘var = “some value”; print var;’). There are some special variables defined in awk: NF, which gives the number of fields in the current record (that is, number of items separated by the field separator character on the current line of input), NR which gives the current record number (line number), FS, which gives the field separator character, which can also be set to any character, or in fact, any regular expression.

Note that the field separator can be set with the ‘-F’ parameter when calling awk.

Awk also defines some fairly similar functions to C: printf in particular, which formats a series of variables according to a given format expression.

One simple usage for awk is to extract (possibly format) fields from a file. The command:

$ awk ‘{print $1}’ /var/log/httpd/access_log

…Extracts the IP address from an Apache log file. This can then be processed to extract statistics.

A more advanced example counts the number of times a certain IP address was logged:

$ awk ‘BEGIN { ip = “127.0.0.1” } $1 == ip {x++} END { print ip ” was logged ” x ” times” }’ /var/log/httpd/access_log

127.0.0.1 was logged 7 times

This starts by setting an ‘ip’ variable, the IP to search for. Then, for every record (line) who’s first field matches that ip variable, the variable ‘x’ is incremented. At the end, it prints a message containing the IP address and the number of times it was found in the log.

Putting it together

Here’s an example of using sort, uniq, sed and awk (this may not be the optimal solution, but it works). I was being comment-spammed quite severely (as usual), and wanted to try blocking the offenders. I had a look in my log to grab out the offending IP addresses:

$ grep comment.php public_html/mike/db/hits | tail -n 1000 | awk ‘{print $3}’ | sed s/:// | sort -n | uniq -c | sort -nr | less

This performs the following:

Searches public_html/mike/db/hits for the term ‘comment.php’ (which the spammers were attacking)
Grabs the last 1000 lines
Extracts the third field, which was the IP address
Removes the colon character, which was in the field
Performs a numeric sort on the IP addresses
Counts the number of repetitions of addresses
Sorts in reverse numeric order, so that the largest number of repetitions (repeat offenses) are displayed first
Displays the results in the ‘less’ viewer

The output looked something like:

37 200.88.223.98
20 202.83.174.66
13 213.203.234.141
12 62.221.222.5
11 222.236.34.90
10 192.104.37.15
10 165.229.205.91
 9 213.148.158.191
 9 193.227.17.30
 8 61.238.244.86
...

I then blocked the first few IP addresses (which, as a side note, didn’t help reduce spamming – there appears to be an almost infinite pool of addresses which are used).

So, that’s the rundown on a few useful tools. They can save a lot of time, especially when combined with a bit of shell script framework. Try them out sometime!

A brief shell scripting tutorial

Shell scripts are very useful things, whether they’re prepared and saved to a file for regular execution (for example, with a scheduler like cron), or just entered straight into the command line. They can perform tasks in seconds that may take days of repetitive work, like, say, resizing or touching up images, replacing text in a large number of HTML documents, converting items from one format to another, or gathering statistics.

This entry is a brief introduction to scripting using the Bash shell, which I find to be the most intuitive, and probably the most common (that said, most of the information here will apply to other shells). We will explore some of the basic building blocks of scripting, such as while and for loops, if statements, and a few common techniques for accomplishing tasks. Along the way, we’ll also take a look at some of the tools that make shell scripting a bit more useful, such as test, and bc. Finally, we’ll put it together with a couple of examples.

Stay tuned over the next few days for a brief tutorial on some very useful unix tools, like awk, sed, sort and uniq, which may become indispensable to you (they have for me).

But first, lets begin with some basics.

Scripting 101

Bash (and it’s siblings) is a lot more than just a launcher; it is more like a programming language interface, which allows users to enter very complicated commands to achieve a wide variety of tasks. The language used is much like any other programming language – it contains for and while loops, if statements, functions and variables.

Commands can be either entered straight into the command line, or saved to files for execution.

Script files invariably begin with a line that’s commonly known as the shbang (hash-bang, referring to the first two characters):

#!/bin/sh

This is a directive that’s read by the shell when a script file is executed – it tells the shell what to use to interpret the commands within. In this case, the /bin/sh application will be used. This is the most common shbang; it can be replaced with #!/usr/bin/perl for perl scripts, or #!/usr/bin/php for php scripts, too.

After the shbang comes the script itself – a series of commands, which will be executed by the interpreter. Comments can be entered in to make the script more readable; these are prefaced by a hash symbol:

# This is a comment

When creating a new script file, I find it easiest to set it as executable, so it can be run by just entering the script name. Alternatively, the script has to be run as a parameter to the interpreter (such as ‘sh script.sh‘). This is annoying, so make the file executable with:

chmod +x script.sh

Now the boring stuff’s covered, lets move on to the basic code structures!

Holding and manipulating values

Variables are used to hold values for use later, and are accessed by a dollar symbol, followed by the variable name. When defining the values of variables, the dollar sign is not used at all. For example:

count=2;

echo $count;

Note particularly the absence of spaces around the equals sign in the first line above. This is required – putting spaces in (like ‘count = 2’) will cause a syntax error.

Numeric variables can have arithmetic operations performed on them using the $((…)) syntax. This allows for simple integer addition, subtraction, division and multiplication. Operations can be combined, and brackets can be used to form complex expressions. For example:

count=$(($count+1));

product=$((count*8));

complex=$(((product+2)*($count-4)));

Note the first line of the previous example – the simple increment. This is quite useful for performing loops with a counter (we’ll have a look at loops soon).

For performing more complicated arithmetic, the ‘bc‘ tool is quite handy. bc is an arbitrary precision calculator language interpreter, and provides basically any mathematical function that could possibly be required.

To use bc, simply ‘pipe’ commands into it, and grab the result on bc’s stdout:

$ echo ‘8/3’ | bc -l

2.66666666666666666666

$ echo ‘a(1)*4’ | bc -l

3.14159265358979323844

$ pi=`echo ‘a(1)*4’ | bc -l`

$ echo $pi

3.14159265358979323844

Note the ‘-l’ parameter to bc – this defines the standard mathlib library, which contains some useful functions (like arctan, or ‘a’, used above). The parameter also makes bc use floating-point numbers by default (without it, bc will only give integer results).

Command-line parameters

Often you will want your shell scripts to take parameters, which modify the behaviour of the script. They can specify a file on which to operate, or a number of times to iterate over a loop, for example. This essentially just passes in a variable into the script, which can then be used.

Command line arguments appear as numbered variables. $0 denotes the command that was run (your script’s name, typically). After that, the arguments to the command are given, as $1, $2, $3, onwards.

For example, the script:

#!/bin/sh

echo $0 utility.

echo Arguments are:

echo $1 – first argument

echo $2 – second argument

echo $3 – third argument

Can be executed with:

$ ./test.sh a b c

./test.sh utility.

Arguments are:

a – first argument

b – second argument

c – third argument

Arguments can also be referred to en masse with the $* special variable, which returns a string containing all arguments.

See ‘Iterating over command-line arguments’ for notes on how to use this.

Making decisions

Making decisions in a script is a very useful thing to be able to do – it can allow you to take actions depending on whether a command succeeded or failed, or it can allow you to perform an action only if it’s applicable (like only backing up to an external drive if it’s plugged in!).

If statements are formatted thus:

if test; then

   do_something;

elif test2; then

   do_something_else;

else

   do_something_completely_different;

fi;

The ‘elif‘ statement is optional, and can be omitted. It can also be duplicated – like any if statement in any other language, you can have as many elif’s as you like.

Note that this can also go on one line. For example: if test; then do_something; else do_something_else; fi

The statement above performs test; if test succeeded, then do_something will be executed. Otherwise, test2 is performed. If that succeeds, do_something_else is executed. Otherwise, do_something_completely_different is executed.

The test in an if statement is a command that is executed; the value returned from the command is used to make the decision.

All command-line applications return a numerical value (this can be any integer value), which is usually utilised to indicate the status of the command upon exiting. A value of zero is usually used to indicate success. A non-zero value usually indicates failure.

You can observe the value returned by a command by using the $? variable immediately after the command exits:

$ ping nosuchhost

ping: cannot resolve nosuchhost: Unknown host

$ echo $?

68

$ ping -c 1 google.com

PING google.com (72.14.207.99): 56 data bytes

64 bytes from 72.14.207.99: icmp_seq=0 ttl=233 time=258.205 ms

— google.com ping statistics —

1 packets transmitted, 1 packets received, 0% packet loss

round-trip min/avg/max/stddev = 258.205/258.205/258.205/0.000 ms

$ echo $?

0

The if construct tests whether the returned value is zero or nonzero. If it’s zero, the test passes. So, we could write:

if ping -c 1 google.com; then

echo ‘Google is alive. All is well.’;

else

echo ‘Google down! The world is probably about to end’.

fi;

Tests can also be performed in-line, allowing commands to be strung together. The && joiner, placed between commands, tells the shell to execute the right-hand command only if the left-hand command succeeds:

ifconfig ppp0 && echo ‘PPP Connection Alive.’

The || joiner performs similarly, but will only execute the right-hand command if the left-hand command fails:

ifconfig ppp0 || redial_connection

Commands can be grouped in these structures, and strung together – for example, a series of commands that must be executed in sequence, and only if all preceding commands succeed too. Commands can be grouped in brackets, to form fairly complex statements:

$ true && (echo 1 && echo 2 && (true || echo 3) && echo 4) || echo 5

1

2

4

$ false && (echo 1 && echo 2 && (true || echo 3) && echo 4) || echo 5

5

$ true && (echo 1 && echo 2 && (false || echo 3) && echo 4) || echo 5

1

2

3

4

Testing, testing

Now we’ve seen how to act upon the results of a test, it’s a good time to introduce the test utility itself.

test is an application that is used to perform a wide variety of tests on strings, numbers, files. Expressions can be combined to construct fairly complex tests. By way of example, lets look at a few uses of test:

$ test ‘Hello’ = ‘Hello’ && echo ‘Yes’ || echo ‘No’

Yes

$ test ‘Hello’ = ‘Goodbye’ && echo ‘Yes’ || echo ‘No’

No

$ test 2 -eq 2 && echo ‘Yes’ || echo ‘No’

Yes

$ test 2 -lt 20 && echo ‘Yes’ || echo ‘No’

Yes

$ test 2 -gt 20 && echo ‘Yes’ || echo ‘No’

No

$ test -e /etc/passwd && echo ‘Yes’ || echo ‘No’

Yes

See the test manual page for more information.

To perform arithmetic tests on floating point values, the bc tool steps in again (as ‘test’ will only operate on integers):

$ test `echo ‘3.4 > 3.1’ | bc` -eq 1 && echo Yes || echo No

Yes

$ test `echo ‘3.4 > 3.6’ | bc` -eq 1 && echo Yes || echo No

No

Note particularly the single quotes around the ‘>’ expression: without this, the meaning of the expression changes (the value ‘3.4’ will be redirected into the file ‘3.1’ or ‘3.6’).

If a bc expression evaluates to true, bc returns ‘1’. Otherwise, bc returns ‘0’.

For code readability, the test utility is aliased to ‘[‘, and will ignore the ‘]’ character. Thus, test can be used in commands like:

if [ $count -gt 4 ]; then

take_action;

fi;

Gone loopy

Iterating over commands can be great for performing tasks on a large number of items. There are two loop types defined, while and for loops.

While loops

While loops have the following structure:

while test; do

command_1;

command_2;

done;

Here, test is the same as that from if statements (see above). Note the placement of semicolons – after the test, and before the do, in particular.

Note that, like all script elements, while loops can be used on one line, for quick entry on the command line: while test; do command_1; command_2; done;

While loops, like their counterparts in other programming languages, will continue executing until test evaluates to false.

For loops

For loops are defined thus:

for var in set; do

command_1;

command_2;

done;

For loops are used to iterate over a set of values, defined here in set. The variable var is used to iterate over the set: For each iteration, var is set to the next value within set.

Set is a whitespace-delimited string, containing a list of items. For example:

for dir in Documents Pictures Library Music; do

cp -a $dir /backup;

done;

for image in Images/*.jpg; do

convert “$image” -scale 640x480 “$image-scaled.jpg”;

done;

Escaping

To break out of a while or for loop, the ‘break’ command is used. To continue onto the next iteration, thereby skipping the rest of the statements in the loop body, the ‘continue’ command is used. For example:

count=0;

while [ $count -lt 100 ]; do   # Iterate 100 times

   if [ $count -eq 2 ]; then   # Skip the 2nd iteration

      continue;

   fi;

   do_stuff || break;          # Stop iterating if do_stuff fails

   count=$((count+1));         # Increment ‘count’

done;

Iterating over files

Lets direct our attention to that second-last example:

for image in Images/*.jpg; do

convert “$image” -scale 640x480 “$image-scaled.jpg”;

done;

Note that this will only function correctly if none of the files in ‘Images’ have spaces in their name. As this is a rather dangerous assumption, we best avoid it when we can.

To be honest, I haven’t discovered a way to make this work on files with spaces. Instead, I tend to use the ‘find‘ tool with ‘xargs‘ to perform commands.

The ‘find’ tool will return a list of files that match the provided pattern. The ‘xargs’ utility performs a set of commands on each item it receives as input. We can put the two together with:

find -maxdepth 1 -type f -print0 | xargs -0 -i{} sh -c ‘echo Working on file {}.’

This example finds all files (-type f) in the current directory (-maxdepth 1), and then xargs prints ‘Working on file <filename>.’ for each one.

The -print0 argument to find forces the utility to delimit files with a ‘null’ character instead of the default, newline. This makes for safer filename handling. It has to be used with the -0 argument in xargs, which will use null character as the delimiter in the input.

The -i{} parameter tells xargs to use the ‘{}’ sequence to denote where the filename should be placed in the command. Arguments afterwards are executed. The argument “sh -c ‘echo Working on file {}.” here will make the shell execute the echo command.

Note that the echo command could be used without ‘sh’, like: xargs -0 -i{} echo Working on file {}.

This is fine if only one command is used. However, if more than one command is to be executed, or more complex commands are to be used, these commands need to be interpreted with ‘sh’. As xargs is just a simple execution tool, it doesn’t understand shell scripts.

Thus, complex statements can be put together. For example (note that this is one command spread across two lines):

find -maxdepth 1 -type f -print0 | xargs -0 -i{} sh -c ‘echo Working on file {}.; copy_file_to_server {} || echo Upload of {} failed.’

Iterating over command-line arguments

Often, you will want to make shell scripts take a series of arguments that are then iterated over. For example, a script may take a list of images to manipulate, or text files to edit.

Such a utility would be invoked with:

$ my_script.sh *.jpg

If any of the arguments had spaces in them (in this case, for example, a jpg called ‘My Trip.jpg’), this can be a little tricky to handle.

Although the arguments would be passed correctly (that is, one of the arguments would indeed contain the text ‘My Trip.jpg’), it is difficult to iterate over them correctly. If a for loop were to be used:

for img in $*; do

manipulate_image $img;

done;

…Spaces within filenames would cause problems. In our example, instead of ‘My Trip.jpg’ being passed to manipulate_img, it would be split – first ‘My’ would be passed to manipulate_img, followed by ‘Trip.jpg’! Nasty.

A technique I often use is to make use of the shift command, which discards the first argument, and moves all other arguments down one. This is a more robust technique:

while [ “$1” ]; do

manipulate_image $1;

shift;

done;

This will take the first argument, act upon it, then move the next argument down for the next loop.

The loop will finish when there are no more arguments, and “$1” will return an empty string, which evaluates to ‘false’.

Final words

That’s about it for this brief tutorial. Hopefully you have enough to start assembling scripts and powerful commands to help you out. There’s a huge amount more to know about shell scripting though – arrays, clever variable manipulation, and plenty more stuff that I’m entirely unaware of, I’m sure. If you want to know more, just do some Googling for shell scripting – there’s an insanely large number of resources out there.

Stay tuned over the next couple of days – I’ll post a brief guide to using some fairly nice tools, like awk, sed, uniq and sort. These little rippers are fantastic for manipulating text and gathering statistics. Trust me, once you know how to use them, you’ll use them all the time (I do!).

For now, I’ll leave you with a final example – this is a small script I wrote the other day to replace the ‘rm’ command, and move all ‘deleted’ items to the trash, instead of just deleting them outright. Here it is:

#!/bin/sh

if [ “$1” = ‘-rf’ -o “$1” = ‘-r’ ]; then

   shift;

   recursive=true;

fi;

while [ “$1” ] ; do

   # If not recursive, skip directories

   if [ -d “$1” -a ! “$recursive” ]; then

      echo $0: $1: is a directory; shift; continue;

   fi;

   [ ! -d ~/.Trash/“`pwd`” ] && mkdir -p ~/.Trash/“`pwd`”;

   mv “$1” ~/.Trash/“`pwd`”;

   shift;

done;