Category Archives: Linux

wget, recursive ftp and exclude-directories

Here’s the problem.

You don’t have shell access to a web server, but you need to download a dump of a web application.

You have ftp access, so you can use the recursive ftp option of wget, but the web root of the application contains all manner of directories which aren’t relevant to what you want to do and which you don’t want to download.

Enter the –exclude-directories switch for wget, whereby you can specify a comma-separated lists of directories you don’t want to include in your download.

Except, of course, it doesn’t work.

Well actually, it does, but it just isn’t very intuitive. And it isn’t helped by the fact that there are lots of forum posts out there telling you that you need to specify the absolute path to the directory rather than the path relative to the web root. This isn’t the case.

Lets remember, we’re authenticating via ftp here, so all our wget client is going to know about is the directory structure below the home directory of the user we are authenticating as.

If you login in using a native ftp client, and issue a pwd command, you’re going to see all of the directory structure that wget sees, so that should tell you how to list the directories on the command line.

For example:

When I login to my server via FTP, and do a pwd, I see “/public_html”.

The directories I want to exclude are the ‘mp3files’ and ‘videos’ directories from my web root, because I don’t want to download 10GB of media.

The FTP path to these directories are ‘/public_html/mp3files’ and ‘/public_html/videos’, so these are the directories I tell wget I don’t want to download:

wget -r -X /public_html/mp3files,/public_html/mp3files -nH – –ftp-password=ftppass

This works.

Run a cron job at multiple random times

Say you want to run a cron job 10 times (or so) in a day at random times.

Here’s my solution:

1. Create a probability test that gives a 10% probability of the outcome you want

$p = mt_rand(1,10);

2. Add in a bit of extra randomness with a short sleep

$s = mt_rand(60,300);


$p = mt_rand(1,10);

if ($p != 1) {


} else {

$s = mt_rand(60,300);

Your stuff here…


3. Run your cron job 4 times an hour, 24 hours a day

*/15 * * * * /usr/bin/php myscript.php

At this, your cron job will run 96 times per day, and execute 1 in 10 of those times, which gives you 9 to 10 executions per day at random times.

OK, its not totally random, and you can’t guarantee the number of executions, but if you have  fixed number of executions per day, that’s not really random, is it?

Nod, nod, wink, wink….

Ubuntu 8 rocks!

I recently (and belatedly) upgraded my laptop from Ubuntu 6 to Ubuntu 8.

I hadn’t really been very diligent in taking the OS updates as they became available, so I decided the safest way to go about it was to backup all my data and re-format my partitions with a fresh installation.

This all went really smoothly, and I was off and running with Ubuntu 8 within an hour or so.

It really is a great Operating System, and at this stage, can easily rival Windows XP or Vista in terms of ease of use.

I am particularly impressed with the improvements that have been made re. Wireless Networking. With Ubuntu 6, hooking up to the Wireless Network took a little bit more mouse work than I was comfortably with, whereas now the whole thing happens in the background.

My new OS also detected the 3G modem in my mobile phone, and established a working connection for me within 30 seconds. It even knew what Irish mobile network I was connected to.

A big issue with my old OS was that it wasn’t able to detect my battery status other than at boot time. I know know exactly how much power my battery has, and I have other readily accessible tools that allow me to easily reduce battery usage on the fly.

Added to this I have all the latest versions of my favourite apps like OpenOffice, GIMP, Firefox, Thunderbird, Gedit etc and I really am pleased I eventually got around to this.

Ubuntu will be getting a hefty donation from NBF this year.