Here’s the problem.
You don’t have shell access to a web server, but you need to download a dump of a web application.
You have ftp access, so you can use the recursive ftp option of wget, but the web root of the application contains all manner of directories which aren’t relevant to what you want to do and which you don’t want to download.
Enter the –exclude-directories switch for wget, whereby you can specify a comma-separated lists of directories you don’t want to include in your download.
Except, of course, it doesn’t work.
Well actually, it does, but it just isn’t very intuitive. And it isn’t helped by the fact that there are lots of forum posts out there telling you that you need to specify the absolute path to the directory rather than the path relative to the web root. This isn’t the case.
Lets remember, we’re authenticating via ftp here, so all our wget client is going to know about is the directory structure below the home directory of the user we are authenticating as.
If you login in using a native ftp client, and issue a pwd command, you’re going to see all of the directory structure that wget sees, so that should tell you how to list the directories on the command line.
When I login to my server via FTP, and do a pwd, I see “/public_html”.
The directories I want to exclude are the ‘mp3files’ and ‘videos’ directories from my web root, because I don’t want to download 10GB of media.
The FTP path to these directories are ‘/public_html/mp3files’ and ‘/public_html/videos’, so these are the directories I tell wget I don’t want to download:
wget -r -X /public_html/mp3files,/public_html/mp3files -nH –firstname.lastname@example.org –ftp-password=ftppass ftp://www.ftpserver.com/public_html