written on 15/01/2015
For so long i've wanted to download all of the  PHDComics web comics. For those of you who don't know what it is, PHDComics are a series of web comics by Jorge Cham that depict the common idiosyncrasies of Graduate School and the Grad Student's life. They are pretty Hilarious. :D
 Anyways, so i finally sat to write a simple script that could download all of the comics for me. Sounds simple right? A simple loop, culr/wget and presto, all images downloaded. Nopes.!! Took me a good 1 Hour and 15 minutes to figure stuff out and write it down.
for i in {1..1776}
do
  w3m -dump_source "http://phdcomics.com/comics/archive.php?comicid="$i>test.txt
  file_line=$(grep "<td bgcolor=#FFFFFF align=center><img id=comic name=comic src=http://www.phdcomics.com/comics/archive/" test.txt)
  x="${file_line#*src=}"
  x="${x%% *}"
  python test.py $i $x
  echo -e "Recieved Image: "$i
  echo -e " " 
done
"<td bgcolor=#FFFFFF align=center><img id=comic name=comic src=http://www.phdcomics.com/comics/archive/"
followed by the image name. This makes it very easy to "grep" the line containing these things. After that there is just very simple RegEX that is awesome in BASH.
x="${file_line#*src=}"Successfully removes everything before and including "src=" from the URL.
x="${x%% *}"Removes everything after the URL. Hence, after this, we are just left with the URL of the image file in the variable x.This, along with the counter ie. "i" is passed to the Python Program "test.py":
import urllib
import sys
count=sys.argv[1]
url=sys.argv[2]
urllib.urlretrieve(url,"%s.gif"%count)