A few months ago, a friend told me about “Naruto”, a japanese manga from which two TV series were made. I now hate this friend, because I was completely hooked by the story and wasted many hours that could’ve been spent more productively :) I watched nearly all of the dubbed episodes of Naruto (I got tired of the later filler episodes) as well as the Shippuden episodes.
The current story arc in Shippuden is also a filler arc that seems to be going nowhere, so I decided to get the mangas to read the canon story. There are, as of this writing, 445 mangas. That’s a lot of mangas to download by hand! (I could read them online, but I found the picture quality to be severly lacking.) Being a lazy programmer and all, instead of clicking individually on every link, I decided to write a script to fetch them all. I could’ve simply used a for loop, but I wanted to download many archives in parallel to make the process go faster, so I wrote the script in Python and I used the new multiprocessing module to make the parallelism easy (trivial, even!)
This is an extremely simple script, nothing fancy, but I give it to you anyway:
from multiprocessing import Pool
import os
def get(n):
os.system('wget -q "http://www.narutochuushin.com/downloads/script/downloads.php?title=manga_chapter%03d"' % n)
pool = Pool(10)
pool.map(get, range(1, 446))
pool.close()
Hope you enjoy!
May 6, 2009 at 9:11 pm |
Parallel, kill-your-connection version
May 7, 2009 at 9:07 am |
I’m not that great with the command line so I’m not sure how to pick out the passed in variable but couldn’t you do something like this?
seq 1 446 | xargs -n 1 -P 10 “wget -q http://www.narutochuushin.com/downloads/script/downloads.php?title=manga_chapter%03d”
You’d just have to figure out how to access the sequence number that was passed in and replace it in the url.
May 7, 2009 at 9:42 am |
Theo: You can use -I to specify a string that will be replaced by the current value.