bandcamp

A quick exercise web scraping with javascript and bash in bandcamp

Sunday, January 8th, 2012

Here is a quick exercise to explore web scraping in bash on Bandcamp.  We’ll get our hands dirty with some simple console and bash scripting. This is written in January 2012, so things may change between now and when you read this.

example album page

If you take a look at the source on Bandcamp, you notice that all of the music data is located in a Javascript variable.  On your browser there should be a ‘view-source’ option.  In Chrome on the mac it is command-option-u.

 

Notice that in the comment Bandcamp includes information on their terms of service and their stance on applications that pull music off the site. You should probably read those and remember this is only meant as an exercise.  Looking at the FAQ I imagine they would be okay with this tutorial.  (Otherwise, if not okay, message me and I’ll take this down.)

So open up your web console.  I’m using Chrome, so it is command-option-i.  You can then explore the object TralbumData. You can retrieve the track information with TralbumData['trackinfo'].

We’re going to pull the titles and the files.

First get the titles.  You can use `console.log(str) ` to get things printed onto the console. We’ll write a loop over all of the objects and print out the ‘title’ value for each.

for (i in TralbumData['trackinfo']){console.log(TralbumData['trackinfo'][i]['title']);}

Now copy that and paste it in a txt file called titles.

Then we need to get the URLs for the files. We can use the same loop, replacing ‘title’ with ‘file’

for (i in TralbumData['trackinfo']){console.log(TralbumData['trackinfo'][i]['file']);}
 That will give you a list of URLs.  Put that in a file called playlist. Now in Bash we’ll download the file at each URL to the appropriate title.mp3 file. It took some googling to figure this out since I don’t normally do any bash scripting beyond simple command line tools.  Open up Terminal.  First we’ll create an array called name and put each line of titles to an entry in name. In doing this we also have the change the Internal Field Separator.  This is so that when we loop over cat titles we’ll get a new item for each line, rather than for each word.  You can do that with this command:
IFS=’
The second quote is on a new line define the IFS as newlines (IFS=’\n’ doesn’t work [source]). And then run the loop to fill the names array.

j=0;for i in `cat titles`; do names[$j]=$i; j=$(expr $j + 1); done

names contains the track titles.  To get the value out for index 0, you’d use ${name[0]}.  We’ll use that for the output file name.  This time we’ll loop over the entries in playlist and use wget to download the file at each url.

j=0;for i in `cat playlist`; do wget -O ${names[$j]}\.mp3 $i; j=$(expr $j + 1); done

The -O command line switch for wget specifies that the following argument will be the name of the output file.