I want to make a small program to get the value of my Magic Card collection. The script will automatically calls the information from a 3rd party website. Therefore I just had to sit down for approx. 2 hours and enter the card names into a spreadsheet application. As a miser I use OpenOffice.Calc for private. For my personal needs that’s more than enough even if its scope cannot compete with Microsoft Excel.
I enter all the card names and their edition in my table. The edition is necessary because it might have an impact on the card’s value.
Now I can start to search for a suitable website to get the price information. To keep it simple I search for a site that hands over the parameters in the site URL. Therefore I use the site’s search function so see how it is working.
In my case I just search for the first of my cards with the flowery name “Woodripper” out of “Nemesis” series. That’s the sites response
Looking a the parameters after the “/?” I understand that the card’s name is handed over in clear text: “cardName=Woodripper”. The variable names are easy to understand so I see that idExpansion must be the series. Unfortunately there is no clear text, it’s coded in an indentifier.The other variables are search options that are not useful for me at the moment. I also see that the result is displayed in a nice HTML table and that will simplify parsing later on.
My next job is to find out how the idExpansion relates to the name of the series. The series names are shown in the website, therefore this information has to somewhere be in the HTML code. I have a look on the source code of the search page. This is pretty simple and works in all browsers similar. You have to right-click somewhere on the website and select the item called like “display source code”. Unfortunately I get the result almost unformatted. That’s not unusual but at the first look I cannot see anything interested. It would be possible to use an advanced HTML editor to beautify my code but I use my beloved Notepad++ instead.
I copy the complete code into a new file and search for the string “Nemesis”. I find the search string embedded intags. Understanding the syntax of option tags I know that they will tell me what I have searched for: the ID and the clear text.
In my example the option tags look like this:
<option value=”32″>Nemesis</option>
It is <option value=”idExpansion”>clear text</option>
I change the formatting of the code using the search and replace function of Notepad++. I add a line break (\r\n) after each </option> tag.
When I scroll down the document now I can easily find the data I’m searching for. Now I just delete everything before the firstand after the lasttag. Now the document contains the payload only. I now also delete the first row because the option ALL has no use for me.
Now I have to put it into a format that my spreadsheet application can work with. I use again the search and replace function and replace <option value =” with an empty string, “>with a tabulator (\t) and with an empty string. Now I have a nice tabstop separated text file.
Now it’s time for a bit of Excel-Magic (I know politically correct it should be called “Openoffice.Calc-Magic or even better “Spreadsheet-Application-Magic” but that sounds to weird to me). I import the text file into a separate tab and add the series ID in my card list using VLOOKUP.
Now I can save my file in ODT format first and then I save it additionally in CSV format. Now I the date is prepared for the next step. Next time I will continue with the script.