After preparing the data last time I need to do the script. I use PHP because I like the language and I’m quite fit in PHP programming. Additionally it can be used on Windows and Linux equally. This time I use the Netbeans IDE, even if a simple text pad would be sufficient as well. For sure such a program could be done in any other programming language (such as Phyton, C#, ..) as well.
Similar last time I must have a look on the HTML code. I search for the tags and can find them easily. Now I have to search for some code snippets that enable me to identify the single columns of the table. While doing such work I always have an empty text file open in my Notepad++ and copy interesting info right into that file. I can copy the string out of this file whenever I need it later on.
Then I make the script. Is has to run through each line of my CSV file and call the target website with the parameters. The script then parse the website and writes the parameters together with the price in a new CSV file.
I use REGEX for parsing the website. Some programmers don’t really like regex because it is so mighty that it can be seen as separate programming language. Honestly it is a bit complicated at the beginning but it is ideal for such a job and there are some helpful resources. For e.g. I strongly suggest to use https://regex101.com/ for developing and testing the expressions.
To make the code run the following extension have to be enabled in the php.ini
- extension=php_openssl.dll
- extension=php_curl.dll
Additionally “allow_url_fopen = On” has to be set.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
<?php ini_set('auto_detect_line_endings',TRUE); // Necessary for reading the CSV $input = fopen('C:\temp\magic karten.csv','r'); // Input CSV with cards and Extension IDs $output = fopen('C:\temp\Kartenpreise.txt', 'wt'); // Output CSV while ( ($data = fgetcsv($input,1000,',') ) != FALSE ) { $ch = curl_init(); $useragent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)'; $url= utf8_encode("https://www.???.de/?mainPage=advancedSearch&cardName=".str_replace(' ','+',$data[0])."&idExpansion=".$data[3]."&manaCost=&convertedManaCostCompare=%3D&convertedManaCostValue=&powerCompare=%3D&powerValue=&toughnessCompare=%3D&toughnessValue="); curl_setopt($ch,CURLOPT_URL,$url); curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false); curl_setopt($ch,CURLOPT_USERAGENT, $useragent); curl_setopt($ch,CURLOPT_RETURNTRANSFER, true); $filestring = curl_exec($ch); curl_close ($ch); usleep(500000); // To be polite and don't stress the target server to much $searchpattern = '/<[^>]*>(?.*)<\/a><\/td>\s*<[^>]*>(?.*)<\/a><\/td>\s*(?\S*)<\/td>\s*(?\S*)/'; if (!preg_match_all($searchpattern, $filestring, $treffer)) { echo $data[0]."\t".$data[2]."\t not found !!!!!!!!!!\r\n"; continue; } echo $treffer['name_deu'][0]."\t".$treffer['name_eng'][0]."\t".$data[2]."\t".$treffer['anz'][0]."\t".$treffer['preis'][0]."\r\n"; fputs($output, $treffer['name_deu'][0]."\t".$treffer['name_eng'][0]."\t".$data[2]."\t".$treffer['anz'][0]."\t".$treffer['preis'][0]."\r\n"); } ini_set('auto_detect_line_endings',FALSE); fclose($input); fclose($output); ?> |
The result is a nice CSV file with all the information I need. I open it in my spreadsheet program and can easily build the sum.