Automatically download new invoices from Customer Portal

I have been struggleing with downloading invoices from my internet provider TeleColumbus (this is a german TV and Internet Provider). They don’t send it out by paper or email. At least not without paying extra. So I have to download them manually each month by logging into their website and and access the custumer portal to download them. This would be ok so far, but they store it there only for 6 months. If you are too late you can only call them and pay them for sending it to you again.

I wanted to solve this issue by creating a script to check for new invoices and download it if available. This script should check every week triggered by a cronjob. Checking it every month should also work but I wanted to be sure not to prevent a check, if my device (which is running the script) is turned of or my internet is not available or the website is down (it happens a lot with TeleColumbus!).

This script is written in bash.

Before starting you should open the website with Firefox and the Firebug plugin installed. Inspect the input fields for username and password.

TeleColumbus Inputfields

You’ll see the ids of the two input fields. They are quite obvious named “username” and “password”.

Next enter your credentials and login.

Take a look at the Console now. There should be “Get” entries listed. Right-click on it and select “copy as cURL”

If you paste this cURL command to a texteditor you’ll get some relevant info:


curl 'https://service.telecolumbus.net/selfcare/ws/getmenu/account-menu-entry' -H 'Host: service.telecolumbus.net' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:42.0) Gecko/20100101 Firefox/42.0' -H 'Accept: */*' -H 'Accept-Language: de,en-US;q=0.7,en;q=0.3' -H 'Accept-Encoding: gzip, deflate' -H 'X-Requested-With: XMLHttpRequest' -H 'Referer: https://service.telecolumbus.net/selfcare/account' -H 'Cookie: m43_selfcare_portal_session=xxxxxxYOUR-SESSION-IDxxxxxx'

Now we can begin to write our own command to login to the website. Write your login credentials for the website into the correspondent variables. They will be used with curl afterwards:

##Global settings 
USER="YourUsername"; 
PASS="YourPassword"; 

curl https://service.telecolumbus.net/selfcare/login -b cookie.txt -c cookie.txt -e service.telecolumbus.net -d "username=$USER&password=$PASS"; 

The first URL after curl is the login page of TeleColumbus.

“-b”  –  will read an existing cookie file (cookie.txt).

“-c”  –  will write a cookie to a file (cookie.txt).

“-e”  –  you’ll need a referrer page (service.telecolumbus.net). I shortened it by using this option. The -H option needs a specific option afterwards (‘Host: service.telecolumbus.net’).

“-d” – this will send data of the username & password (remember: we grabbed these ids of the input fields before?) to the HTTP server.

Now your cookie file should be created. It contains a session id which is needed to download the invoices from the official invoice page.

Open the website containing your invoices and take a look at the code:

TeleColumbus Invoices

We will call this page with curl and need to filter the code to find the download link of the invoice. I nice thing is that the “role” for the latest invoice has also the property “currentdocument”.

So let’s give it a try:

curl https://service.telecolumbus.net/selfcare/account/invoices -b cookie.txt -c cookie.txt -e service.telecolumbus.net

It should output the source code of the whole page. Now we need to filter this output. We know from the html code there is only one table on the website and our latest invoice has a special “role”.

We can use grep to find that specific line. By adding the option “-A5” we can add the following 5 lines to the output.


curl https://service.telecolumbus.net/selfcare/account/invoices -b cookie.txt -c cookie.txt -e service.telecolumbus.net | grep '<tr role="invoice currentdocument"> -A5

We now get the html code of the first table row containing the information for the latest invoice. I’d like to use multiple values of the output. I want to create a variable with these lines to use them for the date and download URL.


htmlLatestInvoice="$(curl https://service.telecolumbus.net/selfcare/account/invoices -b cookie.txt -c cookie.txt -e service.telecolumbus.net | grep '<tr role="invoice currentdocument"> -A5)"

Let’s put the download link into a variable. We can again use grep to find the specific line (it contains “document-url) and the pass it to awk to output the characters on position 6.  The last step is to cut some extra characters at the end.


dl_url="$(echo "$htmlLatestInvoice" | grep document-url | awk {'print $6'} | cut -d "\"" -f2)"

To get the date of the invoice, the same can be done. This time we use cut two times and cut characters at the beginning and then at end:


invoicedate="$(echo "$htmlLatestInvoice" | grep 'documentdate' | cut -d ">" -f2 | cut -d "<" -f1)"

Let’s composite the filename of the invoice into a variable:


file="Invoice_"$invoicedate".pdf"

With cURL we can now grab the invoice:

The URL has no file extension. We can simply output the command to a file. We just composed the filename.


curl $dl_url -b cookie.txt -c cookie.txt -e service.telecolumbus.net > $file

This should grab the invoice and create a .pdf file in your current directory.

Leave a Reply

Your email address will not be published. Required fields are marked *