Making an auto-updating plot with Plotly

There are those days when you find tabulated information on the web that is relevant for your homework/assignment/research. Of course, if the owner of this information only cares about its dissemination, you will probably see that there is no figure representing these data visually.

In my case, I needed to visualize meteorological data for the Denver International Airport (KDEN) uploaded every hour by the National Weather Service. Of cousre, the data are tabulated and there is no figure representing the information. So here is where Plotly comes to play.

One advantage of Plotly is its well documented API. If you visit the Plotly API library you will find APIs for Python, Matlab, R, Node.js, Julia, and …wait for it…Excel. This means that with any of these programming languages (excluding Excel of course) you can replace your native plotting functions for the Plotly API and upload your plot online.

First step

I’m very familiar with Matlab and its plotting functions; however, I wanted to try a different approach for what I had in mind. So my first step was start learning Python…or at least the basics. This step should not be difficult to you if you are familiar with high-level programming languages like Matlab and if you have a basic notion about the Object Oriented Programming (OOP) paradigm. If the last condition does not apply to you, I would recommend reading about OOP before start learning Python.

Second step

Learning Python was not a “just-for-fun” step, it had a purpose. After some googling (yes, it seems to be “officially” a verb), I realized that there is something in this world called web scraping. In short, this means that you use your favorite programming language to search and extract data from websites automatically.

So, there was my new vocabulary word when I met Beautiful Soup (BS). As you can guess, BS helps to web-scrap the web using Python. It took me a while to get used to the BS syntax, but I made extensive use of this tutorial made by Susane mcg to get the basics that I needed.

Third step

As Susane mcg explain, you can us BS to download a table, save it as a variable, and then search for the specific html tag that contains the data. Therefore, as the Art of War of Sun Tzu says…”know your enemy” (actually I have no idea if it says that). For this, I explored the html structure of the table looking for the table tag that was most useful for extracting the data. This is important because a table might have embedded tables, so you need to look for the one you need.

Fourth step

Get familiar with Plotly. You need to install the API for python

Fifth step

Get your hands dirty. We will create a python script called plot_plotly.py.

First, we import the libraries that we will be using:

A python script can accept arguments from the command-line using sys. Thus, we can use the name of the station (KDEN) as input argument or choose a different station if needed (e.g. KORD):

Then, we use station for parsing and retrieve the url that hosts the table:

…and save this to a local variable to be processes by BS:

Now it is the turn of BS to work. First, we process the raw html with BS (like when you digest your soup, but the other way around…eww):

We look for a specific table that has a cell spacing value of 3:

From this table, we look for all its rows:

Since the data contained in the table are basically time series of temperature, pressure, etc, we will create vectors containing this information. First, we initialize the variables:

Although some people say that this is not necessary in Python, you’ll get an initialization error when using these variables in the following loop. This loop populates the vectors with the information extracted from the table:

We reverse the vectors, so the oldest data resides in the first vector index and the newer in the last:

We can determine the month and year from the system time:

Next, create a time vector:

Note the embedded loop in the variable above. From what I’ve learned, that’s common practice in Python.

Now that we have the data in vector form, we can make use of Plotly. You need to create an account in Plotly (which is for free) to be able to upload your plot. You will then receive information for logging in from the python script:

There are different options in Plotly to create a time series plot, called trace. Here, I use a single X axis for all the traces and a second Y axis for pressure:

Then, you can use the following code for the updating time and date of the last plot:

Next, we create a layout:

..then make a figure object:

…and finally send the figure to Plotly:

So, if everything is good, you should obtain a plot like this:

Last step

Although the python script is ready to run, we need a way to run it automatically every hour. I learned that there is a Unix utility called Crontab which I found I had in my system by using:

Then, to edit the crontab schedule you type in your terminal:

This will open a text file using the Vi text editor. Scheduling the python script to run every 62 minutes for KDEN station would be:

Two important things you need before the schedule can run:

  • add this in the first line of your script: #! /usr/bin/env python
  • make the python script executable by using: chmod u+x

Final thoughts

Of course, there are many ways to do the same job and maybe with an even more efficient approach. However, if you are just learning Python (like me) then this will help you to play around a little bit and get the basic syntax of python.

You might also note that this is not exactly the same plot I showed as example in a previous post. Using this code, every time you run plot_plotly.py new data (if the NWS table is updated) will replace the table you already have in Plotly. Then, you need to work a little more the code in order to keep the existing table and just add the last data from the new NWS table, so your plot will extend until the last time you run the script.

Leave a Comment