Using Highcharts for Python with Pandas and PySpark

Using Highcharts for Python with Pandas and PySpark logo

Highcharts Core is the gold standard in JavaScript data visualization, offering far more customizability and interactivity out-of-the-box than “mere” plotting libraries. And using the Highcharts for Python Toolkit, you can now easily use it in your Python code (or even in your Jupyter notebook), and rapidly visualize data that you have loaded into a Pandas or PySpark DataFrame.

Tip
The Highcharts for Python capabilities are quite extensive, and this tutorial is meant to be a quick intro to using Highcharts for Python with Pandas and PySpark. But we definitely recommend that you review our other tutorials to see more detail on how you can enrich your data visualizations using Highcharts for Python and Highcharts Core.

Getting Started with Highcharts for Python

To use Highcharts for Python, like with any Python library, you first have to install it. That’s super easy. Just open your project (in your virtual environment – you are using virtual environments, right?) and execute:

$ pip install highcharts-core

And that’s it Highcharts for Python will now be in stalled in your project and available for use. To use it in your code, you just have to import it the way you would any other library.

Importing Highcharts for Python

The Highcharts for Python library is quite extensive. It’s got a rich (read: complicated) API, with lots of different objects and modules. That’s a reflection of the visualization power that Highcharts offers you, but it does make the decision of what to import and how to import it a little more complicated.

Python’s best practice is to import only what you need, which helps to maximize the performance of your Python code and prevents your application’s namespace from getting cluttered with various things (and further reduces the memory footprint of your Python code, which is always good).

You can either import specific things from their precise locations, or you can also just import the catch-all highcharts module, which flattens the entire API and exposes just about every class/object in one location. We definitely recommend importing things from their precise location like so:

# Import classes using precise module indications. For example:
from highcharts_core.chart import Chart
from highcharts_core.options import HighchartsOptions
from highcharts_core.options.plot_options.scatter import ScatterOptions
from highcharts_core.options.series.scatter import ScatterSeries

A Super Simple Example

So, let’s assume that you are merely analyzing a bunch of data. You have extracted it from the underlying data files or database, loaded it up into a Pandas or PySpark DataFrame, and performed whatever calculations you want to perform on your dataset. While Pandas and PySpark are two different libraries, the way that Highcharts for Python visualizes data from either is extremely similar.

Let’s assume you have your data stored in a DataFrame called df (creative, right?). And let’s further assume that you have two columns with the data that you want to plot: one labeled actual and one labeled forecast. As you can probably guess, we want to plot the forecast against the actuals across the observations in our dataset to better evaluate fit.

Image of a dataframe with the structure from a jupyter notebook

The process of doing this is very simple:

  1. Import the Chart type from highcharts_core.chart and the ScatterSeries type from highcharts_core.options.series.scatter:
    from highcharts_core.chart import Chart
    from highcharts_core.options.series.scatter import ScatterSeries
  2. Create your two series – one for actual and one for forecast – using the ScatterSeries.from_pandas() or ScatterSeries.from_pyspark() class method, which will create new ScatterSeries instances:
    from highcharts_core.options.series.scatter
    import ScatterSeries
    # Create a new ScatterSeries instance plotting df['actual']
    actual_series = ScatterSeries.from_pandas(df,
      property_map = {
        'x': 'idx',
        'y': 'actual',
        'id': 'actual',
        'name': 'Observed Value for Metric'
      },
      series_type = 'scatter'
    )
    # Create a new ScatterSeries instance plotting df['forecast']
    actual_series = ScatterSeries.from_pandas(df,
        property_map = {
          'x': 'idx',
          'y': 'forecast',
          'id': 'forecast',
          'name': 'Forecast Value for Metric'
        }
    )
  3. Create a Chart instance and add your two series to it.
    # Create a Chart and add your two series to the chart.
    my_chart = Chart.from_series(actual_series, forecast_series)
    

And that’s it! Let’s breakdown what’s happening in the method calls above:

First, you’re telling the ScatterSeries.from_pandas() method which dataframe to use, in this case, df. That’s pretty straightforward. But the magic really happens in the property_map argument, which tells the method which DataFrame columns correspond to which properties in the series’ data points.

And then, once your series are created, you create a chart that contains those two series by calling Chart.from_series().

The property_map Argument

The property_map argument takes a dict whose keys correspond to the properties in the series type’s .data property. That can seem a little complicated, but think of it this way:
The Highcharts for Python toolkit supports over 70 different series types in its visualization. Different series types may have different properties for their data points. Some (like the ScatterSeries shown above) may be very simple: they get an “x” value for the x-axis, and a “y” value for the y-axis, and an “id” that uniquely identifies the data point.
The keys in property_map are the data point properties that should be populated. In the example above:

  1. The “x” key corresponds to the data point’s .x property – which in this case gets populated using the column labeled 'idx' in the DataFrame
  2. The “y” key corresponds to the data point’s .y property – which in this case gets populated using the column labeled 'actual' for the actual series, and 'forecast' for the forecast series.
  3. The “id” key corresponds to the data point’s .id property, which in this case will be given the same ID as the original column label, and
    The “name” key corresponds to the data point’s .name property, which in this case gives each series a label that will be shown in the visualization’s legend.

So the property_map argument tells Highcharts for Python which DataFrame column maps to which property. Pretty straightforward!

Tip
If you use convenience methods like .from_pandas() or .from_pyspark() make sure to review the series type’s data points to map your data to the appropriate properties. You can do so in the extensive API reference for the series type you want to chart.

Tapping Into Richer Functionality

The .from_pandas() and .from_pyspark() methods also accept a series_kwargs argument, which expects a dict whose keys are keyword parameters that you can use when instantiating the series. Those parameters can be used to further customize how the series gets rendered, including indicating the axis it should be rendered on, adjusting its interactivity, or more. For more details, please see the extensive API reference documentation for the series types you wish to visualize.

Other Approaches

The example above focuses on using a DataFrame, but Highcharts for Python has you covered regardless of how you are managing your data. You have similar convenience functions for loading data:

  • .from_csv() which loads data from a CSV file, and
  • .from_array() which loads data from an iterable

Every single series type available offers these convenience functions, but you can also call .from_pandas(), .from_pyspark(), and .from_csv() on the Chart class itself if you wish to just visualize one data series.

And if you wish to add more series to an existing chart instance? You can do that easily by just submitting them to that chart instance’s .add_series() method. You can also update existing series by calling the chart instance’s .update_series() method.

Tip
Our other tutorials show you in detail how to use these different convenience methods to rapidly build great visualizations – we recommend you check them out here: Highcharts for Python Tutorials.

And that’s it! You now have a fully-configured Chart instance containing the data from your DataFrame. We’ll describe how to actually use this Chart instance down below.

Visualizing Your Chart

Now that we’ve built a Chart instance and populated it with data, our next step is to visualize it. How you actually do this depends to some extent on how you are building your Python application. There are generally two ways to visualize your data:

In Jupyter Labs/Notebook

If you are working in Jupyter Labs/Notebook, there’s nothing simpler. You can visualize your data by calling my_chart.display(). And that’s it! When you run that notebook cell, your chart will get visualized.

Screenshot of notebook

In a Web Application

If you are building a web application, you may be using a web framework like Flask, Django, or FastAPI and relying on their templating engines for creating your views. Or maybe you are providing a Python backend API that delivers data to an entirely separate app via RESTful API calls. In either case, to visualize your new chart you need to somehow get its configuration to your web-based front-end. And that is super simple as well.

Using the example above, you can generate the full set of HTML and JavaScript content to render your chart with one method call:

as_js_literal = my_chart.to_js_literal()
# This will produce a string equivalent to:
#
# document.addEventListener('DOMContentLoaded', function() {
#   const myChart = Highcharts.chart('target_div', {
#      series: {
#          type: 'scatter',
#          data: [0, 5, 3, 5]
#      }
#   });
# });

So what is this method call doing? It is taking the entire set of instructions included in your my_chart variable, creating a JavaScript literal string that represents them, and putting that string in the as_js_literal Python variable. This string can then be piped into your web front end using whatever templating engine you are using, or delivered to your front-end in an API response, and it will then render your chart as you configured it.

In the example above, if you place the as_js_literal string in what gets rendered in your user’s browser, your chart will be automatically rendered by Highcharts Core, placing the chart inside the div element in your content whose id is “target_div”.

And that’s it! You should now see a beautiful line chart in your web content.

Downloading Your Chart

Often when you’ve created a visualization, you may want to download a static version of it as an image that can be embedded in other documents. With Highcharts for Python, that is a fairly trivial exercise. Given the example above, you can produce a PNG image very simply with one method call:

# Download a PNG version of the chart in memory within your Python code.
my_png_image = my_chart.download_chart(format = 'png')
# Download a PNG version of the chart and save it the file "/images/my-chart-file.png"
my_png_image = my_chart.download_chart(
    format = 'png',
    filename = '/images/my-chart-file.png'
)

The two examples shown above both download a PNG of your chart:
The first example keeps that PNG image in your Python code only, storing its binary data in the my_png_image variable. The second example not only stores its binary data in the my_png_image variable, but it also saves the PNG image to the file “/images/my-chart-file.png”.

The format argument is really the one doing the heavy lifting above. In the example above, it tells the method to generate a PNG image, but you can also create “jpeg”, “pdf”, and “svg”.

And that’s it! You should know that the .download_chart() method defaults to using the Highcharts Export Server provided by Highsoft (creators of Highcharts Core), however you can configure the method to use your own custom Highcharts Export Server if you choose. For more details on how to do this, please review our tutorial on Exporting Static Charts using Highcharts for Python.

More Resources

The above tutorial is just a really simple example of how you can create rich visualizations with just a handful of method calls using Highcharts for Python. But the library offers so much more! We recommend you take a look at the following additional resources which you may find useful: