Plotly¶

Python Open Source Graphing Library¶

Plotly's Python graphing library makes interactive, publication-quality graphs. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts.

gallery

Plotly has an easy-to-use interface to it called Plotly express. This library makes plotting with Plotly very easy. Plotly express works nicely with Pandas dataframes as input, we just need to specify which columns need to be plotted.

Import modules¶

In [2]:

  Copied!     
 
import pandas as pd
import plotly.express as px
import pandas as pd import plotly.express as px

Introduction¶

Let's start exploring the Plotly Database. The regular syntax for any Plotly.Express chart is px.chart_type(data, parameters) so let's try a simple line chart: px.line(data, parameters).

There're different ways to create the plot. We will check them all, but I think the third one makes the most sense.

using lists of values
using pandas.Series
using pandas.DataFrame and referencing the column names

1. Using lists of values.

We can create two lists of values for the x and y axis and use them as parameters for the line chart plot

In [1]:

  Copied!     
 
year = list(range(1996,2020,4))
medals = [1,4,5,9,1,2]

print(year)
print(medals)
year = list(range(1996,2020,4)) medals = [1,4,5,9,1,2] print(year) print(medals)

[1996, 2000, 2004, 2008, 2012, 2016]
[1, 4, 5, 9, 1, 2]

In [3]:

  Copied!     
 
px.line(x = year, y = medals)
px.line(x = year, y = medals)

2. Using pandas.Series

This is very much like using lists

In [4]:

  Copied!     
 
year_series = pd.Series(year)
medals_series = pd.Series(medals)

print(year_series) 
print(medals_series)
year_series = pd.Series(year) medals_series = pd.Series(medals) print(year_series) print(medals_series)

0    1996
1    2000
2    2004
3    2008
4    2012
5    2016
dtype: int64
0    1
1    4
2    5
3    9
4    1
5    2
dtype: int64

In [5]:

  Copied!     
 
px.line(x = year_series, y = medals_series)
px.line(x = year_series, y = medals_series)

3. Using pandas.DataFrame

This is most of the time the best option. We can plot things directly from our DataFrame of interest. We need to give the px.chart_type() function our dataframe using the argument data_frame. Then we only need to specify as x and y axis the name of the columns we want to use!

In [6]:

  Copied!     
 
# We create our dataframe 
df = pd.DataFrame({"Year" : year, "Medals" : medals})
df.head()
# We create our dataframe df = pd.DataFrame({"Year" : year, "Medals" : medals}) df.head()

Out[6]:

	Year	Medals
0	1996	1
1	2000	4
2	2004	5
3	2008	9
4	2012	1

In [7]:

  Copied!     
 
px.line(data_frame = df, x = "Year" , y = "Medals")
px.line(data_frame = df, x = "Year" , y = "Medals")

Note: If our dataframe is in wide format, we may need to change the shape to long format. This means that we always need to have our variables of interest as columns! Have a look at the melt method in Pandas. For example, lets make a wide dataframe:

In [19]:

  Copied!     
 
df = pd.DataFrame({'Year': {0: '2004', 1: '2008', 2: '2012', 3: '2016'},
                   'Canada': {0: 4, 1: 3, 2: 5, 3: 3},
                   'USA': {0: 5, 1: 9, 2: 1, 3: 2}})

df
df = pd.DataFrame({'Year': {0: '2004', 1: '2008', 2: '2012', 3: '2016'}, 'Canada': {0: 4, 1: 3, 2: 5, 3: 3}, 'USA': {0: 5, 1: 9, 2: 1, 3: 2}}) df

Out[19]:

	Year	Canada	USA
0	2004	4	5
1	2008	3	9
2	2012	5	1
3	2016	3	2

In this case, we would like to have a column named "Countries" that will encompass Canada and USA. We use the .melt() method to do this.

In [32]:

  Copied!     
 
long_df = pd.melt(df, id_vars=['Year'], value_vars=['Canada', 'USA'])
long_df
long_df = pd.melt(df, id_vars=['Year'], value_vars=['Canada', 'USA']) long_df

Out[32]:

	Year	variable	value
0	2004	Canada	4
1	2008	Canada	3
2	2012	Canada	5
3	2016	Canada	3
4	2004	USA	5
5	2008	USA	9
6	2012	USA	1
7	2016	USA	2

We also may want update the column names of our long_df to something more meaningful. Do you remember how to do that from yesterday?

In [33]:

  Copied!     
 
#update column names to 'country' and 'medals'
long_df.rename(columns={'variable': 'country', 'value': 'medals'}, inplace=True)
#update column names to 'country' and 'medals' long_df.rename(columns={'variable': 'country', 'value': 'medals'}, inplace=True)

Now we can use the long format dataframe to plot

In [34]:

  Copied!     
 
px.line(data_frame = long_df, x = "Year" , y = "medals", color = "country")
px.line(data_frame = long_df, x = "Year" , y = "medals", color = "country")

Save as variable and show¶

We can save our plots as variables. Then, if you would like to show your plot again, you can call it using the method .show()

In [36]:

  Copied!     
 
fig = px.line(data_frame = long_df, x = "Year" , y = "medals", color = "country")
fig = px.line(data_frame = long_df, x = "Year" , y = "medals", color = "country")

In [37]:

  Copied!     
 
fig.show()
fig.show()

A look behind the scenes: Plotly object structure¶

On the background, each graph is a dictionary-like object. When you store the graph into a variable, commonly fig, and display this dictionary using fig.to_dict() or fig["data"] or fig.data to see the elements data or fig["layout"] to review the design of the plot.

We can use .to_dict().keys() to see all keys inside the fig object:

In [40]:

  Copied!     
 
fig.to_dict().keys()
fig.to_dict().keys()

Out[40]:

dict_keys(['data', 'layout'])

There are two items inside fig.data because we have two lines, one for Canada and one for USA.

In [38]:

  Copied!     
 
fig.data
fig.data

Out[38]:

(Scatter({
     'hovertemplate': 'country=Canada<br>Year=%{x}<br>medals=%{y}<extra></extra>',
     'legendgroup': 'Canada',
     'line': {'color': '#636efa', 'dash': 'solid'},
     'marker': {'symbol': 'circle'},
     'mode': 'lines',
     'name': 'Canada',
     'orientation': 'v',
     'showlegend': True,
     'x': array(['2004', '2008', '2012', '2016'], dtype=object),
     'xaxis': 'x',
     'y': array([4, 3, 5, 3]),
     'yaxis': 'y'
 }), Scatter({
     'hovertemplate': 'country=USA<br>Year=%{x}<br>medals=%{y}<extra></extra>',
     'legendgroup': 'USA',
     'line': {'color': '#EF553B', 'dash': 'solid'},
     'marker': {'symbol': 'circle'},
     'mode': 'lines',
     'name': 'USA',
     'orientation': 'v',
     'showlegend': True,
     'x': array(['2004', '2008', '2012', '2016'], dtype=object),
     'xaxis': 'x',
     'y': array([5, 9, 1, 2]),
     'yaxis': 'y'
 }))

In [39]:

  Copied!     
 
fig.layout
fig.layout

Out[39]:

Layout({
    'legend': {'title': {'text': 'country'}, 'tracegroupgap': 0},
    'margin': {'t': 60},
    'template': '...',
    'xaxis': {'anchor': 'y', 'domain': [0.0, 1.0], 'title': {'text': 'Year'}},
    'yaxis': {'anchor': 'x', 'domain': [0.0, 1.0], 'title': {'text': 'medals'}}
})

As you can see, there are many attributes inside this dictionary. This means that a plot can be modified even after it is created. For example, we can use a layout template to modify the design of a plot or change the plot and axis titles

In [42]:

  Copied!     
 
fig.update_layout(template="plotly_dark", title = "Example", yaxis_title='Medals Earned')
fig.update_layout(template="plotly_dark", title = "Example", yaxis_title='Medals Earned')

This update is not only displayed, it has also changed the plot object. See how the layout part is different now:

In [43]:

  Copied!     
 
fig.layout
fig.layout

Out[43]:

Layout({
    'legend': {'title': {'text': 'country'}, 'tracegroupgap': 0},
    'margin': {'t': 60},
    'template': '...',
    'title': {'text': 'Example'},
    'xaxis': {'anchor': 'y', 'domain': [0.0, 1.0], 'title': {'text': 'Year'}},
    'yaxis': {'anchor': 'x', 'domain': [0.0, 1.0], 'title': {'text': 'Medals Earned'}}
})

We can also modify the attributes of the data using the update_traces method. For example, we change all lines to be dashed:

In [44]:

  Copied!     
 
fig.update_traces(line={"dash":"dash"})
fig.update_traces(line={"dash":"dash"})

We will see more ways of modifying the plots as we go through the different types of plots we can make!

More fun with line graphs¶

There are even more things we can do with line graphs!

Let's use a dataframe with more rows and columns. We will make use of the gapminder dataset which is already integrated in plotly. We can load it by writing px.data.gapminder. Lets see what kind of dataset this is:

In [49]:

  Copied!     
 
gapminder_data = px.data.gapminder()
gapminder_data
gapminder_data = px.data.gapminder() gapminder_data

Out[49]:

	country	continent	year	lifeExp	pop	gdpPercap	iso_alpha	iso_num
0	Afghanistan	Asia	1952	28.801	8425333	779.445314	AFG	4
1	Afghanistan	Asia	1957	30.332	9240934	820.853030	AFG	4
2	Afghanistan	Asia	1962	31.997	10267083	853.100710	AFG	4
3	Afghanistan	Asia	1967	34.020	11537966	836.197138	AFG	4
4	Afghanistan	Asia	1972	36.088	13079460	739.981106	AFG	4
...	...	...	...	...	...	...	...	...
1699	Zimbabwe	Africa	1987	62.351	9216418	706.157306	ZWE	716
1700	Zimbabwe	Africa	1992	60.377	10704340	693.420786	ZWE	716
1701	Zimbabwe	Africa	1997	46.809	11404948	792.449960	ZWE	716
1702	Zimbabwe	Africa	2002	39.989	11926563	672.038623	ZWE	716
1703	Zimbabwe	Africa	2007	43.487	12311143	469.709298	ZWE	716

1704 rows × 8 columns

For now we would like to only use countries from Oceania. Can you help me to subset the dataframe?

In [ ]:

  Copied!     
 
#Let's subset the data
#Let's subset the data

In [ ]:

  Copied!     
 
#@title Solution
df = gapminder_data.loc[gapminder_data['continent'] == 'Oceania']
df.sample(5)
#@title Solution df = gapminder_data.loc[gapminder_data['continent'] == 'Oceania'] df.sample(5)

Color argument¶

As shown above, we can change the color of the lines based on a dataframe colunm by using the argument color. In this example, we plot the life expectancy column VS the year column and the line are colored by the content of the country column. This also gives us separate lines for the separate countries.

In [74]:

  Copied!     
 
# We can separate the data from the different countries by color using the argument `color`
# Separating the px.line call into several lines like this is purely aesthetic. It does not influence the flow of the execution. 
fig = px.line(df, 
              x="year", 
              y="lifeExp", 
              color='country')
fig.show()
# We can separate the data from the different countries by color using the argument `color` # Separating the px.line call into several lines like this is purely aesthetic. It does not influence the flow of the execution. fig = px.line(df, x="year", y="lifeExp", color='country') fig.show()

You want, instead, to change the color of all the lines, we need to use the method update_traces()

In [69]:

  Copied!     
 
fig.update_traces(line={"color":"red"})
fig.show()
fig.update_traces(line={"color":"red"}) fig.show()

Color_discrete_map argument¶

We can also decide the color palette to use with color_discrete_map. In this case, we need to specify for each level of the variable to color by, here country, what color should be used:

In [75]:

  Copied!     
 
fig = px.line(df,
              x="year",
              y="lifeExp",
              color='country',
              color_discrete_map = {"Australia":"Black", "New Zealand": "Red"})

fig.show()
fig = px.line(df, x="year", y="lifeExp", color='country', color_discrete_map = {"Australia":"Black", "New Zealand": "Red"}) fig.show()

`Title` argument¶

We have already seen how to change a plot's title with update_layout, but we can also already pass a title when we make the plot with the title argument.

In [76]:

  Copied!     
 
fig = px.line(df,
              x="year",
              y="lifeExp",
              color='country',
              color_discrete_map = {"Australia":"Black", "New Zealand": "Red"}, 
              title="Life expectancy in Oceania")
fig.show()
fig = px.line(df, x="year", y="lifeExp", color='country', color_discrete_map = {"Australia":"Black", "New Zealand": "Red"}, title="Life expectancy in Oceania") fig.show()

Text argument¶

We can further display the value of each 'dot' in the line (from the x and y values) by using the text argument.

In [77]:

  Copied!     
 
fig = px.line(df,
              x="year",
              y="lifeExp",
              color='country',
              color_discrete_map = {"Australia":"Black", "New Zealand": "Red"},
              title="Life expectancy per year",
              text="lifeExp") #The text argument allows us to plot the actual number on the datapoint
fig.show()
fig = px.line(df, x="year", y="lifeExp", color='country', color_discrete_map = {"Australia":"Black", "New Zealand": "Red"}, title="Life expectancy per year", text="lifeExp") #The text argument allows us to plot the actual number on the datapoint fig.show()

Notice how the text argument positioned the text right on top of the data points? We can modify this behaviour by updating our figures using the update_traces() method, which will modify all data points inside fig.data.

In [78]:

  Copied!     
 
fig.update_traces(textposition="top center")
fig.show()
fig.update_traces(textposition="top center") fig.show()

Line_dash argument¶

By using the line_dash argument, we can change the dash pattern of the lines based on a variable.

In [79]:

  Copied!     
 
fig = px.line(df,
              x="year",
              y="lifeExp",
              color='country',
              text="lifeExp",
              color_discrete_map  = {"Australia":"Black", "New Zealand": "Red"},
              line_dash = "country",
              title="Life expectancy per year")

fig.update_traces(textposition="top center")
fig.show()
fig = px.line(df, x="year", y="lifeExp", color='country', text="lifeExp", color_discrete_map = {"Australia":"Black", "New Zealand": "Red"}, line_dash = "country", title="Life expectancy per year") fig.update_traces(textposition="top center") fig.show()

If you want, instead to change all lines to be dashed, you need to use the update_traces() method as shown above. You can choose one of dash, dot or the default solid.

In [ ]:

  Copied!     
 
fig = px.line(df,
              x="year",
              y="lifeExp",
              color='country',
              text="lifeExp",
              color_discrete_map  = {"Australia":"Black", "New Zealand": "Red"},
              title="Life expectancy per year")

fig.update_traces(textposition="top center", line = {"dash" : "dot"}) #now all lines are dotted, it does not depend on the country column anymore.
fig.show()
fig = px.line(df, x="year", y="lifeExp", color='country', text="lifeExp", color_discrete_map = {"Australia":"Black", "New Zealand": "Red"}, title="Life expectancy per year") fig.update_traces(textposition="top center", line = {"dash" : "dot"}) #now all lines are dotted, it does not depend on the country column anymore. fig.show()

Line_dash_map argument¶

Similar to color_discrete_map there is also line_dash_map to specify the line type at creation.

Note that for this to work you need to specify the line_dash argument (what column the dashing should depend on), otherwise a dash_map makes no sense.

In [80]:

  Copied!     
 
fig = px.line(df,
              x="year",
              y="lifeExp",
              color="country",
              text="lifeExp",
              color_discrete_map  = {"Australia":"Black", "New Zealand": "Red"},
              line_dash = "country",
              line_dash_map = {"Australia":"solid", "New Zealand": "dot"},
              title="Life expectancy per year")
fig.show()
fig = px.line(df, x="year", y="lifeExp", color="country", text="lifeExp", color_discrete_map = {"Australia":"Black", "New Zealand": "Red"}, line_dash = "country", line_dash_map = {"Australia":"solid", "New Zealand": "dot"}, title="Life expectancy per year") fig.show()

Exercise 1: Line graphs¶

Now you!

Create a line graph of life expectancy per year for the continents 'Ocenia' and 'Africa'.

In [ ]:

  Copied!     
 
#create the dataframe and verify that it has the data you want
#create the dataframe and verify that it has the data you want

In [ ]:

  Copied!     
 
#now make the plot
#now make the plot

Color by the country and change the line type by the continent

In [ ]:

Change the template of the plot. Check out templates here

In [ ]:

Quiz¶

What would you do if instead of a line chart you wanted to show the data in a scatter plot?

Scatter plots¶

Scatter plots are coordinate plots that use x and y coordinates to show the relationship between two variables. However, the values of the variables do not necessarily need to be linked or ordered like in a line plot.

Plotting a scatter plot is very much like plotting a line plot, but we use the px.scatter() function. Many of the arguments shown previously for the line plots work here as well, for example, the color argument:

In [107]:

  Copied!     
 
df = gapminder_data.loc[gapminder_data["continent"] == 'Europe']

fig = px.scatter(df,
                 x="lifeExp",
                 y="gdpPercap",
                 color="country")
fig.show()
df = gapminder_data.loc[gapminder_data["continent"] == 'Europe'] fig = px.scatter(df, x="lifeExp", y="gdpPercap", color="country") fig.show()

Symbol argument¶

If you want to further differenciate the countries from each other, you can the symbol argument to different types of symbols, not just dots/circles.

In [100]:

  Copied!     
 
fig = px.scatter(df,
                 x="lifeExp",
                 y="gdpPercap",
                 color="country",
                 symbol='country')
fig.show()
fig = px.scatter(df, x="lifeExp", y="gdpPercap", color="country", symbol='country') fig.show()

Size argument¶

We can also play with the size of the dots to create Bubble plots

In [101]:

  Copied!     
 
fig = px.scatter(df,
                 x="lifeExp",
                 y="gdpPercap",
                 color="country",
                 size='pop') # Using population as the size for the plot

fig.show()
fig = px.scatter(df, x="lifeExp", y="gdpPercap", color="country", size='pop') # Using population as the size for the plot fig.show()

Trendline argument¶

We can easily add trendlines to our scatter plot using the argument trendline. By default you will use the Ordinary Least Squares trendline (linear regression).

We quickly see the relationship between GDP and life expectancy is not linear for Europe in general.

In [102]:

  Copied!     
 
fig = px.scatter(df,
                 x="lifeExp",
                 y="gdpPercap",
                 trendline = "ols") # fitting a trendline with ordinary least squares

fig.show()
fig = px.scatter(df, x="lifeExp", y="gdpPercap", trendline = "ols") # fitting a trendline with ordinary least squares fig.show()

If you have separated the countries using the color argument, you will get a trendline per country.

This will look quite ugly since there are many countries. Some of them actually look like the relationship could be linear.

In [103]:

  Copied!     
 
fig = px.scatter(df,
                 x="lifeExp",
                 y="gdpPercap",
                 color="country",
                 trendline = "ols")

fig.show()
fig = px.scatter(df, x="lifeExp", y="gdpPercap", color="country", trendline = "ols") fig.show()

If you want to color by a variable but still have a global trend, use the argument trendline_scope="overall". We will also change to a none-linear fitting called LOWESS (Locally Weighted Scatterplot Smoothing). This type of fit is also sometimes called LOESS if you are familiar with that term.

In [108]:

  Copied!     
 
fig = px.scatter(df,
                 x="lifeExp",
                 y="gdpPercap",
                 color="country",
                 trendline = "lowess",
                 trendline_scope="overall")

fig.show()
fig = px.scatter(df, x="lifeExp", y="gdpPercap", color="country", trendline = "lowess", trendline_scope="overall") fig.show()

Exercise 2: Scatter plots and trendlines¶

Using the data from 'Africa', create a scatter plot using GDP and population. Try to make the countries as distinguishable as possible.

In [ ]:

Make two separate plots that model the correlation between GDP and population for each country, once using an OLS fit and once a LOWESS fit. Which fit do you think looks more convincing?

In [ ]:

  Copied!     
 
#ols
#ols

In [ ]:

  Copied!     
 
#lowess
#lowess

Bar Charts¶

With px.bar(), each row of the DataFrame is represented as a rectangular mark. Bar plots are very useful to show quantitative information across qualitative features such as years, countries or other categorical data.

As line and scatter plots, px.bar() shares a lot of arguments with line and scatter plots.

In [114]:

  Copied!     
 
df = gapminder_data.loc[gapminder_data["continent"] == 'Oceania']
fig = px.bar(df, x='year', y='pop', color='country')
fig.show()
df = gapminder_data.loc[gapminder_data["continent"] == 'Oceania'] fig = px.bar(df, x='year', y='pop', color='country') fig.show()

Orientation argument¶

If we would rather see horizontal bars instead of vertical, we can set the argument orientation to "h". Note that we need to change the order of the x and y arguments now!

In [ ]:

  Copied!     
 
fig = px.bar(df, x='pop', y='year', color='country', orientation="h")
fig.show()
fig = px.bar(df, x='pop', y='year', color='country', orientation="h") fig.show()

Text on bar charts¶

You can add text to bars using the text_auto or text argument. text_auto=True will automatically use the same variable as the y argument, while you can use any variable with text.

Let's try this with a different build-in dataset of plotly, Olympic medals:

In [117]:

  Copied!     
 
df = px.data.medals_long()
df
df = px.data.medals_long() df

Out[117]:

	nation	medal	count
0	South Korea	gold	24
1	China	gold	10
2	Canada	gold	9
3	South Korea	silver	13
4	China	silver	15
5	Canada	silver	12
6	South Korea	bronze	11
7	China	bronze	8
8	Canada	bronze	12

We would like to see the different types of medals (gold, silver, bronze) per country.

Luckily for us the data is already aggregated so we can directly use the count column for the height of the bar (the y-axis).

In [118]:

  Copied!     
 
fig = px.bar(df, x="medal", y="count", color="nation", text="nation")
fig.show()
fig = px.bar(df, x="medal", y="count", color="nation", text="nation") fig.show()

By default, Plotly will scale and rotate text labels to maximize the number of visible labels, which can result in a variety of text angles and sizes and positions in the same figure. The textfont, textposition and textangle trace attributes can be used to control these.

In addition, you can use the text_auto argument to format the text shown in the plot`

This is the default behaviour

We will plot populations of European countries and label the bars with auto text.

Let's use everything we have learned yesterday and make the mother of all selections: Rows for Europe, for the year 2007 and only for countries with a population of greater than 2 mio (nobody cares about Liechtenstein!).

In [134]:

  Copied!     
 
df = gapminder_data.loc[(gapminder_data["continent"] == 'Europe') & (gapminder_data['year'] == 2007) & (gapminder_data['pop']>2.e6)]
df.head()
df = gapminder_data.loc[(gapminder_data["continent"] == 'Europe') & (gapminder_data['year'] == 2007) & (gapminder_data['pop']>2.e6)] df.head()

Out[134]:

	country	continent	year	lifeExp	pop	gdpPercap	iso_alpha	iso_num
23	Albania	Europe	2007	76.423	3600523	5937.029526	ALB	8
83	Austria	Europe	2007	79.829	8199783	36126.492700	AUT	40
119	Belgium	Europe	2007	79.441	10392226	33692.605080	BEL	56
155	Bosnia and Herzegovina	Europe	2007	74.852	4552198	7446.298803	BIH	70
191	Bulgaria	Europe	2007	73.005	7322858	10680.792820	BGR	100

In [122]:

  Copied!     
 
fig = px.bar(df, y='pop', x='country', text_auto='.2s', #text_auto will show only two numbers
            title="Default: various text sizes, positions and angles")
fig.show()
fig = px.bar(df, y='pop', x='country', text_auto='.2s', #text_auto will show only two numbers title="Default: various text sizes, positions and angles") fig.show()

Again we can use update_traces() to control the angle of the text (set to 0) and the position (outside the bar) and the font size.

In [125]:

  Copied!     
 
fig = px.bar(df, y='pop', x='country', text_auto='.2s',
            title="Controlled text sizes, positions and angles")

fig.update_traces(textfont_size=12, textangle=0, textposition="outside")
fig.update_layout(yaxis_range=[0,10**8]) # We increase the range of the plot so the text fits
fig.show()
fig = px.bar(df, y='pop', x='country', text_auto='.2s', title="Controlled text sizes, positions and angles") fig.update_traces(textfont_size=12, textangle=0, textposition="outside") fig.update_layout(yaxis_range=[0,10**8]) # We increase the range of the plot so the text fits fig.show()

Sorting bar charts¶

We can influence the order the bars are shown in by using a categoryorder like so:

Total ascending means to sort by the total y-value in ascending order.

In [127]:

  Copied!     
 
fig.update_xaxes(categoryorder='total ascending')
fig.show()
fig.update_xaxes(categoryorder='total ascending') fig.show()

We could also impose alphabetic sort (this is actually the default!)

In [130]:

  Copied!     
 
fig.update_xaxes(categoryorder='category ascending')
fig.show()
fig.update_xaxes(categoryorder='category ascending') fig.show()

Lastly, you could impose you own custom order by using a category array:

Can someone see what this is sorted by (roughly)?

In [132]:

  Copied!     
 
fig.update_xaxes(categoryorder='array', categoryarray= ['Portugal','Spain','Ireland','France', 'United Kingdom', 'Belgium', 'Netherlands', 'Switzerland', 'Italy',
                                                        'Germany','Denmark','Norway','Austria','Sweden', 'Czech Republic', 'Slovenia', 'Croatia', 'Poland', 
                                                        'Slovak Republic', 'Hungary','Bosnia and Herzegovina', 'Albania', 'Serbia','Greece', 'Romania', 
                                                        'Bulgaria','Finland', 'Turkey'])
fig.show()
fig.update_xaxes(categoryorder='array', categoryarray= ['Portugal','Spain','Ireland','France', 'United Kingdom', 'Belgium', 'Netherlands', 'Switzerland', 'Italy', 'Germany','Denmark','Norway','Austria','Sweden', 'Czech Republic', 'Slovenia', 'Croatia', 'Poland', 'Slovak Republic', 'Hungary','Bosnia and Herzegovina', 'Albania', 'Serbia','Greece', 'Romania', 'Bulgaria','Finland', 'Turkey']) fig.show()

Stacked vs Grouped Bars¶

We will not always have data that is already aggregated.

For example, let's take the tips dataset which reports on tips given by customers and some information about the customer. It looks like this:

In [136]:

  Copied!     
 
df = px.data.tips()
df.sample(10)
df = px.data.tips() df.sample(10)

Out[136]:

	total_bill	tip	sex	smoker	day	time	size
212	48.33	9.00	Male	No	Sat	Dinner	4
5	25.29	4.71	Male	No	Sun	Dinner	4
216	28.15	3.00	Male	Yes	Sat	Dinner	5
16	10.33	1.67	Female	No	Sun	Dinner	3
31	18.35	2.50	Male	No	Sat	Dinner	4
45	18.29	3.00	Male	No	Sun	Dinner	2
196	10.34	2.00	Male	Yes	Thur	Lunch	2
136	10.33	2.00	Female	No	Thur	Lunch	2
75	10.51	1.25	Male	No	Sat	Dinner	2
239	29.03	5.92	Male	No	Sat	Dinner	3

We would like to make a bar plot that shows the total bill for men and for woman, so sex on the x-axis. We would also like to have the bars color by time (Lunch or Dinner).

We could do the following:

In [ ]:

  Copied!     
 
fig = px.bar(df, x="sex", y="total_bill", color='time')
fig.show()
fig = px.bar(df, x="sex", y="total_bill", color='time') fig.show()

You see that you get a lot of small bars stacked on top of each other, creating white lines in the plot. If we want to get rid of these, we'll need to make a dataframe that has the sum of the total bill split up by sex and time and then plot that.

Lucky we learned groupby yesterday! Can you help me to create the dataframe we need?

In [ ]:

  Copied!     
 
#@title Solution
bills = df.groupby(['sex','time']).total_bill.sum().reset_index()
bills
#@title Solution bills = df.groupby(['sex','time']).total_bill.sum().reset_index() bills

Now we can plot it easily:

In [147]:

  Copied!     
 
fig2 = px.bar(bills, x="sex", y="total_bill", color='time')
fig2.show()
fig2 = px.bar(bills, x="sex", y="total_bill", color='time') fig2.show()

Now you see how pandas and plotly interact and complement each other.

What we we wanted the two chunks for lunch and dinner to be next to each other instead of on top?

We can set the barmode argument to group:

In [148]:

  Copied!     
 
fig3 = px.bar(bills, x="sex", y="total_bill",
             color='time', barmode='group')
fig3.show()
fig3 = px.bar(bills, x="sex", y="total_bill", color='time', barmode='group') fig3.show()

Histograms¶

In statistics, a histogram is representation of the distribution of numerical data, where the data are binned and the count for each bin is represented. More generally, in Plotly a histogram is an aggregated bar chart, with several possible aggregation functions (e.g. sum, average, count...) which can be used to visualize data on categorical and date axes as well as linear axes.

Compared to px.bar(), px.histogram() can work with only the x argument, which can be a continuous or categorical variable

In [ ]:

  Copied!     
 
fig = px.histogram(df, x="total_bill", title = "Continuous variable")
fig.show()
fig = px.histogram(df, x="total_bill", title = "Continuous variable") fig.show()

In [ ]:

  Copied!     
 
fig = px.histogram(df, x="day", title="Categorical variable")
fig.show()
fig = px.histogram(df, x="day", title="Categorical variable") fig.show()

px.histogram() also shares the color, text_auto and barmode argument

In [ ]:

  Copied!     
 
fig = px.histogram(df, 
                   x="total_bill", 
                   color="sex", 
                   text_auto=True)
fig.show()
fig = px.histogram(df, x="total_bill", color="sex", text_auto=True) fig.show()

Bins argument¶

By default, the number of bins is chosen so that this number is comparable to the typical number of samples in a bin. This number can be customized, as well as the range of values, with the nbins argument:

In [ ]:

  Copied!     
 
fig = px.histogram(df, x="total_bill", nbins=20)
fig.show()
fig = px.histogram(df, x="total_bill", nbins=20) fig.show()

Histnorm argument¶

The default mode is to represent the count of samples in each bin. With the histnorm argument, it is also possible to represent the percentage or fraction of samples in each bin (histnorm='percent' or probability), or a density histogram (the sum of all bar areas equals the total number of sample points, density), or a probability density histogram (the sum of all bar areas equals 1, probability density).

In [ ]:

  Copied!     
 
fig = px.histogram(df, x="total_bill", histnorm='probability density')
fig.show()
fig = px.histogram(df, x="total_bill", histnorm='probability density') fig.show()

The y-axis of histograms: Histfunc¶

Usually, we do not pass a y value when we plot a histogram because y should be the sum, i.e. how many total bills were between 0 and 10 dollar, how many between 10 and 20 dollar and so on.

This behavior can be changed by passing a histfunc. This tells plotly to do something else than count the number of occurences.

In the below example we will use the average of the tip column as the y-axis instead:

In [149]:

  Copied!     
 
fig = px.histogram(df, x="total_bill", y="tip", histfunc='avg')
fig.show()
fig = px.histogram(df, x="total_bill", y="tip", histfunc='avg') fig.show()

Because the default histfunc is sum, we can actually use this to get around our earlier problem with the striped bar plots without calculating the values beforehand! How handy!

Switch px.bar for px.histogram and pass a y-value:

In [150]:

  Copied!     
 
fig = px.histogram(df, x="sex", y="total_bill",
             color='time')
fig.show()
fig = px.histogram(df, x="sex", y="total_bill", color='time') fig.show()

Compare to what we got before:

In [ ]:

  Copied!     
 
fig = px.bar(df, x="sex", y="total_bill", color='time')
fig.show()
fig = px.bar(df, x="sex", y="total_bill", color='time') fig.show()

Exercise 3: Bar charts and histograms¶

Use the gapminder data for Oceania and show the GDPR for each year in a bar plot.

In [ ]:

Now separate the bars into countries and put them next to each other instead of stacked on top of each other.

In [ ]:

Have a look at the dataframe created below. What does it contain?

In [176]:

  Copied!     
 
df = gapminder_data.loc[(gapminder_data["continent"] == 'Europe') & (gapminder_data['year'].isin([1987,2007]))]
df.head()
df = gapminder_data.loc[(gapminder_data["continent"] == 'Europe') & (gapminder_data['year'].isin([1987,2007]))] df.head()

Out[176]:

	country	continent	year	lifeExp	pop	gdpPercap	iso_alpha	iso_num
19	Albania	Europe	1987	72.000	3075321	3738.932735	ALB	8
23	Albania	Europe	2007	76.423	3600523	5937.029526	ALB	8
79	Austria	Europe	1987	74.940	7578903	23687.826070	AUT	40
83	Austria	Europe	2007	79.829	8199783	36126.492700	AUT	40
115	Belgium	Europe	1987	75.350	9870200	22525.563080	BEL	56

Plot a histogram of this dataframe that shows the life expectancy for countries in Europe, colored by the year. Display the count inside the bar.

How many countries had a life expectancy of less than 75 in 1987? How many had a life expectancy of less than 75 in 2007?

In [ ]:

Bonus question: Using the tips dataset, create a chart that displays the average total bill depending on the day of the week. You will need to use a histfunc for this.

In [ ]:

In [178]:

  Copied!     
 
fig = px.violin(df, y="lifeExp", x="year")
fig.show()
fig = px.violin(df, y="lifeExp", x="year") fig.show()

Box plots and violin plots¶

Box plots and violin plots are another nice way of showing data distributions. px.box() and px.violin() share almost all their arguments and can be used interchangebly.

In [ ]:

  Copied!     
 
df = px.data.tips()
fig = px.box(df, y="tip", x="smoker", color="sex")
fig.show()
df = px.data.tips() fig = px.box(df, y="tip", x="smoker", color="sex") fig.show()

In [ ]:

  Copied!     
 
fig = px.violin(df, y="tip", x="smoker", color="sex")
fig.show()
fig = px.violin(df, y="tip", x="smoker", color="sex") fig.show()

Points argument¶

You can show the underlying data inside the plots by setting the argument points="all", to show only outliers points="outliers" or not show any points with points=False

In [ ]:

  Copied!     
 
fig = px.violin(df, y="total_bill", x="smoker", color="sex", points = "all")
fig.show()
fig = px.violin(df, y="total_bill", x="smoker", color="sex", points = "all") fig.show()

In [ ]:

  Copied!     
 
fig = px.box(df, y="total_bill", x="smoker", color="sex", points = False)
fig.show()
fig = px.box(df, y="total_bill", x="smoker", color="sex", points = False) fig.show()

Boxplot inside violin¶

You can show a boxplot inside a violin plot using box=True

In [ ]:

  Copied!     
 
fig = px.violin(df, y="tip", x="smoker", color="sex", box=True)
fig.show()
fig = px.violin(df, y="tip", x="smoker", color="sex", box=True) fig.show()

Notched bloxplot¶

You can add notches to your boxplot using notched=True

In [ ]:

  Copied!     
 
fig = px.box(df, y="total_bill", x="smoker", color="sex", points="all", notched=True)
fig.show()
fig = px.box(df, y="total_bill", x="smoker", color="sex", points="all", notched=True) fig.show()

Show mean¶

We can show the mean in our boxplot using by updating our traces using boxmean=True and in our violin plots using meanline_visible=True

In [ ]:

  Copied!     
 
fig = px.box(df, y="total_bill", x="smoker", color="sex", points="all", notched=True)
fig.update_traces(boxmean=True)
fig.show()
fig = px.box(df, y="total_bill", x="smoker", color="sex", points="all", notched=True) fig.update_traces(boxmean=True) fig.show()

In [ ]:

  Copied!     
 
fig = px.violin(df, y="total_bill", x="smoker", color="sex", points="all", box=True)
fig.update_traces(meanline_visible=True)
fig.show()
fig = px.violin(df, y="total_bill", x="smoker", color="sex", points="all", box=True) fig.update_traces(meanline_visible=True) fig.show()

Exercise 4: Boxplots and violin plots¶

Again, using the following dataframe:

In [188]:

  Copied!     
 
df = gapminder_data.loc[(gapminder_data["continent"] == 'Europe') & (gapminder_data['year'].isin([1987,2007]))]
df.head()
df = gapminder_data.loc[(gapminder_data["continent"] == 'Europe') & (gapminder_data['year'].isin([1987,2007]))] df.head()

Out[188]:

	country	continent	year	lifeExp	pop	gdpPercap	iso_alpha	iso_num
19	Albania	Europe	1987	72.000	3075321	3738.932735	ALB	8
23	Albania	Europe	2007	76.423	3600523	5937.029526	ALB	8
79	Austria	Europe	1987	74.940	7578903	23687.826070	AUT	40
83	Austria	Europe	2007	79.829	8199783	36126.492700	AUT	40
115	Belgium	Europe	1987	75.350	9870200	22525.563080	BEL	56

Make a boxplot of life expectancy versus the year.

In [ ]:

Now do the same as a violin plot.

Which one do you prefer as a visualization and why?

In [ ]:

Heatmaps¶

The px.imshow() function can be used to display heatmaps (as well as full-color images, as its name suggests). It accepts both array-like objects like lists of lists, as well as pandas.DataFrame objects. Heatmaps are particularly useful to display correlations between the variables of the data

We can use corr to see how much the variables in the tip data set are correlated with each other. Correlation can only be calculated on numerical columns.

In [174]:

  Copied!     
 
df = px.data.tips()
df.corr()
df = px.data.tips() df.corr()

Out[174]:

	total_bill	tip	size
total_bill	1.000000	0.675734	0.598315
tip	0.675734	1.000000	0.489299
size	0.598315	0.489299	1.000000

In [ ]:

  Copied!     
 
px.imshow(df.corr(), text_auto=True)
px.imshow(df.corr(), text_auto=True)

We can modify the color scale using the argument color_continuous_scale

In [ ]:

  Copied!     
 
px.imshow(df.corr(), text_auto=True, color_continuous_scale='RdBu_r')
px.imshow(df.corr(), text_auto=True, color_continuous_scale='RdBu_r')

We can also explicitly map the color scale using the range_color argument.

In [ ]:

  Copied!     
 
px.imshow(df.corr(), text_auto= '.2f',
          color_continuous_scale='RdBu_r', range_color=[-1,1])
px.imshow(df.corr(), text_auto= '.2f', color_continuous_scale='RdBu_r', range_color=[-1,1])

Exercise 5: Heatmaps¶

Extract info for the continent Europe from the gapminder dataset and calculate the correlation between columns. Plot the result in a heatmap. What do you observe? Are the correlations as you expected?

Change the color scheme to something you find pleasing and add the correlation values the the squares.

In [ ]:

Now, do the same Africa. What do you observe? Are you surprised?

In [ ]:

Advanced plotting¶

Another cool thing we can do in many types of plots is to split the chart into rows or columns depending on a variable. For example, we can divide the information of life expectancy into different plots using the variable "country"

In [ ]:

  Copied!     
 
df = px.data.gapminder().query("continent=='Oceania'")
fig = px.line(df,
              x="year",
              y="lifeExp",
              color='country',
              facet_col ="country",
              text="lifeExp",
              title="Life expectancy per year")

fig.update_traces(textposition="top center")
fig.show()
df = px.data.gapminder().query("continent=='Oceania'") fig = px.line(df, x="year", y="lifeExp", color='country', facet_col ="country", text="lifeExp", title="Life expectancy per year") fig.update_traces(textposition="top center") fig.show()

Plot marginals¶

In scatter and histogram plots, you can add extra plots on the margins (called Plot Marginals) of your scatter plot, for instance "histogram", "rug", "box", or "violin" plots. These plots can be easily added by just using the attributes: marginal_x and marginal_y.

In [ ]:

  Copied!     
 
df = px.data.iris()
df.head()
df = px.data.iris() df.head()

Out[ ]:

	sepal_length	sepal_width	petal_length	petal_width	species	species_id
0	5.1	3.5	1.4	0.2	setosa	1
1	4.9	3.0	1.4	0.2	setosa	1
2	4.7	3.2	1.3	0.2	setosa	1
3	4.6	3.1	1.5	0.2	setosa	1
4	5.0	3.6	1.4	0.2	setosa	1

In [ ]:

  Copied!     
 
fig = px.scatter(df,
                 x="sepal_length",
                 y="sepal_width",
                 color="species",
                 marginal_x="box",
                 marginal_y="violin",
                 size='petal_width',
                 hover_name="species")
fig.show()
fig = px.scatter(df, x="sepal_length", y="sepal_width", color="species", marginal_x="box", marginal_y="violin", size='petal_width', hover_name="species") fig.show()

Can you get a scatter plot with a histogram instead of a rug distribution plot?

In [ ]:

Divide the previous plot using the species variable

In [ ]:

Error argument¶

In scatter, line and bar plots we can show error bar information, such as confidence intervals or measurement errors, using the error arguments. You can choose between displaying the error in the y or x axis (error_y and error_x, respectively).

Note: You will need another variable that contains such information! Below, we create an error variable for showcasing.

In [ ]:

  Copied!     
 
df = px.data.gapminder().query("continent=='Oceania'")
df['e'] = df["lifeExp"]/100 # We create an error variable just to show case
fig = px.line(df,
              x="year",
              y="lifeExp",
              color='country',
              color_discrete_map  = {"Australia":"Black", "New Zealand": "Red"},
              error_y='e',
              title="Life expectancy per year")

fig.update_traces(textposition="top left", line = {"dash" : "dot"})
fig.show()
df = px.data.gapminder().query("continent=='Oceania'") df['e'] = df["lifeExp"]/100 # We create an error variable just to show case fig = px.line(df, x="year", y="lifeExp", color='country', color_discrete_map = {"Australia":"Black", "New Zealand": "Red"}, error_y='e', title="Life expectancy per year") fig.update_traces(textposition="top left", line = {"dash" : "dot"}) fig.show()

Modifying Tooltips¶

Tooltips are the square popups that appear when you hover the mouse over a data point in the plot. We can modify the behaviour of these:

hover_name - highlights value of this column on the top of the tooltip
hover_data - let you add or remove tooltips by setting them True/False
labels - let you rename the column names inside the tooltip

In [ ]:

  Copied!     
 
df = px.data.gapminder().query("continent=='Oceania'")
fig = px.line(df, x="year", y="lifeExp", color='country')
fig.show()
df = px.data.gapminder().query("continent=='Oceania'") fig = px.line(df, x="year", y="lifeExp", color='country') fig.show()

In [ ]:

  Copied!     
 
fig = px.line(df,
              x="year",
              y="lifeExp",
              color='country',
              hover_name="country",
              hover_data = {"country" : False}, # we remove country from the tooltip
              labels={"year": "Year"}, # change year for Year
              title="Life expectancy per year")
fig.show()
fig = px.line(df, x="year", y="lifeExp", color='country', hover_name="country", hover_data = {"country" : False}, # we remove country from the tooltip labels={"year": "Year"}, # change year for Year title="Life expectancy per year") fig.show()

Range Slider and Selector in Python¶

You can use sliders to navigate the range of your axis. This can for instance be very useful when visualizing time-series data. (https://plotly.com/python/reference/layout/xaxis/#layout-xaxis-rangeslider)

In [ ]:

  Copied!     
 
fig = px.line(df,
              x="year",
              y="lifeExp",
              color='country',
              facet_col ="country",
              hover_name="country",
              text="lifeExp",
              color_discrete_map  = {"Australia":"Black", "New Zealand": "Red"},
              line_dash = "country",
              title="Life expectancy per year")

fig.update_traces(textposition="top center")
fig.update_xaxes(rangeslider_visible=True)
fig.show()
fig = px.line(df, x="year", y="lifeExp", color='country', facet_col ="country", hover_name="country", text="lifeExp", color_discrete_map = {"Australia":"Black", "New Zealand": "Red"}, line_dash = "country", title="Life expectancy per year") fig.update_traces(textposition="top center") fig.update_xaxes(rangeslider_visible=True) fig.show()

Exercise 7: Range sliders¶

Using the Africa's gapminder dataset, create a scatter plot with a range selector.

In [ ]:

Modify the tool tip so that when you hover over it will provide information about life expectancy, population, GDP and country code.

In [ ]:

Changing axis ticks¶

If we do not like the ticks on our axis, we can change them using the method update_xaxes() or update_yaxes(). We will tell what texts we would like to show (ticktext) instead of the actual values (tickvals)

In [ ]:

  Copied!     
 
df = px.data.gapminder().query("continent=='Oceania'")

fig = px.line(df,
              x="year",
              y="lifeExp",
              color='country',
              facet_col ="country",
              hover_name="country",
              text="lifeExp",
              color_discrete_map  = {"Australia":"Black", "New Zealand": "Red"},
              line_dash = "country",
              title="Life expectancy per year")

fig.update_xaxes(
    ticktext=["50s", "60s", "70s", "80s", "90s", "00s"],
    tickvals=["1950", "1960", "1970", "1980", "1990", "2000"],
)
fig.show()
df = px.data.gapminder().query("continent=='Oceania'") fig = px.line(df, x="year", y="lifeExp", color='country', facet_col ="country", hover_name="country", text="lifeExp", color_discrete_map = {"Australia":"Black", "New Zealand": "Red"}, line_dash = "country", title="Life expectancy per year") fig.update_xaxes( ticktext=["50s", "60s", "70s", "80s", "90s", "00s"], tickvals=["1950", "1960", "1970", "1980", "1990", "2000"], ) fig.show()

Animating your plot¶

Several Plotly Express functions support the creation of animated figures through the animation_frame and animation_group arguments (https://plotly.com/python/animations/).

In order to make the animation look nicer, we will use the orientation argument to make the plot horizontal. In addition, the variable gdoPercap has too many decimals. We can change the look of the text value by using again the update_traces() method, which will use text comprehension to only display 2 decimals.

In [193]:

  Copied!     
 
df = px.data.gapminder().query("continent=='Oceania'")

fig = px.bar(df, 
             y="country", 
             x="gdpPercap", 
             color="country",
             orientation="h", 
             animation_frame="year",
             animation_group="country",
            title="Evolution of GDP",
            text="gdpPercap", range_x=[5000, 40000])

fig.update_traces(texttemplate='%{text:.2f}')
fig.show()
df = px.data.gapminder().query("continent=='Oceania'") fig = px.bar(df, y="country", x="gdpPercap", color="country", orientation="h", animation_frame="year", animation_group="country", title="Evolution of GDP", text="gdpPercap", range_x=[5000, 40000]) fig.update_traces(texttemplate='%{text:.2f}') fig.show()

Exercise 8: Animations¶

Recreate the above animation for data from Africa, but show the development of life expectancy over time and the GDP as text inside the bars. Remember to separate the countries.

In [ ]:

Plotly¶

Python Open Source Graphing Library¶

Import modules¶

Introduction¶

Save as variable and show¶

A look behind the scenes: Plotly object structure¶

More fun with line graphs¶

Color argument¶

Color_discrete_map argument¶

Title argument¶

Text argument¶

Line_dash argument¶

Line_dash_map argument¶

Exercise 1: Line graphs¶

Quiz¶

Scatter plots¶

Symbol argument¶

Size argument¶

Trendline argument¶

Exercise 2: Scatter plots and trendlines¶

Bar Charts¶

Orientation argument¶

Text on bar charts¶

Sorting bar charts¶

Stacked vs Grouped Bars¶

Histograms¶

Bins argument¶

Histnorm argument¶

The y-axis of histograms: Histfunc¶

Exercise 3: Bar charts and histograms¶

Box plots and violin plots¶

Points argument¶

Boxplot inside violin¶

Notched bloxplot¶

Show mean¶

Exercise 4: Boxplots and violin plots¶

Heatmaps¶

Exercise 5: Heatmaps¶

Advanced plotting¶

Facet_row and facet_col arguments¶

Plot marginals¶

Exercise 6: Marginals and facets¶

Error argument¶

Modifying Tooltips¶

Range Slider and Selector in Python¶

Exercise 7: Range sliders¶

Changing axis ticks¶

Animating your plot¶

Exercise 8: Animations¶

`Title` argument¶