---
title: "Getting Started"
output: html_document
vignette: >
%\VignetteIndexEntry{Start}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
```{r setup}
library(ddplot)
```
`D3.js` is a famous JavaScript library that allows one to create extremely flexible SVG graphics however `D3` has (at least according to me) a pretty steep learning curve. Further, in order to understand some core concepts, one need to have some basics in `HTML`, `CSS` and `JavaScript`. `ddplot` aims to simply the process using a set of functions that render several graphics using a simple `R` API. Finally, `ddplot` is built upon the amazing `r2d3` package which makes it a breeze to interface `D3.js` with `R`, so a big thanks to the developers.
# `scatterPlot()`
Let's work with the `mpg` data frame from the `ggplot2` package.
```{r fig.align='center', message=FALSE, warning=FALSE}
library(ggplot2) # needed for the mpg data frame
scatterPlot(
data = mpg,
x = "hwy",
y = "cty",
xtitle = "hwy variable",
ytitle = "cty variable",
title = "cty and hwy relationship",
titleFontSize = 20
)
```
In comparison to `ggplot2`, graphics' customization in `ddplot` is limited nonetheless you get a fully vectorized SVG which is cool.
```{r, fig.align='center'}
scatterPlot(
data = mpg,
x = "displ",
y = "cty",
col = "tomato",
bgcol = "pink",
size = 3,
stroke = "royalblue",
strokeWidth = 1,
xtitle = "displ variable",
ytitle = "cty variable",
xticks = 3,
yticks = 3)
```
# `histogram()`
The `histogram()` function allows you to visualize the distribution of a vector of data:
```{r}
histogram(
x = mpg$hwy,
bins = 20,
fill = "crimson",
stroke = "white",
strokeWidth = 1,
title = "Distribution of the hwy variable",
width = "20",
height = "10"
)
```
# `animatedHistogram()`
This function allows you to create a one-click histogram animation. Useful for presentation purposes. Click on the following empty plot and see what happens:
```{r}
animatedHistogram(
x = mpg$hwy,
duration = 2000,
delay = 100,
fill = "lime",
stroke = "white",
bgcol = "white"
)
```
Note that you can customize the animation using the two parameters `duration` and `delay`.
# `barChart()`
The `barChat()` function allows you to create bar charts however you need to make the aggregation beforehand. In the following example, we will plot the average `cty` for each `manufacturer` using the `dplyr` package.
```{r fig.align='center', message=FALSE, warning=FALSE}
library(dplyr)
mpg %>% group_by(manufacturer) %>%
summarise(mean_cty = mean(cty)) %>%
barChart(
x = "manufacturer",
y = "mean_cty",
xFontSize = 10,
yFontSize = 10,
fill = "orange",
strokeWidth = 2,
ytitle = "average cty value",
title = "Average City Miles per Gallon by manufacturer"
)
```
The bars can be easily sorted in `ascending` or `descending` order using the `sort` parameter:
```{r message=FALSE, warning=FALSE}
mpg %>% group_by(manufacturer) %>%
summarise(mean_cty = mean(cty)) %>%
barChart(
x = "manufacturer",
y = "mean_cty",
sort = "ascending",
xFontSize = 10,
yFontSize = 10,
fill = "orange",
strokeWidth = 1,
ytitle = "average cty value",
title = "Average City Miles per Gallon by manufacturer",
titleFontSize = 16
)
```
# `horzBarChart()`
If you've many categories, it might be a good idea to go for a horizontal bar chart. It has the same parameters as the `barChart()` function except that the x-axis parameter is named `value` and the y-axis parameter named `label`, this naming convention aims to mitigate some confusion that can arise.
If we want to replicate the above graphic in a horizontal way, we can do:
```{r}
mpg %>% group_by(manufacturer) %>%
summarise(mean_cty = mean(cty)) %>%
horzBarChart(
label = "manufacturer",
value = "mean_cty",
sort = "ascending",
labelFontSize = 10,
valueFontSize = 10,
fill = "orange",
stroke = "crimson",
strokeWidth = 1,
valueTitle = "average cty value",
title = "Average City Miles per Gallon by manufacturer",
titleFontSize = 16
)
```
As in `barChart()`, we can aslo sort in descending order:
```{r}
mpg %>% group_by(manufacturer) %>%
summarise(mean_cty = mean(cty)) %>%
horzBarChart(
label = "manufacturer",
value = "mean_cty",
sort = "descending",
labelFontSize = 10,
valueFontSize = 10,
bgcol = "black",
axisCol = "white",
fill = "white",
stroke = "white",
strokeWidth = 1,
valueTitle = "average cty value",
labelTitle = "Manufacturers",
title = "Average City Miles per Gallon by manufacturer",
titleFontSize = 16
)
```
# `lollipopChart()`
lollipop chart follows the same behavior as bar charts but instead of bars you get lollipops, hence the name. Below an example of a lollipop chart with `ddplot`:
```{r}
mpg %>% group_by(drv) %>%
summarise(median_cty = median(cty)) %>%
lollipopChart(
x = "drv",
y = "median_cty",
sort = "ascending",
xtitle = "drv variable",
ytitle = "median cty",
title = "Median cty per drv",
xFontSize = 20
)
```
It's possible to grasp the distribution of some variable according to a specific categorical variable using the same function:
```{r}
mpg %>% filter(year == 2008) %>%
lollipopChart(
x = "manufacturer",
y = "hwy",
circleFill = 'red',
circleStroke = 'orange',
circleRadius = 5,
sort = "none",
xFontSize = 10
)
```
From above, it's quite easy to notice that although Toyota has two cars with high highway miles per galon (hwy), it also produces many other vehicles with poor hwy.
# `horzLollipop()`
Same with bar charts, if you have a variable that has many categorical values, you can work with the reversed version of `lollipopChart()` which is `horzLollipop()`:
```{r}
mpg %>% group_by(manufacturer) %>%
summarise(median_cty = median(cty)) %>%
horzLollipop(
label = "manufacturer",
value = "median_cty",
sort = "descending")
```
You can also do:
```{r}
mpg %>% filter(year == 2008) %>%
horzLollipop(
label = "manufacturer",
value = "hwy",
circleFill = 'red',
circleStroke = 'orange',
circleRadius = 5,
sort = "none"
)
```
# `pieChart()`
Pie charts and donut charts are pretty straightforward to set up. We'll use a sample from the `starwars` data frame to plot a simple pie chart.
```{r}
# starwars is part of the dplyr data frame
mini_starwars <- starwars %>% tidyr::drop_na(mass) %>%
sample_n(size = 5) # getting 5 random values
pieChart(
data = mini_starwars,
value = "mass",
label = "name"
)
```
Using the `padRadius`, `padAngle` and `cornerRadius` parameters, one can get fanciers pie charts:
```{r}
pieChart(
data = mini_starwars,
value = "mass",
label = "name",
padRadius = 200,
padAngle = 0.1,
cornerRadius = 50,
innerRadius = 10
)
```
If you need a donut chart, you just need to play with the `innerRadius` parameter:
```{r}
pieChart(
data = mini_starwars,
value = "mass",
label = "name",
innerRadius = 120,
cornerRadius = 20,
title = "5 Starwars characters ranked by their mass",
titleFontSize = 16,
bgcol = "yellow"
)
```
# `lineChart()`
The `lineChart()` function is used to plot time series data. The use must provide a `date` variable that has the `yyyy-mm-dd` format. In the following example, we'll use the `Air Passenger` built-in `ts` data and convert it to a classical data frame:
```{r}
# 1. converting AirPassengers to a tidy data frame
airpassengers <- data.frame(
passengers = as.matrix(AirPassengers),
date= zoo::as.Date(time(AirPassengers))
)
# 2. plotting the line chart
lineChart(
data = airpassengers,
x = "date",
y = "passengers"
)
```
You can modify the line interpolation using the `curve` parameter:
```{r}
lineChart(
data = airpassengers,
x = "date",
y = "passengers",
curve = "curveStep"
)
```
```{r}
lineChart(
data = airpassengers,
x = "date",
y = "passengers",
curve = "curveCardinal"
)
```
```{r}
lineChart(
data = airpassengers,
x = "date",
y = "passengers",
curve = "curveBasis"
)
```
# `animLineChart()`
Heavily inspired from [Jure Stabuc's example](https://observablehq.com/@jurestabuc/animated-line-chart), the `animLineChart()` function create an empty SVG but when each time you click on it a line chart animation starts. Note that the line lasts after the end of the animation. Go ahead, click on the empty graphic below:
```{r}
animLineChart(
data = airpassengers,
x = "date",
y = "passengers",
duration = 10000, # in milliseconds (10 seconds)
curve = "curveCardinal"
)
```
# `areaChart()`
`areaChart()` works similarly except that instead of a line you get an area.
```{r}
# 1. converting AirPassengers to a tidy data frame
airpassengers <- data.frame(
passengers = as.matrix(AirPassengers),
date= zoo::as.Date(time(AirPassengers))
)
# 2. plotting the area chart
areaChart(
data = airpassengers,
x = "date",
y = "passengers",
fill = "purple",
bgcol = "white"
)
```
# `areaBand()`
`areaBand()` lets you plot a filled area between two y-values. For the sake of the example, let's create an additional column `passengers_upper` that has an additional 40 passengers for each observation:
```{r}
airpassengers <- data.frame(
passengers_lower = as.matrix(AirPassengers),
passengers_upper = as.matrix(AirPassengers) + 40,
date= zoo::as.Date(time(AirPassengers))
)
areaBand(
data = airpassengers,
x = "date",
yLower = "passengers_lower",
yUpper = "passengers_upper",
fill = "yellow",
stroke = "black"
)
```
# `stackedAreaChart()`
This function allows you to create a stacked area chart. You need two components:
- A data frame in wide format (see an example below). If it's in wide format, you can still use `pivot_wider()` from the `tidyr` package to make wider.
- A date variable in `yyyy-mm-dd` format that will plotted in the x-axis.
Let's work with the following data frame (shortened) provided by [Mike Bostock in his stacked area chart example](https://observablehq.com/@d3/stacked-area-chart):
```{r}
data <- data.frame(
date = c(
"2000-01-01", "2000-02-01", "2000-03-01", "2000-04-01",
"2000-05-01", "2000-06-01", "2000-07-01",
"2000-08-01", "2000-09-01", "2000-10-01"
),
Trade = c(
2000,1023, 983, 2793, 1821, 1837, 1792, 1853, 791, 739
),
Manufacturing = c(
734, 694, 739, 736, 685, 621, 708, 685, 667, 693
),
Leisure = c(
1782, 1779, 1789, 658, 675, 833, 786, 675, 636, 691
),
Agriculture = c(
655, 587,623, 517, 561, 2545, 636, 584, 559, 2504
)
)
data
```
Note that when running `stackedAreaChart()` all the variables available within the considered data frame will be plotted. If you want to restrict the plotting to only specific variables, just drop the unneeded columns:
```{r}
stackedAreaChart(
data = data,
x = "date",
legendTextSize = 14
)
```
You can modify the color scheme using the `colorCategory` parameter:
```{r}
stackedAreaChart(
data = data,
x = "date",
legendTextSize = 14,
curve = "curveCardinal",
colorCategory = "Accent",
bgcol = "white",
stroke = "black",
strokeWidth = 1
)
```
```{r}
stackedAreaChart(
data = data,
x = "date",
legendTextSize = 14,
curve = "curveBasis",
colorCategory = "Set3",
bgcol = "black",
axisCol = "white",
xticks = 4,
stroke = "black"
)
```
You can find list of D3 categorical color schemes [here](https://github.com/d3/d3-scale-chromatic#categorical)
Finally, if you hover over the chart you'll notice a tooltip that identified the different area categories.
# `barChartRace()`
This function allows you to create an animated bar chart race. `barChartRace()` is similar to `barChart()` but takes a third variable mapped to the time dimension, with options for styling transitions.
Let's make a bar chart race of population growth among various countries using a subset of the `gapminder` dataset from the [{gapminder} package](https://github.com/jennybc/gapminder):
```{r, eval = FALSE}
gapminder_subset <- gapminder::gapminder %>%
select(country, year, pop) %>%
filter(country %in% c("Japan", "Mexico", "Germany", "Brazil", "Philippines", "Vietnam")) %>%
mutate(pop = pop/1e6)
gapminder_subset %>%
slice_sample(n = 10)
#> year pop country
#> 1 2007 91.07729 Philippines
#> 2 1997 76.04900 Vietnam
#> 3 1972 107.18827 Japan
#> 4 1967 39.46391 Vietnam
#> 5 1952 30.14432 Mexico
#> 6 1987 142.93808 Brazil
#> 7 1997 168.54672 Brazil
#> 8 1962 41.12148 Mexico
#> 9 1952 69.14595 Germany
#> 10 1957 91.56301 Japan
```
```{r, echo = FALSE}
gapminder_subset <- data.frame(
year = c(
1952L,1957L,1962L,1967L,1972L,1977L,
1982L,1987L,1992L,1997L,2002L,2007L,1952L,1957L,1962L,
1967L,1972L,1977L,1982L,1987L,1992L,1997L,2002L,2007L,
1952L,1957L,1962L,1967L,1972L,1977L,1982L,1987L,1992L,
1997L,2002L,2007L,1952L,1957L,1962L,1967L,1972L,1977L,
1982L,1987L,1992L,1997L,2002L,2007L,1952L,1957L,1962L,
1967L,1972L,1977L,1982L,1987L,1992L,1997L,2002L,2007L,
1952L,1957L,1962L,1967L,1972L,1977L,1982L,1987L,1992L,
1997L,2002L,2007L
),
pop = c(
56.60256,65.551171,76.03939,88.049823,
100.840058,114.313951,128.962939,142.938076,155.975974,
168.546719,179.914212,190.010647,69.145952,71.019069,73.739117,
76.368453,78.717088,78.160773,78.335266,77.718298,
80.597764,82.011073,82.350671,82.400996,86.459025,91.563009,
95.831757,100.825279,107.188273,113.872473,118.454974,
122.091325,124.329269,125.956499,127.065841,127.467972,30.144317,
35.015548,41.121485,47.995559,55.984294,63.759976,
71.640904,80.122492,88.11103,95.895146,102.479927,108.700891,
22.438691,26.072194,30.325264,35.3566,40.850141,46.850962,
53.456774,60.017788,67.185766,75.012988,82.995088,91.077287,
26.246839,28.998543,33.79614,39.46391,44.655014,50.533506,
56.142181,62.826491,69.940728,76.048996,80.908147,
85.262356
),
country = as.factor(c(
"Brazil","Brazil",
"Brazil","Brazil","Brazil","Brazil","Brazil",
"Brazil","Brazil","Brazil","Brazil","Brazil","Germany",
"Germany","Germany","Germany","Germany",
"Germany","Germany","Germany","Germany","Germany",
"Germany","Germany","Japan","Japan","Japan","Japan",
"Japan","Japan","Japan","Japan","Japan","Japan",
"Japan","Japan","Mexico","Mexico","Mexico",
"Mexico","Mexico","Mexico","Mexico","Mexico",
"Mexico","Mexico","Mexico","Mexico","Philippines",
"Philippines","Philippines","Philippines","Philippines",
"Philippines","Philippines","Philippines",
"Philippines","Philippines","Philippines","Philippines",
"Vietnam","Vietnam","Vietnam","Vietnam",
"Vietnam","Vietnam","Vietnam","Vietnam","Vietnam",
"Vietnam","Vietnam","Vietnam"
))
)
```
In this example, we simply pass call `barChartRace()` like `barChart()`, but with an additional variable mapped to the time dimension specified with `time = year`:
```{r}
gapminder_subset %>%
barChartRace(
x = "pop",
y = "country",
time = "year",
ytitle = "Country",
xtitle = "Population (in millions)",
title = "Bar chart race of country populations"
)
```
You can also stylize transitions with the `frameDur`, `transitionDur`, and `ease` arguments. For example, setting the time spent pausing on each frame to zero with `frameDur = 0` will create a smooth animation:
```{r}
gapminder_subset %>%
barChartRace(
x = "pop",
y = "country",
time = "year",
transitionDur = 1000,
frameDur = 0,
ytitle = "Country",
xtitle = "Population (in millions)",
title = "Bar chart race of country populations"
)
```
As you might have noticed, the value of the column passed to the `time` argument is automatically labelled at the bottom-right corner of the plot panel. We can stylize this with a list of options passed to the `timeLabelOpts` argument (or turn it off with `timeLabel = FALSE`). We also give the bars a little bounce here with `ease = "BackInOut"` for fun.
```{r}
gapminder_subset %>%
barChartRace(
x = "pop",
y = "country",
time = "year",
ease = "BackInOut",
ytitle = "Country",
xtitle = "Population (in millions)",
title = "Bar chart race of country populations",
timeLabelOpts = list(
size = 40,
prefix = "Year: ",
xOffset = 0.2
)
)
```
# More to Come ...