--- title: "Getting Started" output: html_document vignette: > %\VignetteIndexEntry{Start} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(ddplot) ``` `D3.js` is a famous JavaScript library that allows one to create extremely flexible SVG graphics however `D3` has (at least according to me) a pretty steep learning curve. Further, in order to understand some core concepts, one need to have some basics in `HTML`, `CSS` and `JavaScript`. `ddplot` aims to simply the process using a set of functions that render several graphics using a simple `R` API. Finally, `ddplot` is built upon the amazing `r2d3` package which makes it a breeze to interface `D3.js` with `R`, so a big thanks to the developers. # `scatterPlot()` Let's work with the `mpg` data frame from the `ggplot2` package. ```{r fig.align='center', message=FALSE, warning=FALSE} library(ggplot2) # needed for the mpg data frame scatterPlot( data = mpg, x = "hwy", y = "cty", xtitle = "hwy variable", ytitle = "cty variable", title = "cty and hwy relationship", titleFontSize = 20 ) ``` In comparison to `ggplot2`, graphics' customization in `ddplot` is limited nonetheless you get a fully vectorized SVG which is cool. ```{r, fig.align='center'} scatterPlot( data = mpg, x = "displ", y = "cty", col = "tomato", bgcol = "pink", size = 3, stroke = "royalblue", strokeWidth = 1, xtitle = "displ variable", ytitle = "cty variable", xticks = 3, yticks = 3) ``` # `histogram()` The `histogram()` function allows you to visualize the distribution of a vector of data: ```{r} histogram( x = mpg$hwy, bins = 20, fill = "crimson", stroke = "white", strokeWidth = 1, title = "Distribution of the hwy variable", width = "20", height = "10" ) ``` # `animatedHistogram()` This function allows you to create a one-click histogram animation. Useful for presentation purposes. Click on the following empty plot and see what happens: ```{r} animatedHistogram( x = mpg$hwy, duration = 2000, delay = 100, fill = "lime", stroke = "white", bgcol = "white" ) ``` Note that you can customize the animation using the two parameters `duration` and `delay`. # `barChart()` The `barChat()` function allows you to create bar charts however you need to make the aggregation beforehand. In the following example, we will plot the average `cty` for each `manufacturer` using the `dplyr` package. ```{r fig.align='center', message=FALSE, warning=FALSE} library(dplyr) mpg %>% group_by(manufacturer) %>% summarise(mean_cty = mean(cty)) %>% barChart( x = "manufacturer", y = "mean_cty", xFontSize = 10, yFontSize = 10, fill = "orange", strokeWidth = 2, ytitle = "average cty value", title = "Average City Miles per Gallon by manufacturer" ) ``` The bars can be easily sorted in `ascending` or `descending` order using the `sort` parameter: ```{r message=FALSE, warning=FALSE} mpg %>% group_by(manufacturer) %>% summarise(mean_cty = mean(cty)) %>% barChart( x = "manufacturer", y = "mean_cty", sort = "ascending", xFontSize = 10, yFontSize = 10, fill = "orange", strokeWidth = 1, ytitle = "average cty value", title = "Average City Miles per Gallon by manufacturer", titleFontSize = 16 ) ``` # `horzBarChart()` If you've many categories, it might be a good idea to go for a horizontal bar chart. It has the same parameters as the `barChart()` function except that the x-axis parameter is named `value` and the y-axis parameter named `label`, this naming convention aims to mitigate some confusion that can arise. If we want to replicate the above graphic in a horizontal way, we can do: ```{r} mpg %>% group_by(manufacturer) %>% summarise(mean_cty = mean(cty)) %>% horzBarChart( label = "manufacturer", value = "mean_cty", sort = "ascending", labelFontSize = 10, valueFontSize = 10, fill = "orange", stroke = "crimson", strokeWidth = 1, valueTitle = "average cty value", title = "Average City Miles per Gallon by manufacturer", titleFontSize = 16 ) ``` As in `barChart()`, we can aslo sort in descending order: ```{r} mpg %>% group_by(manufacturer) %>% summarise(mean_cty = mean(cty)) %>% horzBarChart( label = "manufacturer", value = "mean_cty", sort = "descending", labelFontSize = 10, valueFontSize = 10, bgcol = "black", axisCol = "white", fill = "white", stroke = "white", strokeWidth = 1, valueTitle = "average cty value", labelTitle = "Manufacturers", title = "Average City Miles per Gallon by manufacturer", titleFontSize = 16 ) ``` # `lollipopChart()` lollipop chart follows the same behavior as bar charts but instead of bars you get lollipops, hence the name. Below an example of a lollipop chart with `ddplot`: ```{r} mpg %>% group_by(drv) %>% summarise(median_cty = median(cty)) %>% lollipopChart( x = "drv", y = "median_cty", sort = "ascending", xtitle = "drv variable", ytitle = "median cty", title = "Median cty per drv", xFontSize = 20 ) ``` It's possible to grasp the distribution of some variable according to a specific categorical variable using the same function: ```{r} mpg %>% filter(year == 2008) %>% lollipopChart( x = "manufacturer", y = "hwy", circleFill = 'red', circleStroke = 'orange', circleRadius = 5, sort = "none", xFontSize = 10 ) ``` From above, it's quite easy to notice that although Toyota has two cars with high highway miles per galon (hwy), it also produces many other vehicles with poor hwy. # `horzLollipop()` Same with bar charts, if you have a variable that has many categorical values, you can work with the reversed version of `lollipopChart()` which is `horzLollipop()`: ```{r} mpg %>% group_by(manufacturer) %>% summarise(median_cty = median(cty)) %>% horzLollipop( label = "manufacturer", value = "median_cty", sort = "descending") ``` You can also do: ```{r} mpg %>% filter(year == 2008) %>% horzLollipop( label = "manufacturer", value = "hwy", circleFill = 'red', circleStroke = 'orange', circleRadius = 5, sort = "none" ) ``` # `pieChart()` Pie charts and donut charts are pretty straightforward to set up. We'll use a sample from the `starwars` data frame to plot a simple pie chart. ```{r} # starwars is part of the dplyr data frame mini_starwars <- starwars %>% tidyr::drop_na(mass) %>% sample_n(size = 5) # getting 5 random values pieChart( data = mini_starwars, value = "mass", label = "name" ) ``` Using the `padRadius`, `padAngle` and `cornerRadius` parameters, one can get fanciers pie charts: ```{r} pieChart( data = mini_starwars, value = "mass", label = "name", padRadius = 200, padAngle = 0.1, cornerRadius = 50, innerRadius = 10 ) ``` If you need a donut chart, you just need to play with the `innerRadius` parameter: ```{r} pieChart( data = mini_starwars, value = "mass", label = "name", innerRadius = 120, cornerRadius = 20, title = "5 Starwars characters ranked by their mass", titleFontSize = 16, bgcol = "yellow" ) ``` # `lineChart()` The `lineChart()` function is used to plot time series data. The use must provide a `date` variable that has the `yyyy-mm-dd` format. In the following example, we'll use the `Air Passenger` built-in `ts` data and convert it to a classical data frame: ```{r} # 1. converting AirPassengers to a tidy data frame airpassengers <- data.frame( passengers = as.matrix(AirPassengers), date= zoo::as.Date(time(AirPassengers)) ) # 2. plotting the line chart lineChart( data = airpassengers, x = "date", y = "passengers" ) ``` You can modify the line interpolation using the `curve` parameter: ```{r} lineChart( data = airpassengers, x = "date", y = "passengers", curve = "curveStep" ) ``` ```{r} lineChart( data = airpassengers, x = "date", y = "passengers", curve = "curveCardinal" ) ``` ```{r} lineChart( data = airpassengers, x = "date", y = "passengers", curve = "curveBasis" ) ``` # `animLineChart()` Heavily inspired from [Jure Stabuc's example](https://observablehq.com/@jurestabuc/animated-line-chart), the `animLineChart()` function create an empty SVG but when each time you click on it a line chart animation starts. Note that the line lasts after the end of the animation. Go ahead, click on the empty graphic below: ```{r} animLineChart( data = airpassengers, x = "date", y = "passengers", duration = 10000, # in milliseconds (10 seconds) curve = "curveCardinal" ) ``` # `areaChart()` `areaChart()` works similarly except that instead of a line you get an area. ```{r} # 1. converting AirPassengers to a tidy data frame airpassengers <- data.frame( passengers = as.matrix(AirPassengers), date= zoo::as.Date(time(AirPassengers)) ) # 2. plotting the area chart areaChart( data = airpassengers, x = "date", y = "passengers", fill = "purple", bgcol = "white" ) ``` # `areaBand()` `areaBand()` lets you plot a filled area between two y-values. For the sake of the example, let's create an additional column `passengers_upper` that has an additional 40 passengers for each observation: ```{r} airpassengers <- data.frame( passengers_lower = as.matrix(AirPassengers), passengers_upper = as.matrix(AirPassengers) + 40, date= zoo::as.Date(time(AirPassengers)) ) areaBand( data = airpassengers, x = "date", yLower = "passengers_lower", yUpper = "passengers_upper", fill = "yellow", stroke = "black" ) ``` # `stackedAreaChart()` This function allows you to create a stacked area chart. You need two components: - A data frame in wide format (see an example below). If it's in wide format, you can still use `pivot_wider()` from the `tidyr` package to make wider. - A date variable in `yyyy-mm-dd` format that will plotted in the x-axis. Let's work with the following data frame (shortened) provided by [Mike Bostock in his stacked area chart example](https://observablehq.com/@d3/stacked-area-chart): ```{r} data <- data.frame( date = c( "2000-01-01", "2000-02-01", "2000-03-01", "2000-04-01", "2000-05-01", "2000-06-01", "2000-07-01", "2000-08-01", "2000-09-01", "2000-10-01" ), Trade = c( 2000,1023, 983, 2793, 1821, 1837, 1792, 1853, 791, 739 ), Manufacturing = c( 734, 694, 739, 736, 685, 621, 708, 685, 667, 693 ), Leisure = c( 1782, 1779, 1789, 658, 675, 833, 786, 675, 636, 691 ), Agriculture = c( 655, 587,623, 517, 561, 2545, 636, 584, 559, 2504 ) ) data ``` Note that when running `stackedAreaChart()` all the variables available within the considered data frame will be plotted. If you want to restrict the plotting to only specific variables, just drop the unneeded columns: ```{r} stackedAreaChart( data = data, x = "date", legendTextSize = 14 ) ``` You can modify the color scheme using the `colorCategory` parameter: ```{r} stackedAreaChart( data = data, x = "date", legendTextSize = 14, curve = "curveCardinal", colorCategory = "Accent", bgcol = "white", stroke = "black", strokeWidth = 1 ) ``` ```{r} stackedAreaChart( data = data, x = "date", legendTextSize = 14, curve = "curveBasis", colorCategory = "Set3", bgcol = "black", axisCol = "white", xticks = 4, stroke = "black" ) ``` You can find list of D3 categorical color schemes [here](https://github.com/d3/d3-scale-chromatic#categorical) Finally, if you hover over the chart you'll notice a tooltip that identified the different area categories. # `barChartRace()` This function allows you to create an animated bar chart race. `barChartRace()` is similar to `barChart()` but takes a third variable mapped to the time dimension, with options for styling transitions. Let's make a bar chart race of population growth among various countries using a subset of the `gapminder` dataset from the [{gapminder} package](https://github.com/jennybc/gapminder): ```{r, eval = FALSE} gapminder_subset <- gapminder::gapminder %>% select(country, year, pop) %>% filter(country %in% c("Japan", "Mexico", "Germany", "Brazil", "Philippines", "Vietnam")) %>% mutate(pop = pop/1e6) gapminder_subset %>% slice_sample(n = 10) #> year pop country #> 1 2007 91.07729 Philippines #> 2 1997 76.04900 Vietnam #> 3 1972 107.18827 Japan #> 4 1967 39.46391 Vietnam #> 5 1952 30.14432 Mexico #> 6 1987 142.93808 Brazil #> 7 1997 168.54672 Brazil #> 8 1962 41.12148 Mexico #> 9 1952 69.14595 Germany #> 10 1957 91.56301 Japan ``` ```{r, echo = FALSE} gapminder_subset <- data.frame( year = c( 1952L,1957L,1962L,1967L,1972L,1977L, 1982L,1987L,1992L,1997L,2002L,2007L,1952L,1957L,1962L, 1967L,1972L,1977L,1982L,1987L,1992L,1997L,2002L,2007L, 1952L,1957L,1962L,1967L,1972L,1977L,1982L,1987L,1992L, 1997L,2002L,2007L,1952L,1957L,1962L,1967L,1972L,1977L, 1982L,1987L,1992L,1997L,2002L,2007L,1952L,1957L,1962L, 1967L,1972L,1977L,1982L,1987L,1992L,1997L,2002L,2007L, 1952L,1957L,1962L,1967L,1972L,1977L,1982L,1987L,1992L, 1997L,2002L,2007L ), pop = c( 56.60256,65.551171,76.03939,88.049823, 100.840058,114.313951,128.962939,142.938076,155.975974, 168.546719,179.914212,190.010647,69.145952,71.019069,73.739117, 76.368453,78.717088,78.160773,78.335266,77.718298, 80.597764,82.011073,82.350671,82.400996,86.459025,91.563009, 95.831757,100.825279,107.188273,113.872473,118.454974, 122.091325,124.329269,125.956499,127.065841,127.467972,30.144317, 35.015548,41.121485,47.995559,55.984294,63.759976, 71.640904,80.122492,88.11103,95.895146,102.479927,108.700891, 22.438691,26.072194,30.325264,35.3566,40.850141,46.850962, 53.456774,60.017788,67.185766,75.012988,82.995088,91.077287, 26.246839,28.998543,33.79614,39.46391,44.655014,50.533506, 56.142181,62.826491,69.940728,76.048996,80.908147, 85.262356 ), country = as.factor(c( "Brazil","Brazil", "Brazil","Brazil","Brazil","Brazil","Brazil", "Brazil","Brazil","Brazil","Brazil","Brazil","Germany", "Germany","Germany","Germany","Germany", "Germany","Germany","Germany","Germany","Germany", "Germany","Germany","Japan","Japan","Japan","Japan", "Japan","Japan","Japan","Japan","Japan","Japan", "Japan","Japan","Mexico","Mexico","Mexico", "Mexico","Mexico","Mexico","Mexico","Mexico", "Mexico","Mexico","Mexico","Mexico","Philippines", "Philippines","Philippines","Philippines","Philippines", "Philippines","Philippines","Philippines", "Philippines","Philippines","Philippines","Philippines", "Vietnam","Vietnam","Vietnam","Vietnam", "Vietnam","Vietnam","Vietnam","Vietnam","Vietnam", "Vietnam","Vietnam","Vietnam" )) ) ``` In this example, we simply pass call `barChartRace()` like `barChart()`, but with an additional variable mapped to the time dimension specified with `time = year`: ```{r} gapminder_subset %>% barChartRace( x = "pop", y = "country", time = "year", ytitle = "Country", xtitle = "Population (in millions)", title = "Bar chart race of country populations" ) ``` You can also stylize transitions with the `frameDur`, `transitionDur`, and `ease` arguments. For example, setting the time spent pausing on each frame to zero with `frameDur = 0` will create a smooth animation: ```{r} gapminder_subset %>% barChartRace( x = "pop", y = "country", time = "year", transitionDur = 1000, frameDur = 0, ytitle = "Country", xtitle = "Population (in millions)", title = "Bar chart race of country populations" ) ``` As you might have noticed, the value of the column passed to the `time` argument is automatically labelled at the bottom-right corner of the plot panel. We can stylize this with a list of options passed to the `timeLabelOpts` argument (or turn it off with `timeLabel = FALSE`). We also give the bars a little bounce here with `ease = "BackInOut"` for fun. ```{r} gapminder_subset %>% barChartRace( x = "pop", y = "country", time = "year", ease = "BackInOut", ytitle = "Country", xtitle = "Population (in millions)", title = "Bar chart race of country populations", timeLabelOpts = list( size = 40, prefix = "Year: ", xOffset = 0.2 ) ) ``` # More to Come ...