Data Visualisation

xkcd comic titled 'General quality of charts and graphs in scientific papers'; y-axis: BAD on the bottom to GOOD on the top; x-axis: 1950s to 2010s; Line graph increases with time except for a dip between 1990 and 2010 labelled POWERPOINT/MSPAINT ERA

Learning Objectives

Basic

  1. Sympathise what types of graphs are best for dissimilar types of information (video)
    • 1 discrete
    • i continuous
    • two detached
    • two continuous
    • 1 detached, 1 continuous
    • 3 continuous
  2. Create common types of graphs with ggplot2 (video)
    • geom_bar()
    • geom_density()
    • geom_freqpoly()
    • geom_histogram()
    • geom_col()
    • geom_boxplot()
    • geom_violin()
    • Vertical Intervals
      • geom_crossbar()
      • geom_errorbar()
      • geom_linerange()
      • geom_pointrange()
    • geom_point()
    • geom_smooth()
  3. Set custom labels, colours, and themes (video)
  4. Combine plots on the aforementioned plot, as facets, or as a filigree using cowplot (video)
  5. Save plots as an image file (video)

Setup

                                                # libraries needed for these graphs                                                  library(tidyverse)                                  library(dataskills)                                  library(plotly)                                  library(cowplot)                                                  ready.seed(30250)                  # makes sure random numbers are reproducible                                          

Mutual Variable Combinations

Continuous variables are properties yous can measure, like height. Discrete variables are things you can count, like the number of pets you have. Chiselled variables can be nominal, where the categories don't really have an club, like cats, dogs and ferrets (even though ferrets are obviously best). They can also be ordinal, where there is a clear gild, but the distance betwixt the categories isn't something you could exactly equate, like points on a Likert rating scale.

Unlike types of visualisations are skilful for unlike types of variables.

Load the pets dataset from the dataskills package and explore it with glimpse(pets) or View(pets). This is a simulated dataset with one random cistron (id), two categorical factors (pet, state) and three continuous variables (score, historic period, weight).

                                                data("pets")                                  # if y'all don't take the dataskills package, apply:                                                  # pets <- read_csv("https://psyteachr.github.io/msc-data-skills/data/pets.csv", col_types = "cffiid")                                                  glimpse(pets)                          
            ## Rows: 800 ## Columns: 6 ## $ id      <chr> "S001", "S002", "S003", "S004", "S005", "S006", "S007", "S008"… ## $ pet     <fct> dog, canis familiaris, dog, dog, dog, dog, dog, canis familiaris, dog, dog, dog, domestic dog, do… ## $ land <fct> U.k., Britain, Uk, UK, UK, UK, UK, UK, UK, UK, U.k., UK, U.k., Britain, UK, UK… ## $ score   <int> 90, 107, 94, 120, 111, 110, 100, 107, 106, 109, 85, 110, 102, … ## $ age     <int> 6, 8, 2, ten, 4, 8, nine, 8, vi, 11, 5, 9, 1, 10, seven, 8, 1, 8, 5, 13… ## $ weight  <dbl> nineteen.78932, twenty.01422, 19.14863, 19.56953, 21.39259, 21.31880, 19…          

Before yous read ahead, come upwardly with an example of each type of variable combination and sketch the types of graphs that would best brandish these data.

  • 1 categorical
  • 1 continuous
  • ii chiselled
  • 2 continuous
  • i categorical, 1 continuous
  • iii continuous

Basic Plots

R has some basic plotting functions, only they're difficult to use and aesthetically not very nice. They tin be useful to have a quick look at data while you're working on a script, though. The part plot() ordinarily defaults to a sensible type of plot, depending on whether the arguments x and y are categorical, continuous, or missing.

plot() with categorical x

Figure iii.1: plot() with categorical x

                                                plot(x =                  pets$pet,                  y =                  pets$score)                          

plot() with categorical x and continuous y

Effigy 3.2: plot() with categorical x and continuous y

                                                plot(ten =                  pets$age,                  y =                  pets$weight)                          

plot() with continuous x and y

Figure 3.three: plot() with continuous x and y

The function hist() creates a quick histogram so you can see the distribution of your information. You can adjust how many columns are plotted with the argument breaks.

                                                hist(pets$score,                  breaks =                  20)                          

hist()

Figure 3.4: hist()

GGplots

While the functions above are nice for quick visualisations, it's hard to brand pretty, publication-fix plots. The packet ggplot2 (loaded with tidyverse) is one of the most mutual packages for creating beautiful visualisations.

ggplot2 creates plots using a "grammer of graphics" where y'all add geoms in layers. It tin can be complex to sympathise, but information technology's very powerful once yous have a mental model of how it works.

Allow's offset with a totally empty plot layer created by the ggplot() function with no arguments.

A plot base created by ggplot()

Figure 3.5: A plot base created by ggplot()

The first argument to ggplot() is the information table you desire to plot. Allow's use the pets information we loaded in a higher place. The 2d argument is the mapping for which columns in your data table correspond to which properties of the plot, such as the 10-axis, the y-axis, line colour or linetype, point shape, or object fill up. These mappings are specified by the aes() part. Just calculation this to the ggplot function creates the labels and ranges for the 10 and y axes. They normally have sensible default values, given your data, but nosotros'll learn how to modify them afterward.

                              mapping                  <-                  aes(10 =                  pet,                                                  y =                  score,                                                  colour =                  country,                                                  fill =                  country)                                  ggplot(data =                  pets,                  mapping =                  mapping)                          

Empty ggplot with x and y labels

Figure iii.six: Empty ggplot with x and y labels

People usually omit the argument names and just put the aes() function directly as the 2d argument to ggplot. They as well usually omit 10 and y every bit argument names to aes() (but you take to name the other properties). Next we tin can add together "geoms," or plot styles. You literally add them with the + symbol. You lot tin too add other plot attributes, such every bit labels, or alter the theme and base of operations font size.

                                                ggplot(pets,                  aes(pet, score,                  color =                  country,                  fill =                  country))                  +                                                  geom_violin(alpha =                  0.5)                  +                                                  labs(x =                  "Pet type",                                  y =                  "Score on an Of import Exam",                                  colour =                  "Country of Origin",                                  fill =                  "Country of Origin",                                  title =                  "My first plot!")                  +                                                  theme_bw(base_size =                  fifteen)                          

Violin plot with country represented by colour.

Figure 3.7: Violin plot with country represented past colour.

Common Plot Types

There are many geoms, and they tin can take different arguments to customise their appearance. We'll learn about some of the most common beneath.

Bar plot

Bar plots are skilful for categorical data where you want to represent the count.

                                                      ggplot(pets,                    aes(pet))                    +                                                        geom_bar()                              

Bar plot

Figure 3.8: Bar plot

Density plot

Density plots are good for one continuous variable, just only if you have a adequately big number of observations.

                                                      ggplot(pets,                    aes(score))                    +                                                        geom_density()                              

Density plot

Figure 3.9: Density plot

You can represent subsets of a variable by assigning the category variable to the statement group, fill, or color.

                                                      ggplot(pets,                    aes(score,                    fill =                    pet))                    +                                                        geom_density(alpha =                    0.5)                              

Grouped density plot

Effigy 3.x: Grouped density plot

Attempt changing the alpha argument to figure out what it does.

Frequency polygons

If you want the y-axis to represent count rather than density, try geom_freqpoly().

                                                      ggplot(pets,                    aes(score,                    color =                    pet))                    +                                                        geom_freqpoly(binwidth =                    5)                              

Frequency ploygon plot

Figure 3.11: Frequency ploygon plot

Effort changing the binwidth statement to ten and 1. How exercise you figure out the right value?

Histogram

Histograms are also good for one continuous variable, and work well if you don't have many observations. Set up the binwidth to control how wide each bar is.

                                                      ggplot(pets,                    aes(score))                    +                                                        geom_histogram(binwidth =                    five,                    fill =                    "white",                    color =                    "blackness")                              

Histogram

Effigy 3.12: Histogram

Histograms in ggplot look pretty bad unless you fix the fill and color.

If you testify grouped histograms, you too probably want to change the default position argument.

                                                      ggplot(pets,                    aes(score,                    fill=pet))                    +                                                        geom_histogram(binwidth =                    v,                    alpha =                    0.v,                                                        position =                    "dodge")                              

Grouped Histogram

Figure iii.thirteen: Grouped Histogram

Attempt irresolute the position statement to "identity," "make full," "dodge," or "stack."

Column plot

Cavalcade plots are the worst style to represent grouped continuous data, but also i of the virtually common. If your data are already aggregated (eastward.g., you have rows for each group with columns for the mean and standard mistake), yous can use geom_bar or geom_col and geom_errorbar straight. If non, you can use the role stat_summary to calculate the mean and standard mistake and ship those numbers to the appropriate geom for plotting.

                                                      ggplot(pets,                    aes(pet, score,                    fill=pet))                    +                                                        stat_summary(fun =                    mean,                    geom =                    "col",                    blastoff =                    0.v)                    +                                                        stat_summary(fun.data =                    mean_se,                    geom =                    "errorbar",                                      width =                    0.25)                    +                                                        coord_cartesian(ylim =                    c(fourscore,                    120))                              

Column plot

Figure 3.14: Column plot

Endeavor changing the values for coord_cartesian. What does this do?

Boxplot

Boxplots are great for representing the distribution of grouped continuous variables. They fix nearly of the bug with using bar/column plots for continuous data.

                                                      ggplot(pets,                    aes(pet, score,                    fill up=pet))                    +                                                        geom_boxplot(alpha =                    0.5)                              

Box plot

Figure 3.15: Box plot

Violin plot

Violin pots are similar sideways, mirrored density plots. They requite even more information than a boxplot about distribution and are particularly useful when you lot take non-normal distributions.

                                                      ggplot(pets,                    aes(pet, score,                    fill=pet))                    +                                                        geom_violin(draw_quantiles =                    .5,                                      trim =                    FALSE,                    alpha =                    0.5,)                              

Violin plot

Effigy 3.16: Violin plot

Try changing the quantile argument. Set information technology to a vector of the numbers 0.one to 0.ix in steps of 0.one.

Vertical intervals

Boxplots and violin plots don't e'er map well onto inferential stats that use the hateful. You can stand for the mean and standard error or any other value you can summate.

Here, we will create a tabular array with the means and standard errors for two groups. We'll acquire how to calculate this from raw data in the chapter on data wrangling. Nosotros also create a new object called gg that sets up the base of the plot.

                                  dat                    <-                    tibble(                                      group =                    c("A",                    "B"),                                      mean =                    c(10,                    twenty),                                      se =                    c(ii,                    3)                  )                  gg                    <-                    ggplot(dat,                    aes(group, hateful,                                                        ymin =                    hateful-se,                                                        ymax =                    mean+se))                              

The play a joke on above can be useful if y'all want to correspond the same data in different ways. Yous can add different geoms to the base plot without having to re-type the base plot lawmaking.

geom_crossbar()

Figure 3.17: geom_crossbar()

geom_errorbar()

Effigy iii.xviii: geom_errorbar()

geom_linerange()

Figure 3.xix: geom_linerange()

geom_pointrange()

Figure iii.20: geom_pointrange()

You can as well use the function stats_summary to calculate hateful, standard error, or any other value for your information and display it using any geom.

                                                      ggplot(pets,                    aes(pet, score,                    color=pet))                    +                                                        stat_summary(fun.data =                    mean_se,                    geom =                    "crossbar")                    +                                                        stat_summary(fun.min =                    function(x)                    hateful(x)                    -                    sd(x),                                      fun.max =                    function(10)                    mean(10)                    +                    sd(ten),                                      geom =                    "errorbar",                    width =                    0)                    +                                                        theme(legend.position =                    "none")                    # gets rid of the legend                                                

Vertical intervals with stats_summary()

Effigy iii.21: Vertical intervals with stats_summary()

Scatter plot

Besprinkle plots are a good mode to correspond the relationship between two continuous variables.

                                                      ggplot(pets,                    aes(age, score,                    colour =                    pet))                    +                                                        geom_point()                              

Scatter plot using geom_point()

Figure 3.22: Scatter plot using geom_point()

Line graph

You often desire to represent the human relationship as a single line.

                                                      ggplot(pets,                    aes(age, score,                    color =                    pet))                    +                                                        geom_smooth(formula =                    y                    ~                    x,                    method=                    "lm")                              

Line plot using geom_smooth()

Effigy iii.23: Line plot using geom_smooth()

What are another options for the method statement to geom_smooth? When might you want to use them?

Y'all can plot functions other than the linear y ~ x. The code below creates a data table where x is 101 values betwixt -x and 10. and y is x squared plus 3*x plus 1. You'll probably recognise this from algebra every bit the quadratic equation. You lot can prepare the formula argument in geom_smooth to a quadratic formula (y ~ x + I(x^2)) to fit a quadratic part to the data.

                                      quad                      <-                      tibble(                                          x =                      seq(-                      10,                      x,                      length.out =                      101),                                          y =                      x^                      2                      +                      3                      *10                      +                      one                                        )                                                              ggplot(quad,                      aes(ten, y))                      +                                                              geom_point()                      +                                                              geom_smooth(formula =                      y                      ~                      ten                      +                      I(10^                      ii),                                                              method=                      "lm")                                  

Fitting quadratic functions

Effigy 3.24: Fitting quadratic functions

Customisation

Labels

You can set custom titles and axis labels in a few dissimilar ways.

                                                      ggplot(pets,                    aes(age, score,                    colour =                    pet))                    +                                                        geom_smooth(formula =                    y                    ~                    x,                    method=                    "lm")                    +                                                        labs(title =                    "Pet score with Age",                                      x =                    "Age (in Years)",                                      y =                    "score Score",                                      colour =                    "Pet Type")                              

Set custom labels with labs()

Figure 3.25: Set custom labels with labs()

                                                      ggplot(pets,                    aes(age, score,                    color =                    pet))                    +                                                        geom_smooth(formula =                    y                    ~                    x,                    method=                    "lm")                    +                                                        ggtitle("Pet score with Age")                    +                                                        xlab("Age (in Years)")                    +                                                        ylab("score Score")                    +                                                        scale_color_discrete(name =                    "Pet Type")                              

Set custom labels with individual functions

Figure 3.26: Prepare custom labels with private functions

Colours

You tin can set up custom values for colour and fill using functions like scale_colour_manual() and scale_fill_manual(). The Colours chapter in Cookbook for R has many more ways to customise colour.

                                                      ggplot(pets,                    aes(pet, score,                    colour =                    pet,                    fill =                    pet))                    +                                                        geom_violin()                    +                                                        scale_color_manual(values =                    c("darkgreen",                    "dodgerblue",                    "orange"))                    +                                                        scale_fill_manual(values =                    c("#CCFFCC",                    "#BBDDFF",                    "#FFCC66"))                              

Set custom colour

Figure 3.27: Set custom colour

Themes

GGplot comes with several additional themes and the ability to fully customise your theme. Type ?theme into the console to encounter the full listing. Other packages such every bit cowplot also accept custom themes. You can add a custom theme to the end of your ggplot object and specify a new base_size to make the default fonts and lines larger or smaller.

                                                      ggplot(pets,                    aes(age, score,                    color =                    pet))                    +                                                        geom_smooth(formula =                    y                    ~                    10,                    method=                    "lm")                    +                                                        theme_minimal(base_size =                    eighteen)                              

Minimal theme with 18-point base font size

Figure 3.28: Minimal theme with xviii-betoken base font size

It'south more than complicated, but you can fully customise your theme with theme(). Yous tin can salvage this to an object and add it to the end of all of your plots to make the style consistent. Alternatively, y'all can set the theme at the superlative of a script with theme_set() and this will apply to all subsequent ggplot plots.

                                  vampire_theme                    <-                    theme(                                      rect =                    element_rect(fill =                    "black"),                                      panel.groundwork =                    element_rect(fill =                    "black"),                                      text =                    element_text(size =                    20,                    colour =                    "white"),                                      centrality.text =                    element_text(size =                    16,                    color =                    "grey70"),                                      line =                    element_line(color =                    "white",                    size =                    ii),                                      panel.filigree =                    element_blank(),                                      axis.line =                    element_line(colour =                    "white"),                                      axis.ticks =                    element_blank(),                                      legend.position =                    "top"                                    )                                                        theme_set(vampire_theme)                                                        ggplot(pets,                    aes(age, score,                    color =                    pet))                    +                                                        geom_smooth(formula =                    y                    ~                    ten,                    method=                    "lm")                              

Custom theme

Effigy 3.29: Custom theme

Salve every bit file

You can save a ggplot using ggsave(). It saves the last ggplot you made, by default, but you can specify which plot you lot want to salve if you assigned that plot to a variable.

Y'all can set the width and height of your plot. The default units are inches, but y'all tin can change the units argument to "in," "cm," or "mm."

                                  box                    <-                    ggplot(pets,                    aes(pet, score,                    fill=pet))                    +                                                        geom_boxplot(blastoff =                    0.5)                                    violin                    <-                    ggplot(pets,                    aes(pet, score,                    make full=pet))                    +                                                        geom_violin(alpha =                    0.5)                                                        ggsave("demog_violin_plot.png",                    width =                    5,                    elevation =                    seven)                                                        ggsave("demog_box_plot.jpg",                    plot =                    box,                    width =                    5,                    height =                    7)                              

The file type is ready from the filename suffix, or by specifying the argument device, which can take the following values: "eps," "ps," "tex," "pdf," "jpeg," "tiff," "png," "bmp," "svg" or "wmf."

Combination Plots

Violinbox plot

A combination of a violin plot to show the shape of the distribution and a boxplot to show the median and interquartile ranges can be a very useful visualisation.

                                                      ggplot(pets,                    aes(pet, score,                    fill =                    pet))                    +                                                        geom_violin(show.legend =                    FALSE)                    +                                                        geom_boxplot(width =                    0.2,                    fill =                    "white",                                                        show.fable =                    FALSE)                              

Violin-box plot

Figure 3.xxx: Violin-box plot

Set the show.fable argument to Fake to hide the legend. We exercise this here because the x-axis already labels the pet types.

Violin-point-range plot

You can use stat_summary() to superimpose a point-range plot showning the mean ± 1 SD. You'll acquire how to write your own functions in the lesson on Iteration and Functions.

                                                      ggplot(pets,                    aes(pet, score,                    fill=pet))                    +                                                        geom_violin(trim =                    FALSE,                    alpha =                    0.5)                    +                                                        stat_summary(                                      fun =                    mean,                                      fun.max =                    office(x) {hateful(x)                    +                    sd(x)},                                      fun.min =                    office(x) {mean(x)                    -                    sd(x)},                                      geom=                    "pointrange"                                                        )                              

Point-range plot using stat_summary()

Figure 3.31: Point-range plot using stat_summary()

Violin-jitter plot

If you lot don't have a lot of data points, it's good to correspond them individually. You tin can use geom_jitter to exercise this.

                                                      # sample_n chooses 50 random observations from the dataset                                                        ggplot(sample_n(pets,                    l),                    aes(pet, score,                    fill=pet))                    +                                                        geom_violin(                                      trim =                    Simulated,                                      draw_quantiles =                    c(0.25,                    0.5,                    0.75),                                                        alpha =                    0.v                                                        )                    +                                                        geom_jitter(                                      width =                    0.fifteen,                    # points spread out over xv% of available width                                                        height =                    0,                    # do not move position on the y-axis                                                        alpha =                    0.5,                                                        size =                    iii                                                        )                              

Violin-jitter plot

Figure 3.32: Violin-jitter plot

Scatter-line graph

If your graph isn't too complicated, it'south good to besides show the individual data points backside the line.

                                                      ggplot(sample_n(pets,                    fifty),                    aes(historic period, weight,                    colour =                    pet))                    +                                                        geom_point()                    +                                                        geom_smooth(formula =                    y                    ~                    x,                    method=                    "lm")                              

Scatter-line plot

Figure iii.33: Scatter-line plot

Filigree of plots

You tin can use the cowplot package to easily make grids of different graphs. Starting time, you accept to assign each plot a name. Then you listing all the plots as the commencement arguments of plot_grid() and provide a vector of labels.

                                  gg                    <-                    ggplot(pets,                    aes(pet, score,                    colour =                    pet))                  nolegend                    <-                    theme(fable.position =                    0)                                    vp                    <-                    gg                    +                    geom_violin(blastoff =                    0.five)                    +                    nolegend                    +                                                        ggtitle("Violin Plot")                  bp                    <-                    gg                    +                    geom_boxplot(blastoff =                    0.5)                    +                    nolegend                    +                                                        ggtitle("Box Plot")                  cp                    <-                    gg                    +                    stat_summary(fun =                    mean,                    geom =                    "col",                    fill up =                    "white")                    +                    nolegend                    +                                                        ggtitle("Column Plot")                  dp                    <-                    ggplot(pets,                    aes(score,                    color =                    pet))                    +                                                        geom_density()                    +                    nolegend                    +                                                        ggtitle("Density Plot")                                                        plot_grid(vp, bp, cp, dp,                    labels =                    LETTERS[1                    :                    4])                              

Grid of plots

Figure 3.34: Filigree of plots

Overlapping Discrete Data

Reducing Opacity

You can deal with overlapping data points (very common if you're using Likert scales) by reducing the opacity of the points. Y'all need to use trial and error to adjust these so they look right.

                                                      ggplot(pets,                    aes(age, score,                    color =                    pet))                    +                                                        geom_point(alpha =                    0.25)                    +                                                        geom_smooth(formula =                    y                    ~                    x,                    method=                    "lm")                              

Deal with overlapping data using transparency

Figure three.35: Deal with overlapping data using transparency

Proportional Dot Plots

Or you can ready the size of the dot proportional to the number of overlapping observations using geom_count().

                                                      ggplot(pets,                    aes(age, score,                    colour =                    pet))                    +                                                        geom_count()                              

Deal with overlapping data using geom_count()

Figure 3.36: Deal with overlapping data using geom_count()

Alternatively, yous can transform your information (we will learn to do this in the data wrangling affiliate) to create a count column and apply the count to gear up the dot colour.

                                  pets                    %>%                                                        group_by(age, score)                    %>%                                                        summarise(count =                    north(),                    .groups =                    "driblet")                    %>%                                                        ggplot(aes(age, score,                    color=count))                    +                                                        geom_point(size =                    2)                    +                                                        scale_color_viridis_c()                              

Deal with overlapping data using dot colour

Figure iii.37: Deal with overlapping information using dot colour

The viridis package changes the colour themes to be easier to read by people with colourblindness and to print better in greyscale. Viridis is congenital into ggplot2 since v3.0.0. It uses scale_colour_viridis_c() and scale_fill_viridis_c() for continuous variables and scale_colour_viridis_d() and scale_fill_viridis_d() for discrete variables.

Overlapping Continuous Data

Even if the variables are continuous, overplotting might obscure any relationships if you accept lots of information.

                                                ggplot(pets,                  aes(age, score))                  +                                                  geom_point()                          

Overplotted data

Figure iii.38: Overplotted data

2D Density Plot

Use geom_density2d() to create a contour map.

                                                      ggplot(pets,                    aes(age, score))                    +                                                        geom_density2d()                              

Contour map with geom_density2d()

Effigy 3.39: Profile map with geom_density2d()

Y'all can employ stat_density_2d(aes(fill = ..level..), geom = "polygon") to create a heatmap-fashion density plot.

                                                      ggplot(pets,                    aes(historic period, score))                    +                                                        stat_density_2d(aes(fill =                    ..level..),                    geom =                    "polygon")                    +                                                        scale_fill_viridis_c()                              

Heatmap-density plot

Figure iii.40: Heatmap-density plot

2D Histogram

Utilize geom_bin2d() to create a rectangular heatmap of bin counts. Set the binwidth to the ten and y dimensions to capture in each box.

                                                      ggplot(pets,                    aes(age, score))                    +                                                        geom_bin2d(binwidth =                    c(1,                    five))                              

Heatmap of bin counts

Figure 3.41: Heatmap of bin counts

Hexagonal Heatmap

Use geomhex() to create a hexagonal heatmap of bin counts. Conform the binwidth, xlim(), ylim() and/or the figure dimensions to make the hexagons more or less stretched.

                                                      ggplot(pets,                    aes(historic period, score))                    +                                                        geom_hex(binwidth =                    c(ane,                    5))                              

Hexagonal heatmap of bin counts

Figure 3.42: Hexagonal heatmap of bin counts

Correlation Heatmap

I've included the code for creating a correlation matrix from a table of variables, merely yous don't need to empathize how this is done nevertheless. We'll cover mutate() and gather() functions in the dplyr and tidyr lessons.

                                  heatmap                    <-                    pets                    %>%                                                        select_if(is.numeric)                    %>%                    # get just the numeric columns                                                        cor()                    %>%                    # create the correlation matrix                                                        as_tibble(rownames =                    "V1")                    %>%                    # brand information technology a tibble                                                        gather("V2",                    "r",                    2                    :                    ncol(.))                    # wide to long (V2)                                                

Once yous have a correlation matrix in the correct (long) format, it's easy to make a heatmap using geom_tile().

                                                      ggplot(heatmap,                    aes(V1, V2,                    make full=r))                    +                                                        geom_tile()                    +                                                        scale_fill_viridis_c()                              

Heatmap using geom_tile()

Figure 3.43: Heatmap using geom_tile()

Interactive Plots

You tin use the plotly package to make interactive graphs. Just assign your ggplot to a variable and use the function ggplotly().

                              demog_plot                  <-                  ggplot(pets,                  aes(historic period, score,                  fill=pet))                  +                                                  geom_point()                  +                                                  geom_smooth(formula =                  y~x,                  method =                  lm)                                                  ggplotly(demog_plot)                          

Figure 3.44: Interactive graph using plotly

Hover over the data points in a higher place and click on the legend items.

Glossary

term definition
continuous Data that can take on any values betwixt other existing values.
discrete Information that tin merely take certain values, such as integers.
geom The geometric style in which data are displayed, such as boxplot, density, or histogram.
likert A rating scale with a small number of discrete points in social club
nominal Categorical variables that don't accept an inherent lodge, such equally types of creature.
ordinal Discrete variables that have an inherent social club, such every bit number of legs

Exercises

Download the exercises. See the plots to encounter what your plots should look like (this doesn't contain the answer lawmaking). See the answers simply later on you've attempted all the questions.

                                                # run this to access the exercise                                dataskills::                  do(3)                                                  # run this to access the answers                                dataskills::                  practice(iii,                  answers =                  TRUE)