How To Fix Overlapping Text In R Variable Importace
Data Visualisation
Learning Objectives
Basic
- Sympathise what types of graphs are best for dissimilar types of information (video)
- 1 discrete
- i continuous
- two detached
- two continuous
- 1 detached, 1 continuous
- 3 continuous
- Create common types of graphs with ggplot2 (video)
-
geom_bar()
-
geom_density()
-
geom_freqpoly()
-
geom_histogram()
-
geom_col()
-
geom_boxplot()
-
geom_violin()
- Vertical Intervals
-
geom_crossbar()
-
geom_errorbar()
-
geom_linerange()
-
geom_pointrange()
-
-
geom_point()
-
geom_smooth()
-
- Set custom labels, colours, and themes (video)
- Combine plots on the aforementioned plot, as facets, or as a filigree using cowplot (video)
- Save plots as an image file (video)
Setup
# libraries needed for these graphs library(tidyverse) library(dataskills) library(plotly) library(cowplot) ready.seed(30250) # makes sure random numbers are reproducible
Mutual Variable Combinations
Continuous variables are properties yous can measure, like height. Discrete variables are things you can count, like the number of pets you have. Chiselled variables can be nominal, where the categories don't really have an club, like cats, dogs and ferrets (even though ferrets are obviously best). They can also be ordinal, where there is a clear gild, but the distance betwixt the categories isn't something you could exactly equate, like points on a Likert rating scale.
Unlike types of visualisations are skilful for unlike types of variables.
Load the pets
dataset from the dataskills
package and explore it with glimpse(pets)
or View(pets)
. This is a simulated dataset with one random cistron (id
), two categorical factors (pet
, state
) and three continuous variables (score
, historic period
, weight
).
data("pets") # if y'all don't take the dataskills package, apply: # pets <- read_csv("https://psyteachr.github.io/msc-data-skills/data/pets.csv", col_types = "cffiid") glimpse(pets)
## Rows: 800 ## Columns: 6 ## $ id <chr> "S001", "S002", "S003", "S004", "S005", "S006", "S007", "S008"… ## $ pet <fct> dog, canis familiaris, dog, dog, dog, dog, dog, canis familiaris, dog, dog, dog, domestic dog, do… ## $ land <fct> U.k., Britain, Uk, UK, UK, UK, UK, UK, UK, UK, U.k., UK, U.k., Britain, UK, UK… ## $ score <int> 90, 107, 94, 120, 111, 110, 100, 107, 106, 109, 85, 110, 102, … ## $ age <int> 6, 8, 2, ten, 4, 8, nine, 8, vi, 11, 5, 9, 1, 10, seven, 8, 1, 8, 5, 13… ## $ weight <dbl> nineteen.78932, twenty.01422, 19.14863, 19.56953, 21.39259, 21.31880, 19…
Before yous read ahead, come upwardly with an example of each type of variable combination and sketch the types of graphs that would best brandish these data.
- 1 categorical
- 1 continuous
- ii chiselled
- 2 continuous
- i categorical, 1 continuous
- iii continuous
Basic Plots
R has some basic plotting functions, only they're difficult to use and aesthetically not very nice. They tin be useful to have a quick look at data while you're working on a script, though. The part plot()
ordinarily defaults to a sensible type of plot, depending on whether the arguments x
and y
are categorical, continuous, or missing.
plot(x = pets$pet, y = pets$score)
plot(ten = pets$age, y = pets$weight)
The function hist()
creates a quick histogram so you can see the distribution of your information. You can adjust how many columns are plotted with the argument breaks
.
hist(pets$score, breaks = 20)
GGplots
While the functions above are nice for quick visualisations, it's hard to brand pretty, publication-fix plots. The packet ggplot2
(loaded with tidyverse
) is one of the most mutual packages for creating beautiful visualisations.
ggplot2
creates plots using a "grammer of graphics" where y'all add geoms in layers. It tin can be complex to sympathise, but information technology's very powerful once yous have a mental model of how it works.
Allow's offset with a totally empty plot layer created by the ggplot()
function with no arguments.
The first argument to ggplot()
is the information
table you desire to plot. Allow's use the pets
information we loaded in a higher place. The 2d argument is the mapping
for which columns in your data table correspond to which properties of the plot, such as the 10
-axis, the y
-axis, line colour
or linetype
, point shape
, or object fill up
. These mappings are specified by the aes()
part. Just calculation this to the ggplot
function creates the labels and ranges for the 10
and y
axes. They normally have sensible default values, given your data, but nosotros'll learn how to modify them afterward.
mapping <- aes(10 = pet, y = score, colour = country, fill = country) ggplot(data = pets, mapping = mapping)
People usually omit the argument names and just put the aes()
function directly as the 2d argument to ggplot
. They as well usually omit 10
and y
every bit argument names to aes()
(but you take to name the other properties). Next we tin can add together "geoms," or plot styles. You literally add them with the +
symbol. You lot tin too add other plot attributes, such every bit labels, or alter the theme and base of operations font size.
ggplot(pets, aes(pet, score, color = country, fill = country)) + geom_violin(alpha = 0.5) + labs(x = "Pet type", y = "Score on an Of import Exam", colour = "Country of Origin", fill = "Country of Origin", title = "My first plot!") + theme_bw(base_size = fifteen)
Common Plot Types
There are many geoms, and they tin can take different arguments to customise their appearance. We'll learn about some of the most common beneath.
Bar plot
Bar plots are skilful for categorical data where you want to represent the count.
ggplot(pets, aes(pet)) + geom_bar()
Density plot
Density plots are good for one continuous variable, just only if you have a adequately big number of observations.
ggplot(pets, aes(score)) + geom_density()
You can represent subsets of a variable by assigning the category variable to the statement group
, fill
, or color
.
ggplot(pets, aes(score, fill = pet)) + geom_density(alpha = 0.5)
Attempt changing the alpha
argument to figure out what it does.
Frequency polygons
If you want the y-axis to represent count rather than density, try geom_freqpoly()
.
ggplot(pets, aes(score, color = pet)) + geom_freqpoly(binwidth = 5)
Effort changing the binwidth
statement to ten and 1. How exercise you figure out the right value?
Histogram
Histograms are also good for one continuous variable, and work well if you don't have many observations. Set up the binwidth
to control how wide each bar is.
ggplot(pets, aes(score)) + geom_histogram(binwidth = five, fill = "white", color = "blackness")
Histograms in ggplot look pretty bad unless you fix the fill
and color
.
If you testify grouped histograms, you too probably want to change the default position
argument.
ggplot(pets, aes(score, fill=pet)) + geom_histogram(binwidth = v, alpha = 0.v, position = "dodge")
Attempt irresolute the position
statement to "identity," "make full," "dodge," or "stack."
Column plot
Cavalcade plots are the worst style to represent grouped continuous data, but also i of the virtually common. If your data are already aggregated (eastward.g., you have rows for each group with columns for the mean and standard mistake), yous can use geom_bar
or geom_col
and geom_errorbar
straight. If non, you can use the role stat_summary
to calculate the mean and standard mistake and ship those numbers to the appropriate geom for plotting.
ggplot(pets, aes(pet, score, fill=pet)) + stat_summary(fun = mean, geom = "col", blastoff = 0.v) + stat_summary(fun.data = mean_se, geom = "errorbar", width = 0.25) + coord_cartesian(ylim = c(fourscore, 120))
Endeavor changing the values for coord_cartesian
. What does this do?
Boxplot
Boxplots are great for representing the distribution of grouped continuous variables. They fix nearly of the bug with using bar/column plots for continuous data.
ggplot(pets, aes(pet, score, fill up=pet)) + geom_boxplot(alpha = 0.5)
Violin plot
Violin pots are similar sideways, mirrored density plots. They requite even more information than a boxplot about distribution and are particularly useful when you lot take non-normal distributions.
ggplot(pets, aes(pet, score, fill=pet)) + geom_violin(draw_quantiles = .5, trim = FALSE, alpha = 0.5,)
Try changing the quantile
argument. Set information technology to a vector of the numbers 0.one to 0.ix in steps of 0.one.
Vertical intervals
Boxplots and violin plots don't e'er map well onto inferential stats that use the hateful. You can stand for the mean and standard error or any other value you can summate.
Here, we will create a tabular array with the means and standard errors for two groups. We'll acquire how to calculate this from raw data in the chapter on data wrangling. Nosotros also create a new object called gg
that sets up the base of the plot.
dat <- tibble( group = c("A", "B"), mean = c(10, twenty), se = c(ii, 3) ) gg <- ggplot(dat, aes(group, hateful, ymin = hateful-se, ymax = mean+se))
The play a joke on above can be useful if y'all want to correspond the same data in different ways. Yous can add different geoms to the base plot without having to re-type the base plot lawmaking.
You can as well use the function stats_summary
to calculate hateful, standard error, or any other value for your information and display it using any geom.
ggplot(pets, aes(pet, score, color=pet)) + stat_summary(fun.data = mean_se, geom = "crossbar") + stat_summary(fun.min = function(x) hateful(x) - sd(x), fun.max = function(10) mean(10) + sd(ten), geom = "errorbar", width = 0) + theme(legend.position = "none") # gets rid of the legend
Scatter plot
Besprinkle plots are a good mode to correspond the relationship between two continuous variables.
ggplot(pets, aes(age, score, colour = pet)) + geom_point()
Line graph
You often desire to represent the human relationship as a single line.
ggplot(pets, aes(age, score, color = pet)) + geom_smooth(formula = y ~ x, method= "lm")
What are another options for the method
statement to geom_smooth
? When might you want to use them?
Y'all can plot functions other than the linear y ~ x
. The code below creates a data table where x
is 101 values betwixt -x and 10. and y
is x
squared plus 3*x
plus 1
. You'll probably recognise this from algebra every bit the quadratic equation. You lot can prepare the formula
argument in geom_smooth
to a quadratic formula (y ~ x + I(x^2)
) to fit a quadratic part to the data.
quad <- tibble( x = seq(- 10, x, length.out = 101), y = x^ 2 + 3 *10 + one ) ggplot(quad, aes(ten, y)) + geom_point() + geom_smooth(formula = y ~ ten + I(10^ ii), method= "lm")
Customisation
Labels
You can set custom titles and axis labels in a few dissimilar ways.
ggplot(pets, aes(age, score, colour = pet)) + geom_smooth(formula = y ~ x, method= "lm") + labs(title = "Pet score with Age", x = "Age (in Years)", y = "score Score", colour = "Pet Type")
ggplot(pets, aes(age, score, color = pet)) + geom_smooth(formula = y ~ x, method= "lm") + ggtitle("Pet score with Age") + xlab("Age (in Years)") + ylab("score Score") + scale_color_discrete(name = "Pet Type")
Colours
You tin can set up custom values for colour and fill using functions like scale_colour_manual()
and scale_fill_manual()
. The Colours chapter in Cookbook for R has many more ways to customise colour.
ggplot(pets, aes(pet, score, colour = pet, fill = pet)) + geom_violin() + scale_color_manual(values = c("darkgreen", "dodgerblue", "orange")) + scale_fill_manual(values = c("#CCFFCC", "#BBDDFF", "#FFCC66"))
Themes
GGplot comes with several additional themes and the ability to fully customise your theme. Type ?theme
into the console to encounter the full listing. Other packages such every bit cowplot
also accept custom themes. You can add a custom theme to the end of your ggplot object and specify a new base_size
to make the default fonts and lines larger or smaller.
ggplot(pets, aes(age, score, color = pet)) + geom_smooth(formula = y ~ 10, method= "lm") + theme_minimal(base_size = eighteen)
It'south more than complicated, but you can fully customise your theme with theme()
. Yous tin can salvage this to an object and add it to the end of all of your plots to make the style consistent. Alternatively, y'all can set the theme at the superlative of a script with theme_set()
and this will apply to all subsequent ggplot plots.
vampire_theme <- theme( rect = element_rect(fill = "black"), panel.groundwork = element_rect(fill = "black"), text = element_text(size = 20, colour = "white"), centrality.text = element_text(size = 16, color = "grey70"), line = element_line(color = "white", size = ii), panel.filigree = element_blank(), axis.line = element_line(colour = "white"), axis.ticks = element_blank(), legend.position = "top" ) theme_set(vampire_theme) ggplot(pets, aes(age, score, color = pet)) + geom_smooth(formula = y ~ ten, method= "lm")
Salve every bit file
You can save a ggplot using ggsave()
. It saves the last ggplot you made, by default, but you can specify which plot you lot want to salve if you assigned that plot to a variable.
Y'all can set the width
and height
of your plot. The default units are inches, but y'all tin can change the units
argument to "in," "cm," or "mm."
box <- ggplot(pets, aes(pet, score, fill=pet)) + geom_boxplot(blastoff = 0.5) violin <- ggplot(pets, aes(pet, score, make full=pet)) + geom_violin(alpha = 0.5) ggsave("demog_violin_plot.png", width = 5, elevation = seven) ggsave("demog_box_plot.jpg", plot = box, width = 5, height = 7)
The file type is ready from the filename suffix, or by specifying the argument device
, which can take the following values: "eps," "ps," "tex," "pdf," "jpeg," "tiff," "png," "bmp," "svg" or "wmf."
Combination Plots
Violinbox plot
A combination of a violin plot to show the shape of the distribution and a boxplot to show the median and interquartile ranges can be a very useful visualisation.
ggplot(pets, aes(pet, score, fill = pet)) + geom_violin(show.legend = FALSE) + geom_boxplot(width = 0.2, fill = "white", show.fable = FALSE)
Set the show.fable
argument to Fake
to hide the legend. We exercise this here because the x-axis already labels the pet types.
Violin-point-range plot
You can use stat_summary()
to superimpose a point-range plot showning the mean ± 1 SD. You'll acquire how to write your own functions in the lesson on Iteration and Functions.
ggplot(pets, aes(pet, score, fill=pet)) + geom_violin(trim = FALSE, alpha = 0.5) + stat_summary( fun = mean, fun.max = office(x) {hateful(x) + sd(x)}, fun.min = office(x) {mean(x) - sd(x)}, geom= "pointrange" )
Violin-jitter plot
If you lot don't have a lot of data points, it's good to correspond them individually. You tin can use geom_jitter
to exercise this.
# sample_n chooses 50 random observations from the dataset ggplot(sample_n(pets, l), aes(pet, score, fill=pet)) + geom_violin( trim = Simulated, draw_quantiles = c(0.25, 0.5, 0.75), alpha = 0.v ) + geom_jitter( width = 0.fifteen, # points spread out over xv% of available width height = 0, # do not move position on the y-axis alpha = 0.5, size = iii )
Scatter-line graph
If your graph isn't too complicated, it'south good to besides show the individual data points backside the line.
ggplot(sample_n(pets, fifty), aes(historic period, weight, colour = pet)) + geom_point() + geom_smooth(formula = y ~ x, method= "lm")
Filigree of plots
You tin can use the cowplot
package to easily make grids of different graphs. Starting time, you accept to assign each plot a name. Then you listing all the plots as the commencement arguments of plot_grid()
and provide a vector of labels.
gg <- ggplot(pets, aes(pet, score, colour = pet)) nolegend <- theme(fable.position = 0) vp <- gg + geom_violin(blastoff = 0.five) + nolegend + ggtitle("Violin Plot") bp <- gg + geom_boxplot(blastoff = 0.5) + nolegend + ggtitle("Box Plot") cp <- gg + stat_summary(fun = mean, geom = "col", fill up = "white") + nolegend + ggtitle("Column Plot") dp <- ggplot(pets, aes(score, color = pet)) + geom_density() + nolegend + ggtitle("Density Plot") plot_grid(vp, bp, cp, dp, labels = LETTERS[1 : 4])
Overlapping Discrete Data
Reducing Opacity
You can deal with overlapping data points (very common if you're using Likert scales) by reducing the opacity of the points. Y'all need to use trial and error to adjust these so they look right.
ggplot(pets, aes(age, score, color = pet)) + geom_point(alpha = 0.25) + geom_smooth(formula = y ~ x, method= "lm")
Proportional Dot Plots
Or you can ready the size of the dot proportional to the number of overlapping observations using geom_count()
.
ggplot(pets, aes(age, score, colour = pet)) + geom_count()
Alternatively, yous can transform your information (we will learn to do this in the data wrangling affiliate) to create a count column and apply the count to gear up the dot colour.
pets %>% group_by(age, score) %>% summarise(count = north(), .groups = "driblet") %>% ggplot(aes(age, score, color=count)) + geom_point(size = 2) + scale_color_viridis_c()
The viridis package changes the colour themes to be easier to read by people with colourblindness and to print better in greyscale. Viridis is congenital into ggplot2
since v3.0.0. It uses scale_colour_viridis_c()
and scale_fill_viridis_c()
for continuous variables and scale_colour_viridis_d()
and scale_fill_viridis_d()
for discrete variables.
Overlapping Continuous Data
Even if the variables are continuous, overplotting might obscure any relationships if you accept lots of information.
ggplot(pets, aes(age, score)) + geom_point()
2D Density Plot
Use geom_density2d()
to create a contour map.
ggplot(pets, aes(age, score)) + geom_density2d()
Y'all can employ stat_density_2d(aes(fill = ..level..), geom = "polygon")
to create a heatmap-fashion density plot.
ggplot(pets, aes(historic period, score)) + stat_density_2d(aes(fill = ..level..), geom = "polygon") + scale_fill_viridis_c()
2D Histogram
Utilize geom_bin2d()
to create a rectangular heatmap of bin counts. Set the binwidth
to the ten and y dimensions to capture in each box.
ggplot(pets, aes(age, score)) + geom_bin2d(binwidth = c(1, five))
Hexagonal Heatmap
Use geomhex()
to create a hexagonal heatmap of bin counts. Conform the binwidth
, xlim()
, ylim()
and/or the figure dimensions to make the hexagons more or less stretched.
ggplot(pets, aes(historic period, score)) + geom_hex(binwidth = c(ane, 5))
Correlation Heatmap
I've included the code for creating a correlation matrix from a table of variables, merely yous don't need to empathize how this is done nevertheless. We'll cover mutate()
and gather()
functions in the dplyr and tidyr lessons.
heatmap <- pets %>% select_if(is.numeric) %>% # get just the numeric columns cor() %>% # create the correlation matrix as_tibble(rownames = "V1") %>% # brand information technology a tibble gather("V2", "r", 2 : ncol(.)) # wide to long (V2)
Once yous have a correlation matrix in the correct (long) format, it's easy to make a heatmap using geom_tile()
.
ggplot(heatmap, aes(V1, V2, make full=r)) + geom_tile() + scale_fill_viridis_c()
Interactive Plots
You tin use the plotly
package to make interactive graphs. Just assign your ggplot to a variable and use the function ggplotly()
.
demog_plot <- ggplot(pets, aes(historic period, score, fill=pet)) + geom_point() + geom_smooth(formula = y~x, method = lm) ggplotly(demog_plot)
Hover over the data points in a higher place and click on the legend items.
Glossary
term | definition |
---|---|
continuous | Data that can take on any values betwixt other existing values. |
discrete | Information that tin merely take certain values, such as integers. |
geom | The geometric style in which data are displayed, such as boxplot, density, or histogram. |
likert | A rating scale with a small number of discrete points in social club |
nominal | Categorical variables that don't accept an inherent lodge, such equally types of creature. |
ordinal | Discrete variables that have an inherent social club, such every bit number of legs |
Exercises
Download the exercises. See the plots to encounter what your plots should look like (this doesn't contain the answer lawmaking). See the answers simply later on you've attempted all the questions.
# run this to access the exercise dataskills:: do(3) # run this to access the answers dataskills:: practice(iii, answers = TRUE)
How To Fix Overlapping Text In R Variable Importace,
Source: https://psyteachr.github.io/msc-data-skills/ggplot.html
Posted by: belcheremanded.blogspot.com
0 Response to "How To Fix Overlapping Text In R Variable Importace"
Post a Comment