
MATH/COSC 3570 Introduction to Data Science



ggplot2
has the most powerful functionality.
is more beautiful?
has larger file size that occupies more memory space and has longer render time.
| Grammar element | What it is |
|---|---|
| Data | The data frame used for plotting |
| Geometry |
|
| Aesthetic mapping |
|
ggplot2::mpg# A tibble: 234 × 11
manufacturer model displ year cyl trans drv cty hwy fl class
<chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
1 audi a4 1.8 1999 4 auto(… f 18 29 p comp…
2 audi a4 1.8 1999 4 manua… f 21 29 p comp…
3 audi a4 2 2008 4 manua… f 20 31 p comp…
4 audi a4 2 2008 4 auto(… f 21 30 p comp…
5 audi a4 2.8 1999 6 auto(… f 16 26 p comp…
6 audi a4 2.8 1999 6 manua… f 18 26 p comp…
7 audi a4 3.1 2008 6 auto(… f 18 27 p comp…
8 audi a4 quattro 1.8 1999 4 manua… 4 18 26 p comp…
# ℹ 226 more rows

ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class)) +
geom_point() +
labs(title = "Engine Size v.s. Fuel Efficiency",
subtitle = "Dimensions for class",
x = "Engine displacement (litres)", y = "Highway (mpg)",
color = "Type of car",
caption = "Source: http://fueleconomy.gov")Start with the
mpgdata frame
Start with the
mpgdata frame, map engine displacement to the x-axis
Start with the
mpgdata frame, map engine displacement to the x-axis and map highway miles per gallon to the y-axis.
Start with the
mpgdata frame, map engine displacement to the x-axis and map highway miles per gallon to the y-axis. Represent each observation with a point
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy)) +
geom_point() #<<Don’t miss + sign!
For scatterplots we add points, and use geom_point()

Start with the
mpgdata frame, map engine displacement to the x-axis and map highway miles per gallon to the y-axis. Represent each observation with a point and map type of car (class) to the color of each point.
ggplot(data = mpg,
mapping =
aes(x = displ,
y = hwy,
color = class)) + #<<
geom_point()Add color = class in aes() of the mapping argument, where class is the variable name for type of car.
ggplot automatically generates a legend on the right.

Start with the
mpgdata frame, map engine displacement to the x-axis and map highway miles per gallon to the y-axis. Represent each observation with a point and map type of car (class) to the color of each point. Title the plot “Engine Size v.s. Fuel Efficiency”
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class)) +
geom_point() +
labs(
title="Engine Size vs. Fuel Efficiency" #<<
)labs() layer.
Start with the
mpgdata frame, map engine displacement to the x-axis and map highway miles per gallon to the y-axis. Represent each observation with a point and map type of car (class) to the color of each point. Title the plot “Engine Size vs. Fuel Efficiency”, add the subtitle “Dimensions for class”
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class)) +
geom_point() +
labs(
title="Engine Size vs. Fuel Efficiency",
subtitle="Dimensions for class" #<<
) labs()

Start with the
mpgdata frame, map engine displacement to the x-axis and map highway miles per gallon to the y-axis. Represent each observation with a point and map type of car (class) to the color of each point. Title the plot “Engine Size vs. Fuel Efficiency”, add the subtitle “Dimensions for class”, label the x and y axes as “Engine displacement (litres)” and “Highway (mpg)”, respectively
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class)) +
geom_point() +
labs(
title = "Engine Size vs. Fuel Efficiency",
subtitle = "Dimensions for class",
x = "Engine displacement (litres)", #<<
y = "Highway (mpg)" #<<
) 
Start with the
mpgdata frame, map engine displacement to the x-axis and map highway miles per gallon to the y-axis. Represent each observation with a point and map type of car (class) to the color of each point. Title the plot “Engine Size vs. Fuel Efficiency”, add the subtitle “Dimensions for class”, label the x and y axes as “Engine displacement (litres)” and “Highway (mpg)”, respectively, label the legend “Type of car”
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class)) +
geom_point() +
labs(
title = "Engine Size vs. Fuel Efficiency",
subtitle = "Dimensions for class",
x = "Engine displacement (litres)",
y = "Highway (mpg)",
color = "Type of car" #<<
) class) to color.
Start with the
mpgdata frame, map engine displacement to the x-axis and map highway miles per gallon to the y-axis. Represent each observation with a point and map type of car (class) to the color of each point. Title the plot “Engine Size vs. Fuel Efficiency”, add the subtitle “Dimensions for class”, label the x and y axes as “Engine displacement (litres)” and “Highway (mpg)”, respectively, label the legend “Type of car”, and add a caption for the data source.
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class)) +
geom_point() +
labs(
title = "Engine Size vs. Fuel Efficiency",
subtitle = "Dimensions for class",
x = "Engine displacement (litres)",
y = "Highway (mpg)",
color = "Type of car",
caption="Source: http://fueleconomy.gov" #<<
) 
Start with the
mpgdata frame, map engine displacement to the x-axis and map highway miles per gallon to the y-axis. Represent each observation with a point and map type of car (class) to the color of each point. Title the plot “Engine Size vs. Fuel Efficiency”, add the subtitle “Dimensions for class”, label the x and y axes as “Engine displacement (litres)” and “Highway (mpg)”, respectively, label the legend “Type of car”, and add a caption for the data source. Finally, use a discrete color scale that is designed to be perceived by viewers with common forms of color blindness.
ggplot(data = mpg,
mapping = aes(x = displ,
y = hwy,
color = class)) +
geom_point() +
labs(
title = "Engine Size vs. Fuel Efficiency",
subtitle = "Dimensions for class",
x = "Engine displacement (litres)",
y = "Highway (mpg)",
color = "Type of car",
caption = "Source: http://fueleconomy.gov"
) +
scale_colour_viridis_d() #<<
11-ggplot2
In lab.qmd ## Lab 11 section,
Use readr::read_csv() to import the data penguins.csv into your R workspace.
Generate the following ggplot:

penguins <- read_csv(_________________)
________ |>
ggplot(mapping = ____(x = ______________,
y = ______________,
colour = ________)) +
geom______() +
____(title = ____________________,
_________ = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
x = _____________, y = _______________,
_______ = "Species",
_______ = "Source: Palmer Station LTER / palmerpenguins package")p <- ggplot(data = mpg,
mapping =
aes(x = displ,
y = hwy,
color = class)) +
geom_point()
class(p)[1] "gg" "ggplot"
p
p + labs(
title = "Engine Size vs. Fuel Efficiency",
subtitle = "Dimensions for class",
x = "Engine displacement (litres)",
y = "Highway (mpg)",
color = "Type of car",
caption = "Source: http://fueleconomy.gov"
)
Options include
theme_grey() (default), theme_bw(), theme_dark(), theme_classic(), etc.
p + theme_bw()
p + theme_dark()
Many other themes are added by the package ggthemes.
Check package website, ggplot2 extensions, and ALL YOUR FIGURE ARE BELONG TO US for more themes.
p + ggthemes::theme_economist()
p + ggthemes::theme_fivethirtyeight()
theme() to tweak the display of the current theme, including title, axis labels, etc. Check ?theme.p + theme(
panel.background =
element_rect(fill = "#FFCC00",
colour = "blue",
linewidth = 2.5,
linetype = "solid"),
plot.background =
element_rect(fill = "lightblue"),
axis.line =
element_line(linewidth = 0.5,
linetype = "solid",
colour = "red")
)
Commonly used characteristics of plotting characters that can be mapped to a specific variable in the data are
colourshapesizealpha (transparency)ggplot(
data = mpg,
mapping = aes(
x = displ,
y = hwy,
color = class)) + #<<
geom_point()
Mapped to a different variable than colour
ggplot(
data = mpg,
mapping = aes(
x = displ,
y = hwy,
color = class,
shape = drv)) + #<<
geom_point()
Mapped to same variable as colour
ggplot(
data = mpg,
mapping = aes(
x = displ,
y = hwy,
color = class,
shape = class)) + #<<
geom_point()
ggplot(
data = mpg,
mapping = aes(
x = displ,
y = hwy,
color = class,
shape = class,
size = cty)) + #<<
geom_point()
ggplot(
data = mpg,
mapping = aes(
x = displ,
y = hwy,
color = class,
shape = class,
size = cty,
alpha = year)) + #<<
geom_point()
Mapping
based on the values of a variable in the data.
aes().ggplot(data = mpg,
mapping = aes(x = displ, y = hwy,
size = cty, alpha = year)) + #<<
geom_point()
Setting
not based on the values of a variable in the data.
geom_*().ggplot(data = mpg,
mapping = aes(x = displ, y = hwy)) +
geom_point(size = 5, alpha = 0.5) #<<
One way to add additional variables’ information is with aesthetics. But we see that putting all information in one plot may not be a good idea.
Another way, particularly useful for categorical variables, is to
split your plot into facets, smaller plots that each display one subset of the data.
ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(~ cyl, ncol = 2) #<<ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_grid(drv ~ cyl) #<<ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
facet_grid(drv ~ cyl)ggplot(data = mpg, mapping = aes(x = displ, y = hwy, color = drv)) +
geom_point() +
facet_grid(drv ~ cyl) +
guides(color = "none") #<<plotnine package
Syntax are the same as ggplot in R.
