MATH/COSC 3570 Introduction to Data Science
quit
or exit
to switch back to R.2 + 3 / (5 * 4) ^ 2
[1] 2.01
5 == 5.00
[1] TRUE
# 5 and 5L are of the same value too
# 5 is of type double; 5L is integer
5 == 5L
[1] TRUE
typeof(5L)
[1] "integer"
!TRUE == FALSE
[1] TRUE
Math functions in R are built-in.
# R comment
c()
, short for concatenate or combine.:
to create a sequence of integers.seq()
to create a sequence of numbers of type double
with more options.
[]
with element indexing.factor
can be ordered in a meaningful way. Create a factor by factor()
.## Create a factor from a character vector using function factor()
(fac <- factor(c("med", "high", "low")))
[1] med high low
Levels: high low med
Lists are different from (atomic) vectors: Elements can be of any type, including lists.
Construct a list by using list()
.
Return an element of a list
## subset by name (a vector)
x_lst$idx
[1] 1 2 3
## subset by indexing (a vector)
x_lst[[1]]
[1] 1 2 3
typeof(x_lst$idx)
[1] "integer"
Return a sub-list of a list
## subset by name (still a list)
x_lst["idx"]
$idx
[1] 1 2 3
## subset by indexing (still a list)
x_lst[1]
$idx
[1] 1 2 3
typeof(x_lst["idx"])
[1] "list"
If list
x
is a train carrying objects, thenx[[5]]
is the object in car 5;x[4:6]
is a train of cars 4-6.— @RLangTip, https://twitter.com/RLangTip/status/268375867468681216
dim
.matrix()
to create a matrix.,
to separate row and column index.mat[2, 2]
extracts the element of the second row and second column.mat
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
## all rows and 2nd column
## leave row index blank
## specify 2 in coln index
mat[, 2]
[1] 4 5 6
## 2nd row and all columns
mat[2, ]
[1] 2 5
## The 1st and 3rd rows and the 1st column
mat[c(1, 3), 1]
[1] 1 3
cbind()
(binding matrices by adding columns)
rbind()
(binding matrices by adding rows)
When matrices are combined by columns (rows), they should have the same number of rows (columns).
data.frame()
that takes named vectors as input “element”.## data frame w/ an dbl column named age
## and char column named gender.
(df <- data.frame(age = c(19, 21, 40),
gen = c("m", "f", "m")))
age gen
1 19 m
2 21 f
3 40 m
## a data frame has a list structure
str(df)
'data.frame': 3 obs. of 2 variables:
$ age: num 19 21 40
$ gen: chr "m" "f" "m"
## must set column names
## or they are ugly and non-recognizable
data.frame(c(19, 21, 40), c("m", "f", "m"))
c.19..21..40. c..m....f....m..
1 19 m
2 21 f
3 40 m
Data frame has properties of matrix and list.
## rbind() and cbind() can be used on df
df_r <- data.frame(age = 10,
gen = "f")
rbind(df, df_r)
age gen
1 19 m
2 21 f
3 40 m
4 10 f
df_c <-
data.frame(col = c("red","blue","gray"))
(df_new <- cbind(df, df_c))
age gen col
1 19 m red
2 21 f blue
3 40 m gray
Can use either list or matrix subsetting methods.
df_new
age gen col
1 19 m red
2 21 f blue
3 40 m gray
## Subset rows
df_new[c(1, 3), ]
age gen col
1 19 m red
3 40 m gray
## select the row where age == 21
df_new[df_new$age == 21, ]
age gen col
2 21 f blue
05-R Data Type Summary
In lab.qmd Lab 5,
Create R objects vector v1
, factor f2
, list l3
, matrix m4
and data frame d5
.
Check typeof()
and class()
of those objects, and create a list having the output below.
v1 <- __________
f2 <- __________
l3 <- __________
m4 <- __________
d5 <- __________
v <- c(type = typeof(v1), class = class(v1))
f <- c(type = __________, class = _________)
l <- c(type = __________, class = _________)
m <- c(type = __________, class = _________)
d <- c(type = __________, class = _________)
____(vec = v,
______ = ___,
______ = ___,
______ = ___,
______ = ___)
[]
.0
: the 1st element-1
: the last elementWhat does lst[0:1]
return? Is it a list?
Lists are changed in place!
list.method()
Tuples work exactly like lists except they are immutable, i.e., they can’t be changed in place.
To create a tuple, we use ()
.
Note
Lists have more methods than tuples because lists are more flexible.
A dictionary consists of key-value pairs.
A dictionary is mutable, i.e., the values can be changed in place and more key-value pairs can be added.
To create a dictionary, we use {'key': value}
.
The value can be accessed by the key in the dictionary.
{'Name': 'Ivy', 'Age': 9, 'Class': 'Third'}
dict_items([('Name', 'Ivy'), ('Age', 9), ('Class', 'Third')])
06-Python Data Structure
In lab.qmd Lab 6,
Remember to create Python code chunk
Any issue of this Python chunk?
Commit and Push your work once you are done.
In your lab-yourusername project, run
library(reticulate)
virtualenv_create("myenv")
Go to Tools > Global Options > Python > Select > Virtual Environments
You may need to restart R session. Do it, and in the new R session, run
library(reticulate)
py_install(c("numpy", "pandas", "matplotlib"))
summary(data)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.0 9.8 17.0 55.3 47.5 230.0
plot()
mtcars[1:15, 1:4]
mpg cyl disp hp
Mazda RX4 21.0 6 160 110
Mazda RX4 Wag 21.0 6 160 110
Datsun 710 22.8 4 108 93
Hornet 4 Drive 21.4 6 258 110
Hornet Sportabout 18.7 8 360 175
Valiant 18.1 6 225 105
Duster 360 14.3 8 360 245
Merc 240D 24.4 4 147 62
Merc 230 22.8 4 141 95
Merc 280 19.2 6 168 123
Merc 280C 17.8 6 168 123
Merc 450SE 16.4 8 276 180
Merc 450SL 17.3 8 276 180
Merc 450SLC 15.2 8 276 180
Cadillac Fleetwood 10.4 8 472 205
plot(x = mtcars$mpg, y = mtcars$hp,
xlab = "Miles per gallon",
ylab = "Horsepower",
main = "Scatter plot",
col = "red",
pch = 5, las = 1)
matplotlib.pyplot
mpg cyl disp hp
0 21.0 6 160.0 110
1 21.0 6 160.0 110
2 22.8 4 108.0 93
3 21.4 6 258.0 110
4 18.7 8 360.0 175
5 18.1 6 225.0 105
6 14.3 8 360.0 245
7 24.4 4 146.7 62
8 22.8 4 140.8 95
9 19.2 6 167.6 123
10 17.8 6 167.6 123
11 16.4 8 275.8 180
12 17.3 8 275.8 180
13 15.2 8 275.8 180
14 10.4 8 472.0 205
The command plt.scatter()
is used for creating one single plot.
If multiple subplots are wanted in one single call, one can use plt.subplots()
boxplot()
boxplot()
hist()
hist()
decides the class intervals/with based on breaks
. If not provided, R chooses one.hist(mtcars$wt,
breaks = 20,
col = "#003366",
border = "#FFCC00",
xlab = "weights",
main = "Histogram of weights",
las = 1)
hist()
barplot()
(counts <- table(mtcars$gear))
3 4 5
15 12 5
barplot()
pie()
3 4 5
46.9 37.5 15.6
(labels <- paste0(3:5, " gears: ", percent, "%"))
[1] "3 gears: 46.88%" "4 gears: 37.5%" "5 gears: 15.62%"
pie(x = counts, labels = labels,
main = "Pie Chart",
col = 2:4,
radius = 1)
pie()
image()
image()
function displays the values in a matrix using color.In Python,
fields::image.plot()
num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ...
image.plot(volcano)
num [1:87, 1:61] 100 101 102 103 104 105 105 106 107 108 ...
scatterplot3d()
library(scatterplot3d)
scatterplot3d(x = mtcars$wt,
y = mtcars$disp,
z = mtcars$mpg,
main = "3D Scatter Plot",
xlab = "Weights",
ylab = "Displacement",
zlab = "Miles per gallon",
pch = 16,
color = "steelblue")
In Python,
persp()
par(mar = c(0,0,0,0))
# Exaggerate the relief
z <- 2 * volcano
# 10 meter spacing (S to N)
x <- 10 * (1:nrow(z))
# 10 meter spacing (E to W)
y <- 10 * (1:ncol(z))
par(bg = "slategray")
persp(x, y, z, theta = 135, phi = 30,
col = "green3", scale = FALSE,
ltheta = -120, shade = 0.75,
border = NA, box = FALSE)
In Python,
volcano = pd.read_csv("./slides/data/volcano.csv", index_col=0)
volcano = volcano.values
z = 2 * volcano
x = np.arange(1, z.shape[0] + 1) * 10
y = np.arange(1, z.shape[1] + 1) * 10
X, Y = np.meshgrid(y, x)
fig = plt.figure()
ax = fig.add_subplot(projection='3d', facecolor="slategray")
ax.plot_surface(X, Y, z, cmap="Greens", edgecolor="none", shade=True, alpha=0.9)
plt.show()
07-Plotting
In lab.qmd ## Lab 7
,
mtcars
data, use R or Python to
make a scatter plot of miles per gallon
vs. weight
. Decorate your plot using arguments, col
, pch
, xlab
, etc.
create a histogram of 1/4 mile time. Make it beautiful!
We will talk about data visualization in detail soon!