library(tidyverse)
library(ggplot2)
<- mpg |>
class_avg group_by(class) |>
summarise(displ = median(displ), hwy = median(hwy))
library(ggrepel)
ggplot(mpg, aes(displ, hwy, colour = class)) +
geom_label_repel(aes(label = class), data = class_avg, size = 6,
label.size = 0, segment.color = NA) +
geom_point() + theme(legend.position = "none")
Homework 1: Quarto, Basic Syntax and Data Importing
Spring 2025 MATH/COSC 3570 Introduction to Data Science by Dr. Cheng-Han Yu
1 Autobiography
Please introduce yourself. You can share anything, your hometown, major, family, hobbies, working experience, honors and awards, special skills, etc, yes anything! Your autobiography should include:
- At least two paragraphs (Paragraphs are separated by a blank line)
- Bold text
- Italic text
- Text with both bold AND italic font (Not mentioned in class, but you should be able to figure it out)
- Clickable text with a hyperlink
- Blockquote
- Listed items
- emoji
To make your emoji works, add
from: markdown+emoji
in your YAML header, like--- title: "My Document" from: markdown+emoji ---
Then add emoji to your writing by typing
:EMOJICODE:
. Check emoji cheatsheet.
Your Self-Introduction:
2 Chunk Options
Please check the references https://quarto.org/docs/reference/cells/cells-knitr.html and answer the following questions.
Please add your nice picture using
knitr::include_graphics()
in the code chunk labeled photo. Please useecho
to not to show the codefig-cap
to add a figure captionfig-cap-location
to put the caption on the margin.
In the code chunk opt-echo, use the chunk option
echo
to NOT to showlibrary(tidyverse)
,library(ggplot2)
, andlibrary(ggrepel)
. Note: you may need to use !expr. Check stackoverflow discussion and Quarto chunk options.fig-align
to have the figure right-aligned.
A Marquette student has a really bad code style. Please
Add the chunk option
tidy
in the chunk labelled style to make her code below more readable.Add another option
eval
so that the code is NOT run.
You may need to install the package
formatR
so that the tidy format can work.
for(k in 1:10){j=cos(sin(k)*k^2)+3;l=exp(k-7*log(k,base=2));print(j*l-5)}
[1] 4.966218
[1] -4.985713
[1] -4.998994
[1] -4.999823
[1] -4.999956
[1] -4.999988
[1] -4.999988
[1] -4.999991
[1] -4.999995
[1] -4.999996
- Use the chunk option
results
in the chunk labelled cat, so that the text output is “I love Marquette and Data Science!”.
cat("I love **Marquette** and *Data Science*!\n")
I love **Marquette** and *Data Science*!
- We can re-use a code chunk by using its name! Please use the option
#| label: photo
and make the empty code chunk run the code in the code chunk named photo. Note that the chunk options are not carried.
3 Basic R
3.1 Vector
Use the built-in data set LakeHuron
that records annual measurements of the level, in feet, of Lake Huron 1875–1972.
- Return a logical vector that shows whether the lake level is higher than the average level or not.
- Return years that have a level higher than the average.
3.2 Data Frame
- Make the
mtcars
dataset as a tibble usingas_tibble()
. Call ittbl
.
- Print the sub data of
tbl
that contains the 11th to 15th rows and the last three columns.
- Grab the second and the third columns of
tbl
.
- Extract the fourth column of
tbl
as a numerical vector.
Start with
tbl
, use the pipe operator|>
to do the followings sequentially.extract the first 10 observations (rows) using
head()
find the column names
colnames()
sort the columns names using
sort()
in a decreasing order. (alphabetically from z to a)
3.3 Data Importing
Use
readxl::read_excel()
to read the datasales.xlsx
in the data folder. Use argumentssheet
,skip
andcol_names
so that the output looks like# A tibble: 9 x 2 id n <chr> <chr> 1 Brand 1 n 2 1234 8 3 8721 2 4 1822 3 5 Brand 2 n 6 3333 1 # … with 3 more rows
- Use
readxl::read_excel()
to read in thefavourite-food.xlsx
file in the data folder and call the datafav_food
. Use the argumentna
to treat “N/A” and “99999” as a missing value. Print the data out.
4 Basic Python
import numpy as np
import pandas as pd
4.1 Pandas Data Frame
- Import the data set
mtcars.csv
usingpd.read_csv()
. Then print the first five rows.
- Use method
.iloc
to obtain the first and fourth rows, and the second and third columns. Name the datadfcar
.
- Set the row names of
dfcar
toMazda
andHornet
.
- Use method
.loc
to obtain rowHornet
and columndisp
.
4.2 NumPy Array
In class, we learned the R data structure matrix:
<- matrix(data = 1:6, nrow = 3, ncol = 2)
mat c(1, 3), 1]
mat[<- matrix(data = c(7, 0, 0, 8, 2, 6), nrow = 3, ncol = 2)
mat_c cbind(mat, mat_c)
- Use NumPy methods to create an array equivalent to
mat
. Call itmat_py
.
- Subset the
mat_py
so that the result is equivalent tomat[c(1, 3), 1]
.
- Create an array equivalent to
mat_c
. Call itmat_py_c
. Then combine them by columns usingnp.hstack()
.