Welcome Aboard πŸ™Œ

MATH/COSC 3570 Introduction to Data Science

Dr. Cheng-Han Yu
Department of Mathematical and Statistical Sciences
Marquette University

Taipei, Taiwan

Taiwan location

My Journey

  • Assistant Professor (2020/08 - )

  • Postdoctoral Fellow

  • PhD in Statistics

  • MA in Economics/PhD program in Statistics

How to Reach Me

  • Office hours TuTh 4:50 - 5:50 PM and Wed 12 - 1 PM in Cudahy Hall 353.
  • πŸ“§
    • Answer your question within 24 hours.
    • Expect a reply on Monday if shoot me a message on weekends.
    • Start your subject line with [math3570] or [cosc3570] followed by a clear description of your question.
  • I will NOT reply your e-mail if … Check the email policy in the syllabus!

TA Information

  • Statistics PhD student Qishi Zhan

  • πŸ“¨

  • Help desk hours: To be announced.

  • Welcome to set up a meeting with your TA via Teams.

  • Let me know if you need any other help! πŸ˜„

What is This Course?

  • Every aspect of doing a practical data science project, from importing data to deploying what we learn from data.

❓ What are prerequisites?
πŸ‘‰ COSC 1010 (Intro Programming) and MATH 4720 (Intro Stats) or MATH 2780 (Intro Regression)


❓ Is this like another intro stats course?
πŸ‘‰ No. Statistics and data science are closely related.

Nowadays
πŸ‘‰ Data science is a broader subject than statistics.

πŸ‘‰ Statistics focuses more on analyzing and learning from data, a part of the entire workflow of data science.


❓ Is this like another intro CS or programming course?
πŸ‘‰ Absolutely not. We learn how to code for doing data science, not for understanding computer systems and structures.

What is NOT Covered in This Course

  • Advanced data analytics and computing
    • MATH 4750 Statistical Computing
    • MATH 4760 Time Series Analysis
    • MATH 4780 Regression Analysis
    • MATH 4790 Bayesian Statistics
    • COSC 4600 Fundamentals of Artificial Intelligence
    • COSC 4610 Data Mining
    • COEN 4860 Introduction to Neural Networks
  • Big data: We start with small, in-memory data sets. You don’t know how to tackle big data unless you have experience with small data.
  • Database: Learn SQL in
    • COSC 4800 Principles of Database Systems
    • INSY 4052 Database Management Systems.

What Computing Languages?

~ 60%

~ 40%

  • You’ve learned Python in COSC 1010. Being R-Python bilingual is getting more important!

πŸ‘‰ Wouldn’t it be great to add both languages to your resume! 😎

❌ Don’t want to learn R and/or Python? Take 3570 next semester~!

❌ Drop deadline: 01/21 (Tue), 11:59 PM.

Where to Code? Posit Cloud


  • Have nice computing power and interactive collaboration with me and your teammates!

  • Student plan: $5/month (Cheaper than buying a textbook!)

Course Materials

Learning Management System (D2L)

  • Assessments > Grades

Grading Policy ✨

  • 30% In-class lab exercises and participation.

  • 30% Homework

  • 15% Midterm mini project

  • 25% Final project competition

  • Extra credit opportunities

  • ❌ You have to participate in the final presentation in order to pass the course.
  • ❌ You will NOT be allowed to do any extra credit projects/homework/exam to compensate for a poor grade.

Grade-Percentage Conversion

  • \([x, y)\) means greater than or equal to \(x\) and less than \(y\). For example, 94.0 is in [94, 100] and the grade is A and 93.8 is in [90, 94) and the grade is A-.
Grade Percentage
A [94, 100]
A- [90, 94)
B+ [87, 90)
B [84, 87)
B- [80, 84)
C+ [77, 80)
C [74, 77)
C- [70, 74)
D+ [65, 70)
D [60, 65)
F [0, 60)

Lab Exercises (30%)

  • Graded as Complete/Incomplete and used as evidence of attendance and participation.

  • Allowed to have one incomplete lab exercise without any penalty.

  • Beyond that, 2% grade percentage will be taken off for each missing/incomplete exercise.

  • You will create a RStudio project in Posit Cloud saving all of your lab exercises. (We’ll go through know-how together)

Homework (30%)

  • The homework assignments are individual. Submit your own work.

  • ❌ You may not share answers/code with your classmates.

  • Homework will be assigned through GitHub:
    • clone/pull the homework repo into Posit Cloud
    • work on the Quarto file in the repo (We’ll go through know-how together)
  • You will have at least one week to complete your assignment.

  • ❌ No make-up homework for any reason unless you have excused absence. πŸ™

Mini Project (15%)

  • You will be team up to do the midterm mini project.

  • More details about the mini project presentation will be released later.

Project (25%)

  • You will be team up to do the final project.

  • Your project can be in either of the following categories:

    1. Data analysis using statistical models or machine learning algorithms

    2. Introduce a R or Python package not learned in class, including live demo

    3. Introduce a data science tool (visualization, computing, etc) not learned in class, including live demo

    4. Introduce a programming language not learned in class for doing data science, including live demo, Julia, SQL, MATLAB, SAS for example.

    5. Web development: Shiny website or dashboard, including live demo

  • The final project presentation is on Thursday, 5/1, 2 PM and Monday, 5/5, 10:30 AM - 12:30 PM.

  • More information will be released later.

Generative AI and Sharing/Reusing Code Policy

Generative AI

  • You may use generative AI tools such as ChatGPT or DALL-E to generate a first draft of text for your work, provided that this use is documented and cited.

[Example] Data science is an interdisciplinary field that … 1

Sharing/Reusing Code

  • Unless explicitly stated otherwise, you may make use of any online resources, but you must explicitly cite where you obtained any code you directly use or use as inspiration in your solutions.

[Example]

  • The code is modified from the GitHub repo https://github.com/chenghanyustats/slam

  • The code is generated from ChatGPT response to β€œPlease generate Python code for solving the math problem I attach,” Jan 14, 2025.

Academic Integrity

This course expects all students to follow University and College statements on academic integrity.

  • Honor Pledge and Honor Code: I recognize the importance of personal integrity in all aspects of life and work. I commit myself to truthfulness, honor, and responsibility, by which I earn the respect of others. I support the development of good character, and commit myself to uphold the highest standards of academic integrity as an important aspect of personal integrity. My commitment obliges me to conduct myself according to the Marquette University Honor Code.

Q & A

❓ K: I hope to learn more about programming. R: I’m looking forward to learning more about what data science consists of rather than just learning a programming language.
πŸ‘‰ To make sure everyone is on the same page, first couple of weeks is about learning R and Python syntax. After spring break, we focus on modeling and machine learning methods.


❓ What do you think will be the most interesting part of the course?
πŸ‘‰ I love data visualization and web development.


❓ D: Do I need good coding skills to be able to succeed? G: How much of this class is about coding?

πŸ‘‰ We’ll learn basic syntax for doing data science step by step.


❓ What kind of time estimate do you believe most students should spend on reading + assignments for the course?
πŸ‘‰ Everyone is different. The more the better.

Q & A

❓ What kind of projects will we be doing for this course?
πŸ‘‰ Any project related to data works. You propose one to me. We discuss it, then decide.


❓ Will this class help me better understand how to code proficiently?
πŸ‘‰ As you learn to speak a foreign language, you need to code a lot, frequently and constantly in order to be proficient in any programming language. No shortcut.


❓ Do you know of any internships or research positions offered through Marquette University that incorporate the skills learned in this Data Science course?
πŸ‘‰ Quite many. Northwestern Mutual, Direct Supply, for example. I’ll share intern info with you if I get any.


❓ More questions

Bring your laptop next time!