homepage

courses

previous -- view my teaching portfolio here

 

 

 

Daemen University

Data Mining -- CSC 400

 

Course Description (from the College): This course discusses techniques for preprocessing data or analysis and presents the concepts related to data warehousing, online analytical processing (OLAP), and data generalization. It presents methods for mining frequent patterns, associations, and correlations. It also presents methods for data classification and prediction, data-clustering approaches, and outlier analysis. Topics will include: Rule induction; decision trees; naive Bayesian probability; neural networks, image processing, perception and support vector machines, ensemble methods; boosting, bagging and random forests, cross validation, ROC, clustering and rule mining; association rule mining, time series. (3 hours)
Prerequisite: MTH 325 and CSC 350 (UG)

 

Syllabus/Syllabus -- in Word format
Final Project Directions

Homeworks
 

Important Dates
Exam I Part 1/Part 2/Data -- Thursday, September 29th key/practice exam
Exam II Part 1/Part 2/Data -- Thursday, November 10th key/practice exam
 

see the syllabus for a more detailed calendar
 

Email List

can be gotten through Blackboard

Announcements:

Math Adjunct Office
My voicemail
My Daemen email: bmccall@daemen.edu
Office hours: see syllabus


 

Homeworks

Package Summaries

SQL Tutorial

Data Analyses
(data files are in .xlsx format)

DA #1
DA #2
DA #3
DA #4
DA #5 Data
Census

General Data Analysis Directions

Code Examples

Week 1
Week 2
Week 3
Week 4
Week 5
Week 6
Week 7
Week 8
Week 9


Readings

1/23 N Reading 1/25 N Reading
1/30 N Reading 2/1 N Reading 1 2
  2/8 N Reading 1 2
2/13 N Reading 1 2 2/15 N Reading 1 2 3
2/20 N Reading  
2/27 N Reading 2/29 N Reading 1 2 3
3/5 N Reading 1 2 3 3/7 N Reading 1 2 3
   
3/19 N Reading 1 2 3/21 N Reading
3/26 N Reading 1 2 3/28 N Reading 1 2
  4/4 N Reading
  4/11 N Reading 1 2
4/16 N Reading 1 2 3 4/18 N Reading 1 2
4/23 N Reading 1 2 3 4/25 N Reading 1 2
   

R Tutorials:

Bar Graphs (base R)
Boxplot (base R)
Dotplots (base R)
Histogram (base R)
Normal Distribution shaded between two values (base R)
Normal Probability Plots (base R)
Scatterplots with Trendlines and Residual Graphs (base R)

 

Handouts:

Answer Keys


Resources

Tutorials on Advanced Stats and Machine Learning With R
Applied Statistics with R (textbook)
Intermediate Statistics with R (textbook)
Probability and Statistics for Engineering and the Sciences, Jay. L. Devore, 8th ed. (textbook)
Introductory Statistics (textbook)
Practical Statistics for Data Scientists (textbook)
Online Statistics Book (textbook)
A Little Book of Time Series Analysis for R (textbook)
A Course in Time Series Analysis (textbook/notes)
Introduction to Probability for Data Science (textbook)
Friedman's ANOVA Test
How to Perform Friedman's Test in R
Kendall's Tau
Calculating Kendall's Rank Correlation in R
Introduction to Bootstrapping (Statistics by Jim)
Boostrapping in R
Tutorial on Permutation Tests in R
How to use Permuation Tests
Understanding AUC-ROC Curves
Some Packages for ROC Curves
Time Series Analysis in R
Getting Started with Multiple Imputation in R
Basic Statistics Using R
Learning Statistics with R
Statistics with R (Table of Contents)
Stats and R
Intro to Hypothesis Testing in R
R-Tutorial: An R Introduction to Statistics
Tidy Modeling with R
R Cheatsheets
Free Web Books for Learning (Statistics) with R
Easier ggplot with ggcharts
R Color Brewer's Palettes
Markdown Cheat Sheet
Smoothing
Cubic and Smoothing Splines in R
B-Spline Basis for Polynomial Splines
Data Minin in R
R and Data Mining
Data Mining with R: Part 1
R Companion for Introduction to Data Mining
R Reference Card for Data Mining
Data Mining in R
R and Data Mining (U. Idaho)
R and Data Mining (Cornell) (Resources)
Data Mining Applications with R
Data Mining Tutorial

R Project
R Studio
Anaconda
Using R with Anaconda

 

Links!

PDF Graph Paper
Bad Graphs (Convention Speeches)
Visualizing Data Badly: 8 Examples
Correlation is not Causation: orginal article / handout
Presidents by State
TI-Connect Software
How much people lie on surveys
On the Hazards of Significance Testing
Exploring Correlation and Regression
Central Limit Theorem: with Bunnies and Dragons
SOCR: Statistics Online Computational Resources
How Many Ways Can You Arrange a Deck of Cards?
Free Online Math Courses
Confidence Interval for Rho
Free Courses from Coursera

Coding
R Tutorials
MTH 324/MTH 325

 

 

 
(c) 2013, 2007, 2004 by Betsy McCall, all rights reserved
To contact the webmistress, email betsy@pewtergallery.com
Last updated: 2022 May 8th