Daemen
University Data Mining -- CSC 400
Course Description (from the College):
This course discusses techniques for preprocessing data or analysis and
presents the concepts related to data warehousing, online analytical
processing (OLAP), and data generalization. It presents methods for mining
frequent patterns, associations, and correlations. It also presents
methods for data classification and prediction, data-clustering
approaches, and outlier analysis. Topics will include: Rule induction;
decision trees; naive Bayesian probability; neural networks, image
processing, perception and support vector machines, ensemble methods;
boosting, bagging and random forests, cross validation, ROC, clustering
and rule mining; association rule mining, time series. (3
hours) Prerequisite: MTH 325 and CSC 350 (UG)
Syllabus/Syllabus -- in Word format
Final
Project Directions
Homeworks
Important Dates
Exam I Part 1/Part
2/Data -- Thursday, September 29th
key/practice exam
Exam II Part 1/Part 2/Data -- Thursday, November 10th
key/practice exam
see the syllabus for a more detailed
calendar
Email List
can be gotten through Blackboard
Announcements:
Math Adjunct Office
My voicemail
My Daemen email: bmccall@daemen.edu
Office hours: see syllabus
Homeworks
Package Summaries
SQL Tutorial
Data Analyses
(data files are in .xlsx format)
DA #1
DA #2
DA #3
DA #4
DA #5 Data
Census
General Data Analysis Directions
Code Examples
Week 1
Week 2
Week 3
Week 4
Week 5
Week 6
Week 7
Week 8
Week 9
Readings
R Tutorials:
Bar Graphs (base
R) Boxplot
(base R) Dotplots
(base R) Histogram
(base R)
Normal Distribution shaded between two values
(base R) Normal Probability Plots
(base R) Scatterplots with Trendlines and Residual Graphs
(base R)
Handouts:
Answer
Keys
Resources
Tutorials on Advanced Stats and Machine
Learning With R
Applied
Statistics with R (textbook)
Intermediate Statistics with R (textbook)
Probability and Statistics for Engineering and the Sciences, Jay. L.
Devore, 8th ed. (textbook)
Introductory Statistics (textbook)
Practical Statistics for Data Scientists (textbook)
Online Statistics Book
(textbook)
A Little Book of Time Series Analysis for R (textbook)
A Course in Time Series Analysis (textbook/notes)
Introduction to Probability
for Data Science (textbook)
Friedman's ANOVA
Test How to
Perform Friedman's Test in R
Kendall's Tau
Calculating Kendall's Rank Correlation in R
Introduction to Bootstrapping (Statistics by Jim)
Boostrapping in R
Tutorial on Permutation Tests in R
How to use Permuation Tests
Understanding AUC-ROC Curves
Some Packages for ROC Curves
Time
Series Analysis in R
Getting Started with Multiple Imputation in R
Basic Statistics Using R
Learning Statistics with R
Statistics with R (Table of Contents)
Stats
and R
Intro
to Hypothesis Testing in R
R-Tutorial: An R
Introduction to Statistics Tidy
Modeling with R
R Cheatsheets
Free Web Books for Learning (Statistics) with R
Easier ggplot with ggcharts
R
Color Brewer's Palettes
Markdown Cheat Sheet
Smoothing
Cubic and Smoothing Splines in R
B-Spline Basis for Polynomial Splines
Data Minin in R
R and Data Mining
Data Mining with R: Part 1
R Companion for Introduction to Data Mining
R Reference Card for Data Mining
Data
Mining in R
R and Data Mining (U. Idaho)
R and Data Mining
(Cornell) (Resources)
Data Mining Applications with R
Data Mining Tutorial
R Project
R Studio
Anaconda
Using R with Anaconda
Links!
PDF Graph Paper
Bad Graphs (Convention Speeches)
Visualizing Data Badly: 8 Examples Correlation is not Causation:
orginal article /
handout
Presidents by State
TI-Connect Software
How much people lie on surveys
On the Hazards of Significance
Testing
Exploring Correlation and Regression
Central Limit Theorem: with Bunnies and Dragons
SOCR: Statistics Online Computational
Resources
How Many Ways Can You Arrange a Deck of Cards?
Free Online Math
Courses Confidence
Interval for Rho
Free Courses from Coursera
Coding
R Tutorials
MTH 324/MTH
325
|