homepage

courses

previous -- view my teaching portfolio here

Instructions for downloading files

 

 

Buffalo State College

Databases and the Data Science Information Life Cycle, DSA 610

 

Course Description (from the College):  Introduction to a “big picture” understanding of data flow for strategic, data-driven decision making, including data storage, data organization, data gathering and preparation, exploratory data analysis, and meaningful visualizations and communication. Includes hands-on practice.. (3 credits)

 

Syllabus -- in Word format
     

Homework

Important Dates
Final Project -- Due Wednesay, May 17th

more detailed schedule in the syllabus

Email List

via Brightspace

 

Announcements:

Buffalo State
Office:
My BSC voicemail:
My BSC email: mccallb@buffalostate.edu
Office Hours: by appt (after class in-person, or make appt over zoom)

 

Answer Keys

Homework Style Guide

Homeworks to be Turned in

Homework #1 -- Data
Homework #2 -- Data
Homework #3
Homework #4 -- Data
Homework #5
Homework #6
Homework #7 -- Data/Data
Homework #8 -- Data/Data
Homework #9 -- Data

 

Readings
1/27 Data Lifecycle Management: A Complete Guide
Data Analytics Lifecycle
Statistics for Data Science: Complete Guide with Example
2/3 Data generation processes
7 Data Collection Methods in Business Analytics
Guide to Experimental Design | Overview, 5 steps & Examples
Graphs -- 1 variable
2/10 The database development life cycle
The Types of Databases (with Examples)
Python Data Structures
Python Data Types
Top 10 Types of Comparison Charts
2/24 80 types of charts & graphs for data visualization (with examples)
Normal Forms in DBMS
Data Cleaning: What It Is, Why It Matters & How to Do It
Python Web Scraping Tutorial
3/3 What’s the Difference Between a Logical Data Model and a Physical Data Model?
Database vs Spreadsheet: What’s the Difference?
Coding missing values
A Guide to Time Series Analysis in Python
Python Datetime
3/10 The SQL Tutorial for Data Analysis
Introduction to Python SQL Libraries
Feature Engineering in Machine Learning With Python: A Guide
Merging and Aggregating Data
3/17 ACID Transactions
Data Validation: Types, Benefits, and Accuracy Process
Data Exploration - A Complete Introduction
Top Techniques to Handle Missing Values Every Data Scientist Should Know
3/31 Data sharing, management, use, and reuse: Practices and perceptions of scientists worldwide
Statistical Modeling
Regression: Definition, Analysis, Calculation, and Example
4/7 What’s the Difference Between a Data Warehouse, Data Lake, and Data Mart?
Spatial Data: Definition, Types, Examples, Use Cases & More!
Choosing the Right Classification Model
4/14 What Is Parallel Processing?
Apache Spark vs. Hadoop: Key Differences and Use Cases
8 Clustering Algorithms in Machine Learning that All Data Scientists Should Know
4/21 On responsible machine learning datasets emphasizing fairness, privacy and regulatory norms with examples in biometrics and healthcare
Validating Machine Learning Models: A Detailed Overview
Metrics to Evaluate your Machine Learning Algorithm
4/28 Data Retention Policy
What is data storytelling?
Ensemble Methods
5/5 What Are the Different Types of Data Destruction and Which One Should You Use?
Analyzing Text Data

Practice Labs & Class Notes

1/27 2/3
2/10  
2/24 3/3
3/10 3/18
  3/31
4/7 4/14
4/21 4/28
5/5  

N - class notes
D - dataset
J - Jupyter notebook
R - review
B - blank (unexecuted) Jupyter notebook
P - pdf of a Jupyter notebook
H - html version of a Jupyter notebook
E - other example

My server isn't a fan of the Jupyter notebook files. You can access all the versions of the file (blank, executed, pdf and html with associated files) in the zip file here.

Python Playlist on YouTube

Joke

Projects

Project #1 -- Data
Project #2 -- Data
Project #3 -- Data/Data
Final Project -- Data set options posted in Blackboard
Peer Reviews

 

Handouts

Supplemental Lecture Notes on Databases (from an undergrad course):
Units: 1 2 3 4 5 6 7 8 9 10

Links:

Data Lifecycle
Data Lifecycle Management
The Lifecycle of Data
16-Step Lifecycle
Data Lifecycle, Best Practices
Data Lifecycle Management (DLM)
Data Analytics Lifecycle
Data Protection and Information Lifecycle Management
The Data Science Lifecycle

Excel Easy
Data Analysis in Excel
Excel 2016+
Microsoft Excel Video Training

SQL Tutorial
Learn SQL
SQL Basics for Beginners
SQLite Tutorial
Beginners' Guide to SQLite
SQL Query Cheet Sheet
SQL Tutorial
SQL Tutorial for Beginners
Interactive SQL Course

10+ Free Python Books
Python for Beginners
Learn Python
Which Library should I use for my Python Dashboard?
Best Python Data Visualization Libraries

Big Data: How is it generated?
Data Capture
Data Classification
Types of Data Classification
Data Classification
Ethics of Data Collection
The Murky Ethics of Data Gathering in a Post-Cambridge Analytica World
What is Data Validation?
Data Privacy
5 Things to Know about Data Privacy

Data Storage
Data Storage: Emerging Technologies
Data Lakes vs. Data Warehouses
Data Security
Most Common Passwords
Spreadsheets vs. Databases
Relational Databases
Definition and Overview of ODBMS
Object-Oriented Databases and Advantages

JSON Databases
JSON Interchange Standard
JSON vs. XML
What is JSON?
Importing XML into Pandas

Data Maintenance vs. Data Cleansing
Tips to Maintaining Your Data
Data Management
History of Data Management
Data Management: A Cheat Sheet

Data sharing and how it can benefit your scientific career
What is Data Sharing?
Data Reuse
Your Data Can Live Forever: How to Plan for Data Reuse
Why Data Sharing and Reuse are Hard to Do

Data Retention, Archiving and Disposing
Dos and Don'ts of Data Archiving
Data Retention Best Practices
Data Retention and Archiving Policy -- example
OECD Data Retention Policy -- example
Historical Data, Archiving and Retention -- example (HIT)
Data Retention 101
The Essentials of Data Retention: Policies, Plans, and Templates

Safe Data Destruction 101: Why Data Destruction is Necessary
Dispose of Information Properly
Secure Data Disposal and Destruction: 6 Methods
Data Disposal Laws

Data Discovery

Data Preparation
Data Preparation in Data Mining
Why is Data Preparation Important?

Stats NZ
Public Datasets (List of Sources)
Recommended Data Repositories
Data.gov

Color Brewer (for maps)
Color Brewer for Python
Plotly Graphing Library
Maps with Folium
Geographic Maps with Basemap
Python Libraries for GIS
State FIPS codes

Web Scraping with Beautiful Soup
Twitter API
Using APIs with Python
Beginner's Guide to Using an API with Python

Handling Missing Data
Missing Values in Machine Learning
Missing Values Guide
Missing Values from Python Data Science Handbook

Model Planning
Map Reduce
Spark vs. Map Reduce
Natural Language Processing
Sentiment Analysis
Clustering
Classification
Regression
Graph Theory

Statistical Hypothesis Testing

Data and Racial Equity

Communicating Results
Analyzing Data and Communicating Results
Telling a Story with Data
What is a Data Dashboard?
Time Series Forecasting
Introduction to Time Series Forecasting
5 Common Times Series Methods

Four Phases of Operationalizing Analytics
5 Keys to Operationalize Big Data Analytics in the Cloud
Operationalizing Analytics

What is a Robust Machine Learning Model?
Correct Model Validation
Cross Validation

PDF Graph Paper
I Will Derive song
How to draw Greek
GraphCalc
Free Online Math Courses
Excel Tutorials on YouTube
Python Tutorials on YouTube
Mathnotes
Coding
Spring 2021
Spring 2022
Spring 2023
Summer 2023
Spring 2024
Summer 2024

 

 

 
(c) 2019, 2010, 2008, 2004 by Betsy McCall, all rights reserved
To contact the webmistress, email betsy@pewtergallery.com
Last updated: 2022 October 7th