Buffalo
State College
Databases and the Data Science Information Life Cycle, DSA 610
Course Description (from the College):
Introduction to a “big picture” understanding of data flow for strategic,
data-driven decision making, including data storage, data organization,
data gathering and preparation, exploratory data analysis, and meaningful
visualizations and communication. Includes hands-on practice.. (3 credits)
Syllabus -- in Word format
Homework
Important Dates
Final Project -- Due
Wednesay, May 17th
more detailed schedule in the syllabus
Email List
via Brightspace
Announcements:
Buffalo State Office:
My BSC voicemail:
My BSC email: mccallb@buffalostate.edu
Office Hours: by appt (after class in-person, or make appt over zoom)
Answer
Keys
Homework Style Guide
Homeworks to be
Turned in
Homework #1 --
Data
Homework #2 --
Data
Homework #3
Homework #4 --
Data
Homework #5
Homework #6
Homework #7 --
Data/Data
Homework #8 --
Data/Data
Homework #9 --
Data
Readings
1/27 |
Data Lifecycle Management: A Complete Guide
Data Analytics Lifecycle
Statistics for Data Science: Complete Guide with Example |
2/3 |
Data generation processes
7 Data Collection Methods in Business Analytics
Guide to Experimental Design | Overview, 5 steps & Examples
Graphs -- 1 variable |
2/10 |
The database development life cycle
The Types of Databases (with Examples)
Python Data Structures
Python Data Types
Top 10 Types of Comparison Charts |
2/24 |
80 types of charts & graphs for data visualization (with examples)
Normal Forms in DBMS
Data Cleaning: What It Is, Why It Matters & How to Do It
Python Web Scraping Tutorial |
3/3 |
What’s the Difference Between a Logical Data Model and a Physical
Data Model?
Database vs Spreadsheet: What’s the Difference?
Coding missing values
A
Guide to Time Series Analysis in Python
Python Datetime |
3/10 |
The SQL Tutorial for Data Analysis
Introduction to Python SQL Libraries
Feature Engineering in Machine Learning With Python: A Guide
Merging and Aggregating Data |
3/17 |
ACID Transactions
Data
Validation: Types, Benefits, and Accuracy Process
Data
Exploration - A Complete Introduction
Top Techniques to Handle Missing Values Every Data Scientist
Should Know |
3/31 |
Data sharing, management, use, and reuse: Practices and
perceptions of scientists worldwide
Statistical Modeling
Regression: Definition, Analysis, Calculation, and Example |
4/7 |
What’s the Difference Between a Data Warehouse, Data Lake, and
Data Mart?
Spatial Data: Definition, Types, Examples, Use Cases & More!
Choosing the Right Classification Model |
4/14 |
What Is Parallel Processing?
Apache
Spark vs. Hadoop: Key Differences and Use Cases
8 Clustering Algorithms in Machine Learning that All Data
Scientists Should Know |
4/21 |
On responsible machine learning datasets emphasizing fairness,
privacy and regulatory norms with examples in biometrics and
healthcare
Validating Machine Learning Models: A Detailed Overview
Metrics to Evaluate your Machine Learning Algorithm |
4/28 |
Data Retention Policy
What is data storytelling?
Ensemble Methods |
5/5 |
What Are the Different Types of Data Destruction and Which One
Should You Use?
Analyzing
Text Data |
Practice
Labs & Class Notes
1/27 |
2/3 |
2/10 |
|
2/24 |
3/3 |
3/10 |
3/18 |
|
3/31 |
4/7 |
4/14 |
4/21 |
4/28 |
5/5 |
|
N - class notes D - dataset J - Jupyter notebook R - review B
- blank (unexecuted) Jupyter notebook P - pdf of a Jupyter notebook
H - html version of a Jupyter notebook E - other example
My server isn't a fan of the Jupyter notebook files. You can access all
the versions of the file (blank, executed, pdf and html with associated
files) in the zip file here.
Python Playlist on YouTube
Joke
Projects
Project #1 --
Data
Project #2 --
Data
Project #3 --
Data/Data
Final
Project -- Data set options posted in Blackboard
Peer Reviews
Handouts Supplemental Lecture Notes on Databases (from an
undergrad course): Units:
1
2
3
4
5
6
7
8
9
10 Links:
Data Lifecycle
Data Lifecycle Management
The Lifecycle of Data
16-Step Lifecycle
Data
Lifecycle, Best Practices
Data Lifecycle Management (DLM)
Data
Analytics Lifecycle
Data Protection and Information Lifecycle Management
The
Data Science Lifecycle
Excel Easy
Data Analysis in
Excel
Excel 2016+
Microsoft Excel Video Training
SQL Tutorial
Learn SQL
SQL Basics for
Beginners SQLite Tutorial
Beginners' Guide to SQLite
SQL Query Cheet Sheet SQL
Tutorial SQL Tutorial for
Beginners
Interactive SQL Course
10+ Free Python Books
Python for
Beginners Learn Python
Which Library should I use for my Python Dashboard?
Best Python Data Visualization Libraries
Big Data: How is it generated?
Data Capture
Data Classification
Types of Data Classification
Data Classification
Ethics of Data Collection
The Murky Ethics of Data Gathering in a Post-Cambridge Analytica World
What is Data
Validation?
Data Privacy
5 Things to Know about Data Privacy
Data Storage
Data Storage: Emerging Technologies
Data Lakes vs. Data Warehouses
Data Security
Most Common Passwords
Spreadsheets vs. Databases
Relational Databases
Definition and Overview of ODBMS
Object-Oriented Databases and Advantages
JSON Databases JSON
Interchange Standard
JSON vs. XML
What is JSON?
Importing XML into Pandas
Data Maintenance vs. Data Cleansing
Tips to Maintaining Your Data
Data Management
History of Data Management
Data Management: A Cheat Sheet
Data sharing
and how it can benefit your scientific career
What is Data Sharing?
Data Reuse
Your Data Can Live Forever: How to Plan for Data Reuse
Why Data Sharing and
Reuse are Hard to Do
Data Retention, Archiving and Disposing
Dos and Don'ts of Data Archiving
Data
Retention Best Practices
Data Retention and Archiving Policy -- example
OECD Data Retention Policy -- example
Historical Data, Archiving and Retention -- example (HIT)
Data Retention
101
The Essentials of Data Retention: Policies, Plans, and Templates
Safe Data Destruction 101: Why Data Destruction is Necessary
Dispose of Information Properly
Secure Data Disposal and Destruction: 6 Methods
Data Disposal Laws
Data Discovery
Data Preparation
Data Preparation in Data Mining
Why is Data Preparation Important?
Stats NZ
Public
Datasets (List of Sources)
Recommended
Data Repositories Data.gov
Color Brewer (for maps)
Color Brewer for
Python Plotly Graphing
Library
Maps with Folium
Geographic Maps with Basemap
Python
Libraries for GIS
State FIPS codes
Web
Scraping with Beautiful Soup
Twitter API
Using APIs
with Python
Beginner's Guide to Using an API with Python
Handling Missing Data
Missing Values in Machine Learning
Missing Values Guide
Missing Values from Python Data Science Handbook
Model Planning
Map Reduce
Spark vs. Map Reduce
Natural Language Processing
Sentiment Analysis
Clustering
Classification
Regression
Graph Theory
Statistical Hypothesis Testing
Data and Racial Equity
Communicating Results
Analyzing Data and Communicating Results
Telling a Story with Data
What is a Data Dashboard?
Time Series
Forecasting
Introduction to Time Series Forecasting
5 Common Times Series Methods
Four Phases of Operationalizing Analytics
5 Keys to Operationalize Big Data Analytics in the Cloud
Operationalizing Analytics
What is a Robust Machine Learning Model?
Correct Model Validation
Cross Validation
PDF Graph Paper
I Will Derive song How to
draw Greek GraphCalc
Free Online Math
Courses
Excel Tutorials on YouTube
Python Tutorials on YouTube
Mathnotes
Coding
Spring 2021
Spring 2022
Spring 2023
Summer 2023
Spring 2024
Summer 2024
|