Classification.

We’re learning about classification and have started a project involving classification. I’ve tentatively settled on predicting the 30-day readmission of diabetic patients. Preventing patients from being released too early from the hospital is a topic I care about personally. The dataset I’ll use for modeling can be found here: https://archive.ics.uci.edu/ml/datasets/diabetes+130-us+hospitals+for+years+1999-2008.

For classification, we’ve learned how to use logistic regression, K-nearest neighbors (KNN), decision trees, and random forests.

We each set up an EC2 Amazon Web Services (AWS) Ubuntu instance that we can control through ssh. We have PostgreSQL, anaconda, and Jupyter notebooks running remotely on these EC2 servers so that we become comfortable with using AWS.

We have learned how to use SQLAlchemy, psycopg, and PostgreSQL. Since my project does not require SQL so far, I will either complete additional challenge problems to show that I have a degree of SQL mastery. Additionally, I may soon find another interesting dataset that can make good use of SQL.

Tools learned so far:

Tools used for Webscraping: Python, pandas, Selenium, beautifulsoup, fuzzywuzzy.

Tools used for Linear regression: Python, pandas, statsmodels, scikitlearn, matplotlib, seaborn.

Tools used for Classification: scikitlearn.

Tools used for Cloud computing: AWS EC2, ssh.

Tools used for SQL: SQLAlchemy, psycopg, and PostgreSQL.