Some ideas about my current project:
I’m currently using Project Gutenberg for unsupervised learning. I’ve performed topic modeling with LDA and NMF using a chaotic assortment of over 1700 books, mostly fiction and science fiction.
I’m trying to determine where to go from here, since I can get out topics and then cluster those topics, etc, but I’m hoping to do something a little more.
I’ve played with sentence-level topic modeling on individual books to good effect, but the primary story I can think to tell from that data is just how the topics change over the course of the book.
What I’d really like to do is look at topics across author or compare the topic trajectories of books by the same author. Do some authors have a much more consistent story fingerprint than others? How well could I infer this from topic modeling?
Some ideas for related projects:
- Steam reviews topic modeling
- Game recommender from Steam reviews
- Book recommender from Goodreads reviews
- Anime and cartoon cross-recommender from synopses (myanimelist, watchcartoon?, wikipedia?)
- Electronic health record data: This was what I wanted to work on originally, but I was having a hell of a time finding a good source for data. If anyone knows where I can get access to EHR notes for doing natural language processing and machine learning, please let me know.
Some ideas for improving my past projects:
- Predict Steam overall review number rather than peak concurrent users
- Improve the appearance of my prediction app for hospital readmission