Problem
Build a document clustering mechanism. You can scrape a few news sites. Convert each document into an array of numbers (read: TF-IDF). Build a clustering algorithm to cluster documents into 10 categories. Suggest you use k-means algorithm (as it's much simpler), but if you're feeling adventurous, try using NMF or LDA. Submit code used to do clustering, as well as assigning a category to a new document.