Problem
1. Find a big enough data source which you have access to, that you can justify processing with more than a single machine. Do something interesting with it.
2. What is your definition of big data?
3. What is the largest data set that you have processed? What did you do, and what were the results?