A sample of office assistants across the San Francisco-Bay Area was created. You want to build a model that will allow you to best predict a starting salary of potential new hires. It was suggested that the data should be analyzed with a regression model to determine if salary was related to years of experience, aptitude test scores, EI scores, number of foreign languages spoken, word processing speed and employee satisfaction. This data is shown in Table 1 and loaded on the website under the EXAM folder at the top of the course age.
- Build the cleaned up regression model.
- Are there any outliers?
- Discuss the R squared in an English sentence
- Why do we want the fewest number of x's in the model?