Project

Predicting Success: An Application of Random Forests to Student Outcomes

This project examines the effectiveness of applying machine learning techniques to the realm of college student success, specifically with the intent of discovering and identifying those student characteristics and factors that show the strongest predictive capability with regards to successful graduation. The student data examined consists of first time freshmen and transfer students who matriculated at California State University San Marcos in the period of Fall 2000 through Fall 2010 and who either graduated successfully or discontinued their education. Operating on over 30,000 student observations, random forests are used to determine the relative importance of the student characteristics with genetic algorithms to perform feature selection and pruning. To improve the machine learning algorithm cross validated hyper-parameter tuning was also implemented. Overall predictive strength is relatively high as measured by the Matthews Correlation Coefficient, and both intuitive and novel features which provide support for the learning model are explored.

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.