home > articles >
Data Mining Applications in Higher Education
Source: www.spss.com
Copyright SPSS, Inc. 2004
Part 2 of 3
Case study two: academic planning and interventions
transfer prediction
Challenge
This case study showcases a solution to a vexing higher education problem: how to
accurately predict students academic outcomes so that proactive academic
intervention can be taken. When institutions use data mining to predict which students are
most at risk, institutions
can forestall a student from failing before he or she even knows it.
Over half of the community college students identify transferring to universities as their
goal. However, due to academic difficulties, many of them do not transfer, or they take a
long time to reach this goal. In the past it had been difficult to discover which students
had transferred, but now the National Student Clearing House allows data matching between
community colleges and universities. This means that data miners and decision makers can
link students academic behavior in a junior college to their final transfer
outcomes.
Solution
The key to building a data mining model with this newly found information lies in a
combination of typologies and domain knowledge. What exactly is domain knowledge? It is
comprised of a thorough understanding of the data in the databases, the background against
which data are collected, massaged, and stored, plus a keen sense of the business
objective of the particular field under study. Domain knowledge is valuable to a data
miner for making the right choices in modeling and making adjustments to models. In the
area of transfer education the domain knowledge emphasizes that the most effective means
of increasing student transfers is to identify transfer-directed students as early as
possible. Grooming those who are most likely to transfer is far more
meaningful than counting the number of students who have accumulated enough units to
transfer.
Armed with the knowledge of who had transferred, a dataset was built that contains
students who fell under the general transfer clusters of speeders and
laggards. Those who transferred included anyone who actually transferred
within the cluster as well as those that fell outside of the cluster. The dataset was
split into two using a proprietary randomization method, with one being a test dataset and
the other a validation dataset. The outcome variable was transfer.
Other variables, such as demographics, courses taken, units accumulated, and financial
aid, were predictors that would all be analyzed without stepwise testing for significance.
This is because data mining is very tolerant of variable interactions and non-linear
relationships in data. Supervised data mining was the obvious and appropriate method;
therefore, neural network and rule induction algorithms were chosen to run simultaneously
in order to contrast and compare the prediction accuracy.
Results
Data mining now enables the prediction of good transfer candidates. After extensive
machine learning the neural network algorithm, Neural Net, had a prediction accuracy of 72
percent, and the rule induction algorithm, C5.0 and C&RT (Classification and
Regression Tree), had a prediction accuracy of 80 percent. The models then ran against the
test dataset (holdout data) and produced similar results, which indicated that the models
had a good grasp of the intrinsic patterns within the data.
|
Home | Solutions
| Articles | Partners | Company | Contact |
The Software Marketing and
Applications Company |
"Analytical tools can improve organizational efficiency,
sales, and profits"
|