Asia-Pacific Centre for Research (ACRE), Inc.
   Home Solutions Articles Partners Company Contact   

home > articles


Data Mining Applications in Higher Education
Source: www.spss.com
Copyright SPSS, Inc. 2004

Part 1 of 3


Case study one: creating meaningful learning outcome typologies



Challenge

"What do institutions know about their students?" If the answer is a recital of the percentages of enrollment by gender or some other counts, then institutions truly do not know their students. This case study shows how unsupervised data mining enables suburban community
colleges to establish learning outcome typologies1 for their students.

In a typical suburban community college with an enrollment of 15,000, students are traditionally identified as "transfer-oriented," "vocational education directed," or "basic skill upgraders."
However, these identifications are all based on students' initial declaration of educational goals at enrollment. These groupings are very inclusive classifications, but they don't help in comprehending the vast differences among each type of students.


Solution

To establish appropriate typologies for 15,000 students, researchers used both TwoStep and K-means, two powerful clustering algorithms. The first attempt was made using the aforementioned general groupings of "transfer," "vocational," and "basic skills." The results appeared to
be mixed. The boundaries among clusters were unclear and dispersed - a reflection of goodness of fit of the feature vectors associated with centroids. After repeated testing on holdout datasets as well as the removal of suspected outliers, the results did not improve much. It
is possible that the students' declaration of goals did not dictate their academic behavior. Therefore, a replacement method was used by looking at two elements: educational outcomes in combination with lengths of study.

Defining educational outcomes is easier said than done. First of all, enough time must pass to conclude that a student has reached a certain milestone. Secondly, dropping out was also an outcome by itself. After this, further work was conducted to determine length of study, which
required decisions on how to deal with students, or "stopouts," who left for a while and came back later for more.

All of these tested one's domain knowledge. There are no absolutely right or absolutely wrong typologies. They are all relative, giving new meaning to unsupervised data mining. A typology is a good one if it serves a particular business or scientific research objective.

After dealing with outliers (cases that do not appear to belong to any group) by either finding them a home (cluster) or removing them, TwoStep produced the following clusters: "transfers," "vocational students," "basic skills students," "students with mixed outcomes," and "dropouts." K-means validated these clusters. After introducing the element of length of study, it gave new dimensions to each of the clusters. Some transfers blazed through their studies in no time; some vocational students took longer; and others appeared to be happy taking a course or two for no particular purpose.


Results

Data mining, combined with student demographics and other information, helped colleges better describe the clusters. For example, certain older students tended to take their time and younger students with better socioeconomic backgrounds often picked high credit courses
to graduate quickly. The most interesting part of classification is to name these typologies. For example, we used the term "transfer speeders" to describe those who piled up their units quickly, and "college historians" to describe those who have been taking classes forever. Others are "fence-sitters," "skill upgraders," etc.

Typologies like the above are important because they reach beyond conventional student profiling. They provide a way to identify homogenous groups of students, thus increasing the accuracy of predictive modeling algorithms. Even if data mining ceases after having found sensible typologies, the knowledge of the newly discovered patterns and relationships helps college teachers and administrators better meet the needs of various student groups.

 

Home | Solutions | Articles | Partners | Company | Contact

The Software Marketing and Applications Company

 

"Analytical tools can improve organizational efficiency, sales, and profits"


Design by www.webphil.com