Asia-Pacific Centre for Research (ACRE), Inc.
   Home Solutions Articles Partners Company Contact   

home > articles > Web mining

Gaining a competitive edge with Web mining
Source: www.spss.com
Copyright SPSS, Inc. 2004


Every day, your Web site generates millions of pieces of data. You look for ways to leverage that data because it holds intimate details describing the relationship between you and your customers — going beyond the relatively crude information found in customer databases and sales tracking systems. For instance, Web data tells you not only what a customer bought, but where on your site a customer looked prior to making a purchase.

Whether you’re an e-tailer wanting to increase profits or a government body working to provide essential services online, discovering how to uncover useful information about Web visitors from data is the Web mining challenge.

Using Web data to gain an edge on the Internet could make your organization a winner in the years to come. Information gained from mining Web data helps you to create and maintain a site that entices visitors and maximizes the profitability of your customer relationships. Web mining is simply data mining using Web data. Applying data mining techniques to data collected from your site enables you to discover patterns and relationships that would otherwise go undetected. Web mining turns your Web data into useful insight and intelligence, which describes your site and the people who visit it. 

Data mining is now a mission-critical business process, and the Web provides more data to mine — and more chances to learn about your customers or citizens — than ever before.

Web mining empowers you to:

Improve navigation on your site
Use offline analysis to understand how people navigate your site. Then redesign it. Align visitor information with site goals and create appropriate content for each visitor type.

Personalize your customer interactions
Use a variety of powerful analytical techniques that scale up to the traffic on your Web site and receive sophisticated customer intelligence. Use this information to provide personalized content to your visitors.

Ensure Web site reliability
Predict peaks in Web activity and inventory requirements to ensure your site can support special promotions or unusually busy times.


How Web mining can work for your organization
Data mining techniques sift through Web data to identify relationships that ordinary summary reports of page requests and hits cannot uncover. The large volume of data collected on your Web site makes it difficult to understand what visitor actions affect your site’s performance. Standard Web reporting tools can provide a snapshot of your online customers, but they only look in one direction — backward. These reporting tools deliver basic information, such as the number of page views and visitors’ IP addresses — not the type of information needed to determine how visitor behavior affects your Web profits.

Web mining gives you a unique type and quality of information you need to improve your site. When you mine Web data, you use forward-looking models to go way beyond static reports. Information uncovered in these models empowers you to discover what customers want and predict what they will do.

Discover how you can implement Web mining in your organization and start applying the information gained to improve customer and citizen relationships. The following sections describe what data you need to start Web mining and how you can apply it in your organization — and get results.


How you do Web mining

Solve your complex business problems
Before you can begin Web mining, you need to have a clear idea what business problems you need to solve. For example, if you’re an e-tailer, your goal may be to increase sales. You’ll need to understand which visitor behaviors lead to purchases and which behaviors result in abandoned shopping carts. Or, maybe, you want to ensure that customers or citizens can find information quickly. You’ll need to understand the paths people take to find information on your site. And, you’ll need to know what type of data you need to accomplish these business goals.

Use the best practice approach to data mining, CRISP-DM (CRoss-Industry Standard Process for Data Mining). CRISP-DM is a comprehensive data mining methodology and process model that makes large data mining projects faster, more efficient and less costly. CRISP-DM also gives you provisions to align your business goals with the technical aspects of Web mining so that you can reach relevant results. For more information on CRISP-DM, visit www.crisp-dm.org.

Gather the data you need
Accessing and processing the data necessary for reports and analysis is an important, early step for successful Web mining. Web data comes in a variety of formats, but these formats fit into two main categories: 

· Event data — the user and the application or server interact dynamically to generate this data, which are time-stamped records of user actions
· Non-event data — either the Web site generates the data or you collect it elsewhere (you may also store it externally). Examples of such data include transaction and customer records, demographic databases and site architecture and topology.

Event data are generally known as Web logs. Web (or server) logs include a number of different reports that the Web site may generate, such as:

· Access logs — logs recording “hits” or requests, giving time, requesting host, (possibly) username, the request line and transmission status or size Agent logs — optional logs describing the browser software a visitor used
· Error logs — logs detailing a free-form dump of errors
· Referrer logs — optional logs describing “from-to” navigation behavior (i.e., from URL to URL)
· Cookie logs — optional logs describing cookie-keyed interaction between a server and a visitor. Essentially, these logs show if a person using a computer that previously requested pages from your site has requested pages again.
· Elf logs — extended, administrator-defined logs that contain any combination of data from the server environment (similar layout to the access log). They provide a useful way of combining access log, agent log and referrer log data in a single line, thus alleviating many reconnection problems.
· Application logs — data logs from Web-based applications. Applications may be “black box” in nature, such as off-the-shelf mail routing programs, or they may be “white box” line-of-business applications. These applications record data (as well as the manner of recording, location, format, etc.) at dramatically different levels.

Typical non-event data are information in databases and structural data:

· Web-based applications use databases widely. Typically, the information such databases contain is highly structured and relatively low-volume. Customer and order databases are classic examples.
· Structural data describes the design and layout of the Web application in sufficient detail, enabling you to infer a connection between such event data as navigation actions and the “model” of the site visitor

Pre-process Web data to refine your analysis
Like most types of data used in data mining tasks, systems don’t collect Web data with analysis in mind. To effectively mine this data, you need to pre-process raw Web logs and other data. Some of the processing activities you need to do include:

· Identifying visitors by aggregating requests and timestamps by combining the visitor’s IP number or hostname and agent log information on browsers
· Identifying sessions by applying business rules to timestamps and using structural data
· Tying Web behavior to customers by using cookie logs, elf logs, application logs or databases to understand visitor behavior across multiple sessions and the relationship between visitor behavior and key events, like purchasing a particular type of product

This log information could include extended information or it may be different if you base it on your Web infrastructure. Either way, the concept is similar. The following example of a raw access log provides some insight into the size of the pre-processing task. Each line contains a request with information about the visitor’s IP address (masked in the example below), a date, a timestamp and requested HTML or GIF file.

One of the first preparation tasks is to assign a session ID number. For example, you can define a session as a consecutive series of log-entries sorted by IP address, date or time; you can also divide sessions by changed IPs in the sorted list or by how long an IP address spends on a given page. Then you can calculate the time a visitor spends on a page by subtracting the date or times in the log file.

As with any data mining project, the quality and usefulness of your Web mining results depends on your insight. You will need to define your Web mining goals and apply your ideas about how visitors use your site. For example, you need to determine reasonable rules with which to define a visitor session.

How you can use Web mining results

Understand visitors’ behavior on your Web site
Once you have pre-processed your Web data, you can begin analyzing it to detect patterns. At this point, many organizations will evaluate Web site metrics using standard reporting tools. Web mining empowers you to go beyond these measures to provide deeper under-standing of Web site behavior. Using Web mining, your organization goes beyond what many others do — empowering your organization to learn how visitors interact with your Web site.

A typical Web mining analysis: clustering sessions to determine groups of usage patterns. For example, you can group data by the length of visits to a page and purchase activity so that profiles of different visitor types emerge. These visitors might browse and find what they want; or they might be repeat visitors who buy a lot or people who only use a portion of your Web site. Because you can use predictive modeling techniques to group your visitors, you can deploy these techniques back to your Web site to score new visitors to determine their likely membership in usage groups defined by your organization. This means that you can provide appropriate content for these users and meet their specific needs.

The cluster analysis helps us focus on important questions: have we appropriately placed the content on these sections for this type of user? Or, can we do more to make this a better online experience, and perhaps sell more in the online mall? How do these visitors differ from other types of visitors? 

Encouraging stickiness
Stickiness is the extent to which visitors tend to stay on a particular Web site. Having a sticky site means visitors find useful information and browse your site longer.

Performing Web analysis can help increase stickiness. One technique for improving stickiness is clickstream analysis. Clickstreams are paths that visitors take through a Web site. It’s not enough to simply compare recorded page strings that the site serves up to your visitors. You need predictive modeling to get a complete picture and under-stand what makes a site sticky. Web mining enables organizations to examine a set of events, find strong patterns among a Web site’s many clickstreams and predict the last event in a clickstream — whether this event is a page request, purchase or download. 

In all of the clickstreams, users started at the home page, moved to the site map, tried another section of the Web site before moving back to the site map and eventually left. Visitors didn’t stay on the site very long. In fact, this pattern suggests neither the home page nor the site map is intuitive enough for a significant number of people to navigate.

Increase sales using personalization
Finding patterns in Web data can provide a lot of value for understanding your Web business and making site changes to improve your online presence. But the value of Web mining holds even greater promise: custom content delivery — powered by predictive models that tailor Web pages and recommendations for each visitor in real time.

Use a variety of modeling techniques for optimum results
To ensure you provide custom content, you need a variety of powerful techniques that scale up to the traffic on your Web site. Data mining lessons teach us that the accuracy of different techniques (or combination of techniques) depends on the data. The same is true for Web mining. Your data may require different techniques than your competitors’ data and your data may change over time. You must equip yourself with a variety of modeling approaches.



Home | Solutions | Articles | Partners | Company | Contact

The Software Marketing and Applications Company

 

"Analytical tools can improve organizational efficiency, sales, and profits"


Design by www.webphil.com