| home > articles > Web mining Gaining
a competitive edge with Web mining
Source: www.spss.com
Copyright SPSS, Inc. 2004
Every day, your Web site generates millions of pieces of data. You look for ways to
leverage that data because it holds intimate details describing the relationship between
you and your customers going beyond the relatively crude information found in
customer databases and sales tracking systems. For instance, Web data tells you not only
what a customer bought, but where on your site a customer looked prior to making a
purchase.
Whether youre an e-tailer wanting to increase profits or a government body
working to provide essential services online, discovering how to uncover useful
information about Web visitors from data is the Web mining challenge.
Using Web data to gain an edge on the Internet could make your organization a winner in
the years to come. Information gained from mining Web data helps you to create and
maintain a site that entices visitors and maximizes the profitability of your customer
relationships. Web mining is simply data mining using Web data. Applying data mining
techniques to data collected from your site enables you to discover patterns and
relationships that would otherwise go undetected. Web mining turns your Web data into
useful insight and intelligence, which describes your site and the people who visit
it.
Data mining is now a mission-critical business process, and the Web provides more data to
mine and more chances to learn about your customers or citizens than ever
before.
Web mining empowers you to:
Improve navigation on your site
Use offline analysis to understand how people navigate your site. Then redesign it. Align
visitor information with site goals and create appropriate content for each visitor type.
Personalize your customer interactions
Use a variety of powerful analytical techniques that scale up to the traffic on your Web
site and receive sophisticated customer intelligence. Use this information to provide
personalized content to your visitors.
Ensure Web site reliability
Predict peaks in Web activity and inventory requirements to ensure your site can support
special promotions or unusually busy times.
How Web mining can work for your organization
Data mining techniques sift through Web data to identify relationships that ordinary
summary reports of page requests and hits cannot uncover. The large volume of data
collected on your Web site makes it difficult to understand what visitor actions affect
your sites performance. Standard Web reporting tools can provide a snapshot of your
online customers, but they only look in one direction backward. These reporting
tools deliver basic information, such as the number of page views and visitors IP
addresses not the type of information needed to determine how visitor behavior
affects your Web profits.
Web mining gives you a unique type and quality of information you need to improve your
site. When you mine Web data, you use forward-looking models to go way beyond static
reports. Information uncovered in these models empowers you to discover what customers
want and predict what they will do.
Discover how you can implement Web mining in your organization and start applying the
information gained to improve customer and citizen relationships. The following sections
describe what data you need to start Web mining and how you can apply it in your
organization and get results.
How you do Web mining
Solve your complex business problems
Before you can begin Web mining, you need to have a clear idea what business problems you
need to solve. For example, if youre an e-tailer, your goal may be to increase
sales. Youll need to understand which visitor behaviors lead to purchases and which
behaviors result in abandoned shopping carts. Or, maybe, you want to ensure that customers
or citizens can find information quickly. Youll need to understand the paths people
take to find information on your site. And, youll need to know what type of data you
need to accomplish these business goals.
Use the best practice approach to data mining, CRISP-DM (CRoss-Industry Standard Process
for Data Mining). CRISP-DM is a comprehensive data mining methodology and process model
that makes large data mining projects faster, more efficient and less costly. CRISP-DM
also gives you provisions to align your business goals with the technical aspects of Web
mining so that you can reach relevant results. For more information on CRISP-DM, visit
www.crisp-dm.org.
Gather the data you need
Accessing and processing the data necessary for reports and analysis is an important,
early step for successful Web mining. Web data comes in a variety of formats, but these
formats fit into two main categories:
· Event data the user and the application or server interact dynamically to
generate this data, which are time-stamped records of user actions
· Non-event data either the Web site generates the data or you collect it
elsewhere (you may also store it externally). Examples of such data include transaction
and customer records, demographic databases and site architecture and topology.
Event data are generally known as Web logs. Web (or server) logs include a number of
different reports that the Web site may generate, such as:
· Access logs logs recording hits or requests, giving time,
requesting host, (possibly) username, the request line and transmission status or size
Agent logs optional logs describing the browser software a visitor used
· Error logs logs detailing a free-form dump of errors
· Referrer logs optional logs describing from-to navigation behavior
(i.e., from URL to URL)
· Cookie logs optional logs describing cookie-keyed interaction between a server
and a visitor. Essentially, these logs show if a person using a computer that previously
requested pages from your site has requested pages again.
· Elf logs extended, administrator-defined logs that contain any combination of
data from the server environment (similar layout to the access log). They provide a useful
way of combining access log, agent log and referrer log data in a single line, thus
alleviating many reconnection problems.
· Application logs data logs from Web-based applications. Applications may be
black box in nature, such as off-the-shelf mail routing programs, or they may
be white box line-of-business applications. These applications record data (as
well as the manner of recording, location, format, etc.) at dramatically different levels.
Typical non-event data are information in databases and structural data:
· Web-based applications use databases widely. Typically, the information such
databases contain is highly structured and relatively low-volume. Customer and order
databases are classic examples.
· Structural data describes the design and layout of the Web application in sufficient
detail, enabling you to infer a connection between such event data as navigation actions
and the model of the site visitor
Pre-process Web data to refine your analysis
Like most types of data used in data mining tasks, systems dont collect Web data
with analysis in mind. To effectively mine this data, you need to pre-process raw Web logs
and other data. Some of the processing activities you need to do include:
· Identifying visitors by aggregating requests and timestamps by combining the
visitors IP number or hostname and agent log information on browsers
· Identifying sessions by applying business rules to timestamps and using structural data
· Tying Web behavior to customers by using cookie logs, elf logs, application logs or
databases to understand visitor behavior across multiple sessions and the relationship
between visitor behavior and key events, like purchasing a particular type of product
This log information could include extended information or it may be different if you base
it on your Web infrastructure. Either way, the concept is similar. The following example
of a raw access log provides some insight into the size of the pre-processing task. Each
line contains a request with information about the visitors IP address (masked in
the example below), a date, a timestamp and requested HTML or GIF file.
One of the first preparation tasks is to assign a session ID number. For example, you can
define a session as a consecutive series of log-entries sorted by IP address, date or
time; you can also divide sessions by changed IPs in the sorted list or by how long an IP
address spends on a given page. Then you can calculate the time a visitor spends on a page
by subtracting the date or times in the log file.
As with any data mining project, the quality and usefulness of your Web mining results
depends on your insight. You will need to define your Web mining goals and apply your
ideas about how visitors use your site. For example, you need to determine reasonable
rules with which to define a visitor session.
How you can use Web mining results
Understand visitors behavior on your Web site
Once you have pre-processed your Web data, you can begin analyzing it to detect patterns.
At this point, many organizations will evaluate Web site metrics using standard reporting
tools. Web mining empowers you to go beyond these measures to provide deeper
under-standing of Web site behavior. Using Web mining, your organization goes beyond what
many others do empowering your organization to learn how visitors interact with
your Web site.
A typical Web mining analysis: clustering sessions to determine groups of usage patterns.
For example, you can group data by the length of visits to a page and purchase activity so
that profiles of different visitor types emerge. These visitors might browse and find what
they want; or they might be repeat visitors who buy a lot or people who only use a portion
of your Web site. Because you can use predictive modeling techniques to group your
visitors, you can deploy these techniques back to your Web site to score new visitors to
determine their likely membership in usage groups defined by your organization. This means
that you can provide appropriate content for these users and meet their specific needs.
The cluster analysis helps us focus on important questions: have we appropriately placed
the content on these sections for this type of user? Or, can we do more to make this a
better online experience, and perhaps sell more in the online mall? How do these visitors
differ from other types of visitors?
Encouraging stickiness
Stickiness is the extent to which visitors tend to stay on a particular Web site. Having a
sticky site means visitors find useful information and browse your site longer.
Performing Web analysis can help increase stickiness. One technique for improving
stickiness is clickstream analysis. Clickstreams are paths that visitors take through a
Web site. Its not enough to simply compare recorded page strings that the site
serves up to your visitors. You need predictive modeling to get a complete picture and
under-stand what makes a site sticky. Web mining enables organizations to examine a set of
events, find strong patterns among a Web sites many clickstreams and predict the
last event in a clickstream whether this event is a page request, purchase or
download.
In all of the clickstreams, users started at the home page, moved to the site map, tried
another section of the Web site before moving back to the site map and eventually left.
Visitors didnt stay on the site very long. In fact, this pattern suggests neither
the home page nor the site map is intuitive enough for a significant number of people to
navigate.
Increase sales using personalization
Finding patterns in Web data can provide a lot of value for understanding your Web
business and making site changes to improve your online presence. But the value of Web
mining holds even greater promise: custom content delivery powered by predictive
models that tailor Web pages and recommendations for each visitor in real time.
Use a variety of modeling techniques for optimum results
To ensure you provide custom content, you need a variety of powerful techniques that scale
up to the traffic on your Web site. Data mining lessons teach us that the accuracy of
different techniques (or combination of techniques) depends on the data. The same is true
for Web mining. Your data may require different techniques than your competitors
data and your data may change over time. You must equip yourself with a variety of
modeling approaches.
|