Asia-Pacific Centre for Research (ACRE), Inc.
   Home Solutions Articles Partners Company Contact   

home > articles > Data warehousing

Data Warehousing: Strategies, Technologies And Techniques Statistical Analysis
Source: www.spss.com
Copyright SPSS, Inc. 2004


Data mining is uncovering the hidden meaning and relationships in the massive amounts of data stored in the data warehouse. In short, the value of the data warehouse lies in the information that can be derived from its data through the mining process. Successful mining of data relies on refining tools and techniques capable of rendering large quantities of data understandable and meaningful. Since its creation in the 18th century, statistics have served this purpose, providing the mathematical tools and analytic techniques for dealing with large amounts of data. Today, as we are confronted with increasingly large volumes of data, statistics are, more than ever, a critical component of the data mining and refining toolkit that facilitates making effective business decisions. What are statistics and why use them?

Way of thinking
Statistics is a general method of reasoning from data. It is a basic approach shared by people in today’s society to draw conclusions and make decisions in business and in life. It lets us communicate effectively about a wide range of topics from sales performance to product quality to operational efficiency. Statistics is the way that we “reason effectively about data and chance in everyday life.” The goal of statistical analysis is to gain insight through numbers. We will consider four important aspects of statistics: developing good data, strategies for exploring data and drawing conclusions from the data and presenting your results.

Producing data
You will have a wealth of data in the warehouse and available from outside sources. There are important concepts to consider in selecting the data you actually use in your analysis. These concepts are: sampling, experimentation and measurement. They are important because the efficiency and accuracy of your analysis – and therefore your ability to draw useful conclusions in a timely manner – are dependent on the quality of the data reflecting the business situation.

Exploring data
Exploring data is important for understanding the quality of the data in the warehouse and to begin looking for areas to mine for information. Exploring data will tell you if most of the observations are missing or will indicate if the measurements are suspect because of extreme variability. In effect, exploratory data analysis gives you a “feel” for the data and will help uncover possible directions the analysis can go. Just as the mining company explores the terrain looking for the place to put a mine with the highest likelihood of success, so too does the data miner need to gain a sense of where the key relationships are in the data. Probably equally important, exploring data will serve to highlight any problems inherent in the database in terms of inaccurate or missing data.

The first step in data analysis must be exploring it to see overall patterns and extreme exceptions to the patterns. This is best done by graphing the data and visually identifying the patterns and the number of exceptions. In exploring data we typically look at each variable separately starting with basic counts and percentages which tell us the number and proportion of measures at each level. Then we look at the distributions of the data using charts like histograms, dot plots, boxplots, line charts and others. We also look at some measure of the data that describe various characteristics of the data in terms of average, variability and distribution.

Descriptive statistics include the following measures:
¨ Mean arithmetic average of the values
¨ Median the midpoint of values
¨ Mode the most frequent value
¨ Percentiles breaking the numbers in to groups by percentage of values above and below
¨ Variance average deviation of observations from the mean
¨ Standard deviation the spread of values around the mean

Drawing conclusions from data
Statistics are concerned with finding relationships between variables. Once one has mined to an area with an interesting relationship, statistics provide the additional tools to “refine” the data into an understanding of the strength of the relationship and the factors that cause the relationship. For example order values and sales lead sources are interesting characteristics to measure and summarize. But order value and sales lead source for the same order give us significantly more information than either measure alone. When we have the source and the value of orders linked, we can look for associations between the source and value which will lead us to evaluating higher promotion spending on the sources which bring the most high value orders or possibly on the sources which bring the highest total revenue even if it is booked as smaller transactions in higher volume. 

Applications of statistics in data mining
Statistical analysis is the secret weapon of many successful businesses today. It is the essential tool for mining the data you have, refining the data into useful information and for leading you to other data you might want to acquire. Businesses who effectively employ statistical analysis can increase revenues, cut costs, improve operating efficiency and improve customer satisfaction. They can more accurately identify problems and opportunities and understand their causes so that they can more quickly act to eliminate threats or capitalize on opportunities.



Home | Solutions | Articles | Partners | Company | Contact

The Software Marketing and Applications Company

 

"Analytical tools can improve organizational efficiency, sales, and profits"


Design by www.webphil.com