Business Statistics Business Statistics Business Statistics:
Revealing Facts from Figures
URL for this site is:
http://ubmail.ubalt.edu/~harsham/Business-stat/opre504.htm
This Web site is a course in statistics appreciation; i.e., acquiring a feeling for the statistical way of thinking. It is an introductory course in statistics that is designed to provide you with the basic concepts and methods of statistical analysis for processes and products. Materials in this Web site are tailored to meet your needs in making business decisions and they foster statistical thinking. The cardinal objective for this Web site is to increase the extent to which statistical thinking is merged with managerial thinking for decision making under uncertainty.
MENU
- Introduction
- Towards Statistical Thinking
- Probability for Statistical Inference
- Topics in Business Statistics
- Interesting and Useful Sites
- Companion site I: Time Series Analysis and Forecasting Techniques
- Companion site II: Computers and Computational Statistics
- Companion site III: Questionnaire Design and Surveys Sampling
- Companion site IV: Probabilistic Modeling Process: Calculable Risky Decision-Making
- Companion site V: Excel For Introductory Statistical Analysis
- Companion site VI: Statistical Books List
To search the site, try Edit | Find in page [Ctrl + f]. Enter a word or phrase in the dialogue box, e.g. "parameter" or "probability" If the first appearance of the word/phrase is not what you are looking for, try Find Next.
Introduction
Towards Statistical Thinking For Decision Making Under Uncertainties
The Birth of Statistics
Different Schools of Thought in Statistics
Bayesian, Frequentist, and Classical Methods
What is Business Statistics
Belief, Opinion, and Fact
Kinds of Lies: Lies, Damned Lies and Statistics
Probability for Statistical Inference
Probability, Chance, Likelihood, and Odds
How to Assign Probabilities
General Laws of Probability
Mutually Exclusive versus Independent Events
Entropy Measure
Applications of and Conditions for Using Statistical Tables
Relationships among Distributions and Unification of Statistical Tables
- Normal Distribution
- Binomial Distribution
- Poisson Distribution
- Exponential Distribution
- Uniform Distribution
- Student's t-Distribution
- Chi-square Distribution
Topics in Business Statistics
Greek Letters Commonly Used in Statistics
Type of Data and Levels of Measurement
Sampling Methods
Histogramming: Cecking the Homogeneity of Population
How to Construct a Box Plot
Outlier Removal
Statistical Summaries
- Representative of a Sample: Measures of Central Tendency
- Selecting Among the Mean, Median, and Mode
- Quality of a Sample: Measures of Dispersion
- Guess a Distribution to Fit Your Data: Skewness & Kurtosis
- Computation of Descriptive Statistics for Grouped/Ungrouped Data
- A Numerical Example & Discussions
- Multinomial Distributions: Expected Value, Variance, Standard Deviation, & Coefficient of Variation
What is so important About the Normal Distributions
What Is a Sampling Distribution
What Is Central Limit Theorem
What Is "Degrees of Freedom"
Parameters' Estimation and Quality of a 'Good' Estimate
Procedures for Statistical Decision Making
Statistics with Confidence and Determining Sample Size
Hypothesis Testing: Rejecting a Claim
The Classical Approach to the Test of Hypotheses
The Meaning and Interpretation of P-values (what the data say)
Blending the Classical and the P-value Based Approaches in Test of Hypotheses
Conditions Under Which Most Statistical Testing ApplyStatistical Tests for Equality of Populations Characteristics
- Homogeneous Population (Don't mix apples and oranges)
- Test for Randomness: The Runs Test
- Test for Normality
Power of a Test
- Two-Population Independent Means (T-test)
- Two Dependent Means (T-test for paired data sets)
- More Than Two Independent Means (ANOVA)
- More Than Two Dependent Means (ANOVA)
Parametric vs. Non-Parametric vs. Distribution-free Tests
Chi-square Tests
Bonferroni Method
Goodness-of-fit Test for Discrete Random Variables
When We Should Pool Variance Estimates
Resampling Techniques: Jackknifing, and Bootstrapping
What is a Linear Least Squares Model
Pearson's and Spearman's Correlation
How to Compare Two Correlation Coefficients
Covariance and Correlation
Independence vs. Correlated
Correlation, and Level of Significance
Regression Analysis: Planning, Development, and Maintenance
Predicting Market Response
Warranties: Statistical Planning and Analysis
Factor Analysis
Interesting and Useful Sites (topical category)
Review of Statistical Tools on the Internet
General References
Statistical Societies & Organizations
Statistics References
Statistics Resources
Statistical Data Analysis
Probability Resources
Data and Data Analysis
Computational Probability and Statistics Resources
Questionnaire Design, Surveys Sampling and Analysis
Statistical Software
Learning Statistics
Econometric and Forecasting
Selected Topics
Glossary Collections Sites
Statistical Tables
Introduction
Today's business decisions are driven by data. In all aspects of our lives, and importantly in the business context, an amazing diversity of data is available for inspection and enlightenment. Moreover, business managers and professionals are increasingly encouraged to justify decisions on the basis of data.
Business managers need statistical model-based decision support systems. Statistical skills enable you to intelligently collect, analyze and interpret data relevant to their decision-making. Statistical concepts and statistical thinking enable you:
- to solve problems in a diversity of contexts
- to add substance to decisions.
This Web site is a course in statistics appreciation, i.e. to acquire a feel for the statistical way of thinking. Appreciation of statistics is wonderful: it makes what is excellent in statistical thinking belongs to you as well. It is an introductory course in statistics that is designed to provide you with the basic concepts, and methods of statistical analysis for processes and products. Materials in this Web site are tailored to meet your needs in making business decision and it promotes one to think statistically. The cardinal objective for this Web site is to increase the extent in which statistical thinking is embedded with managerial thinking for decision making under uncertainties. It is already a known fact that "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write." So, let's be ahead of our time. In competitiveness business managers must design quality into products, and the processes of making the products. They must facilitate a process of never-ending improvement at all stages of manufacturing and service. This is a strategy employing statistical methods, particularly statistically designed experiments, produces processes that provide high yield and products that seldom fail. Moreover, it facilitates development of robust products that are insensitive to changes in the environment and internal component variation. Carefully planned statistical studies remove hindrances to high quality and productivity at every stage of production. This saves time and money. It is well recognized that quality must be engineered into products as early as possible in the design process. One must know how to use carefully planned, cost-effective experiments to improve, optimize and make robust products and processes.
Business Statistics is a science assisting you to make business decisions under uncertainties based on some numerical and measurable scales. Decision making process must be based on data neither on personal opinion nor on belief.
The Devil is in the Deviations: Variation is inevitability in life!Every process, Every measurement, Every sample has variation. Managers need to understand variation for two key reasons. First, so that they can lead others to apply statistical thinking in day to day activities and secondly, to apply the concept for the purpose of continuous improvement. This course will provide you with hands-on experience to promote the use of statistical thinking and techniques to apply them to make educated decisions, whenever you encounter variation in business data. You will learn techniques to intelligently assess and manage the risks inherent in decision-making. Therefore, remember that:
Just like weather, if you cannot control something, you should learn how to measure and analyze, in order to predict it, effectively.
If you have taken statistics before, and have a feeling of inability to grasp concepts, it is largely due to your former non-statistician instructors teaching statistics. Their deficiencies lead students to develop phobias for the sweet science of statistics. In this respect, Professor Herman Chernoff made the following remark in Statistical Science, Vol. 11, No. 4, 335-350, 1996:
Plugging numbers in the formulas and crunching them has no value by themselves. You should continue to put effort into the concepts and concentrate on interpreting the results.
Even, when you solve a small size problem by hand, I would like you to use the available computer software and Web-based computation to do the dirty work for you.
You must be able to read off the logical secrete in any formulas not memorizing them. For example, in computing the variance, consider its formula. Instead of memorizing, you should start with some why:
i. Why we square the deviations from the mean.
Because, if we add up all deviations we get always zero value. So, to deal with this problem, we square the deviations. Why not raising to the power of four (three will not work)? Squaring does the trick why should we make life more complicated than it is? Notice; also that, squaring also magnifies the deviations, therefore it works to our advantage to measure the quality of the data.ii. Why there is a summation notation in the formula.
To add up the squared deviation of each data point to compute the total sum of squared deviations.iii. Why do we divide the sum of squares by n-1.
The amount of deviation should reflects also how large is the sample, so we must bring in the sample size. That is, in general larger sample size have larger sum of square deviation from the mean, Okay. Why n-1 not n. The reason for it is that when you divide by n-1 the sample's variance provide an estimated variance much closer to the population variance, than when you divide by n, on average. You note that for large sample size n (say over 30) it really does not matter whether it is divided by n or n-1. The results are almost the same and they are acceptable. The factor n-1 is what we consider as the "degrees of freedom".This example shows how to question statistical formulas rather, than memorizing them. If fact when you try to understand the formulas you do not need to remember them, they are parts of your brain connectivity. Clear thinking is always more important than the ability to do a lot of arithmetic.
When you look at a statistical formula the formula should talk to you, as when a musician looks at a piece of musical-notes he/she hears the music. How to become a statistician who is also a musician?
A course in appreciation of statistical thinking gives business professionals an edge. Professionals with strong quantitative skills are in demand. This phenomenon will grow as the impetus for data-based decisions strengthens and the amount and availability of data increases. The statistical toolkit can be developed and enhanced at all stages of a career.
Decision making process under uncertainty is largely based on application of statistics for probability assessment of uncontrollable events (or factors), as well as risk assessment of your decision. The original idea of statistics was the collection of information about and for the State. Probability has much longer history. Probability is derived from the verb to probe meaning to "find out" what is not too easily accessible or understandable. The word "proof" has the same origin that provides necessary details to understand what is claimed to be true.
The main objective for this course is to learn statistical thinking; to emphasize more on concepts, and less theory and fewer recipes, and finally to foster active learning using, e.g., the useful and interesting Web-sites.
Further Readings:
Churchman C., The Design of Inquiring Systems, Basic Books, New York, 1971. Early in the book he stated that knowledge can be considered as a collection of information, or as an activity, or as a potential. He also noted that knowledge resides in the user and not in the collection.
Some Topics in Business Statistics
Greek Letters Commonly Used as Statistical Notations
We use Greek letters in statistics and other scientific areas to honor the ancient Greek philosophers who invented science (such as Socrates, the inventor of dialectic reasoning).
Greek Letters Commonly Used as Statistical Notations alpha beta ki-sqre delta mu nu pi rho sigma tau theta a b c 2 d m n p r s t q
Note: ki-square (ki-sqre, Chi-square), c2, is not the square of anything, its name imply Chi-square (read, ki-square). Ki does not exist in statistics. I'm glad that you're overcoming all the confusions that exist in learning statistics.
The Birth of Statistics
The original idea of "statistics" was the collection of information about and for the "State".The birth of statistics occurred in mid-17th century. A commoner, named John Graunt, who was a native of London, begin reviewing a weekly church publication issued by the local parish clerk that listed the number of births, christenings, and deaths in each parish. These so called Bills of Mortality also listed the causes of death. Graunt who was a shopkeeper organized this data in the forms we call descriptive statistics, which was published as Natural and Political Observation Made upon the Bills of Mortality. Shortly thereafter, he was elected as a member of Royal Society. Thus, statistics has to borrow some concepts from sociology, such as the concept of "Population". It has been argued that since statistics usually involves the study of human behavior, it cannot claim the precision of the physical sciences.
Probability has much longer history. It originated from the study of games of chance and gambling during the sixteenth century. Probability theory was a branch of mathematics studied by Blaise Pascal and Pierre de Fermat in the seventeenth century. Currently; in 21st century, probabilistic modeling are used to control the flow of traffic through a highway system, a telephone interchange, or a computer processor; find the genetic makeup of individuals or populations; quality control; insurance; investment; and other sectors of business and industry.
New and ever growing diverse fields of human activities are using statistics; however, it seems that this field itself remains obscure to the public. Professor Bradley Efron expressed this fact nicely:
For the history of probability, and history of statistics, visit History of Statistics Material. I also recommend the following books.
Further Readings:
Daston L., Classical Probability in the Enlightenment, Princeton University Press, 1988.
The book points out that early Enlightenment thinkers could not face uncertainty. A mechanistic, deterministic machine, was the Enlightenment view of the world.Gillies D., Philosophical Theories of Probability, Routledge, 2000. Covers the classical, logical, subjective, frequency, and propensity views.
Hacking I., The Emergence of Probability, Cambridge University Press, London, 1975.
A philosophical study of early ideas about probability, induction and statistical inference.Peters W., Counting for Something: Statistical Principles and Personalities, Springer, New York, 1987.
It teaches the principles of applied economic and social statistics in a historical context. Featured topics include public opinion polls, industrial quality control, factor analysis, Bayesian methods, program evaluation, non-parametric and robust methods, and exploratory data analysis.Porter T., The Rise of Statistical Thinking, 1820-1900, Princeton University Press, 1986.
The author states that statistics has become known in the twentieth century as the mathematical tool for analyzing experimental and observational data. Enshrined by public policy as the only reliable basis for judgments as the efficacy of medical procedures or the safety of chemicals, and adopted by business for such uses as industrial quality control, it is evidently among the products of science whose influence on public and private life has been most pervasive. Statistical analysis has also come to be seen in many scientific disciplines as indispensable for drawing reliable conclusions from empirical results.This new field of mathematics found so extensive a domain of applications.Stigler S., The History of Statistics: The Measurement of Uncertainty Before 1900, U. of Chicago Press, 1990. It covers the people, ideas, and events underlying the birth and development of early statistics.
Tankard J., The Statistical Pioneers, Schenkman Books, New York, 1984.
This work provides the detailed lives and times of theorists whose work continues to shape much of the modern statistics.
Different Schools of Thought in Statistics
There are few different schools of thoughts in statistics. They are introduced sequentially in time by necessity.The Birth Process of a New School of Thought
The process of devising a new school of thought in any field has always taken a natural path. Birth of new schools of thought in statistics is not an exception. The birth process is outlined below:
Given an already established school, one must work within the defined framework.
A crisis appears, i.e., some inconsistencies in the framework result from its own laws.
Response behavior:
- Reluctance to consider the crisis.
- Try to accommodate and explain the crisis within the existing framework.
- Conversion of some well-known scientists attracts followers in the new school.
The perception of a crisis in statistical community calls forth demands for "foundation-strengthens". After the crisis is over, things may look different and historians of statistics may cast the event as one in a series of steps in "building upon a foundation". So we can read histories of statistics, as the story of a pyramid built up layer by layer on a firm base over time.
Other schools of thought are emerging to extend and "soften" the existing theory of probability and statistics. Some "softening" approaches utilize the concepts and techniques developed in the fuzzy set theory, the theory of possibility, and Dempster-Shafer theory.
The following Figure illustrates the three major schools of thought; namely, the Classical (attributed to Laplace), Relative Frequency (attributed to Fisher), and Bayesian (attributed to Savage). The arrows in this figure represent some of the main criticisms among Objective, Frequentist, and Subjective schools of thought. To which school do you belong? Read the conclusion in this figure.
Bayesian, Frequentist, and Classical Methods
The problem with the Classical Approach is that what constitutes an outcome is not objectively determined. One person's simple event is another person's compound event. One researcher may ask, of a newly discovered planet, "what is the probability that life exists on the new planet?" while another may ask "what is the probability that carbon-based life exists on it?"Bruno de Finetti, in the introduction to his two-volume treatise on Bayesian ideas, clearly states that "Probabilities Do not Exist". By this he means that probabilities are not located in coins or dice; they are not characteristics of things like mass, density, etc.
Some Bayesian approaches consider probability theory as an extension of deductive logic to handle uncertainty. It purports to deduce from first principles the uniquely correct way of representing your beliefs about the state of things, and updating them in the light of the evidence. The laws of probability have the same status as the laws of logic. These Bayesian approaches are explicitly "subjective" in the sense that they deal with the plausibility which a rational agent ought to attach to the propositions he/she considers, "given his/her current state of knowledge and experience." By contrast, at least some non-Bayesian approaches consider probabilities as "objective" attributes of things (or situations) which are really out there (availability of data).
A Bayesian and a classical statistician analyzing the same data will generally reach the same conclusion. However, the Bayesian is better able to quantify the true uncertainty in his analysis, particularly when substantial prior information is available. Bayesians are willing to assign probability distribution function(s) to the population's parameter(s) while frequentists are not.
From a scientist's perspective, there are good grounds to reject Bayesian reasoning. The problem is that Bayesian reasoning deals not with objective, but subjective probabilities. The result is that any reasoning using a Bayesian approach cannot be publicly checked -- something that makes it, in effect, worthless to science, like non replicative experiments.
Bayesian perspectives often shed a helpful light on classical procedures. It is necessary to go into a Bayesian framework to give confidence intervals the probabilistic interpretation which practitioners often want to place on them. This insight is helpful in drawing attention to the point that another prior distribution would lead to a different interval.
A Bayesian may cheat by basing the prior distribution on the data; a Frequentist can base the hypothesis to be tested on the data. For example, the role of a protocol in clinical trials is to prevent this from happening by requiring the hypothesis to be specified before the data are collected. In the same way, a Bayesian could be obliged to specify the prior in a public protocol before beginning a study. In a collective scientific study, this would be somewhat more complex than for Frequentist hypotheses because priors must be personal for coherence to hold.
A suitable quantity that has been proposed to measure inferential uncertainty; i.e., to handle the a priori unexpected, is the likelihood function itself.
If you perform a series of identical random experiments (e.g., coin tosses), the underlying probability distribution that maximizes the probability of the outcome you observed is the probability distribution proportional to the results of the experiment.
This has the direct interpretation of telling how (relatively) well each possible explanation (model), whether obtained from the data or not, predicts the observed data. If the data happen to be extreme ("atypical") in some way, so that the likelihood points to a poor set of models, this will soon be picked up in the next rounds of scientific investigation by the scientific community. No long run frequency guarantee nor personal opinions are required.
There is a sense in which the Bayesian approach is oriented toward making decisions and the frequentist hypothesis testing approach is oriented toward science. For example, there may not be enough evidence to show scientifically that agent X is harmful to human beings, but one may be justified in deciding to avoid it in one's diet.
Since the probability (or the distribution of possible probabilities) is continuous, the probability that the probability is any specific point estimate is really zero. This means that in a vacuum of information, we can make no guess about the probability. Even if we have information, we can really only guess at a range for the probability.
For more information, visit the Web sites Bayesian Inference for the Physical Sciences, Bayesians vs. Non-Bayesians, Society for Bayesian Analysis, Probability Theory As Extended Logic, and Bayesians worldwide.
Further Readings:
Corfield D., and J. Williamson, Foundations of Bayesianism, Kluwer Academic Publishers, 2001. Contains Logic, Mathematics, Decision Theory, and Criticisms of Bayesianism.
Land F., Operational Subjective Statistical Methods, Wiley, 1996. Presents a systematic treatment of subjectivist methods along with a good discussion of the historical and philosophical backgrounds of the major approaches to probability and statistics.
Plato, Jan von, Creating Modern Probability, Cambridge University Press, 1994. This book provides a historical point of view on subjectivist and objectivist probability school of thoughts.
Weatherson B., Begging the question and Bayesians, Studies in History and Philosophy of Science, 30(4), 687-697, 1999.
Zimmerman H., Fuzzy Set Theory, Kluwer Academic Publishers, 1991. Fuzzy logic approaches to probability (based on L.A. Zadeh and his followers) present a difference between "possibility theory" and probability theory.
What is Business Statistics?
In this diverse world of ours, no two things are exactly the same. A statistician is interested in both the differences and the similarities, i.e. both patterns and departures.The actuarial tables published by insurance companies reflect their statistical analysis of the average life expectancy of men and women at any given age. From these numbers, the insurance companies then calculate the appropriate premiums for a particular individual to purchase a given amount of insurance.
Exploratory analysis of data makes use of numerical and graphical techniques to study patterns and departures from patterns. The widely used descriptive statistical techniques are: Frequency Distribution Histograms; Box & Whisker and Spread plots; Normal plots; Cochrane (odds ratio) plots; Scattergrams and Error Bar plots; Ladder, Agreement and Survival plots; Residual, ROC and diagnostic plots; and Population pyramid. Graphical modeling is a collection of powerful and practical techniques for simplifying and describing inter-relationships between many variables, based on the remarkable correspondence between the statistical concept of conditional independence and the graph-theoretic concept of separation.
The controversial "Million Man March on Washington", was in 1995 demonstrated the size of a rally can have important political consequences. March organizers steadfastly maintained the official attendance estimates offered by the U. S. Park Service (300,000) were too low.
In examining distributions of data, you should be able to detect important characteristics, such as shape, location, variability, and unusual values. From careful observations of patterns in data, you can generate conjectures about relationships among variables. The notion of how one variable may be associated with another permeates almost all of statistics, from simple comparisons of proportions through linear regression. The difference between association and causation must accompany this conceptual development.
Data must be collected according to a well-developed plan if valid information on a conjecture is to be obtained. The plan must identify important variables related to the conjecture, and specify how they are to be measured. From the data collection plan, a statistical model can be formulated from which inferences can be drawn.
Statistical models are currently used in various fields of business and science. However, the terminology differs from field to field. For example, the fitting of models to data, called calibration, history matching, and data assimilation, are all synonymous with parameter estimation.
Data is known to be crude information and not knowledge by itself. The sequence from data to knowledge is: from Data to Information, from Information to Facts, and finally, from Facts to Knowledge. Data becomes information, when it becomes relevant to your decision problem. Information becomes fact, when the data can support it. Fact becomes knowledge, when it is used in the successful completion of decision process. Once you have a massive amount of facts integrated as knowledge, then your mind will be superhuman in the same sense that mankind with writing is superhuman compared to mankind before writing. The following figure illustrates the statistical thinking process based on data in constructing statistical models for decision making under uncertainties.
![]()
That's why we need Business Statistics. Statistics arose from the need to place knowledge on a systematic evidence base. This required a study of the laws of probability, the development of measures of data properties and relationships, and so on.
The main objective of Business Statistics is to make inference (prediction, making decisions) about certain characteristics of a population based on information contained in a random sample from the entire population, as depicted below:
Business Statistics is the science of ‘good' decision making in the face of uncertainty and is used in many disciplines such as financial analysis, econometrics, auditing, production and operations including services improvement, and marketing research. It provides knowledge and skills to interpret and use statistical techniques in a variety of business applications. A typical Business Statistics course is intended for business majors, and covers statistical study, descriptive statistics (collection, description, analysis, and summary of data), probability, and the binomial and normal distributions, test of hypotheses and confidence intervals, linear regression, and correlation.
![]()
The following discussion refers to the above chart. Statistics is a science of making decisions with respect to the characteristics of a group of persons or objects on the basis of numerical information obtained from a randomly selected sample of the group. Statisticians refer to this numerical observation a realization of a random sample. However, notice that one cannot see a random sample. A random sample is only a sample of a finite outcomes of a random process.
At the planning stage of a statistical investigation the question of sample size (n) is critical. This course provides a practical introduction to sample size determination in the context of some commonly used significance tests.
Population: A population is any entire collection of people, animals, plants or things from which we may collect data. It is the entire group we are interested in, which we wish to describe or draw conclusions about. In the above figure the life of the light bulbs manufactured say by GE, is the concerned population.
Statistical Experiment
In order to make any generalization about a population, a random sample from the entire population, that is meant to be representative of the population, is often studied. For each population there are many possible samples. A sample statistic gives information about a corresponding population parameter. For example, the sample mean for a set of data would give information about the overall population mean m .
It is important that the investigator carefully and completely defines the population before collecting the sample, including a description of the members to be included.
Example: The population for a study of infant health might be all children born in the U.S.A. in the 1980's. The sample might be all babies born on 7th May in any of the years.
An experiment is any process or study which results in the collection of data, the outcome of which is unknown. In statistics, the term is usually restricted to situations in which the researcher has control over some of the conditions under which the experiment takes place.
Example: Before introducing a new drug treatment to reduce high blood pressure, the manufacturer carries out an experiment to compare the effectiveness of the new drug with that of one currently prescribed. Newly diagnosed subjects are recruited from a group of local general practices. Half of them are chosen at random to receive the new drug, the remainder receive the present one. So, the researcher has control over the type of subject recruited and the way in which they are allocated to treatment.
Experimental (or Sampling) Unit: A unit is a person, animal, plant or thing which is actually studied by a researcher; the basic objects upon which the study or experiment is carried out. For example, a person; a monkey; a sample of soil; a pot of seedlings; a zip code area; a doctor's practice.
Design of experiments is a key tool for increasing the rate of acquiring new knowledge --- knowledge that in turn can be used to gain competitive advantage, shorten the product development cycle, and produce new products and processes which will meet and exceed your customer's expectations.
The major task of statistics is to study the characteristics of populations whether these populations are people, objects, or collections of information. For two major reasons, it is often impossible to study an entire population:
The process would be too expensive or time consuming. The process would be destructive. In either case, we would resort to looking at a sample chosen from the population and trying to infer information about the entire population by only examining the smaller sample. Very often the numbers which interest us most about the population are the mean m and standard deviation s. Any number -- like the mean or standard deviation -- which is calculated from an entire population, is called a Parameter. If the very same numbers are derived only from the data of a sample, then the resulting numbers are called Statistics. Frequently, parameters are represented by Greek letters and statistics by Latin letters (as shown in the above Figure). The step function in this figure is the Empirical Distribution Function (EDF), known also as Ogive, which is used to graph cumulative frequency. An EDF is constructed by placing a point corresponding to the middle point of each class at a height equal to the cumulative frequency of the class. EDF represents the distribution function Fx.
Parameter
A parameter is a unknown value, and therefore it has to be estimated. Parameters are used to represent a certain population characteristic. For example, the population mean m is a parameter that is often used to indicate the average value of a quantity.Within a population, a parameter is a fixed value that does not vary. Each sample drawn from the population has its own value of any statistic that is used to estimate this parameter. For example, the mean of the data in a sample is used to give information about the overall mean m in the population from which that sample was drawn.
Statistic: A statistic is a quantity that is calculated from a sample of data. It is used to give information about unknown values in the corresponding population. For example, the average of the data in a sample is used to give information about the overall average in the population from which that sample was drawn.
It is possible to draw more than one sample from the same population and the value of a statistic will in general vary from sample to sample. For example, the average value in a sample is a statistic. The average values in more than one sample, drawn from the same population, will not necessarily be equal.
Statistics are often assigned Roman letters (e.g.
and s), whereas the equivalent unknown values in the population (parameters ) are assigned Greek letters (e.g., µ, s).
The word estimate means to esteem, that is giving a value to something. A statistical estimate is an indication of the value of an unknown quantity based on observed data.
More formally, an estimate is the particular value of an estimator that is obtained from a particular sample of data and used to indicate the value of a parameter.
Example: Suppose the manager of a shop wanted to know m , the mean expenditure of customers in her shop in the last year. She could calculate the average expenditure of the hundreds (or perhaps thousands) of customers who bought goods in her shop, that is, the population mean m . Instead she could use an estimate of this population mean m by calculating the mean of a representative sample of customers. If this value were found to be $25, then $25 would be her estimate.
There are two broad subdivisions of statistics: Descriptive statistics and Inferential statistics.
The principal descriptive quantity derived from sample data is the mean (
), which is the arithmetic average of the sample data. It serves as the most reliable single measure of the value of a typical member of the sample. If the sample contains a few values that are so large or so small that they have an exaggerated effect on the value of the mean, the sample is more accurately represented by the median -- the value where half the sample values fall below and half above.
The quantities most commonly used to measure the dispersion of the values about their mean are the variance s2 and its square root , the standard deviation s. The variance is calculated by determining the mean, subtracting it from each of the sample values (yielding the deviation of the samples), and then averaging the squares of these deviations. The mean and standard deviation of the sample are used as estimates of the corresponding characteristics of the entire group from which the sample was drawn. They do not, in general, completely describe the distribution (Fx) of values within either the sample or the parent group; indeed, different distributions may have the same mean and standard deviation. They do; however, provide a complete description of the Normal Distribution, in which positive and negative deviations from the mean are equally common and small deviations are much more common than large ones. For a normally distributed set of values, a graph showing the dependence of the frequency of the deviations upon their magnitudes is a bell-shaped curve. About 68 percent of the values will differ from the mean by less than the standard deviation, and almost 100 percent will differ by less than three times the standard deviation.
Statistical inference refers to extending your knowledge obtained from a random sample from the entire population to the whole population. This is known in mathematics as Inductive Reasoning. That is, knowledge of the whole from a particular. Its main application is in hypotheses testing about a given population.
Inferential statistics is concerned with making inferences from samples about the populations from which they have been drawn. In other words, if we find a difference between two samples, we would like to know, is this a "real" difference (i.e., is it present in the population) or just a "chance" difference (i.e. it could just be the result of random sampling error). That's what tests of statistical significance are all about.
Statistical inference guides the selection of appropriate statistical models. Models and data interact in statistical work. Models are used to draw conclusions from data, while the data are allowed to criticize, and even falsify the model through inferential and diagnostic methods. Inference from data can be thought of as the process of selecting a reasonable model, including a statement in probability language of how confident one can be about the selection.
Inferences made in statistics are of two types. The first is estimation, which involves the determination, with a possible error due to sampling, of the unknown value of a population characteristic, such as the proportion having a specific attribute or the average value m of some numerical measurement. To express the accuracy of the estimates of population characteristics, one must also compute the "standard errors" of the estimates; these are margins that determine the possible errors arising from the fact that the estimates are based on random samples from the entire population and not on a complete population census. The second type of inference is hypothesis testing. It involves the definitions of a "hypothesis" as one set of possible population values and an "alternative," a different set. There are many statistical procedures for determining, on the basis of a sample, whether the true population characteristic belongs to the set of values in the hypothesis or the alternative.
The statistical inference is grounded in probability, idealized concepts of the group under study, called the population, and the sample. The statistician may view the population as a set of balls from which the sample is selected at random, that is, in such a way that each ball has the same chance as every other one for inclusion in the sample.
Notice that to be able to estimate the population parameters, the sample size n must be greater than one. For example, with a sample size of one the variation (s2) within the sample is 0/1 = 0. An estimate for the variation (s2) within the population would be 0/0, which is indeterminate quantity, meaning impossible. For working with zero correctly, visit the Web site The Zero Saga & Confusions With Numbers.
Probability (means, probing for unknowns) is the tool used for anticipating what the distribution of data should look like under a given model. Random phenomena are not haphazard: they display an order that emerges only in the long run and is described by a distribution. The mathematical description of variation is central to statistics. The probability required for statistical inference is not primarily axiomatic or combinatorial, but is oriented toward describing data distributions.
Statistics is a tool that enables us to impose order on the disorganized cacophony of the real world of modern society. The business world has grown both in size and competition. Corporations must perform risky businesses, hence the growth in popularity and need for business statistics.
Business statistics has grown out of the art of constructing charts and tables! It is a science of basing decisions on numerical data in the face of uncertainty.
Business statistics is a scientific approach to decision making under risk. In practicing business statistics, we search for an insight, not the solution. Our search is for the one solution that meets all the business's needs with the lowest level of risk. Business statistics can take a normal business situation and with the proper data gathering, analysis, and re-search for a solution, turn it into an opportunity.
While business statistics cannot replace the knowledge and experience of the decision maker, it is a valuable tool that the manager can employ to assist in the decision making process in order to reduce the inherent risk.
Business Statistics provides justifiable answers to the following concerns for every consumer and producer:
- What is your or your customer's Expectation of the product/service you buy or that you sell? That is, what is a good estimate for m ?
- Given the information about your or your customer's expectation, what is the Quality of the product/service you buy or you sell. That is, what is a good estimate for s ?
- Given the information about your or your customer's expectation, and the quality of the product/service you buy or you sell, does the product/service Compare with other existing similar types? That is, comparing several m 's.
Visit also the following Web sites:
How to Study Statistics
Decision Analysis
Kinds of Lies: Lies, Damned Lies and Statistics
"There are three kinds of lies -- lies, damned lies, and statistics." quoted in Mark Twain's autobiography.It is already an accepted fact that "Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write."
The following are some examples as how statistics could be misused in advertising, which can be described as the science of arresting human unintelligence long enough to get money from it. The founder of Revlon says "In factory we make cosmetics; in the store we sell hope."
In most cases, the deception of advertising is achieved by omission:
- The Incredible Expansion Toyota: "How can it be that an automobile that's a mere nine inches longer on the outside give you over two feet more room on the inside? May be it's the new math!" Toyota Camry Ad.
Where is the fallacy in this statement? Taking volume as length! For example : 3x6x4=72 feet (cubic), 3x6x4.75=85.5 feet (cubic). It could be even more than 2 feet!
- Pepsi Cola Ad.: " In recent side-by-side blind taste tests, nationwide, more people preferred Pepsi over Coca-Cola".
The questions are: Was it just some of taste tests, what was the sample size? It does not say "In all recent…"
- Correlation? Consortium of Electric Companies Ad. "96% of streets in the US are under-lit and, moreover, 88% of crimes take place on under-lit streets".
- Dependent or Independent Events? "If the probability of someone carrying a bomb on a plane is .001, then the chance of two people carrying a bomb is .000001. Therefore, I should start carrying a bomb on every flight."
- Paperboard Packaging Council's concerns: "University studies show paper milk cartons give you more vitamins to the gallon."
How was the design of experiment? The research was sponsored by the council! Paperboard sales is declining!
- All the vitamins or just one? "You'd have to eat four bowls of Raisin Bran to get the vitamin nutrition in one bowl of Total".
- Six Times as Safe: "Last year 35 people drowned in boating accidents. Only 5 were wearing life jackets. The rest were not. Always wear life jacket when boating".
What percentage of boaters wear life jackets? Conditional probability.
- A Tax Accountant Firm Ad.: "One of our officers would accompany you in the case of Audit".
This sounds like a unique selling proposition, but it conceals the fact that the statement is a US Law.
- Dunkin Donuts Ad.: "Free 3 muffins when you buy three at the regular 1/2 dozen price."
There have been many other usual misuses of statistics: dishonest and/or ignorant survey methods, loaded survey questions, graphs and picto-grams that suppress that which is not in the "proof program," and survey respondents who are the autos select because they have an axe to grind about the issue; very interesting stuff, and, of course, those amplifying that which the data really minimizes.
References and Further Readings:
Dewdney A., 200% of Nothing, John Wiley, 1993. Based on his articles about math abuse in Scientific American, Dewdney lists the many ways we are manipulated with fancy mathematical footwork and faulty thinking in print ads, the news, company reports and product labels. He shows how to detect the full range of math abuses and defend against them.Schindley W., The Informed Citizen: Argument and Analysis for Today, Harcourt Brace, 1996. This rhetoric/reader explores the study and practice of writing argumentative prose. The focus is on exploring current issues in communities, from the classroom to cyberspace. The "interacting in communities" theme and the high-interest readings engage students, while helping them develop informed opinions, effective arguments, and polished writing.
Belief, Opinion, and Fact
The letters in your course number: OPRE 504, stand for OPerations RE-search. OPRE is a science of making decisions (based on some numerical and measurable scales) by searching, and re-searching for a solution. I refer you to What Is OR/MS? for a deeper understanding of what OPRE is all about. Decision making under uncertainty must be based on facts not on personal opinion nor on belief.
Belief, Opinion, and Fact Belief Opinion Fact Self says I'm right This is my view This is a fact Says to others You're wrong That is yours I can prove it to you Sensible decisions are always based on facts. We should not confuse facts with beliefs or opinions. Beliefs are defined as someone's own understanding or needs. In belief, "I am" always right and "you" are wrong. There is nothing that can be done to convince the person that what they believe in is wrong. Opinions are slightly less extreme than beliefs. An opinion means that a person has certain views that they think are right. They also know that others are entitled to their own opinions. People respect other's opinions and in turn expect the same. Contrary to beliefs and opinions are facts. Facts are the basis of decisions. A fact is something that is right, and one can prove it to be true based on evidence and logical arguments.
Examples for belief, opinion, and facts can be found in religion, economics, and econometrics, respectively.
With respect to belief, Henri Poincaré said "Doubt everything or believe everything: these are two equally convenient strategies. With either we dispense with the need to think."
How to Assign Probabilities?
Probability is an instrument to measure the likelihood of the occurrence of an event. There are three major approaches of assigning probabilities as follows:
- Classical Approach: Classical probability is predicated on the condition that the outcomes of an experiment are equally likely to happen. The classical probability utilizes the idea that the lack of knowledge implies that all possibilities are equally likely. The classical probability is applied when the events have the same chance of occurring (called equally likely events), and the set of events are mutually exclusive and collectively exhaustive. The classical probability is defined as:
P(X) = Number of favorable outcomes / Total number of possible outcomes
- Relative Frequency Approach: Relative probability is based on accumulated historical or experimental data. Frequency-based probability is defined as:
P(X) = Number of times an event occurred / Total number of opportunities for the event to occur. Note that relative probability is based on the ideas that what has happened in the past will hold.
- Subjective Approach: The subjective probability is based on personal judgment and experience. For example, medical doctors sometimes assign subjective probability to the length of life expectancy for a person who has cancer.
- Anchoring: is the practice of assigning a value obtained from a prior experience and adjusting the value in consideration of current expectations or circumstances
- The Delphi Technique: It consists of a series of questionnaires. Each series is one "round". The responses from the first "round" are gathered and become the basis for the questions and feedback of the second "round". The process is usually repeated for a predetermined number of "rounds" or until the responses are such that a pattern is observed. This process allows expert opinion to be circulated to all members of the group and eliminates the bandwagon effect of majority opinion.
General Laws of Probability
- General Law of Addition: When two or more events will happen at the same time, and the events are not mutually exclusive, then:
P(X or Y) = P(X) + P(Y) - P(X and Y) Notice that, the equation P(X or Y) = P(X) + P(Y) - P(X and Y), contains especial events: An event (X and Y) which is the intersection of set/events X and Y, and another event (X or Y) which is the union (i.e., either/or) of sets X and Y. Although this is very simple, it says relatively little about how event X influences event Y and vice versa. If P(X and Y) is 0, indicating that en X and Y do not intersect (i.e., they are mutually exclusive), then we have P(X or Y) = P(X) + P(Y). On the other hand if P(X and Y) is not 0, then there are interactions between the two events X and Y. Usually it could be a physical interaction between them. This makes the relationship P(X or Y) = P(X) + P(Y) - P(X and Y) nonlinear because the P(X and Y) term is subtracted off which influences the result.
- Special Law of Addition: When two or more events will happen at the same time, and the events are mutually exclusive, then:
P(X or Y) = P(X) + P(Y)
- General Law of Multiplication: When two or more events will happen at the same time, and the events are dependent, then the general rule of multiplicative law is used to find the joint probability:
P(X and Y) = P(X) . P(Y|X), where P(X|Y) is a conditional probability.
- Special Law of Multiplicative: When two or more events will happen at the same time, and the events are independent, then the special rule of multiplication law is used to find the joint probability:
P(X and Y) = P(X) . P(Y)
- Conditional Probability Law: A conditional probability is denoted by P(X|Y). This phrase is read: the probability that X will occur given that Y is known to have occurred.
Conditional probabilities are based on knowledge of one of the variables. The conditional probability of an event, such as X, occurring given that another event, such as Y, has occurred is expressed as:
P(X|Y) = P(X and Y) / P(Y) Provided P(Y) is not zero. Note that when using the conditional law of probability, you always divide the joint probability by the probability of the event after the word given. Thus, to get P(X given Y), you divide the joint probability of X and Y by the unconditional probability of Y. In other words, the above equation is used to find the conditional probability for any two dependent events.
A special case of the Bayes Theorem is:
P(X|Y) = P(Y|X). P(X) / P(Y) If two events, such as X and Y, are independent then:
P(X|Y) = P(X),
andP(Y|X) = P(Y)
Mutually Exclusive versus Independent Events
Mutually Exclusive (ME): Event A and B are M.E if both cannot occur simultaneously. That is, P[A and B] = 0.Independency (Ind.): Events A and B are independent if having the information that B already occurred does not change the probability that A will occur. That is P[A given B occurred] = P[A].
If two events are ME they are also Dependent: P(A given B) = P[A and B]/P[B], and since P[A and B] = 0 (by ME), then P[A given B] = 0. Similarly,
If two events are Dependent then they are also not ME.
If two events are Dependent then they may or may not be ME.
If two events are not ME, then they may or may not be Independent.
The following Figure contains all possibilities. The notations used in this table are as follows: X means does not imply, question mark ? means it may or may not imply, while the check mark means it implies.
Bernstein was the first to discover that (probabilistic) pairwise independency and mutual independency for a collection of events A1,..., An are different notions. ![]()
Type of Data and Levels of Measurement
Information can be collected in statistics using qualitative or quantitative data.Qualitative data, such as eye color of a group of individuals, is not computable by arithmetic relations. They are labels that advise in which category or class an individual, object, or process fall. They are called categorical variables.
Quantitative data sets consist of measures that take numerical values for which descriptions such as means and standard deviations are meaningful. They can be put into an order and further divided into two groups: discrete data or continuous data. Discrete data are countable data, for example, the number of defective items produced during a day's production. Continuous data, when the parameters (variables) are measurable, are expressed on a continuous scale. For example, measuring the height of a person.
The first activity in statistics is to measure or count. Measurement/counting theory is concerned with the connection between data and reality. A set of data is a representation (i.e., a model) of the reality based on a numerical and mensurable scales. Data are called "primary type" data if the analyst has been involved in collecting the data relevant to his/her investigation. Otherwise, it is called "secondary type" data.
Data come in the forms of Nominal, Ordinal, Interval and Ratio (remember the French word NOIR for color black). Data can be either continuous or discrete.
Level of Measurements _________________________________________ Nominal Ordinal Interval/Ratio Ranking? no yes yes Numerical difference no no yes Zero and unit of measurement are arbitrary in the Interval scale. While the unit of measurement is arbitrary in Ratio scale, its zero point is a natural attribute. The categorical variable is measured on an ordinal or nominal scale.
Measurement theory is concerned with the connection between data and reality. Both statistical theory and measurement theory is necessary to make inferences about reality.
Since statisticians live for precision, they prefer Interval/Ratio levels of measurement.
Visit the Web site Measurement theory: Frequently Asked Questions
Histogramming: Ckecking for Homogeneity of Population
The mode is the most frequently occurring value in a set of observations. Data may have two modes. In this case, we say the data are bimodal, and sets of observations with more than two modes are referred to as multimodal. Whenever, more than one mode exist, then the population from which the sample came is a mixture of more than one population. Almost all standard statistical analyses are conditioned on the assumption that the population is homogeneous, meaning that its density is unimodal. To check the unimodality of sampling data, one may use histogramming.Number of Class Intervals in a Histogram: Before we can construct our frequency distribution we must determine how many classes we should use. This is purely arbitrary, but too few classes or too many classes will not provide as clear a picture as can be obtained with some more nearly optimum number. An empirical relationship, known as Sturge's rule, may be used as a useful guide to determine the optimal number of classes (k) is given by
k = the smallest integer greater than or equal to 1 + 3.332 Log(n) where k is the number of classes, Log is in base 10, n is the total number of the numerical values which comprise the data set.
Therefore, class width is:
(highest value - lowest value) / (1 + 3.332 Logn) where n is the total number of items in the data set.
The following Java applet produces a histogram based on this rule:
Test for Homogeneity of Population.To have an "optimum" you need some measure of quality -- presumably in this case, the "best" way to display whatever information is available in the data. The sample size contributes to this; so the usual guidelines are to use between 5 and 15 classes, with more classes, if you have a larger sample. You should take into account a preference for tidy class widths, preferably a multiple of 5 or 10, because this makes it easier to understand.
Beyond this it becomes a matter of judgement. Try out a range of class widths, and choose the one that works best. This assumes you have a computer and can generate alternative histograms fairly readily.
There are often management issues that come into play as well. For example, if your data is to be compared to similar data -- such as prior studies, or from other countries -- you are restricted to the intervals used therein.
If the histogram is very skewed, then unequal classes should be considered. Use narrow classes where the class frequencies are high, wide classes where they are low.
The following approaches are common:
Let n be the sample size, then the number of class intervals could be
MIN { n, 10 Log(n) }.
The Log is the logarithm in base 10. Thus for 200 observations you would use 14 intervals but for 2000 you would use 33.
Alternatively,
- Find the range (highest value - lowest value).
- Divide the range by a reasonable interval size: 2, 3, 5, 10 or a multiple of 10.
- Aim for no fewer than 5 intervals and no more than 15.
One of the main applications of histogramming is to Test for Homogeneity of Population. The unimodality of the histogram is a necessary condition for the homogeneity of population to make any statistical analysis meaningful.
How to Construct a BoxPlot
A BoxPlot is a graphical display that has many characteristics. It includes the presence of possible outliers. It illustrates the range of data. It shows a measure of dispersion such as the upper quartile, lower quartile and interquartile range (IQR) of the data set as well as the median as a measure of central location which is useful for comparing sets of data. It also gives an indication of the symmetry or skewness of the distribution. The main reason for the popularity of boxplots is that they offer a lot of information in a compact way.Steps to Construct a BoxPlot:
- Horizontal lines are drawn at the median, upper, and lower quartiles. These horizontal lines are joined by vertical lines to produce the box.
- A vertical lines is drawn up from the upper quartile to the most extreme data point that is within a distance of 1.5 (IQR) of the upper quartile. A similar defined vertical line is drawn from the lower quartile.
- Each data point beyond the end of the vertical line is marked with and asterisk (*).
Probability, Chance, Likelihood, and Odds
"Probability" has an exact technical meaning -- well, in fact it has several, and there is still debate as to which term ought to be used. However, for most events for which probability is easily computed e.g. rolling of a die the probability of getting a four [::], almost all agree on the actual value (1/6), if not the philosophical interpretation. A probability is always a number between 0 [not "quite" the same thing as impossibility: it is possible that "if" a coin were flipped infinitely many times, it would never show "tails", but the probability of an infinite run of heads is 0] and 1 [again, not "quite" the same thing as certainty but close enough].The word "chance" or "chances" is often used as an approximate synonym of "probability", either for variety or to save syllables. It would be better practice to leave "chance" for informal use, and say "probability" if that is what is meant.
In cases where the probability of an observation is described by a parametric model, the "likelihood" of a parameter value given the data is defined to be the probability of the data given the parameter. One occasionally sees "likely" and "likelihood"; however, these terms are used casually as synonyms for "probable" and "probability".
"Odds" is a probabilistic concept related to probability. It is the ratio of the probability (p) of an event to the probability (1-p) that it does not happen: p/(1-p). It is often expressed as a ratio, often of whole numbers; e.g., "odds" of 1 to 5 in the die example above, but for technical purposes the division may be carried out to yield a positive real number (here 0.2). The logarithm of the odds ratio is useful for technical purposes, as it maps the range of probabilities onto the (extended) real numbers in a way that preserves symmetry between the probability that an event occurs and the probability that it does not occur.
Odds are a ratio of nonevents to events. If the event rate for a disease is 0.1 (10 per cent), its nonevent rate is 0.9 and therefore its odds are 9:1. Note that this is not the same expression as the inverse of event rate.
Another way to compare probabilities and odds is using "part-whole thinking" with a binary (dichotomous) split in a group. A probability is often a ratio of a part to a whole; e.g., the ratio of the part [those who survived 5 years after being diagnosed with a disease] to the whole [those who were diagnosed with the disease]. Odds are often a ratio of a part to a part; e.g., the odds against dying are the ratio of the part that succeeded [those who survived 5 years after being diagnosed with a disease] to the part that 'failed' [those who did not survive 5 years after being diagnosed with a disease].
Obviously, probability and odds are intimately related: Odds = p / (1-p). Note that probability is always between zero and one, whereas odds range from zero to infinity.
Aside from their value in betting, odds allow one to specify a small probability (near zero) or a large probability (near one) using large whole numbers (1,000 to 1 or a million to one). Odds magnify small probabilities (or large probabilities) so as to make the relative differences visible. Consider two probabilities: 0.01 and 0.005. They are both small. An untrained observer might not realize that one is twice as much as the other. But if expressed as odds (99 to 1 versus 199 to 1) it may be easier to compare the two situations by focusing on large whole numbers (199 versus 99) rather than on small ratios or fractions.
Visit also the Web site Counting and Combinatorial
What Is "Degrees of Freedom"
Recall that in estimating the population's variance, we used (n-1) rather than n, in the denominator. The factor (n-1) is called "degrees of freedom."Estimation of the Population Variance: Variance in a population is defined as the average of squared deviations from the population mean. If we draw a random sample of n cases from a population where the mean is known, we can estimate the population variance in an intuitive way. We sum the deviations of scores from the population mean and divide this sum by n. This estimate is based on n independent pieces of information and we have n degrees of freedom. Each of the n observations, including the last one, is unconstrained ('free' to vary).
When we do not know the population's mean, we can still estimate the population variance, but now we compute deviations around the sample mean. This introduces an important constraint because the sum of the deviations around the sample mean is known to be zero. If we know the value for the first (n-1) deviations, the last one is known. There are only n-1 independent pieces of information in this estimate of variance.
If you study a system with n parameters xi, i =,1..., n you can represent it in a n-dimension space. Any point of this space shall represent a potential state of your system. If your n parameters could vary independently, then your system would be fully described in a n-dimension hyper-volume. Now, imagine you've got one constraint between the parameters (an equation relying your n parameters), then your system would be described by a (n-1)-dimension hyper-surface. For example, in three dimensional space, a linear relationship means a plane which is 2-dimensional.
In statistics, your n parameters are your n data. To evaluate variance, you first need to infer the mean E(X). So when you evaluate the variance, you've got one constraint on your system (which is the expression of the mean), and it only remains (n-1) degrees of freedom to your system.
Therefore, we divide the sum of squared deviations by n-1 rather than by n when we have sample data. On average, deviations around the sample mean are smaller than deviations around the population mean. This is because our sample mean is always in the middle of our sample scores; in fact the minimum possible sum of squared deviations for any sample of numbers is around the mean for that sample of numbers. Thus, if we sum the squared deviations from the sample mean and divide by n, we have an underestimate of the variance in the population (which is based on deviations around the population mean).
If we divide the sum of squared deviations by n-1 instead of n, our estimate is a bit larger, and it can be shown that this adjustment gives us an unbiased estimate of the population variance. However, for large n, say, over 30, it does not make too much of difference if we divide by n, or n-1.
Degrees of Freedom in ANOVA: You will see the key parse "degrees of freedom" also appearing in the Analysis of Variance (ANOVA) tables. If I tell you about 4 numbers, but don't say what they are, the average could be anything. I have 4 degrees of freedom in the data set. If I tell you 3 of those numbers, and the average, you can guess the fourth number. The data set, given the average, has 3 degrees of freedom. If I tell you the average and the standard deviation of the numbers, I have given you 2 pieces of information, and reduced the degrees of freedom to from 4 to 2. You only need to know 2 of the numbers' values to guess the other 2.
In an ANOVA table, degree of freedom (df) is the divisor in SS/df which will result in an unbiased estimate of the variance of a population.
df = N - k, where N is the sample size, and k is a small number, equal to the number of "constraints", the number of "bits of information" already "used up". Degree of freedom is an additive quantity; total amounts of it can be "partitioned" into various components.
For example, suppose we have a sample of size 13 and calculate its mean, and then the deviations from the mean, only 12 of the deviations are free to vary: once one has found 12 of the deviations, the thirteenth one is determined. Therefore, if one is estimating a population variance from a sample, k = 1.
In bivariate correlation or regression situations, k = 2: the calculation of the sample means of each variable "uses up" two bits of information, leaving N - 2 independent bits of information.
In a one-way analysis of variance (ANOVA) with g groups, there are three ways of using the data to estimate the population variance. If all the data are pooled, the conventional SST/(n-1) would provide an estimate of the population variance.
If the treatment groups are considered separately, the sample means can also be considered as estimates of the population mean, and thus SSb/(g - 1) can be used as an estimate. The remaining ("within-group", "error") variance can be estimated from SSw/(n - g). This example demonstrates the partitioning of df: df total = n - 1 = df(between) + df(within) = (g - 1) + (n - g).
Therefore, the simple 'working definition' of df is ‘sample size minus the number of estimated parameters'. A fuller answer would have to explain why there are situations in which the degrees of freedom is not an integer. After, we said all this, the best explanation, is mathematical in that we use df to obtain an unbiased estimate.
In summary, the concept of degrees of freedom is used for the following two different purposes:
- Parameter(s) of certain distributions, such as F, and t-distribution are called degrees of freedom. Therefore, degrees of freedom could be positive non-integer number(s).
- Degrees of freedom is used to obtain unbiased estimate for the population parameters.
Outlier Removal
Because of the potentially large variance, outliers could be the outcome of sampling. It's perfectly correct to have such an observation that legitimately belongs to the study group by definition. Lognormally distributed data (such as international exchange rate), for instance, will frequently exhibit such values.Therefore, you must be very careful and cautious: before declaring an observation "an outlier," find out why and how such observation occurred. It could even be an error at the data entering stage.
First, construct the BoxPlot of your data. Form the Q1, Q2, and Q3 points which divide the samples into four equally sized groups. (Q2 = median) Let IQR = Q3 - Q1. Outliers are defined as those points outside the values Q3+k*IQR and Q1-k*IQR. For most case one sets k=1.5.
Another alternative is the following algorithm
a) Compute s of whole sample.
b) Define a set of limits off the mean: mean + ks, mean - ks sigma (Allow user to enter k. A typical value for k is 2.)
c) Remove all sample values outside the limits.Now, iterate N times through the algorithm, each time replacing the sample set with the reduced samples after applying step (c).
Usually we need to iterate through this algorithm 4 times.
As mentioned earlier, a common "standard" is any observation falling beyond 1.5 (interquartile range) i.e., (1.5 IQRs) ranges above the third quartile or below the first quartile. The following SPSS program, helps you in determining the outliers.
$SPSS/OUTPUT=LIER.OUT TITLE 'DETERMINING IF OUTLIERS EXIST' DATA LIST FREE FILE='A' / X1 VAR LABLE X1 'INPUT DATA' LIST CASE CASE=10/VARIABLE=X1/ CONDESCRIPTIVE X1(ZX1) LIST CASE CASE=10/VARIABLES=X1,ZX1/ SORT CASES BY ZX1(A) LIST CASE CASE=10/VARIABLES=X1,ZX1/ FINISH
Statistical Summaries
Representative of a Sample: Measures of Central Tendency Summaries
How do you describe the "average" or "typical" piece of information in a set of data? Different procedures are used to summarize the most representative information depending of the type of question asked and the nature of the data being summarized.Measures of location give information about the location of the central tendency within a group of numbers. The measures of location presented in this unit for ungrouped (raw) data are the mean, the median, and the mode.
Mean: The arithmetic mean (or the average or simple mean) is computed by summing all numbers in an array of numbers (xi) and then dividing by the number of observations (n) in the array.
![]()
The mean uses all of the observations, and each observation affects the mean. Even though the mean is sensitive to extreme values, i.e., extremely large or small data can cause the mean to be pulled toward the extreme data, it is still the most widely used measure of location. This is due to the fact that the mean has valuable mathematical properties that make it convenient for use with inferential statistical analysis. For example, the sum of the deviations of the numbers in a set of data from the mean is zero, and the sum of the squared deviations of the numbers in a set of data from the mean is the minimum value.
You may use this Applet to compute the mean.
Weighted Mean: In some cases, the data in the sample or population should not be weighted equally, rather each value should be weighted according to its importance.
Median: The median is the middle value in an ordered array of observations. If there is an even number of observations in the array, the median is the average of the two middle numbers. If there is an odd number of data in the array, the median is the middle number.
The median is often used to summarize the distribution of an outcome. If the distribution is skewed, the median and the IQR may be better than other measures to indicate where the observed data are concentrated.
Generally, the median provides a better measure of location than the mean when there are some extremely large or small observations; i.e., when the data are skewed to the right or to the left. For this reason, median income is used as the measure of location for the U.S. household income. Note that if the median is less than the mean, the data set is skewed to the right. If the median is greater than the mean, the data set is skewed to the left.
Mode: The mode is the most frequently occurring value in a set of observations. Why use the mode? The classic example is the shirt/shoe manufacturer who wants to decide what sizes to introduce. Data may have two modes. In this case, we say the data are bimodal, and sets of observations with more than two modes are referred to as multimodal. Note that the mode does not have important mathematical properties for future use. Also, the mode is not a helpful measure of location, because there can be more than one mode or even no mode.
Whenever, more than one mode exist, then the population from which the sample came is a mixture of more than one population. Almost all standard statistical analyses are conditioned on the assumption that the population is homogeneous, meaning that its density is unimodal.
Notice that Excel is a very limited statistical software. For example, it displays only one mode, the first one. Unfortunately, this is very misleading. However, you may find out if there are others by inspection only, as follow: Create a frequency distribution, invoke the menu sequence: Tools, Data analysis, Frequency and follow instructions on the screen. You will see the frequency distribution and then find the mode visually. Unfortunately, Excel does not draw a Stem and Leaf diagram. All commercial off-the-shelf software, such as SAS and SPSS display a Stem and Leaf diagram, which is a frequency distribution of a given data set.
Quartiles & Percentiles: When we order the data, for example in ascending order, we may divide the data into quarters, Q1…Q4 known as quartiles. The first Quartile (Q1) is that value where 25% of the values are smaller and 75% are larger. The second Quartile (Q2) is that value where 50% of the values are smaller and 50% are larger. The third Quartile (Q3) is that value where 75% of the values are smaller and 25% are larger.
Percentiles have a similar concept and therefore, are related, e.g., the 25th percentile corresponds to the first quartile Q1, etc. The advantage of percentiles is that they may be subdivided. The percentiles and quartiles are most conveniently read off a cumulative distribution function.
Selecting Among the Mean, Median, and Mode
It is a common mistake to specify the wrong index for central tenancy.The first consideration is the type of data, if the variable is categorical, the mode is the single measure that best describes that data. ![]()
The second consideration in selecting the index is to ask whether the total of all observations is of any interest. If the answer is yes, then the mean is the proper index of central tendency.
If the total is of no interest, then depending on whether the histogram is symmetric or skewed one must use either mean or median, respectively.
In all cases the histogram must be unimodal.
Suppose that four people want to get together to play poker. They live on 1st Street, 3rd Street, 7th Street, and 15th Street. They want to select a house that involves the minimum amount of driving for all parties concerned.
Let's suppose that they decide to minimize the absolute amount of driving. If they met at 1st Street, the amount of driving would be 0 + 2 + 6 + 14 = 22 blocks. If they met at 3rd Street, the amount of driving would be 2 + 0+ 4 + 12 = 18 blocks. If they met at 7th Street, 6 + 4 + 0 + 8 = 18 blocks. Finally, at 15th Street, 14 + 12 + 8 + 0 = 34 blocks.
So the two houses that would minimize the amount of driving would be 3rd or 7th Street. Actually, if they wanted a neutral site, any place on 4th, 5th, or 6th Street would also work.
Note that any value between 3 and 7 could be defined as the median of 1, 3, 7, and 15. So the median is the value that minimizes the absolute distance to the data points.
Now the person at 15th is upset at always having to do more driving. So the group agrees to consider a different rule. The decide to minimize the square of the distance driving. This is the least square principle. By squaring, we give more weight to a single very long commute than to a bunch of shorter commutes. With this rule, the 7th Street house (36 + 16 + 0 + 64 = 116 square blocks) is preferred to the 3rd Street house (4 + 0 + 16 + 144 = 164 square blocks). If you consider any location, and not just the houses themselves, then 9th Street is the location that minimizes the square of the distances driven.
Find the value of x that minimizes
(1 - x)2 + (3 - x)2 +(7 - x)2 + (15 - x)2.The value that minimizes the sum of squared values is 6.5 which is also equal to the arithmetic mean of 1, 3, 7, and 15. With calculus, it's easy to show that this holds in general.
For moderately asymmetrical distributions the mode, median and mean satisfy the formula: mode=3 (median) - 2(mean).
Consider a small sample of scores with an even number of cases, for example, 1, 2, 4, 7, 10, and 12. The median is 5.5, the midpoint of the interval between the scores of 4 and 7.
As we discussed above, it is true that the median is a point around which the sum absolute deviations is minimized. In this example the sum of absolute deviation is 22. However, it is not, a unique point. Any point in the 4 to 7 region will have the same value of 22 for the sum of the absolute deviations.
Indeed, medians are tricky. The 50%-50% (above-below) is not quite correct. For example, 1, 1, 1, 1, 1, 1, 8 has no median. The convention says that, the median is 1; however about 14% of the data lie strictly above it, 100% of the data is
the median. This generalizes to other percentiles.
We will make use of this idea in regression analysis. In an analogous argument, the regression line is a unique line which minimizes the sum of the squared deviations from it. There is no unique line which minimizes the sum of the absolute deviations from it.
Quality of a Sample: Measures of Dispersion
Average by itself is not a good indication of quality. You need to know the variance to make any educated assessment. We are reminded of the dilemma of the six-foot tall statistician who drowned in a stream that had an average depth of three feet.These are statistical procedures for describing the nature and extent of differences among the information in the distribution. A measure of variability is generally reported with a measure of central tendency.
Statistical measures of variation are numerical values that indicate the variability inherent in a set of data measurements. Note that a small value for a measure of dispersion indicates that the data are concentrated around the mean; therefore, the mean is a good representative of the data set. On the other hand, a large measure of dispersion indicates that the mean is not a good representative of the data set. Also, measures of dispersion can be used when we want to compare the distributions of two or more sets of data. Quality of a data set is measured by its variability: Larger variability indicates lower quality. That is why high variation makes the manager very worried. Your job, as a statistician is to measure the variation, and if it is too high and unacceptable, then it is the job of the technical staff, such as engineers, to fix the process.
The decision situations with flat uncertainty have the largest risk. For simplicity, consider the case when there are only two outcomes one with probability of p. Then, the variation in the outcomes is p(1-p). This variation is the largest if we set p = 50%. That is, equal chance for each outcome. In such a case, the quality of information is at its lowest level. You may like to perform some experiment using this Applet for a good understanding of the above argument.
Remember, quality of information and variation are inversely related. Larger the variation in the data, the lower the quality of the data (i.e., information): the Devil is in the Deviations.
The four most common measures of variation are the range, variance, standard deviation, and coefficient of variation.
Range: The range of a set of observations is the absolute value of the difference between the largest and smallest values in the data set. It measures the size of the smallest contiguous interval of real numbers that encompasses all of the data values. It is not useful when extreme values are present. It is based solely on two values, not on the entire data set. In addition, it cannot be defined for open-ended distributions such as Normal distribution.
Notice that, when dealing with discrete random observations, some authors define the range as: Range = Largest value - Smallest value + 1.
Normal distribution does not have a range. A student said "since the tails of normal density function never touch the x-axis, at the same time since for an observation to contribute to forming the such a curve, very large positive and negative values must exist" Yet such remote values are always possible, but increasingly improbable. This encapsulates the asymptotic behavior of normal density very well.
Variance: An important measure of variability is variance. Variance is the average of the squared deviations of each observation in the set from the arithmetic mean of all of observations.
Variance = S (xi - ) 2 / (n - 1), n
2.
The variance is a measure of spread or dispersion among values in a data set. Therefore, the greater the variance, the lower the quality.
The variance is not expressed in the same units as the observations. In other words, the variance is hard to understand because the deviations from the mean are squared, making it too large for logical explanation. This problem can be solved by working with the square root of the variance, which is called the standard deviation.
Standard Deviation: Both variance and standard deviation provide the same information; one can always be obtained from the other. In other words, the process of computing a standard deviation always involves computing a variance. Since standard deviation is the square root of the variance, it is always expressed in the same units as the raw data:
![]()
For large data set (more than 30, say), approximately 68% of the data will fall within one standard deviation of the mean, 95% fall within two standard deviations, and 97.7% (or almost 100% ) fall within three standard deviations (S) from the mean.
You may use this Applet to compute the mean, and standard deviation.
Standard Error: Standard error is a statistic indicating the accuracy of an estimate. That is, it tells us to assess how different the estimate ( such as
) is from the population parameter (such as m). It is therefore, the standard deviation of a sampling distribution of the estimator such as
's.
Coefficient of Variation: Coefficient of Variation (CV) is the relative deviation with respect to size
:
![]()
CV is independent of the unit of measurement. In estimation of a parameter when CV is less than say 10%, the estimate is assumed acceptable. The inverse of CV; namely 1/CV is called the Signal-to-noise Ratio.
The coefficient of variation is used to represent the relationship of the standard deviation to the mean, telling how much representative the mean is of the numbers from which it came. It expresses the standard deviation as a percentage of the mean; i.e., it reflects the variation in a distribution relative to the mean.
You may use this Applet to compute the mean, standard deviation and the coefficient of variation.
Z Score: how many standard deviations a given point (i.e. observations) is above or below the mean. In other words, a Z score represents the number of standard deviations that an observation (x) is above or below the mean. The larger the Z value, the further away a value will be from the mean. Note that values beyond three standard deviations are very unlikely. Note that if a Z score is negative, the observation (x) is below the mean. If the Z score is positive, the observation (x) is above the mean. The Z score is found as:
Z = (x - ) / standard deviation of X
The Z score is a measure of the number of standard deviations that an observation is above or below the mean. Since the standard deviation is never negative, a positive Z score indicates that the observation is above the mean, a negative Z score indicate that the observation is below the mean. Note that Z is a dimensionless value, and therefore is a useful measure by which to compare data values from two different populations even those measured by different units.
Z-Transformation: Applying the formula z = (X - m) / s will always produce a transformed variable with a mean of zero and a standard deviation of one. However; the shape of the distribution will not be affected by the transformation. If X is not normal then the transformed distribution will not be normal either. In the following SPSS command variable x is transformed to zx.
descriptives variables=x(zx) You have heard the terms z value, z test, z transformation, and z score. Do all of these terms mean the same thing? Certainly not:
The z value is refereed to the critical value (a point on the horizontal axes) of the Normal (o, 1) density function, for a given area to the left of that z-value.
The z test is refereed to the procedures for testing the equality of mean (s) of one (or two) population(s).
z score of a given observation x in a sample of size n, is simply (x - average of the sample) divided by the standard deviation of the sample.
The z transformation of a set of observations of size n is simply (each observation - average of all observation) divided by the standard deviation among all observations. The aim is to produce a transformed data set with a mean of zero and a standard deviation of one. This makes the transformed set dimensionless and manageable with respect to its magnitudes. It also used in comparing several data sets measured using different scales of measurements.
Pearson coined the term "standard deviation" sometime near 1900. The idea of using squared deviations goes back to Laplace in the early 1800's.
Finally, notice again, that the trandforming raw scores to z scores does NOT normalize the data.
Guess a Distribution to Fit Your Data: Skewness & Kurtosis
A pair of statistical measures skewness and kurtosis is a measuring tool which is used in selecting a distribution(s) to fit your data. To make an inference with respect to the population distribution, you may first compute skewness and kurtosis from your random sample from the entire population. Then, locating a point with these coordinates on some widely used Skewness-Kurtosis Charts (available from your instructor upon request), guess a couple of possible distributions to fit your data. Finally, you might use the goodness-of-fit test to rigorously come up with the best candidate fitting your data. Removing outliers improves both skewness and kurtosis.Skewness: Skewness is a measure of the degree to which the sample population deviates from symmetry with the mean at the center.
Skewness = S (xi - ) 3 / [ (n - 1) S 3 ], n
2.
Skewness will take on a value of zero when the distribution is a symmetrical curve. A positive value indicates the observations are clustered more to the left of the mean with most of the extreme values to the right of the mean. A negative skewness indicates clustering to the right. In this case we have: Mean
Median
Mode. The reverse order holds for the observations with positive skewness.
Kurtosis: Kurtosis is a measure of the relative peakedness of the curve defined by the distribution of the observations.
Kurtosis = S (xi - ) 4 / [ (n - 1) S 4 ], n
2.
Standard normal distribution has kurtosis of +3. A kurtosis larger than 3 indicates the distribution is more peaked than the standard normal distribution.
Coefficient of Excess Kurtosis = Kurtosis - 3. A less than 3 kurtosis value means that the distribution is flatter than the standard normal distribution.
Skewness and kurtosis can be used to check for normality via the the Jarque-Bera statistic. For large n, (say, over 30) under the normality condition the quantity
n {Skewness2 / 6 +((Kurtosis - 3)2) / 24)} follows a chi-square distribution with d.f. = 2.
Further Reading:
Tabachnick B., and L. Fidell, Using Multivariate Statistics, HarperCollins, 1996. Has a good discussion on applications and significance tests for skewness and kurtosis.
Computation of Descriptive Statistics for Grouped/Ungrouped Data
One of the most common ways to describe a single variable is with a frequency distribution. Depending on the particular variable, all of the data values may be represented, or you may group the values into categories first (e.g., with age), it would usually not be sensible to determine the frequencies for each value. Rather, the values are grouped into ranges and the frequency determined.). Frequency distributions can be depicted in two ways, as a table or as a graph (this type of graph is often referred to as a histogram or bar chart). Grouped data is derived from raw data and it consists of frequencies (counts of raw values) tabulated with the classes in which they occur. The Class Limits represent the largest (Upper) and lowest (Lower) values which the class will contain. The formulas for the descriptive statistics becomes much simpler for the grouped data, as shown below for Mean, Variance, Standard Deviation, respectively, where (fx) is for the frequency of each class, and n is the sum of all f's:
![]()
![]()
![]()
Numerical Example & Discussions
A Numerical Example: Given the following, small (n = 4) data set, compute the descriptive statistics: x1 = 1, x2 = 2, x3 = 3, and x4 = 6.
i xi (xi- )
(xi - ) 2
(xi - ) 3
(xi - )4
1 1 -2 4 -8 16 2 2 -1 1 -1 1 3 3 0 0 0 0 4 6 3 9 27 81 Sum 12 0 14 18 98 The mean
is 12 / 4 = 3, the variance is s2 = 14 / 3 = 4.67, the standard deviation is s = (14/3) 0.5 = 2.16, the skewness is 18 / [3 (2.16) 3 ] = 0.5952, and finally, the kurtosis is 18 / [3 (2.16) 4 ] = 1.5.
You may use this Applet to check your hand computation.
A Short Discussion
Deviations about the mean m of a distribution is the basis for most of the statistical tests we will learn. Since we are measuring how much a set of scores is dispersed about the mean m , we are measuring variability. We can calculate the deviations about the mean m and express it as variance s2 or standard deviation s. It is very important to have a firm grasp of this concept because it will be a central concept throughout your statistics course.
Both variance s2 and standard deviation s measure variability within a distribution. Standard deviation s is a number that indicates how much on average each of the values in the distribution deviates from the mean m (or center) of the distribution. Keep in mind that variance s2 measures the same thing as standard deviation s (dispersion of scores in a distribution). Variance s2, however, is the average squared deviations about the mean m . Thus, variance s2 is the square of the standard deviation s.
Expected value and variance of
are m and s2/n, respectively.
Expected value and variance of S2 are s2 and 2s4 / (n-1), respectively.
and S2 are the best estimators for m and s2. They are Unbiased (you may update your estimate); Efficient (they have the smallest variation among other estimators); Consistent (increasing sample size provides a better estimate); and Sufficient (you do not need to have the whole data set; what you need are Sxi and Sxi2 for estimations). Note also that the above variance for of S2 is justified only in the case where the population distribution tends to be normal, otherwise one may use bootstrapping techniques.
In general, it is believed that the pattern of mode, median, and mean go from lower to higher in positive skewed data sets, and just the opposite pattern in negative skewed data sets. However; for example, in the following 23 numbers, mean=2.87, median=3, but the data is positively skewed:
4 2 7 6 4 3 5 3 1 3 1 2 4 3 1 2 1 1 5 2 2 3 1 and, the following 10 numbers have mean=median=mode=4, but the data set is left skewed:
1 2 3 4 4 4 5 5 6 6 Note also that, most commercial software do not correctly compute skewness and kurtosis. There is no easy way to determine confidence intervals about a computed skewness or kurtosis value from a small to medium sample. The literature gives tables based on asymptotic methods for sample sets larger than 100 for normal distributions only.
You may have noticed that using the above numerical example on some computer packages such as SPSS, the skewness and the kurtosis are different from what we have computed. For example, the SPSS output for the skewness is 1.190. However; for large a sample size n, the results are identical.
Reference and Further Readings:
David H., Early Sample Measures of Variability, Statistical Science, 13, 1998, 368-377. This article provides a good historical accounts of statistical measures.
Groeneveld R., A class of quantile measures for kurtosis, The American Statistician, 325, Nov. 1998.
Hosking J., M, Moments or L moments? An example comparing two measures of distributional shape, The American Statistician, Vo.l 46, 186-189, 1992.
Multinomial Distributions: Expected Value, Variance,
Standard Deviation, & Coefficient of VariationA multinomial random variable is an extended binomial. However, the difference is that in a multinomial case, there are more than two possible outcomes. There are a fixed number of independent outcomes, with a given probability for each outcome. The expected value (i.e., averages):
![]()
The variance is:
![]()
The vriance is not expressed in the same units as the expected value. So, the variance is hard to understand and explain as a result of the squared term in its computation. This can be alleviated by working with the square root of the variance which is called the Standard Deviation:
![]()
Both variance and standard deviation provide the same information and, therefore, one can always be obtained from the other. In other words, the process of computing standard deviation always involves computing the variance. Since standard deviation is the square root of the variance, it is always expressed in the same units as the expected value.
For the dynamic process, the Volatility as a measure for risk includes the time period over which the standard deviation is computed. The Volatility measure is defined as standard deviation divided by the square root of the time duration.
Coefficient of Variation: Coefficient of Variation (CV) is the size of error relative to the expected value, which is defined as:
![]()
Notice that the CV is independent from the expected value measurement. The coefficient of variation demonstrates the relationship between standard deviation and expected value, by expressing the risk as a percentage of the expected value. The inverse of CV (namely 1/CV) is called the Signal-to-Noise Ratio.
For an application of this section visit Risk Assessment: How Good Is Your Decision?
You may use Multinomial Distributions: Expected Value, Variance, Standard Deviation, & Coefficient of Variation for checking your computation and performing computer-assisted experimentation.
Parameters' Estimation and Quality of a 'Good' Estimate
Estimation is the process by which sample data are used to indicate the value of an unknown quantity in a population.Results of estimation can be expressed as a single value; known as a point estimate, or a range of values, referred to as a confidence interval.
Whenever we use point estimation, we calculate the margin of error associated with that point estimation. For example; for the estimation of the population mean m, the margin of errors calculated as follows: ±1.96 SE(
).
In newspapers and television reports on public opinion pools, the margin of error is the margin of "sampling error". There are many nonsampling errors that can and do affect the accuracy of polls. Here we talk about sampling error. The fact that subgroups have larger sampling error than one must include the following statement: "Other sources of error include but are not limited to, individuals refusing to participate in the interview and inability to connect with the selected number. Every feasible effort is made to obtain a response and reduce the error, but the reader (or the viewer) should be aware that some error is inherent in all research."
To estimate means to esteem (to give value to). An estimator is any quantity calculated from the sample data which is used to give information about an unknown quantity in the population. For example, the sample mean is an estimator of the population mean m.
Estimators of population parameters are sometimes distinguished from the true value by using the symbol 'hat'. For example, true population standard deviation s is estimated (from a sample) population standard deviation.
Example: The usual estimator of the population mean is
= Sxi / n, where n is the size of the sample and x1, x2, x3,.......,xn are the values of the sample. If the value of the estimator in a particular sample is found to be 5, then 5 is the estimate of the population mean µ.
A "Good" estimator is the one which provides an estimate with the following qualities:
Unbiasedness: An estimate is said to be an unbiased estimate of a given parameter when the expected value that of estimator can be shown to be equal to the parameter being estimated. For example, the mean of a sample is an unbiased estimate of the mean of the population from which the sample was drawn. Unbiasedness is a good quality for an estimate since in such a case, using weighted average of several estimates provides a better estimate than each one of those estimates. Therefore, unbiasedness allows us to upgrade our estimates. For example is your estimate of the population mean µ are say, 10, and 11.2 from two independent samples of equal sizes 20, and 30 respectively, then the estimate of the population mean µ based on both samples is [20 (10) + 30 (11.2)] (20 + 30) = 10.75.
Consistency: The standard deviation of an estimate is called the standard error of that estimat