# Statistics and probability

Statistics is about gaining information from sets of data. Sometimes you want to represent a lot of complicated information from a large data set in a way that is easily understood. This is called *descriptive statistics*.

An example of this is the so-called *worm plot* used in cricket: over the cause of a cricket match there can be many hundreds of balls and runs. In the worm plot depicted below, England’s performance is described by the blue line, and the West Indies’ by the green line. You can see at a glance that, although there was variation in the run rate, England consistently scored at a higher rate than the West Indies, and so won the match.

Although some information has been lost - you don't know for instance from which balls in the overs runs were taken - this summary graph clearly displays all of the meaningful information. The human mind is very visual, and this is why graphics, such as graphs or pie charts, are very good for conveying statistical information.

The other branch of statistics is called *inference statistics*. This used to obtain information about a large set of data from a smaller sample. Think of opinion polls. Here, the statistician randomly selects a group of people, a thousand say, and asks them about their opinion, for example whether or not they like the current government. It is then assumed that the opinion of the sample reflects the opinions of people as a whole.

To be able to do statistics, you first have to learn how to collect, handle and represent data.

### Probability theory

Statistics is intimately linked to probability theory. You can use statistics to work out the probability, the chance, that a certain event will occur: if you want to know the chance that your holiday plane will crash, you think of how many planes usually crash within a year. Since this number is very small, you deduce that the chance of your plane crashing is small also. You've done a very simple statistical analysis of the data concerning plane crashes and used it to work out a probability.

But things also work the other way around: you can use abstract probabilities to help you with your stats. Say for example you want to test whether a die that is used in a casino is fair. To do this, you throw the die a great number of times and record the outcomes. You then reason like this: if the die is fair, then each number should be equally likely. There are six numbers, so each number should come up in 1/6 of the cases. What you are doing is comparing the actual die with an *ideal* die: if your casino die does give you each number in roughly 1/6 of all throws, you decide that it is fair.

Probability therory is important when it comes to evaluating statistics. If in am opinion poll you find that 80% of the people you asked liked the current Prime Minister, it is very tempting to conclude that the Prime Minister is very popluar. But how do you know that your result did not just occur by chance? Well, you use probability theory to calculate the probability that your result occured by chance — if this probability is very small, then you should conclude that the Prime Minister is popular.

Statistics, and the underlying theory of probability, are obviously useful for opinion pollsters and professional gamblers, but who else uses them? Here are a few examples:

### Medicine

Stats and probability theory are absolutely essential in medicine as they are used to test new drugs and work out the chance that patients develop side effects from the drugs. Tests are performed on large groups of animals or people and stats is the tool needed to evaluate the tests. It's essential to get it right, for obvious reasons. Even doctors and nurses who don't perform the tests themselves need to be well-versed in stats to understand the results and advise their patients accurately.

Stats and probability theory are also used to assess the risk from things like tobacco and alcohol, and to see how a certain gene affects people. How likely is it that a person with that gene develops a certain illness or characteristic?

Medical research cannot do without statistics.

### Social and natural sciences

On the face of it, sciences like psychology, sociology or biology do not seem to have much to do with maths or stats. But all of these have one thing in common: the scientist makes experiments and observations – how many kids in a class are very shy, how many people from a certain social group take drugs, how many trees in a forest are attacked by a parasite – and then interprets these observations. He or she may deduce that children with overpowering parents are particularly shy, that poverty is linked to drug abuse, or that a parasite is always deadly for a tree.

But to make the observations in the first place, the scientist needs descriptive statistics. He or she has to make sure that the experiment is set up correctly, and know how to collect the data and represent them in a meaningful way. To interpret the data, the scientist needs inference statistics.

Anything that involves observing or describing the world around us uses statistics.

### The financial world

A very important thing in the financial world is risk assessment: what is the probability, or risk, of a company going bankrupt, or the interest rates going up? What is the risk of investing in a company, or of taking on a mortgage? The insurance industry is based on the idea of risk: the chance of your house burning down is quite small, but if it does happen, you lose everything. The insurance company exactly balances the risk of fire with the cost of a fire. They decide what premium to charge you, so that they still make a profit even though they sometimes pay out huge amounts.

Risk assessment is very complicated, not least because risk is something very subjective. Even a mere rumour that oil prices will rise can wreak havoc on the stock market, the risk doesn’t even have to be real.

A good understanding of risk, and how it can be described using statistics and probability is essential for anyone working in the financial world. Finance is all about uncertainty and risk, and how to reduce them. This is why the financial world is becoming more and more mathematical. Employers in this area often prefer mathematicians and statisticians to people with an economics background.

### Politics

Politics is very much about strategy. How should an election campaign be fought? How should a government deal with other powers? How much money should the health service receive? To find a good strategy, politicians need to understand public opinion, know about the structure of society and assess risks. The government employs many statisticians to help them with this. They can conduct and evaluate a census, and work out the risk of there being an epidemic, or of the world economy plunging.

During the cold war, *game theory*, which is closely related to probability theory, was used to decide whether the US strategy - arming itself to the teeth to deter an attack from the USSR - was effective.

### Reliability theory in manufacturing

When you produce a product, be it a car or a light bulb, you want to know how reliable it is. To find out, you take a sample of your light bulbs or cars and test them. Just as in an opinion poll, you can use statistical methods to gain information about the quality of your product from this sample. Reliability theory has become a very important branch within statistics.

### Law

Statistics is often used in law. Suppose that someone working for a company has been accused of falsifying their expense account. She says that it cost her £100 to take a business partner out to dinner, but in reality it only cost her £50? There are statistical tests that can be used to see whether the numbers in the woman’s expense account are likely to have been made up.

But there is also a lot of controversy around statistics and probability in law, because it can get misused. A few years ago, a woman called Sally Clark was jailed for the murder of her two children. She said that they both died of cot death, but the jury was told by an "expert witness" that the probability of two children dying of cot death in the same family is extremely low, so they decided that she must have killed them. But this reasoning is flawed. This was recognised later, the woman was released, and the "expert witness", a doctor, was struck off the medical register.

### All of us

Finally, we all need a basic understanding of statistics. The newspapers and TV news are full of statistics that we need to understand. Politicians are often accused of "juggling" numbers – manipulating statistics to express what they want them to express – and every citizen has to be aware of this possibility to be able to decide whether or not to believe the politician.

We also need to make decisions based on risk. In recent years, there has been a lot of

discussion about the connection between the MMR vaccine and a disorder called autism. It can be very difficult to decide whom to believe in these cases, but with a basic understanding of statistics you don’t need to rely on someone else telling you what to do – you can make up your own mind. You don’t need to be an expert, a little basic knowledge can go a long way in understanding the numbers you are being bombarded with every day.

Date Published: December 30, 2010