Research Methods In Psychology
Saul Mcleod, PhD
BSc (Hons) Psychology, MRes, PhD, University of Manchester
Saul Mcleod, Ph.D., is a qualified psychology teacher with over 18 years experience of working in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.
Learn about our Editorial Process
Olivia Guy-Evans, MSc
Associate Editor for Simply Psychology
BSc (Hons) Psychology, MSc Psychology of Education
Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.
Research methods in psychology are systematic procedures used to observe, describe, predict, and explain behavior and mental processes. They include experiments, surveys, case studies, and naturalistic observations, ensuring data collection is objective and reliable to understand and explain psychological phenomena.
Hypotheses are statements about the prediction of the results, that can be verified or disproved by some kind of investigation.
There are four types of hypotheses :
- Null Hypotheses (H0 ) – these predict that no difference will be found in the results between the conditions. Typically these are written ‘There will be no difference…’
- Alternative Hypotheses (Ha or H1) – these predict that there will be a significant difference in the results between the two conditions. This is also known as the experimental hypothesis.
- One-tailed (directional) hypotheses – these state the specific direction the researcher expects the results to move in, e.g. higher, lower, more, less. In a correlation study, the predicted direction of the correlation can be either positive or negative.
- Two-tailed (non-directional) hypotheses – these state that a difference will be found between the conditions of the independent variable but does not state the direction of a difference or relationship. Typically these are always written ‘There will be a difference ….’
All research has an alternative hypothesis (either a one-tailed or two-tailed) and a corresponding null hypothesis.
Once the research is conducted and results are found, psychologists must accept one hypothesis and reject the other.
So if a difference is found, the Psychologist would accept the alternative hypothesis and reject the null. The opposite applies if no difference is found.
Sampling is the process of selecting a representative group from the population under study.
A sample is the participants you select from a target population (the group you are interested in) to make generalisations about.
Representative means the extent to which a sample mirrors a researcher’s target population and reflects its characteristics.
Generalisability means the extent to which their findings can be applied to the larger population of which their sample was a part.
- Volunteer sample : where participants pick themselves through newspaper adverts, noticeboards or online.
- Opportunity sampling : also known as convenience sampling , uses people who are available at the time the study is carried out and willing to take part. It is based on convenience.
- Random sampling : when every person in the target population has an equal chance of being selected. An example of random sampling would be picking names out of a hat.
- Systematic sampling : when a system is used to select participants. Picking every Nth person from all possible participants. N = the number of people in the research population / the number of people needed for the sample.
- Stratified sampling : when you identify the subgroups and select participants in proportion to their occurrences.
- Snowball sampling : when researchers find a few participants, and then ask them to find participants themselves and so on.
- Quota sampling : when researchers will be told to ensure the sample fits with certain quotas, for example they might be told to find 90 participants, with 30 of them being unemployed.
Experiments always have an independent and dependent variable .
- The independent variable is the one the experimenter manipulates (the thing that changes between the conditions the participants are placed into). It is aassumed to have a direct effect on the dependent variable.
- The dependent variable is the thing being measured, or the results of the experiment.
Operationalization of variables means making them measurable/quantifiable. We must use operationalization to ensure that variables are in a form that can be easily tested.
For instance, we can’t really measure ‘happiness’ but we can measure how many times a person smiles within a two hour period.
By operationalizing variables, we make it easy for someone else to replicate our research. Remember, this is important because we can check if our findings are reliable.
Extraneous variables are all variables, which are not the independent variable, but could affect the results of the experiment.
It can be a natural characteristic of the participant, such as intelligence levels, gender, or age for example, or it could be a situational feature of the environment such as lighting or noise.
Demand characteristics are a type of extraneous variable that occurs if the participants work out the aims of the research study, they may begin to behave in a certain way.
For example, in Milgram’s research , critics argued that participants worked out that the shocks were not real and they administered them as they thought this was what was required of them.
Extraneous variables must be controlled so that they do not affect (confound) the results.
Randomly allocating participants to their conditions or using a matched pairs experimental design can help to reduce participant variables.
Situational variables are controlled by using standardized procedures, ensuring every participant in a given condition is treated in the same way
Experimental design refers to how participants are allocated to each condition of the independent variable, such as a control or experimental group.
- Independent design ( between-groups design ): each participant is selected for only one group. With the independent design, the most common way of deciding which participants go into which group is by means of randomization.
- Matched participants design : each participant is selected for only one group, but the participants in the two groups are matched for some relevant factor or factors (e.g. ability; sex; age).
- Repeated measures design ( within groups) : each participant appears in both groups, so that there are exactly the same participants in each group.
- The main problem with the repeated measures design is that there may well be order effects. Their experiences during the experiment may change the participants in various ways.
- They may perform better when they appear in the second group because they have gained useful information about the experiment or about the task. On the other hand, they may perform less well on the second occasion because of tiredness or boredom.
- Counterbalancing is the best way of preventing order effects from disrupting the findings of an experiment, and involves ensuring that each condition is equally likely to be used first and second by the participants
If we wish to compare two groups with respect to a given independent variable, it is essential to make sure that the two groups do not differ in any other important way.
All experimental methods involve an iv (independent variable) and dv (dependent variable)..
- Lab Experiments are conducted in a well-controlled environment, not necessarily a laboratory, and therefore accurate and objective measurements are possible. The researcher decides where the experiment will take place, at what time, with which participants, in what circumstances, using a standardized procedure.
- Field experiments are conducted in the everyday (natural) environment of the participants. The experimenter still manipulates the IV, but in a real-life setting. It may be possible to control extraneous variables, though such control is more difficult than in a lab experiment.
- Natural experiments are when a naturally occurring IV is investigated that isn’t deliberately manipulated, it exists anyway. Participants are not randomly allocated, and the natural event may only occur rarely.
Case studies are in-depth investigations of a person, group, event, or community. It uses information from a range of sources, such as from the person concerned and also from their family and friends.
Many techniques may be used such as interviews, psychological tests, observations and experiments. Case studies are generally longitudinal: in other words, they follow the individual or group over an extended period of time.
Case studies are widely used in psychology and among the best-known ones carried out were by Sigmund Freud . He conducted very detailed investigations into the private lives of his patients in an attempt to both understand and help them overcome their illnesses.
Case studies provide rich qualitative data and have high levels of ecological validity. However, it is difficult to generalize from individual cases as each one has unique characteristics.
Correlation means association; it is a measure of the extent to which two variables are related. One of the variables can be regarded as the predictor variable with the other one as the outcome variable.
Correlational studies typically involve obtaining two different measures from a group of participants, and then assessing the degree of association between the measures.
The predictor variable can be seen as occurring before the outcome variable in some sense. It is called the predictor variable, because it forms the basis for predicting the value of the outcome variable
Relationships between variables can be displayed on a graph or as a numerical score called a correlation coefficient.
- If an increase in one variable tends to be associated with an increase in the other, then this is known as a positive correlation .
- If an increase in one variable tends to be associated with a decrease in the other, then this is known as a negative correlation .
- A zero correlation occurs when there is no relationship between variables.
After looking at the scattergraph, if we want to be sure that a significant relationship does exist between the two variables, a statistical test of correlation can be conducted, such as Spearman’s rho.
The test will give us a score, called a correlation coefficient . This is a value between 0 and 1, and the closer to 1 the score is, the stronger the relationship between the variables. This value can be both positive e.g. 0.63, or negative -0.63.
A correlation between variables, however, does not automatically mean that the change in one variable is the cause of the change in the values of the other variable. A correlation only shows if there is a relationship between variables.
Correlation does not always prove causation, as a third variable may be involved.
Interviews are commonly divided into two types: structured and unstructured.
- Structured interviews are formal. The interview situation is standardized as far as possible. Structured interviews are formal, like job interviews. A fixed, predetermined set of questions is put to every participant in the same order and in the same way. Responses are recorded on a questionnaire, and the researcher presets the order and wording of questions, and sometimes the range of alternative answers. The interviewer stays within their role and maintains social distance from the interviewee.
- Unstructured interviews are informal, like casual conversations. A general conversation normally precedes them, and the researcher deliberately adopts an informal approach in an attempt to break down social barriers. There are no set questions, and the participant can raise whatever topics he/she feels are relevant and ask them in their own way. Questions are posed about participants’ answers to the subject Unstructured interviews are most useful in qualitative research to analyze attitudes and values. Though they rarely provide a valid basis for generalization, their main advantage is that they enable the researcher to probe social actors’ subjective point of view.
Questionnaires can be thought of as a kind of written interview. They can be carried out face to face, by telephone, or post.
The choice of questions is important because of the need to avoid bias or ambiguity in the questions, ‘leading’ the respondent or causing offense.
- Open questions are designed to encourage a full, meaningful answer using the subject’s own knowledge and feelings. They provide insights into feelings, opinions, and understanding. Example: “How do you feel about that situation?”
- Closed questions can be answered with a simple “yes” or “no” or specific information, limiting the depth of response. They are useful for gathering specific facts or confirming details. Example: “Do you feel anxious in crowds?”
- Postal questionnaires seem to offer the opportunity of getting around the problem of interview bias by reducing the personal involvement of the researcher. Its other practical advantages are that it is cheaper than face-to-face interviews and can be used to contact many respondents scattered over a wide area relatively quickly.
There are different types of observation methods :
- Covert observation is where the researcher doesn’t tell the participants that they are being observed until after the study is complete. There could be ethical problems or deception and consent with this particular observation method.
- Overt observation is where a researcher tells the participants that they are being observed and what they are being observed for.
- Controlled : behavior is observed under controlled laboratory conditions (e.g., Bandura’s Bobo doll study).
- Natural : Here, spontaneous behavior is recorded in a natural setting.
- Participant : Here, the observer has direct contact with the group of people they are observing. The researcher becomes a member of the group they are researching.
- Non-participant (aka “fly on the wall): The researcher does not have direct contact with the people being observed. The observation of participants’ behavior is from a distance
A pilot study is a small scale preliminary study conducted in order to evaluate the feasibility of the key s teps in a future, full-scale project.
A pilot study is an initial run-through of the procedures to be used in an investigation; it involves selecting a few people and trying out the study on them. It is possible to save time, and in some cases, money, by identifying any flaws in the procedures designed by the researcher.
A pilot study can help the researcher spot any ambiguities (i.e. unusual things) or confusion in the information given to participants or problems with the task devised.
Sometimes the task is too hard, and the researcher may get a floor effect, because none of the participants can score at all or can complete the task – all performances are low.
The opposite effect is a ceiling effect, when the task is so easy that all achieve virtually full marks or top performances and are “hitting the ceiling”.
In cross-sectional research , a researcher compares multiple segments of the population at the same time
Sometimes we want to see how people change over time, as in studies of human development and lifespan. Longitudinal research is a research design in which data-gathering is administered repeatedly over an extended period of time.
In cohort studies , the participants must share a common factor or characteristic such as age, demographic, or occupation. A cohort study is a type of longitudinal study in which researchers monitor and observe a chosen population over an extended period.
Triangulation means using more than one research method to improve the validity of the study.
Reliability is a measure of consistency, if a particular measurement is repeated and the same result is obtained then it is described as being reliable.
- Test-retest reliability : assessing the same person on two different occasions which shows the extent to which the test produces the same answers.
- Inter-observer reliability : the extent to which there is an agreement between two or more observers.
A meta-analysis is a systematic review that involves identifying an aim and then searching for research studies that have addressed similar aims/hypotheses.
This is done by looking through various databases and then decisions are made about what studies are to be included/excluded.
Strengths: Increases the conclusions’ validity as they’re based on a wider range.
Weaknesses: Research designs in studies can vary so they are not truly comparable.
A researcher submits an article to a journal. The choice of the journal may be determined by the journal’s audience or prestige.
The journal selects two or more appropriate experts (psychologists working in a similar field) to peer review the article without payment. The peer reviewers assess: the methods and designs used, originality of the findings, the validity of the original research findings and its content, structure and language.
Feedback from the reviewer determines whether the article is accepted. The article may be: Accepted as it is, accepted with revisions, sent back to the author to revise and re-submit or rejected without the possibility of submission.
The editor makes the final decision whether to accept or reject the research report based on the reviewers comments/ recommendations.
Peer review is important because it prevent faulty data from entering the public domain, it provides a way of checking the validity of findings and the quality of the methodology and is used to assess the research rating of university departments.
Peer reviews may be an ideal, whereas in practice there are lots of problems. For example, it slows publication down and may prevent unusual, new work being published. Some reviewers might use it as an opportunity to prevent competing researchers from publishing work.
Some people doubt whether peer review can really prevent the publication of fraudulent research.
The advent of the internet means that a lot of research and academic comment is being published without official peer reviews than before, though systems are evolving on the internet where everyone really has a chance to offer their opinions and police the quality of research.
Types of Data
- Quantitative data is numerical data e.g. reaction time or number of mistakes. It represents how much or how long, how many there are of something. A tally of behavioral categories and closed questions in a questionnaire collect quantitative data.
- Qualitative data is virtually any type of information that can be observed and recorded that is not numerical in nature and can be in the form of written or verbal communication. Open questions in questionnaires and accounts from observational studies collect qualitative data.
- Primary data is first-hand data collected for the purpose of the investigation.
- Secondary data is information that has been collected by someone other than the person who is conducting the research e.g. taken from journals, books or articles.
Validity means how well a piece of research actually measures what it sets out to, or how well it reflects the reality it claims to represent.
Validity is whether the observed effect is genuine and represents what is actually out there in the world.
- Concurrent validity is the extent to which a psychological measure relates to an existing similar measure and obtains close results. For example, a new intelligence test compared to an established test.
- Face validity : does the test measure what it’s supposed to measure ‘on the face of it’. This is done by ‘eyeballing’ the measuring or by passing it to an expert to check.
- Ecological validit y is the extent to which findings from a research study can be generalized to other settings / real life.
- Temporal validity is the extent to which findings from a research study can be generalized to other historical times.
Features of Science
- Paradigm – A set of shared assumptions and agreed methods within a scientific discipline.
- Paradigm shift – The result of the scientific revolution: a significant change in the dominant unifying theory within a scientific discipline.
- Objectivity – When all sources of personal bias are minimised so not to distort or influence the research process.
- Empirical method – Scientific approaches that are based on the gathering of evidence through direct observation and experience.
- Replicability – The extent to which scientific procedures and findings can be repeated by other researchers.
- Falsifiability – The principle that a theory cannot be considered scientific unless it admits the possibility of being proved untrue.
A significant result is one where there is a low probability that chance factors were responsible for any observed difference, correlation or association in the variables tested.
If our test is significant, we can reject our null hypothesis and accept our alternative hypothesis.
If our test is not significant, we can accept our null hypothesis and reject our alternative hypothesis. A null hypothesis is a statement of no effect.
In Psychology, we use p < 0.05 (as it strikes a balance between making a type I and II error) but p < 0.01 is used in tests that could cause harm like introducing a new drug.
A type I error is when the null hypothesis is rejected when it should have been accepted (happens when a lenient significance level is used, an error of optimism).
A type II error is when the null hypothesis is accepted when it should have been rejected (happens when a stringent significance level is used, an error of pessimism).
- Informed consent is when participants are able to make an informed judgment about whether to take part. It causes them to guess the aims of the study and change their behavior.
- To deal with it, we can gain presumptive consent or ask them to formally indicate their agreement to participate but it may invalidate the purpose of the study and it is not guaranteed that the participants would understand.
- Deception should only be used when it is approved by an ethics committee, as it involves deliberately misleading or withholding information. Participants should be fully debriefed after the study but debriefing can’t turn the clock back.
- All participants should be informed at the beginning that they have the right to withdraw if they ever feel distressed or uncomfortable.
- It causes bias as the ones that stayed are obedient and some may not withdraw as they may have been given incentives or feel like they’re spoiling the study. Researchers can offer the right to withdraw data after participation.
- Participants should all have protection from harm . The researcher should avoid risks greater than those experienced in everyday life and they should stop the study if any harm is suspected. However, the harm may not be apparent at the time of the study.
- Confidentiality concerns the communication of personal information. The researchers should not record any names but use numbers or false names though it may not be possible as it is sometimes possible to work out who the researchers were.
- SAVE ARTICLE
Overview – Research Methods
Research methods are how psychologists and scientists come up with and test their theories. The A level psychology syllabus covers several different types of studies and experiments used in psychology as well as how these studies are conducted and reported:
- Types of psychological studies (including experiments , observations , self-reporting , and case studies )
- Scientific processes (including the features of a study , how findings are reported , and the features of science in general )
- Data handling and analysis (including descriptive statistics and different ways of presenting data ) and inferential testing
Note: Unlike all other sections across the 3 exam papers, research methods is worth 48 marks instead of 24. Not only that, the other sections often include a few research methods questions, so this topic is the most important on the syllabus!
Example question: Design a matched pairs experiment the researchers could conduct to investigate differences in toy preferences between boys and girls. [12 marks]
Types of study
There are several different ways a psychologist can research the mind, including:
Each of these methods has its strengths and weaknesses. Different methods may be better suited to different research studies.
The experimental method looks at how variables affect outcomes. A variable is anything that changes between two situations ( see below for the different types of variables ). For example, Bandura’s Bobo the doll experiment looked at how changing the variable of the role model’s behaviour affected how the child played.
Experiments can be designed in different ways, such as:
- Independent groups: Participants are divided into two groups. One group does the experiment with variable 1, the other group does the experiment with variable 2. Results are compared.
- Repeated measures: Participants are not divided into groups. Instead, all participants do the experiment with variable 1, then afterwards the same participants do the experiment with variable 2. Results are compared.
A matched pairs design is another form of independent groups design. Participants are selected. Then, the researchers recruit another group of participants one-by-one to match the characteristics of each member of the original group. This provides two groups that are relevantly similar and controls for differences between groups that might skew results. The experiment is then conducted as a normal independent groups design.
Types of experiment
Laboratory vs. field experiment.
Experiments are carried out in two different types of settings:
- E.g. Bandura’s Bobo the doll experiment or Asch’s conformity experiments
- E.g. Bickman’s study of the effects of uniforms on obedience
Strengths of laboratory experiment over field experiment:
The controlled environment of a laboratory experiment minimises the risk of other variables outside the researchers’ control skewing the results of the trial, making it more clear what (if any) the causal effects of a variable are. Because the environment is tightly controlled, any changes in outcome must be a result of a change in the variable.
Weaknesses of laboratory experiment over field experiment:
However, the controlled nature of a laboratory experiment might reduce its ecological validity . Results obtained in an artificial environment might not translate to real-life. Further, participants may be influenced by demand characteristics : They know they are taking part in a test, and so behave how they think they’re expected to behave rather than how they would naturally behave.
Natural and quasi experiment
Natural experiments are where variables vary naturally. In other words, the researcher can’t or doesn’t manipulate the variables . There are two types of natural experiment:
- E.g. studying the effect a change in drug laws (variable) has on addiction
- E.g. studying differences between men (variable) and women (variable)
The observational method looks at and examines behaviour. For example, Zimbardo’s prison study observed how participants behaved when given certain social roles.
An observational study will use behavioural categories to prioritise which behaviours are recorded and ensure the different observers are consistent in what they are looking for.
For example, a study of the effects of age and sex on stranger anxiety in infants might use the following behavioural categories to organise observational data:
Rather than writing complete descriptions of behaviours, the behaviours can be coded into categories. For example, IS = interacted with stranger, and AS = avoided stranger. Researchers can also create numerical ratings to categorise behaviour, like the anxiety rating example above.
Inter-observer reliability : In order for observations to produce reliable findings, it is important that observers all code behaviour in the same way. For example, researchers would have to make it very clear to the observers what the difference between a ‘3’ on the anxiety scale above would be compared to a ‘7’. This inter-observer reliability avoids subjective interpretations of the different observers skewing the findings.
Event and time sampling
Because behaviour is constant and varied, it may not be possible to record every single behaviour during the observation period. So, in addition to categorising behaviour , study designers will also decide when to record a behaviour:
- Event sampling: Counting how many times the participant behaves in a certain way.
- Time sampling: Recording participant behaviour at regular time intervals. For example, making notes of the participant’s behaviour after every 1 minute has passed.
Note: Don’t get event and time sampling confused with participant sampling , which is how researchers select participants to study from a population.
Types of observation
Naturalistic vs. controlled.
Observations can be made in either a naturalistic or a controlled setting:
- E.g. setting up cameras in an office or school to observe how people interact in those environments
- E.g. Ainsworth’s strange situation or Zimbardo’s prison study
Covert vs. overt
Observations can be either covert or overt :
- E.g. setting up hidden cameras in an office
- E.g. Zimbardo’s prison study
Participant vs. non-participant
In observational studies, the researcher/observer may or may not participate in the situation being observed:
- E.g. in Zimbardo’s prison study , Zimbardo played the role of prison superintendent himself
- E.g. in Bandura’s Bobo the doll experiment and Ainsworth’s strange situation , the observers did not interact with the children being observed
Self-report methods get participants to provide information about themselves. Information can be obtained via questionnaires or interviews .
Types of self-report
A questionnaire is a standardised list of questions that all participants in a study answer. For example, Hazan and Shaver used questionnaires to collate self-reported data from participants in order to identify correlations between attachment as infants and romantic attachment as adults.
Questions in a questionnaire can be either open or closed :
- >8 hours
- E.g. “How did you feel when you thought you were administering a lethal shock?” or “What do you look for in a romantic partner and why?”
Strengths of questionnaires:
- Quantifiable: Closed questions provide quantifiable data in a consistent format, which enables to statistically analyse information in an objective way.
- Replicability: Because questionnaires are standardised (i.e. pre-set, all participants answer the same questions), studies involving them can be easily replicated . This means the results can be confirmed by other researchers, strengthening certainty in the findings.
Weaknesses of questionnaires:
- Biased samples: Questionnaires handed out to people at random will select for participants who actually have the time and are willing to complete the questionnaire. As such, the responses may be biased towards those of people who e.g. have a lot of spare time.
- Dishonest answers: Participants may lie in their responses – particularly if the true answer is something they are embarrassed or ashamed of (e.g. on controversial topics or taboo topics like sex)
- Misunderstanding/differences in interpretation: Different participants may interpret the same question differently. For example, the “are you religious?” example above could be interpreted by one person to mean they go to church every Sunday and pray daily, whereas another person may interpret religious to mean a vague belief in the supernatural.
- Less detail: Interviews may be better suited for detailed information – especially on sensitive topics – than questionnaires. For example, participants are unlikely to write detailed descriptions of private experiences in a questionnaire handed to them on the street.
In an interview , participants are asked questions in person. For example, Bowlby interviewed 44 children when studying the effects of maternal deprivation.
Interviews can be either structured or unstructured :
- Structured interview: Questions are standardised and pre-set. The interviewer asks all participants the same questions in the same order.
- Unstructured interview: The interviewer discusses a topic with the participant in a less structured and more spontaneous way, pursuing avenues of discussion as they come up.
Interviews can also be a cross between the two – these are called semi-structured interviews .
Strengths of interviews:
- More detail: Interviews – particularly unstructured interviews conducted by a skilled interviewer – enable researchers to delve deeper into topics of interest, for example by asking follow-up questions. Further, the personal touch of an interviewer may make participants more open to discussing personal or sensitive issues.
- Replicability: Structured interviews are easily replicated because participants are all asked the same pre-set list of questions. This replicability means the results can be confirmed by other researchers, strengthening certainty in the findings.
Weaknesses of interviews:
- Lack of quantifiable data: Although unstructured interviews enable researchers to delve deeper into interesting topics, this lack of structure may produce difficulties in comparing data between participants. For example, one interview may go down one avenue of discussion and another interview down a different avenue. This qualitative data may make objective or statistical analysis difficult.
- Interviewer effects : The interviewer’s appearance or character may bias the participant’s answers. For example, a female participant may be less comfortable answering questions on sex asked by a male interviewer and and thus give different answers than if she were asked by a female interviewer.
Note: This topic is A level only, you don’t need to learn about case studies if you are taking the AS exam only.
Case studies are detailed investigations into an individual, a group of people, or an event. For example, the biopsychology page describes a case study of a young boy who had the left hemisphere of his brain removed and the effects this had on his language skills.
In a case study, researchers use many of the methods described above – observation , questionnaires , interviews – to gather data on a subject. However, because case studies are studies of a single subject, the data they provide is primarily qualitative rather than quantitative . This data is then used to build a case history of the subject. Researchers then interpret this case history to draw their conclusions.
Types of case study
Typical vs. unusual cases.
Most case studies focus on unusual individuals, groups, and events.
Many case studies are longitudinal . This means they take place over an extended time period, with researchers checking in with the subject at various intervals. For example, the case study of the boy who had his left hemisphere removed collected data on the boy’s language skills at ages 2.5, 4, and 14 to see how he progressed.
Strengths of case studies:
- Provides detailed qualitative data: Rather than focusing on one or two aspects of behaviour at a single point in time (e.g. in an experiment ), case studies produce detailed qualitative data.
- Allows for investigation into issues that may be impractical or unethical to study otherwise. For example, it would be unethical to remove half a toddler’s brain just to experiment , but if such a procedure is medically necessary then researchers can use this opportunity to learn more about the brain.
Weaknesses of case studies:
- Lack of scientific rigour: Because case studies are often single examples that cannot be replicated , the results may not be valid when applied to the general population.
- Researcher bias: The small sample size of case studies also means researchers need to apply their own subjective interpretation when drawing conclusions from them. As such, these conclusions may be skewed by the researcher’s own bias and not be valid when applied more generally. This criticism is often directed at Freud’s psychoanalytic theory because it draws heavily on isolated case studies of individuals.
This section looks at how science works more generally – in particular how scientific studies are organised and reported . It also covers ways of evaluating a scientific study.
Study features and design
Studies will usually have an aim . The aim of a study is a description of what the researchers are investigating and why . For example, “to investigate the effect of SSRIs on symptoms of depression” or “to understand the effect uniforms have on obedience to authority”.
Studies seek to test a hypothesis . The experimental/alternate hypothesis of a study is a testable prediction of what the researchers expect to happen.
- E.g. “That SSRIs will reduce symptoms of depression” or “subjects are more likely to comply when orders are issued by someone wearing a uniform”.
- E.g. “That SSRIs have no effect on symptoms on depression” or “subject conformity will be the same when orders are issued by someone wearing a uniform as when orders are issued by someone bot wearing a uniform”
Either the experimental/alternate hypothesis or the null hypothesis will be supported by the results of the experiment.
It’s often not possible or practical to conduct research on everyone your study is supposed to apply to. So, researchers use sampling to select participants for their study.
- E.g. all humans, all women, all men, all children, etc.
- E.g. 10,000 humans, 200 women from the USA, children at a certain school
For example, the target population (i.e. who the results apply to) of Asch’s conformity experiments is all humans – but Asch didn’t conduct the experiment on that many people! Instead, Asch recruited 123 males and generalised the findings from this sample to the rest of the population.
Researchers choose from different sampling techniques – each has strengths and weaknesses.
The random sampling method involves selecting participants from a target population at random – such as by drawing names from a hat or using a computer program to select them. This method means each member of the population has an equal chance of being selected and thus is not subject to any bias.
Strengths of random sampling:
- Unbiased: Selecting participants by random chance reduces the likelihood that researcher bias will skew the results of the study.
- Representative: If participants are selected at random – particularly if the sample size is large – it is likely that the sample will be representative of the population as a whole. For example, if the ratio of men:women in a population is 50:50 and participants are selected at random, it is likely that the sample will also have a ratio of men to women that is 50:50.
Weaknesses of random sampling:
- Impractical: It’s often impractical/impossible to include all members of a target population for selection. For example, it wouldn’t be feasible for a study on women to include the name of every woman on the planet for selection. But even if this was done, the randomly selected women may not agree to take part in the study anyway.
The systematic sampling method involves selecting participants from a target population by selecting them at pre-set intervals. For example, selecting every 50th person from a list, or every 7th, or whatever the interval is.
Strengths of systematic sampling:
- Unbiased and representative: Like random sampling , selecting participants according to a numerical interval provides an objective means of selecting participants that prevents researcher bias being able to skew the sample. Further, because the sampling method is independent of any particular characteristic (besides the arbitrary characteristic of the participant’s order in the list) this sample is likely to be representative of the population as a whole.
Weaknesses of systematic sampling:
- Unexpected bias: Some characteristics could occur more or less frequently at certain intervals, making a sample that is selected based on that interval biased. For example, houses tend to be have even numbers on one side of a road and odd numbers on the other. If one side of the road is more expensive than the other and you select every 4th house, say, then you will only select even numbers from one side of the road – and this sample may not be representative of the road as a whole.
The stratified sampling method involves dividing the population into relevant groups for study, working out what percentage of the population is in each group, and then randomly sampling the population according to these percentages.
For example, let’s say 20% of the population is aged 0-18, and 50% of the population is aged 19-65, and 30% of the population is aged >65. A stratified sample of 100 participants would randomly select 20x 0-18 year olds, 50x 19-65 year olds, and 30x people over 65.
Strengths of stratified sampling:
- Representative: The stratification is deliberately designed to yield a sample that is representative of the population as a whole. You won’t get people with certain characteristics being over- or under-represented within the sample.
- Unbiased: Because participants within each group are selected randomly , researcher bias is unable to skew who is included in the study.
Weaknesses of stratified sampling:
- Requires knowledge of population breakdown: Researchers need to accurately gauge what percentage of the population falls into what group. If the researchers get these percentages wrong, the sample will be biased and some groups will be over- or under-represented.
Opportunity and volunteer sampling
The opportunity and volunteer sampling methods:
- E.g. Approaching people in the street and asking them to complete a questionnaire.
- E.g. Placing an advert online inviting people to complete a questionnaire.
Strengths of opportunity and volunteer sampling:
- Quick and easy: Approaching participants ( opportunity sampling) or inviting participants ( volunteer sampling) is quick and straightforward. You don’t have to spend time compiling details of the target population (like in e.g. random or systematic sampling ), nor do you have to spend time dividing participants according to relevant categories (like in stratified sampling ).
- May be the only option: With natural experiments – where a variable changes as a result of something outside the researchers’ control – opportunity sampling may be the only viable sampling method. For example, researchers couldn’t randomly sample 10 cities from all the cities in the world and change the drug laws in those cities to see the effects – they don’t have that kind of power. However, if a city is naturally changing its drug laws anyway, researchers could use opportunity sampling to study that city for research.
Weaknesses of opportunity and volunteer sampling:
- Unrepresentative: The pool of participants will likely be biased towards certain kinds of people. For example, if you conduct opportunity sampling on a weekday at 10am, this sample will likely exclude people who are at work. Similarly, volunteer sampling is likely to exclude people who are too busy to take part in the study.
Independent vs. dependent variables
If the study involves an experiment , the researchers will alter an independent variable to measure its effects on a dependent variable :
- E.g. In Bickman’s study of the effects of uniforms on obedience , the independent variable was the uniform of the person giving orders.
- E.g. In Bickman’s study of the effects of uniforms on obedience , the dependent variable was how many people followed the orders.
Extraneous and confounding variables
In addition to the variables actually being investigated ( independent and dependent ), there may be additional (unwanted) variables in the experiment. These additional variables are called extraneous variables .
Researchers must control for extraneous variables to prevent them from skewing the results and leading to false conclusions. When extraneous variables are not properly controlled for they are known as confounding variables .
For example, if you’re studying the effect of caffeine on reaction times, it might make sense to conduct all experiments at the same time of day to prevent this extraneous variable from confounding the results. Reaction times change throughout the day and so if you test one group of subjects at 3pm and another group right before they go to bed, you may falsely conclude that the second group had slower reaction times.
Operationalisation of variables
Operationalisation of variables is where researchers clearly and measurably define the variables in their study.
For example, an experiment on the effects of sleep ( independent variable ) on anxiety ( dependent variable ) would need to clearly operationalise each variable. Sleep could be defined by number of hours spent in bed, but anxiety is a bit more abstract and so researchers would need to operationalise (i.e. define) anxiety such that it can be quantified in a measurable and objective way.
If variables are not properly operationalised, the experiment cannot be properly replicated , experimenters’ subjective interpretations may skew results, and the findings may not be valid .
A pilot study is basically a practice run of the proposed research project. Researchers will use a small number of participants and run through the procedure with them. The purpose of this is to identify any problems or areas for improvement in the study design before conducting the research in full. A pilot study may also give an early indication of whether the results will be statistically significant .
For example, if a task is too easy for participants, or it’s too obvious what the real purpose of an experiment is, or questions in a questionnaire are ambiguous, then the results may not be valid . Conducting a pilot study first may save time and money as it enables researchers to identify and address such issues before conducting the full study on thousands of participants.
Features of a psychological report.
The report of a psychological study (research paper) typically contains the following sections in the following order:
- Title: A short and clear description of the research.
- Abstract: A summary of the research. This typically includes the aim and hypothesis , methods, results, and conclusion.
- Introduction: Funnel technique: Broad overview of the context (e.g. current theories, previous studies, etc.) before focusing in on this particular study, why it was conducted, its aims and hypothesis .
- Study design: This will explain what method was used (e.g. experiment or observation ), how the study was designed (e.g. independent groups or repeated measures ), and identification and operationalisation of variables .
- Participants: A description of the target population to be studied, the sampling method , how many participants were included.
- Equipment used: A description of any special equipment used in the study and how it was used.
- Standardised procedure: A detailed step-by-step description of how the study was conducted. This allows for the study to be replicated by other researchers.
- Controls : An explanation of how extraneous variables were controlled for so as to generate accurate results.
- Results: A presentation of the key findings from the data collected. This is typically written summaries of the raw data ( descriptive statistics ), which may also be presented in tables , charts, graphs , etc. The raw data itself is typically included in appendices.
- Discussion: An explanation of what the results mean and how they relate to the experimental hypothesis (supporting or contradicting it), any issues with how results were generated, how the results fit with other research, and suggestions for future research.
- Conclusion: A short summary of the key findings from the study.
- Book: Milgram, S., 2010. Obedience to Authority . 1st ed. Pinter & Martin.
- Journal article: Bandura, A., Ross, D. and Ross, S., 1961. Transmission of Aggression through Imitation of Aggressive Models . The Journal of Abnormal and Social Psychology, 63(3), pp.575-582.
- Appendices: This is where you put any supporting materials that are too detailed or long to include in the main report. For example, the raw data collected from a study, or the complete list of questions in a questionnaire .
Peer review is a way of assessing the scientific credibility of a research paper before it is published in a scientific journal. The idea with peer review is to prevent false ideas and bad research from being accepted as fact.
It typically works as follows: The researchers submit their paper to the journal they want it to be published in, and the editor of that journal sends the paper to expert reviewers (i.e. psychologists who are experts in that area – the researchers’ ‘peers’) who evaluate the paper’s scientific validity. The reviewers may accept the paper as it is, accept it with a few changes, reject it and suggest revisions and resubmission at a later date, or reject it completely.
There are several different methods of peer review:
- Open review: The researchers and the reviewers are known to each other.
- Single-blind: The researchers do not know the names of the reviewers. This prevents the researchers from being able to influence the reviewer. This is the most common form of peer review.
- Double-blind: The researchers do not know the names of the reviewers, and the reviewers do not know the names of the researchers. This additionally prevents the reviewer’s bias towards the researcher from influencing their decision whether to accept their paper or not.
Criticisms of peer review:
- Bias: There are several ways peer review can be subject to bias. For example, academic research (particularly in niche areas) takes place among a fairly small circle of people who know each other and so these relationships may affect publication decisions. Further, many academics are funded by organisations and companies that may prefer certain ideas to be accepted as scientifically legitimate, and so this funding may produce conflicts of interest.
- Doesn’t always prevent fraudulent/bad research from being published: There are many examples of fraudulent research passing peer review and being published (see this Wikipedia page for examples).
- Prevents progress of new ideas: Reviewers of papers are typically older and established academics who have made their careers within the current scientific paradigm. As such, they may reject new or controversial ideas simply because they go against the current paradigm rather than because they are unscientific.
- Plagiarism: In single-blind and double-blind peer reviews, the reviewer may use their anonymity to reject or delay a paper’s publication and steal the good ideas for themself.
- Slow: Peer review can mean it takes months or even years between the researcher submitting a paper and its publication.
In psychological studies, ethical issues are questions of what is morally right and wrong. An ethically-conducted study will protect the health and safety of the participants involved and uphold their dignity, privacy, and rights.
To provide guidance on this, the British Psychological Association has published a code of human research ethics :
- Participants are told the project’s aims , the data being collected, and any risks associated with participation.
- Participants have the right to withdraw or modify their consent at any time.
- Researchers can use incentives (e.g. money) to encourage participation, but these incentives can’t be so big that they would compromise a participant’s freedom of choice.
- Researchers must consider the participant’s ability to consent (e.g. age, mental ability, etc.)
- Prior (general) consent: Informing participants that they will be deceived without telling them the nature of the deception. However, this may affect their behaviour as they try to guess the real nature of the study.
- Retrospective consent: Informing participants that they were deceived after the study is completed and asking for their consent. The problem with this is that if they don’t consent then it’s too late.
- Presumptive consent: Asking people who aren’t participating in the study if they would be willing to participate in the study. If these people would be willing to give consent, then it may be reasonable to assume that those taking part in the study would also give consent.
- Confidentiality: Personal data obtained about participants should not be disclosed (unless the participant agreed to this in advance). Any data that is published will not be publicly identifiable as the participant’s.
- Debriefing: Once data gathering is complete, researchers must explain all relevant details of the study to participants – especially if deception was involved. If a study might have harmed the individual (e.g. its purpose was to induce a negative mood), it is ethical for the debrief to address this harm (e.g. by inducing a happy mood) so that the participant does not leave the study in a worse state than when they entered.
Study results are reliable if the same results can be consistently replicated under the same circumstances. If results are inconsistent then the study is unreliable.
Note: Just because a study is reliable, its results are not automatically valid . A broken tape measure may reliably (i.e. consistently) record a person’s height as 200m, but that doesn’t mean this measurement is accurate.
There are several ways researchers can assess a study’s reliability:
Test-retest is when you give the same test to the same person on two different occasions. If the results are the same or similar both times, this suggests they are reliable.
For example, if your study used scales to measure participants’ weight, you would expect the scales to record the same (or a very similar) weight for the same person in the morning as in the evening. If the scales said the person weighed 100kg more later that same day, the scales (and therefore the results of the study) would be unreliable.
Inter-observer reliability is a way to test the reliability of observational studies .
For example, if your study required observers to assess participants’ anxiety levels, you would expect different observers to grade the same behaviour in the same way. If one observer rated a participant’s behaviour a 3 for anxiety, and another observer rated the exact same behaviour an 8, the results would be unreliable.
Inter-observer reliability can be assessed mathematically by looking for correlation between observers’ scores. Inter-observer reliability can be improved by setting clearly defined behavioural categories .
Study results are valid if they accurately measure what they are supposed to. There are several ways researchers can assess a study’s validity:
- E.g. let’s say you come up with a new test to measure participants’ intelligence levels. If participants scoring highly on your test also scored highly on a standardised IQ test and vice versa, that would suggest your test has concurrent validity because participants’ scores are correlated with a known accurate test.
- E.g. a study that measures participants’ intelligence levels by asking them when their birthday is would not have face validity. Getting participants to complete a standardised IQ test would have greater face validity.
- E.g. let’s say your study was supposed to measure aggression levels in response to someone annoying. If the study was conducted in a lab and the participant knew they were taking part in a study, the results probably wouldn’t have much ecological validity because of the unrealistic environment.
- E.g. a study conducted in 1920 that measured participants’ attitudes towards social issues may have low temporal validity because societal attitudes have changed since then.
Control of extraneous variables
There are several different types of extraneous variables that can reduce the validity of a study. A well-conducted psychological study will control for these extraneous variables so that they do not skew the results.
Demand characteristics are extraneous variables where the demands of a study make participants behave in ways they wouldn’t behave outside of the study. This reduces the study’s ecological validity .
For example, if a participant guesses the purpose of an experiment they are taking part in, they may try to please the researcher by behaving in the ‘right’ way rather than the way they would naturally. Alternatively, the participant might rebel against the study and deliberately try to sabotage it (e.g. by deliberately giving wrong answers).
In some study designs, researchers can control for demand characteristics using single- blind methods. For example, a drug trial could give half the participants the actual drug and the other half a placebo but not tell participants which treatment they received. This way, both groups will have equal demand characteristics and so any differences between them should be down to the drug itself.
Investigator effects are another extraneous variable where the characteristics of the researcher affect the participant’s behaviour. Again, this reduces the study’s ecological validity .
Many characteristics – e.g. the researcher’s age, gender, accent, what they’re wearing – could potentially influence the participant’s responses. For example, in an interview about sex, females may feel less comfortable answering questions asked by a male interviewer and thus give different answers than if they were asked by a female. The researcher’s biases may also come across in their body language or tone of voice, affecting the participant’s responses.
In some study designs, researchers can control for demand characteristics using double- blind methods. In a double-blind drug trial, for example, neither the participants nor the researchers know which participants get the actual drug and which get the placebo. This way, the researcher is unable to give any clues (consciously or unconsciously) to participants that would affect their behaviour.
Participant variables are differences between participants. These can be controlled for by random allocation .
For example, in an experiment on the effect of caffeine on reaction times, participants would be randomly allocated into either the caffeine group or the non-caffeine group. A non -random allocation method, such as allocating caffeine to men and placebo to women, could mean variables in the allocation method (in this case gender) skew the results. When participants are randomly allocated, any extraneous variables (e.g. gender in this case) will be allocated evenly between each group and so not skew the results of one group more than the other.
Situational variables are the environment the experiment is conducted in. These can be controlled for by standardisation .
For example, all the tests of caffeine on reaction times would be conducted in the same room, at the same time of day, using the same equipment, and so on to prevent these features of the environment from skewing the results.
In a repeated measures experiment, researchers may use counterbalancing to control for the order in which tasks are completed.
For example, half of participants would do task A followed by task B, and the other half would do task B followed by task A.
Implications of psychological research for the economy
Psychological research often has practical applications in real life. The following are some examples of how psychological findings may affect the economy:
- Attachment : Bowlby’s maternal deprivation hypothesis suggests that periods of extended separation between mother and child before age 3 are harmful to the child’s psychological development. And if mothers stay at home during this period, they can’t go out to work. However, some more recent research challenges Bowlby’s conclusions, suggesting that substitutes (e.g. the father , or nursery care) can care for the child, allowing the mother to go back to work sooner and remain economically active.
- Depression : Psychological research has found effective therapies for treating depression, such as cognitive behavioural therapy and SSRIs. The benefits of such therapies – if they are effective – are likely to outweigh the costs because they enable the person to return to work and pay taxes, as well avoiding long-term costs to the health service.
- OCD : Similar to above: Drug therapies (e.g. SSRIs) and behavioural approaches (e.g. CBT) may alleviate OCD symptoms, enabling OCD sufferers to return to work, pay taxes, and avoid reliance on healthcare services.
- Memory : Public money is required to fund police investigations. Psychological tools, such as the cognitive interview , have improved the accuracy of eyewitness testimonies, which equates to more efficient use of police time and resources.
Features of science
Theory construction and hypothesis testing.
Science works by making empirical observations of the world, formulating hypotheses /theories that explain these observations, and repeatedly testing these hypotheses /theories via experimentation.
- E.g. A tape measure provides a more objective measurement of something compared to a researcher’s guess. Similarly, a set of scales is a more objective way of determining which of two objects is heavier than a researcher lifting each up and giving their opinion.
- E.g. Burger (2009) replicated Milgram’s experiments with similar results.
- E.g. The hypothesis that “water boils at 100°c” could be falsified by an experiment where you heated water to 999°c and it didn’t boil. In contrast, “everything doubles in size every 10 seconds” could not be falsified by any experiment because whatever equipment you used to measure everything would also double in size.
- Freud’s psychodynamic theories are often criticised for being unfalsifiable: There’s not really any observations that could disprove them because every possible behaviour (e.g. crying or not crying) could be explained as the result of some unconscious thought process.
Philosopher Thomas Kuhn argues that science is not as unbiased and objective as it seems. Instead, the majority of scientists just accept the existing scientific theories (i.e. the existing paradigm) as true and then find data that supports these theories while ignoring/rejecting data that refutes them.
Rarely, though, minority voices are able to successfully challenge the existing paradigm and replace it with a new one. When this happens it is a paradigm shift . An example of a paradigm shift in science is that from Newtonian gravity to Einstein’s theory of general relativity.
Data handling and analysis
Types of data, quantitative vs. qualitative.
Data from studies can be quantitative or qualitative :
- Quantitative: Numerical
- Qualitative: Non-numerical
For example, some quantitative data in the Milgram experiment would be how many subjects delivered a lethal shock. In contrast, some qualitative data would be asking the subjects afterwards how they felt about delivering the lethal shock.
Strengths of quantitative data / weaknesses of qualitative data:
- Can be compared mathematically and scientifically: Quantitative data enables researchers to mathematically and objectively analyse data. For example, mood ratings of 7 and 6 can be compared objectively, whereas qualitative assessments such as ‘sad’ and ‘unhappy’ are hard to compare scientifically.
Weaknesses of quantitative data / strengths of qualitative data:
- Less detailed: In reducing data to numbers and narrow definitions, quantitative data may miss important details and context.
Although the detail of qualitative data may be valuable, this level of detail can also make it hard to objectively or mathematically analyse. Content analysis is a way of analysing qualitative data. The process is as follows:
- E.g. A bunch of unstructured interviews on the topic of childhood
- E.g. Discussion of traumatic events, happy memories, births, and deaths
- E.g. Researchers listen to the unstructured interviews and count how often traumatic events are mentioned
- Statistical analysis is carried out on this data
Primary vs. secondary
Researchers can produce primary data or use secondary data to achieve the research aims of their study:
- Primary data: Original data collected for the study
- Secondary data: Data from another study previously conducted
A meta-analysis is a study of studies. It involves taking several smaller studies within a certain research area and using statistics to identify similarities and trends within those studies to create a larger study.
We have looked at some examples of meta-analyses elsewhere in the course such as Van Ijzendoorn’s meta-analysis of several strange situation studies and Grootheest et al’s meta-analysis of twin studies on OCD .
A good meta-analysis is often more reliable than a regular study because it is based on a larger data set, and any issues with one single study will be balanced out by the other studies.
Measures of central tendency: mean, median, mode.
Mean , median , and mode are measures of central tendency . In other words, they are ways of reducing large data sets into averages .
The mean is calculated by adding all the numbers in a set together and dividing the total by the number of numbers.
- Example set: 22, 78, 3, 33, 90
- The mean is 45.2
- Uses all data in the set.
- Accurate: Provides a precise number based on all the data in a set.
- E.g.: 1, 3, 2, 5, 9, 4, 913 <- the mean is 133.9, but the 913 could be a measurement error or something and thus the mean is not representative of the data set
The median is calculated by arranging all the numbers in a set from smallest to biggest and then finding the number in the middle. Note: If the total number of numbers is odd, you just pick the middle one. But if the total number of numbers is even, you take the mid-point between the two numbers in the middle.
- Example set: 20, 66, 85, 45, 18, 13, 90, 28, 9
- 9, 13, 18, 20, 28 , 45, 66, 85, 90
- The median is 28
- Won’t be skewed by freak scores (unlike the mean).
- E.g.: 1, 1, 3 , 9865, 67914 <- 3 is not really representative of the larger numbers in the set.
- Less accurate/sensitive than the mean.
The mode is calculated by counting which is the most commonly occurring number in a set.
- Example set: 7, 7, 20 , 16, 1, 20 , 25, 16, 20 , 9
- There are two 7’s, but three 20’s
- The mode is 20
- Makes more sense for presenting the central tendency in data sets with whole numbers. For example, the average number of limbs for a human being will have a mean of something like 3.99, but a mode of 4.
- Does not use all the data in a set.
- A data set may have more than one mode.
Measures of dispersion: Range and standard deviation
Range and standard deviation are measures of dispersion . In other words, they quantify how much scores in a data set vary .
The range is calculated by subtracting the smallest number in the data set from the largest number.
- Example set: 59, 8, 7, 84, 9, 49, 14, 75, 88, 11
- The largest number is 88
- The smallest number is 7
- The range is 81
- Easy and quick to calculate: You just subtract one number from another
- Accounts for freak scores (highest and lowest)
- Can be skewed by freak scores: The difference between the biggest and smallest numbers can be skewed by a single anomalous result or error, which may give an exaggerated impression of the data distribution compared to standard deviation .
- 4, 4, 5, 5, 5, 6, 6, 7, 19
- 4, 16, 16, 17, 17, 17, 18, 19 19
The standard deviation (σ) is a measure of how much numbers in a data set deviate from the mean (average). It is calculated as follows:
- Example data set: 59, 79, 43, 42, 81, 100, 38, 54, 92, 62
- Calculate the mean (65)
- -6, 14, -22, -23, 16, 35, -27, -11, 27, -3
- 36, 196, 484, 529, 256, 1225, 729, 121, 729, 9
- The standard deviation is 20.77
Note: This method of standard deviation is based on the entire population. There is a slightly different method for calculating based on a sample where instead of dividing by the number of numbers in the second to last step, you divide by the number of numbers-1 (in this case 4314/9=479.333). This gives a standard deviation of 21.89.
- Is less skewed by freak scores: Standard deviation measures the average difference from the mean and so is less likely to be skewed by a single freak score (compared to the range ).
- Takes longer to calculate than the range .
A percentage (%) describes how much out of 100 something occurs. It is calculated as follows:
- Example: 63 out of a total of 82 participants passed the test
- 76.8% of participants passed the test
To calculate a percentage change, work out the difference between the original number and the after number, divide that difference by the original number, then multiply the result by 100:
- Example: He got 80 marks on the test but after studying he got 88 marks on the test
- His test score increased by 10% after studying
Normal and skewed distributions
A data set that has a normal distribution will have the majority of scores on or near the mean average. A normal distribution is also symmetrical: There are an equal number of scores above the mean as below it. In a normal distribution, scores become rarer and rarer the more they deviate from the mean.
An example of a normal distribution is IQ scores. As you can see from the histogram below, there are as many IQ scores below the mean as there are above the mean :
When plotted on a histogram , data that follows a normal distribution will form a bell-shaped curve like the one above.
Skewed distributions are caused by outliers: Freak scores that throw off the mean . Skewed distributions can be positive or negative :
- Mean > Median > Mode
- Mean < Median < Mode
Correlation refers to how closely related two (or more) things are related. For example, hot weather and ice cream sales may be positively correlated: When hot weather goes up, so do ice cream sales.
Correlations are measured mathematically using correlation coefficients (r). A correlation coefficient will be anywhere between +1 and -1:
- r=+1 means two things are perfectly positively correlated: When one goes up , so does the other by the same amount
- r=-1 means two things perfectly negatively correlated: When one goes up , the other goes down by the same amount
- r=0 means two things are not correlated at all: A change in one is totally independent of a change in the other
The following scattergrams illustrate various correlation coefficients:
Presentation of data
For example, the behavioural categories table above presents the raw data of each student in this made-up study. But in the results section, researchers might include another table that compares average anxiety rating scores for males and females.
For example, each dot on the correlation scattergram opposite could represent a student. The x-axis could represent the number of hours the student studied, and the y-axis could represent the student’s test score.
For example, the results of Loftus and Palmer’s study into the effects of different leading questions on memory could be presented using the bar chart above. It’s not like there are categories in-between ‘contacted’ and ‘hit’, so the bars have gaps between them (unlike a histogram ).
A histogram is a bit like a bar chart but is used to illustrate continuous or interval data (rather than discrete data or whole numbers).
Because the data on the x axis is continuous, there are no gaps between the bars.
For example, the line graph above illustrates 3 different people’s progression in a strength training program over time.
For example, the frequency with which different attachment styles occurred in Ainsworth’s strange situation could be represented by the pie chart opposite.
Probability and significance.
The point of inferential testing is to see whether a study’s results are statistically significant , i.e. whether any observed effects are as a result of whatever is being studied rather than just random chance.
For example, let’s say you are studying whether flipping a coin outdoors increases the likelihood of getting heads. You flip the coin 100 times and get 52 heads and 48 tails. Assuming a baseline expectation of 50:50, you might take these results to mean that flipping the coin outdoors does increase the likelihood of getting heads. However, from 100 coin flips, a ratio of 52:48 between heads and tails is not very significant and could have occurred due to luck. So, the probability that this difference in heads and tails is because you flipped the coin outside (rather than just luck) is low.
Probability is denoted by the symbol p . The lower the p value, the more statistically significant your results are. You can never get a p value of 0, though, so researchers will set a threshold at which point the results are considered statistically significant enough to reject the null hypothesis . In psychology, this threshold is usually <0.05, which means there is a less than 5% chance the observed effect is due to luck and a >95% chance it is a real effect.
Type 1 and type 2 errors
When interpreting statistical significance, there are two types of errors:
- E.g. The p threshold is <0.05, but the researchers’ results are among the 5% of fluke outcomes that look significant but are just due to luck
- E.g. The p threshold is set too low (e.g. <0.01), and the data falls short (e.g. p=<0.02)
Increasing the sample size reduces the likelihood of type 1 and type 2 errors.
Key maths skills made easy!
Types of statistical test
Note: The inferential tests below are needed for A level only, if you are taking the AS exam , you only need to know the sign test .
There are several different types of inferential test in addition to the sign test . Which inferential test is best for a study will depend on the following three criteria:
- Whether you are looking for a difference or a correlation
- E.g. at the competition there were 8 runners, 12 swimmers, and 6 long jumpers (it’s not like there are in-between measurements between ‘swimmer’ and ‘runner’)
- E.g. First, second, and third place in a race
- E.g. Ranking your mood on a scale of 1-10
- E.g. Weights in kg
- E.g. Heights in cm
- E.g. Times in seconds
- Whether the experimental design is related (i.e. repeated measures ) or unrelated (i.e. independent groups )
The following table shows which inferential test is appropriate according to these criteria:
Note: You won’t have to work out all these tests from scratch, but you may need to:
- Say which of the statistical tests is appropriate (i.e. based on whether it’s a difference or correlation; whether the data is nominal, ordinal, or interval; and whether the data is related or unrelated).
- Identify the critical value from a critical values table and use this to say whether a result (which will be given to you in the exam) is statistically significant.
The sign test
The sign test is a way to calculate the statistical significance of differences between related pairs (e.g. before and after in a repeated measures experiment ) of nominal data. If the observed value (s) is equal or less than the critical value (cv), the results are statistically significant.
Example: Let’s say we ran an experiment on 10 participants to see whether they prefer movie A or movie B .
- n = 9 (because even though there are 10 participants, one participant had no change so we exclude them from our calculation)
- In this case our experimental hypothesis is two-tailed: Participants may prefer movie A or movie B
- (The null hypothesis is that participants like both movies equally)
- In this case, let’s say it’s 0.1
- The experimental hypothesis is two-tailed
- So, in this example, our critical value (cv) is 1
- In this example, there are 2 As, so our observed value (s) is 2
- In this example, the observed value (2) is greater than the critical value (1) and so the results are not statistically significant. This means we must accept the null hypothesis and reject the experimental hypothesis .