It will be important to know the differences in types of data. Because different statistical methods will only work with specific types of data. There are many ways to gather data, and some are better than others. After looking at data types, we will talk about different types of experiments, some sampling methods, and a brief introduction to some popular experimental designs in statistics.
-
Types of data
1) Independent (= Explanatory) vs. Dependent (= Response)
Example: What combination of diet and exercise will lead to losing the most weight? Three different diets and two different exercise routines were used, with 10 subjects trying each diet exercise combination. Whether or not a subject lost weight was recorded.
- Independent: exercise and diet. There are two explanatory variables here, as both diet and exercise can be used to explain the response.
- Dependent: whether or not the subject lost weight. Again, it’s what’s being predicted.
2) Qualitative (= Categorical) vs. Quantitative (= Numerically)
– Qualitative: sex, academic year…
– Quantitative: height, weight, number..
3) Continuous (= impossible to count) vs. Discrete (= countable; number)
– Continuous: can be expressed in decimal or fraction; age, height, weight
– Discrete: a little more difficult to work with because they don’t come with continuous variables. It is not suitable for the normality tests.
2. Types of data collection
1) Experimental: most the biological or medical experiments
– Researchers determine what their independent variables are and the way to measure the dependent variables.
2) Observational: surveys
– Researchers determine independent variables, but they just have to observe the dependent variable.
– There can be some interaction between the independent variables and dependent variables, or unmeasured variables.
3. Design a simple experiment
1) Population vs. Sample
Population and Parameters
– population: all subjects of interest
– parameters: values of interest of the population; average, maximum… / represented by Greek letters
* population parameters are rarely known and estimated from sample statistics.
Samples and Statistics
– samples: subsets of the population of interest
– statistics: values acquired from samples / represented by alphabet with a bar or caret on top of them.
2) Bias
- Sampling bias: occurs when the sample was not random, or when not every subject of the population had a chance of being selected.
- Response bias: occurs when the subject is influenced to respond a certain way. However, even in experimental studies, response bias can exist.
- Nonresponse bias: occurs when subjects don’t respond. Nonresponse bias will only be a concern if a large proportion of those sampled doesn’t respond.
3) Sampling methods
- Simple random sample: When each subject of a population has an equally likely chance of being selected. However, it doesn’t mean that we will avoid bias.
- Stratified: We subset the population into groups, called strata. These strata are mutually exclusive and are determined based on some quality that is easily defined. To obtain the sample, we’ll randomly select a number of subjects from each stratum.
- Cluster: Clusters are usually based on convenience. To gather data, we will randomly select clusters and measure all subjects in the clusters selected.
- Convenience: Observations are collected by convenience.
4) Experimental design
A good experimental design will try to capture all known sources of variation. This will help us in understanding how well independent variables predict dependent variables.
- Treatment: independent variables
- Experimental unit: the smallest possible unit in which the treatment can be applied.
- Replication: when multiple experimental units are given the same treatment.
Let’s talk about commonly used experimental designs.
- Completely randomized design (CRD): all experimental units are randomly assigned the treatments, no matter the factors not involved in the experiment.
-> Example: Consider an experiment where we’re trying to determine how much water is needed to grow tomatoes. We have 16 pods with one tomato plant in each. We decide to try four different amounts of water (2 oz, 4 oz, 6 oz, and 8 oz). At the end of the study, the yield of tomatoes will be measured. Given no other information, what type of design is this?
- Completely randomized block design (CRBD): CRD + block (or group) to the design. The blocks in a CRBD are usually organic in nature, meaning that the researcher cannot apply a block to a subject, rather the subject comes from that block/group.
-> Example: Consider an experiment where 10 mice were exposed to radiation from cell phones and 10 mice were not. Also, assume that the mice were of four different breeds. -> Breed is the blocking factor!
- Factorial: A factorial design has more than one independent variable. In factorial designs, we are not only interested in the two independent variables, but also in how the two react together called interaction).
-> Example: Determination of the best combination of diet (Atkins, Paleo, McDonald’s only) and exercise (Yoga, high cardio) provides the most weight loss.
Thank you for your very informative post!
Thanks for your kind comment!
There’s definately a great deal to learn about this issue.
I really like all the points you have made.