Planning a successful clinical study begins with some central questions:

- What question(s) do we want to answer?
- What statistical methods will help us answer these questions?
- What data do we need to collect?
- How will we collect it?
- From whom and from how many?
- What other factors might influence our results?

The answers to all these questions are influenced by an understanding of statistics. When considering your next study, be sure to work with an established, trustworthy, and professional clinical testing team that actively involves a qualified clinical statistician in the planning process.

Most of us will never become experts in statistics; still, careful attention to the principles of statistics helps support the success of a clinical trial. A key component of early planning is to develop a clinical trial that offers the most accurate information. You’ll be rewarded with a more efficient study that provides more compelling data than if statistical analysis had been merely a final step in the study process.

A qualified professional clinical statistician can guide the study planning team in deciding how many subjects to recruit for the study as well as help recommend target demographics for the subject population. Making critical decisions about the study population is a key initial step in successful clinical trial planning, that gets a qualified professional clinical statistician involved in the planning process early.

## Types of Data

Let’s begin with the most basic underpinning for any discussion of statistics and clinical study planning. When planning a clinical study, it’s important to know what kinds of data will be helpful as well as how it will be collected and used. There are many types of data, and some data fits into more than one data category. However, most data can be identified by type in a simple organization like this:

### Quantitative data

Data that can be measured or counted, then expressed numerically. It answers questions like how many, how much, how often, or how big, and it can be used in mathematical calculations. There are two basic types of quantitative data.

#### Continuous data

Continuous data is quantitative data that has an infinite number of possible values within a range, such as measurements like temperature, speed, or height, for a few examples.

#### Discrete data

Discrete data is quantitative data that can be counted individually, such as the number of cars in a garage or the number of pitches thrown in a baseball game.

### Qualitative data

Qualitative data is data that is expressed in words and sometimes in numbers that can’t be used in mathematical calculations. Qualitative data is also known as **categorical data** because it can be organized into categories instead of being measured numerically. There are two basic types of qualitative, or categorical, data.

#### Nominal data

Nominal data is qualitative data that describes characteristics by name, such as hair color, marital status, or ethnicity.

#### Ordinal data

Ordinal data is qualitative data that describes an element’s position in a given sequence, such as grade in school, satisfaction on a scale of 1 to 5 or ranking in a tournament.

Again, qualitative data is sometimes expressed in numbers, but these numbers aren’t useful in mathematical calculations. (For example, three second-graders don’t equal a sixth-grader!) However, qualitative data can be very useful in statistical analysis.

## Special Concerns for Qualitative Data

Qualitative data isn’t always subject to objective evaluation and often is derived from opinion. For the purposes of clinical trials, qualitative data may be collected in study intake documents or in subject surveys, diaries, and questionnaires.

One technique used for questionnaire development is the Likert Scale, which asks subjects to respond to statements by selecting answers from a scale including values like “Strongly Agree,” “Agree,” and so on. We cannot control subjects’ candor on these scales. There also is a risk of subjects falling under the influence of a logical fallacy known as the central tendency, which causes people to tend toward avoiding strong positive or negative expressions.

However, there are study planning and data analysis techniques to help control for these would-be weaknesses and strengthen the results of studies that must rely on qualitative data.

## Hypothesis Testing

One central concept in clinical statistics is hypothesis testing. In a seemingly counterintuitive twist, hypothesis testing begins with a “null hypothesis” (expressed mathematically as H0) or the premise that the product does not perform—that there is *no difference* between user experience with and without the product. Then a study is designed to test this hypothesis.

Continuing in the vein of counterintuition, the four possible outcomes in hypothesis testing are described like this:

### True Positive

The product actually makes a difference in the user experience (H0 is false), and the test agrees (rejects H0).

### True Negative

The product actually makes no difference in the user experience (H0 is true) and the test agrees (fails to reject H0).

### False Positive

The product actually makes no difference (H0 is true), but the test shows a difference (rejects H0). This is called a **Type I Error**.

### False Negative

The product actually makes a difference (H0 is false), but the test does not show a difference (fails to reject H0). This is known as a **Type II Error**.[AM1]

**To summarize:** Type I errors in clinical testing indicates a product effect that actually does not occur, while Type II errors fail to indicate a product effect that actually does occur. Either type of error can be more problematic, depending on the product being tested and the specific characterization of the null hypothesis.

## Study Populations

** **Clinical studies contain various “populations” (also called sets), which are essentially different ways of sorting data by the subjects’ relationship with the test product and the study procedure. We offer these terms so you’ll recognize them when you hear them mentioned in talks about your study.

## Significance, Power, Variation, and Sample Size

**Statistical significance (usually indicated by a metric called a ***p***-value) ** indicates the chance of a Type I error occurring. **Statistical power** **(usually indicated by a percentage)** indicates the test’s ability to avoid a Type II error. If statistical significance is desired in a study, the ideal scenario is to obtain a *p*-value as small as possible while also possessing a large statistical power. Variability in data is expressed as **standard deviation**. **Sample size** influences all of these aspects of clinical evaluation and is a critical foundation of quality study design. Sample size is sometimes influenced by other considerations, as well, such as the need to avoid bias.[AM2]

A sample size that is too small can result in insufficient data for reliable analysis; a sample size that is larger than necessary can result in wasted resources. A number of statistical tools help determine the ideal sample size for a given study.

## Bias

In discussing bias related to clinical study data collection, human factors need particular attention. **Selection bias** occurs when researchers assign a subject to a group based on subjective (and likely subconscious) observations about the subject. **Expectancy bias** occurs when researchers are more likely to observe the results they expect to see.

Bias can also be introduced by the study subject. **Social desirability bias** occurs when subjects provide responses that they think will be viewed favorably by researchers and peers. **Response bias** occurs when subjects respond with only information they want to share.

**Confounding bias** is one of several forms of study bias that does not arise from the human condition. Confounding bias occurs when study data is distorted by an extraneous, unintended variable.

These are just a few of the many forms of bias that can improperly influence clinical study results. Qualified clinical statisticians can help the clinical study planning team to avoid bias from the outset.

The **Safety Analysis Population** includes all study subjects who received at least one exposure to the test product. The **Per Protocol Population** (PP) includes subjects who completed the study according to the instructions. The **Clinically Evaluable Population** includes those whose participation resulted in data that was able to be included in the study results.

The largest study set is the **Intent-to-Treat Population** (ITT), defined as all subjects involved in the study, regardless of randomized group assignment and level of study completion.

## Randomization

You may have noticed above that not all subjects receive the same test product. Randomization is the process that objectively assigns clinical trial subjects, or test sites, to one group or another within the same trial in order to help avoid bias in study data. In a very simple example, a study may require that some subjects be assigned to use Shampoo A and some to use Shampoo B.

**Simple randomization** can be achieved by methods as commonplace as flipping a coin or drawing an odd- or even-numbered card from a deck, but the very simplicity of such methods can result in unbalanced groups within a study.

There are several other, more robust randomization techniques. **Block randomization **keeps subgroup sizes balanced. **Stratified randomization** results in subgroups that are balanced in both size and attributes. **Covariate adaptive randomization** assigns individuals to a group for the purpose of balancing that group’s attributes.

Randomization software is available to quickly and effectively randomize subjects in a way that best supports study objectives. The professional statistician on your clinical study planning team can help if you want to know more about randomization.

## Blinding (or Masking)

Even more important in avoiding bias in clinical studies is the process of blinding, also known as masking. Blinding is particularly important when study outcomes include comparisons of qualitative data (see Part I of this series for more on data types) such as comfort or pain relief.

In a blinded study, subjects don’t know which group they have been assigned to. A subject may be randomized to use either Shampoo A or Shampoo B. In a **single-blinded study**, the subject will not know which shampoo he or she is using. In a **double-blinded study**, neither the subject nor the researcher knows which group the subject is assigned to. Not all studies need to be blinded, and some cannot be blinded at all.

## Reducing Bias Clarifies Results

Because both clinical study subjects and research professionals are human, bias prevention must be built into every clinical study. There are many forms of bias that can influence the objectivity of study results at any stage of the trial. Some forms of bias can be reduced in the initial study design. Others in data analysis.

Talk with the qualified clinical statistician on your professional clinical research planning team about what methods should be used to reduce or eliminate bias in your next study.** **

This series has discussed the various types of data that can be collected in clinical studies. Understanding how sample size can influence reliability and interpretation of study data; and, providing methods to eliminate or reduce bias in study results. Careful early planning for all of these aspects, with a qualified clinical statistician, before a study begins is essential.

It is also critical to understand, in the planning stages, how study data will be analyzed after the study is completed. Understanding statistical analysis concepts and terminology will help you communicate with the clinical study planning team, particularly the statistician, whose role is central to the planning process.

There are two forms of statistical analysis: descriptive statistics and inferential statistics.

## Descriptive Statistics

Descriptive statistics form the foundation of data analysis for clinical studies. Descriptive analysis allows researchers to get acquainted with the sample data by summarizing important information before further analysis is conducted. Descriptive statistics consider only actual data collected from or about the study subjects, like

- measures of
**frequency**, such as counts or percentages of occurrence; - expressions of
**central tendency**, such as mean (average of values), median (central value), and mode (most commonly-occurring value); - descriptions of
**data dispersion or variation**, such as range of values and standard deviation; and - indications of
**position**, such as rankings by percentile or along a scale of possible values.

## Inferential Statistics

Inferential statistics go further and use the sample data to make reasoned, evidence-based conclusions and predictions about larger populations. One important function of inferential statistics is to inform judgments about whether an observed difference between groups is a dependable study result or happened by chance.

### Regression Analysis

Regression analysis is an inferential tool used to determine how a change in one variable relates to a change in another variable, allowing researchers to estimate one value when another is known.

One caution with regression analysis is to avoid confusing correlation with causation. **Correlation** means that a change in one variable is simply linked with a change in another. For example, glove sales and snow shovel sales tend to increase together, so we can say the two are correlated. **Causation** means that a change in one variable one caused the change in another.

In the example, the increase in glove sales didn’t cause the increase in snow shovel sales, so no causal relationship exists between the two. Falling temperatures, however, might have a causal relationship with both.

### Parametric Data Modeling

Parametric data modeling allows inferences to be made about larger populations based on assumptions about the “shape” of the study data, such as a bell curve for example. **Non-parametric data modeling**, on the other hand, does not assume a shape for the data but instead allows the data to estimate the model shape.

Parametric modeling is often preferred for its straightforward assumptions and higher statistical power when correctly specified, but the more complex non-parametric modeling avoids the risk of selecting a model that does not accurately reflect population data.

## Start with Statistics

Statistical analysis is abstract, but the realities of statistical analysis have an enormous influence on value and reliability in clinical studies.

When considering your next study, be sure to work with an established, trustworthy, and professional clinical testing team that actively involves a qualified clinical statistician in the planning process. You’ll be rewarded with a more efficient study that provides more compelling data than if the statistical analysis had been merely a final step in the study process.