Introduction
It is commonly assumed that if a medical practice, in particular
homeopathy, claims to have real effects on real illnesses in the
real world, then it ought to be possible to demonstrate those effects
in a placebo-controlled trial. But in Part I of this series, we
found that controlled research itself has certain intrinsic limitations
in its capacity to record real differences between real objects
and real processes in the real world.6 This raises the question,
of course, whether, in a failed trial, the effects claimed by the
physician were not there, or whether, on the contrary, instead of
an axe it would have been prudent for the experimentalist to grind
a new lens for his spectacles – in other words, whether the
controlled trial was inadequately adapted to its task.
In the present paper, we will look at factors that differentiate
the task of research in complex alternative practices, from the
challenges encountered in the efficacy trial of conventional medicines.
In this connection, it must be conceded that the subject is perhaps
more a curiosity than an essential element serving to advance knowledge,
at least concerning homeopathic medicine, since, increasingly, statistical
research is producing dramatic experimental confirmation of the
efficacy of homeopathic treatments.2,6 Yet, it should be noted,
although the homeopathic side may relish such a dramatic shift in
fortunes in the experimental theatre, the point is that, finally,
there is clear evidence of an emerging agreement in outcomes between
empirical and quantitative practice. This state of affairs reflects
that, at long last, the true “gold standard” in scientific
research is being approached: independent corroboration of findings.
This unfolding, monumental reversal of fortune is truly a stunning,
epochal event in the making. It reflects, on the one hand, the vindication
of empirical practice, warts of bias and all. On the other hand,
it reminds us – for only the latest in a long, long series
of similar experiences – of the seemingly limitless capacity
of the human intellect to delude itself into thinking it had learned,
practically speaking, all there was to learn.
The fall from grace of the hubristically acclaimed “gold
standard” of medical research, the controlled trial, is obviously
and unavoidably implied in the very success of homeopathy. For decades
and even centuries, homeopathy has been the preferred whipping boy
of the cadres of the Cochrane Contraption; yet the developing scientific
and clinical successes of homeopathy reveal that, for those decades
and centuries, those who claimed to speak for “science”
in actuality spoke only for error and prejudice.
This is not the time to explore the ramifications of this long-lasting
exercise in scientific self-deception – an important topic
that will invite us back for a closer look at a later time. For
the moment, however, we will focus our attention on specific technical
problems to be addressed by those proposing to design a truly efficacious
protocol for testing homeopathy, and to place controlled research
into a more appropriately delimited frame of reference, so that
it might better fulfill its promise as a useful tool in the armamentarium
of scientific investigations.
Counting
and Controlled Research
The controlled trial has but one purpose: to eliminate bias as
an influence in forming our impression whether a medicine has a
determining role in the healing process. But as by now should be
clear, bias is not the only factor in the experimental situation
that may contribute to inaccurate results. Indeed, this basic fact
has always been clear, even going back to the origins of controlled
research, when modifications to the blinded trial were introduced
– including, for example, randomization and the crossover
design. But these modifications to method only accounted for a limited
number of confounders, those most commonly encountered in conventional
(allopathic) practice.
These modifications, however, did not touch the more complex array
of confounders encompassed by homeopathic medicine, and it is this
that accounts for the persistent failure of controlled research
to accurately gauge efficacy of homeopathic remedies. And this is
the reason that controlled research, so well adapted to the allopathic
model, was doomed to failure when applied to alternative practices,
for, as has been truly said, "The need to perform adjustments
for covariates...weakens the findings [of research]."3(p.95)
In this connection, homeopaths often claim that individualization
is the problem that undermines performance of the RCT in measuring
homeopathy, or length of treatment, or the fact that homeopathy
is an energy medicine. All of these concerns are legitimate, and
yet none are basic. Individualization can be factored into a trial
easily, by simply allowing the therapist to prescribe according
to his usual methods; length of treatment can obviously be accommodated
by permitting treatment to proceed at its own pace, for as long
as required – potentially a costly process, but nevertheless
easy to incorporate into protocol design; and the fact that homeopathy
is an energy medicine shouldn’t impact findings at all –
if we claim an effect, at the end of treatment it should be there,
whether it was achieved via energetic channels or with a hammer
and nails.
The underlying problem is that all of these concerns, and others,
can and do interfere with measurement in an RCT in one trial or
another, depending on protocol design. In short, homeopathic treatment
is so variegated in its nature, that the number and variety of potential
confounders have undermined most efforts to measure homeopathic
effect, because there has never been an effort to approach the problem
of confounding variables systematically. Therefore, in designing
protocols researchers have simply failed to account sufficiently
for the range of potential confounding influences, missing one variable
in this trial, and missing another variable in that trial.
By comparison, in a conventional efficacy trial the arms of the
trial may include balanced populations - or they might not. But
such uncertainty is not sufficient for the purposes of establishing
reliable practices in scientific research. For that end, it is necessary
to guarantee as far as possible that the two arms are balanced,
and in the conventional trial, as noted, from an early date this
was accomplished by endorsement of the systematically applied procedure
of randomization.
What this suggests, indeed what it demands, is that in order to
bring controlled research of homeopathic practice into line with
best scientific practice, it is necessary to devise formal, systematically
applied procedures to control for: length of treatment; individualization;
numbers of symptoms targeted; range of symptoms harvested and not
harvested; influence of dosing regimen (size of dose, frequency
of dose, potency) on participant response; and the like. Lacking
such systematic safeguards, at the least the published report of
such a trial must include a comprehensive evaluation of the effects
of such influences on the final tabulation of trial outcomes; as
is demonstrated in the example provided at the end of this paper,
such factors can obviously have profound effects on reported (perceived)
results.
Further, these considerations suggest the question, why hasn’t
this been accomplished already? For 200 years, why have the advocates
of controlled research failed to address the question of systematic
measures to guarantee accuracy of measurement in trials of alternative
practices, in particular homeopathy? The answer is simply that mainstream
(allopathically-grounded) medical practitioners and researchers
disbelieved and even disliked the claims and practices of homeopathic
medicine, and therefore were just not interested in finding research
methods that would confirm those claims - indeed, they were content
simply to accept the findings of controlled research, which confirmed
their own expectations that homeopathy does not work. We are happy
to concede that this was not, for the most part, a conscious conspiracy;
indeed, it is clear that this course of events is the natural outcome
of the normal workings of bias in human commerce - unnoticed and
unintended, as bias always is.
Counting
in Homeopathic Research
Although it is something of an oversimplification, it is nevertheless
true that, archetypically, allopathic medicines aim at a single
symptom or syndrome, and expect as a result of their application,
the elimination of that symptom (or syndrome). Further, typically,
the time allotted for the medicine to work is brief. Compare this
to the expectations we have for the action of homeopathic remedies,
which may eliminate a symptom or numerous symptoms, or produce them,
or aggravate existing symptoms, or introduce symptoms that had not
previously been experienced. Furthermore, the time frame for action
of the homeopathic remedy is very broad – certainly extending
to days, weeks, and even months, depending not only on the characteristic
action of the specific remedy, but as well on the susceptibility
of the patient.
Returning to the question of modifications to research methodology,
for the purpose of controlling for variables, it becomes apparent
that strategies such as randomization and crossovers account only
for the simplest situations. Such modifications of method will have
little or no impact on medicinal action such as we find in homeopathic
remedies. Thus, in terms of the caveat proffered by the editors
of the BMJ series, randomization systematically controls for a particular
range of confounding variables; be it noted, however, that no such
systematic modifications to controlled methodology have ever been
introduced in order to control for variables found in alternative
practices. Instead, the literati of the blinded trial seem to think
it quite satisfactory to leave it to each investigator to re-invent
the wheel, every time he sets out to design a new protocol.
Other than being enormously inefficient, this strategy obviously
leaves the door wide open for confounders to undermine research
through an “open admissions policy” regarding ignorance
of practice methods, bias, and even simple lack of imagination.
In short, implementation of the principle of blinded research took
place in the absence of systematization of technique. As illustrated
by the analysis that follows, of a small but representative sampling
of controlled trials in homeopathy, such weaknesses characterized
most controlled trials conducted until recently.2,7
There are many ways to count, but the best of them is to simply
count everything, once. Considering the range of mathematical instruments
strewn about in the statistician’s toolbox, it may seem perverse
to begin this discussion with consideration of such a rudimentary
task. But, if this be perverse, then its justification is to be
found in the surprising inability – or unwillingness –
of some of the most sophisticated of our culture’s intelligentsia
to count from one to two accurately, and in the ingeniously wrought
fabrics within which they stitch their numerical masquerades, putting
on a ballet – to switch metaphors – when a two-step
would do quite nicely.
In any case, the essential element in counting is to count everything,
once. This is not rocket science. But even in this, it does not
take long to discover the fancy footwork that finds other ways to
tabulate outcomes. For example, in Brien et. al.1
the experimental group (and the control as well, of course) was
given a list of symptoms and instructed to note when they experienced
any of the symptoms. Out of 10 symptoms in the list, 5 were “true
symptoms” (of Belladonna) and 5 were “false symptoms,”
that is, symptoms that did not appear in the Materia Medica as proving
symptoms for this drug.
The authors surely deserve kudos for the nonchalance with which
they slip in the following ingenious maneuver, designed to justify
subtracting verum responses from the total symptom count produced
by the experimental group:
The primary outcome measure was an individual proving reaction
to Belladonna 30C based on the … proving definition …
as at least two true symptoms on at least 2 consecutive days
with no more than one false symptom during the 21 days of the
study period. (p. 564)
By this nifty little device, the authors succeed in subtracting
two verum responses (the production of two different proving symptoms
of the given remedy), each of which occurred on at least two consecutive
days: in other words, a total of four symptomatic responses are
neatly shuffled off the docket because of the simultaneous presence
of a placebo response!
Aside from the “face absurdity” of this procedure,
a number of unsubstantiated assumptions and arbitrary guidelines
are revealed by even a brief examination of this definition, for
example:
1. Though the authors did not specify, their definition of
proving presumably assumed that the appearance of a false symptom
in a participant’s report demonstrated his susceptibility
to placebo, thus justifying the assumption that his apparent
proving response was also merely placebo, rather than an actual
response to the drug.
Out of a number of objections to this “theory”
that leap to mind, it may simply be observed that everyone in
the world is susceptible to the power of suggestion, but that
does not imply that they do not also react to and obtain benefit
from real medicines!
2. Of course, as is generally understood, clinically it is
well known that placebo response often supports and even enhances
patient response to medical interventions. In other words, from
this perspective also, the presence of a placebo response has
no bearing whatsoever on the question of the legitimacy of an
apparent verum response. Indeed, the only way to evaluate the
legitimacy of a verum response is clinically, either through
an exhaustive case analysis, or through labs – but that
would be an “inconvenient” procedure for a researcher
who considered only numbers to represent “evidence.”
3. In any case, we should also note that the definition of
“proving” adopted by these authors creates a situation
in which there could be any number of proving reactions that
never even approach being counted: for example, a participant
could easily produce 5 or more true symptoms, none of which
would be counted, if none of them occurred on consecutive days.
To be blunt about it, this is not science, nor research, nor medicine.
Frankly, it is unmitigated nonsense, and the fact it is published
under the authoritative-sounding banner of so-called “systematic
research” adds not a jot of credibility to its findings. One
might as well ignore verum responses for any participant who was
observed to have worn a blue shirt on Thursday, for all the relevance
these deliberations have to the question of medical efficacy.
Walach et al8 run into similar problems, though,
paradoxically, they demonstrate a good awareness of the difficulties
involved in their own research project, and are able to recognize
indications, embedded in the outcomes of their research, suggesting
positive effects of homeopathy: “…the effect is very
small and at the same time it seems to be there.”
This trial was designed to provide a randomized, blinded experiment
on an individual case. Randomization was achieved by giving each
participant both verum (Belladonna) and placebo in a random sequence
(over an 8 week period, 4 weeks of Belladonna and 4 weeks of placebo)
– symptoms presenting during weeks participants received Belladonna
were counted as verum responses; symptoms presenting during weeks
participants received placebo were, correspondingly, counted as
placebo responses.
The experienced homeopath will immediately object that response
to homeopathic remedies typically can be expected to persist, easily,
for days or weeks, and that, therefore, symptoms occurring during
“placebo weeks” represent, or at least could represent,
continuing action of the remedy. In fact, to their credit, the authors
of this paper are aware of this problem, and conclude, therefore,
that since “we cannot exclude the presence of carryover effects
… [we] recommend not to use this kind of randomization design…”
in homeopathic proving trials.
It is interesting to note, however, that if placebo is administered
in the first week of the trial (or during the initial two weeks),
then and only then can we be certain that symptoms produced in that
week(s) was the product of suggestibility. But as soon as verum
is administered to a participant, all subsequent symptomatic responses
– because of what the authors call a “carryover effect”
– must necessarily be counted as verum responses.
Indeed, although the trial is badly flawed as a trial of individual
response – as the authors themselves observe – it has
nevertheless produced a substantial body of evidence - that is,
data - that could be mined for a new perspective on the nature of
homeopathic action. Such a prospect is especially appealing in this
situation, because it is possible, by reconsidering the design in
light of the difficulties introduced by carryover effects, to construct
a series of new hypotheses as to what might be discovered on analysis
of the trial data within a frame more adequately adapted to homeopathic
theory and practice:
Hypothesis 1 - comparing rate of “true
placebo” responses (produced in a placebo week at the
start of the sequence) with rate of verum responses will reflect
that homeopathy outperformed placebo; the difficulty with this
hypothesis is that it assumes adequacy of the dosing routines
for purposes of producing proving reactions, an assumption that
is by no means a foregone conclusion.4
Hypothesis 2 – comparing symptom rate
in carryover weeks, with symptom rate in the initial placebo
week, will show that carryover weeks outperformed initial weeks:
if confirmed, this would represent a proof of both efficacy
of homeopathic remedies and of carryover effects.
The Statement of Efficacy:
Mapping the Controlled Trial
The essential feature of the allopathic medical system, that adapts
its medicines and methods so nicely to the controlled format, is
found in the fact that, archetypically, the allopathic medicine
targets a single symptom, or a well-defined group of symptoms, the
allopathic disease state. In this context, the problem set before
the experimentalist is simply this: does the medication make the
symptom go away? The answer – “yes” or “no”
– is perfectly matched to the verum/placebo duality of the
blinded trial: does the SSRI reduce depression; does Lipitor reduce
cholesterol; does aspirin reduce pain?
But as soon as the experimental equation grows more complex, the
reasonably perfect match falls apart. For example, if a medicine
is more effective with one class of patient, such as youth or women,
then a trial will return misleading results if the control and experimental
groups are not well-matched in this dimension. Early in its history,
considerations such as these led to the introduction of randomization
and other modifications, or safeguards, to the controlled trial,
to ensure that such variables balanced out between the two arms
of the trial, thus eliminating these confounders as a potential
influence over experimental measurements.
In earlier papers,4,5,6 numerous confounding variables
were identified, that interfered in trial results in one experiment
or another. In Part I of the present series6 we suggested that,
in view of some of these variables it was essential that the experimentalist
produce a formal “Statement of Efficacy,” specifying
how sensitive a measurement was required to produce reliable (trustworthy,
credible) evidence regarding any particular medical practice. The
example was examined, regarding estimation of the number of symptomatic
responses that might be lost to the experimental count, depending
on a number of factors that combined to effect apparent and real
response rates, including especially the size of dose and the sensitivity
of individual participants. In particular, we explored the effect
of what we called the “complex responder” in reducing
verum performance by masking real responses behind placebo responses.
It is beyond the scope of this paper to propose design modifications
that, comparably to randomization or crossover, would have the effect
of systematically protecting a controlled trial from being influenced
by the effects of these confounders. Indeed, it has to be confessed
that the present author has no suggestions to offer in that connection.
However, as a beginning, I would argue that it is essential to at
least make a conscientious effort to identify those factors that
call into question the credibility of statistical findings. In this
regard, my proposal for a “Statement of Efficacy”, to
be demanded of the research scientist, would have the effect of
“mapping the controlled trial,” that is, identifying
factors that interfere with efficacious application of the placebo
control. As an example, we will explore one case in which adjustments
to the raw count in a homeopathic proving trial may dramatically
affect our conclusions regarding homeopathic efficacy:
Size of dose. A small dose - such as ordinarily
used in homeopathic practice as well as in the homeopathic proving
trial - typically produces a mild effect if it produces any effect
at all. But Hahnemann indicated that individual sensitivity to remedies
varied widely, as 1:1000. Similarly, it is commonly known that not
all participants in clinical provings respond to the experimental
drug, or respond at different times, some after a single dose, some
after several doses, some only after the size of the dose has been
substantially increased.
Therefore, the experimentalist must establish what percentage of
participants in the verum arm of a trial are likely to respond to
the specific size of dose administered to them during the trial.
Then, the Statement of Efficacy must, in this regard, provide a
statistically derived formula to correct for the induced measuring
error produced by the small dose. For example, let us assume 28
out of 100 participants in the control (placebo) arm of a trial
produce symptomatic responses, and that 30% out of 100 participants
in the verum arm could be expected to respond to the dose administered
during the trial. Then, if 28% of the verum participants actually
do produce a symptomatic response, the statistical implication is
that, at increasingly large doses, 93% of participants (in the verum
arm) would have shown a symptomatic response.
Therefore, we have the following outcomes (where vs = number of
symptoms returned by the verum group and “ps” = number
of symptoms returned by the placebo group):
(fig. 1)
raw count
vs:ps
28:28
Such an outcome traditionally suggests that “verum performed
no better than placebo,” and therefore leads to the conclusion
that the medicine is ineffective against the symptoms under investigation.
However, in this scenario, the raw count of participant responses
does not reflect the fact that a portion of the respondents, according
to theory and clinical experience, would have responded at larger
doses. Therefore, correcting the count statistically, according
to the response rates suggested (hypothetically) at the beginning
of this example, we arrive at this (realizing that our example predicted
30 verum respondents, and that 28 actually did produce a symptomatic
response, which translates to 93 responders out of a group of 100
participants):
(fig. 2)
statistically corrected count
vs:ps
93:28
Et voila! According to statistically corrected calculations, verum
has significantly outperformed placebo! Furthermore, the corrected
figures offer evidence to confirm the fact that symptom production
by verum is lost to experimental count because of the phenomenon
of the “complex responder.”
Summary
In this paper I have examined in detail some of the ways in which
mistakes in trial design dramatically affect experimental outcomes.
I have also offered an example of one way in which such mistakes
can be identified, and their actual effects on outcomes estimated
statistically. The conscientious research scientist will take the
opportunity to re-examine his assumptions, and hopefully move the
theory and art of his methodology into the twenty-first century,
where it belongs, by the development of systematic measures to enhance
the reliability and credibility of outcomes from placebo-controlled
research. It has been roughly 200 years since the introduction of
randomization, used to enhance accuracy of count in placebo controlled
research in conventional medicine; now it is time - at long last
- to provide comparable assurances in our experimental approach
to alternative practices.
References
1. Brien, S., et. al. 2003. Ultramolecular homeopathy has no observable
clinical effects. A randomized, double-blind, placebo-controlled
proving trial of Belladonna 30C. Br J Clin Pharmacol, 56:562-568.
2. ENHR. November, 2006. An Overview of Positive Homeopathy Research
and Surveys.
3. Godlee, F., et. al., editors. 2004. Clinical Evidence: Mental
Health, Vol. 11. BMJ Publishing Group LTD, London.
4. Shere, N. 2005. Proving Homeopathy. Homeopathy 4 Everyone, April.
http://www.hpathy.com/research/shere_provinghomeopathy.asp
5. Shere, N. 2006a. Is the Randomized Double Blind Placebo Controlled
Trial an Objective Scientific Instrument? January. Homeopathy 4
Everyone. http://www.hpathy.com/research/shere-double-blind-placebo.asp
6. Shere, N. 2006d. Validating Controlled Research – Part
I: Measuring the Measuring Rod. Homeopathy 4 Everyone. November.
http://www.hpathy.com/research/shere-validating-controlled-r
esearch.asp
7. Shere, N. 2006f. Book Review: Homeopathy – The Scientific
Proofs of Efficacy. November. Homeopathy 4 Everyone. http://www.hpathy.com/bookreviews/guna-homeopathy-scientific
-proofs.asp
8. Walach, H., et. al. 2003. Effects of Belladonna 12CH and 30CH
in Healthy Volunteers. A Multiple, Single-Case Experiment in Randomization
Design. Monaco International Talks. http://www.giriweb.com/walach.htm
(viewed December 6, 2006).
|