Jump to content

Experimental Design


Strange

Recommended Posts

Someone has recently commented recently on “the rice experiment” but isn’t willing to discuss it.

I thought it would be a good case study for looking at the issues of experimental design and the sort of potential confounding factors that need to be taken into account. 

It seems that a Japanese amateur scientist claimed that being saying nice or nasty things to a mixture of rice and water will change the way it decays or ferments. Being nice to the rice results in fermentation and a nice smell, being nasty (or ignoring) the rice results in black mould and foul smells.

There are lots of examples online, carried out by people who are both sceptical and not, with variable results.

Here is a typical example from someone who was sceptical but surprised by the results: https://yayyayskitchen.com/2017/02/02/30-days-of-love-hate-and-indifference-rice-and-water-experiment-1/ (just the first search result that came up).

It is easy to be sceptical as the whole idea seems implausible and the whole experimental setup used is seriously flawed. For example, there is only one sample for each type of “treatment”, there is a single experimenter who prepares the specimens, talks to them and analyses the results (in other words, the experiment is not blinded) and finally the results are judged subjectively (colour, smell, etc).

Well, it would be easy to dismiss this as nonsense, but that isn’t very scientific. (And we wouldn’t want to be accused of being “scared” of a positive result. :))

So here are some suggestions for a more rigorous approach to the experiment. Feel free to suggest improvements. And if anyone wants to carry it out the agreed protocol, I would be interested to see the results. It would be fascinating if reasonably well-controlled conditions reproduced the same results. (My initial reaction would be that we need to design a better experiment!)

So, here some suggestions:

1. Have multiple samples of each type of treatment (treated positively, treated negatively, ignored.).

This might need some statistical analysis to determine how many are needed to get a convincing result. But as we don’t (yet) know the strength of the effect we are looking for, that might be hard to do. Maybe start with 10 samples for each treatment?

As the original experimenter claimed that ignored was the worst case, do we need something else as a control? If so what?

2. W need to define the criteria that will be used to measure the results: area of mould if present, types of organisms found, etc (input from someone with expertise in microbiology needed here!).

Should also include the subjective results (colour, smell) for consistency with original. These should probably be judged by, say, three different people.

3. All samples need to prepared identically from the same pack of rice (well stirred before use) and container of water.

Should the containers be sterilised - not to avoid contamination but just to ensure they are all the same? Or maybe just wash them all in the same way to ensure they are equally contaminated

4. We then randomise and label each sample with either a positive word (“love” in the original, I believe), a negative word (“stupid” in original) and blank.

(A future experiment could randomise the labels and how the samples are treated to see which has more effect. But that seems unnecessarily complex for now.)

All samples are also numbered for later identification.

5. It looks like most people do this with uncovered containers for the rice and water. This makes sense to some extent because we want to allow yeast, fungi, bacteria, whatever to get to the samples. But the rice and the water aren’t sterile so it should be OK to cover them.

If the samples are uncovered, there is a risk of extra contamination from the person talking to them. And, perhaps, if you are yelling insults it is more likely for the sample to be contaminated from breath, drops of saliva, etc.

Maybe half the samples should be open and half should be sealed? And/or a screen of some sort between the person speaking and the samples?

6. All samples need to be kept at the same conditions (light, temperature, etc). This means keeping them close together. I would be worried about the claimed emotional influence “spilling over” to nearby samples, but that doesn’t seem to be a concern with the experiments done so far, so the [claimed] effect is obviously very targeted.

7.  Now the key part. Someone needs to talk to the samples every day saying nice things, nasty things, or ignoring them as appropriate.

This should definitely be someone who is not involved in the preparation or the analysis of the results.

Should it be someone who really believes in the claimed effect? So they can’t later say that the person doing the talking was not convincing enough, if the results are not positive.

Should it be a person chosen at random?

Should it be multiple people to “average out” the effect?

These interactions should be filmed so that we can analyse (independently) the words used, the amount of time spent on each sample, etc.

8. After the required time (30 days seems to be recommended, but apparently things can start to get pretty smelly before then - perhaps another good reason for keeping the samples sealed!) we do the analysis.

To do this, the nice/nasty/blank labels are removed or covered so the people ding the analysis don’t know how each sample was treated. The results are recored against the serial number of each sample.

We then collate all the information and see if there is a statistically significant difference between the result and the way the sample was treated.

 

Drinks and Nobel Prizes all round when we demonstrate convincingly that the effect is real!

Any comments?

Link to comment
Share on other sites

I've watched a lot of these videos on YouTube. My recommendation is to focus on the intensity of the emotions, love or hate. The time duration varies a lot. Some people have seen it as little as a month, while others have taken 170 days. In this case, longer might be better. I mean, if the rice became moldy in one day in your setup, then one day of sending loving emotions to the rice may not have much effect. Also, the idea is to wait for one of the jars of rice to change by a noticeable amount. 

I'm curious what's the general plan if the experiment doesn't show the same results as found by dozens of YouTubers? Give up, or ask someone who's had good results to perform the experiment for you?

Link to comment
Share on other sites

32 minutes ago, Theoretical said:

've watched a lot of these videos on YouTube. My recommendation is to focus on the intensity of the emotions, love or hate. The time duration varies a lot. Some people have seen it as little as a month, while others have taken 170 days. In this case, longer might be better. I mean, if the rice became moldy in one day in your setup, then one day of sending loving emotions to the rice may not have much effect. Also, the idea is to wait for one of the jars of rice to change by a noticeable amount. 

The 30 days was based on the original experiment.

An alternative would be to stop as soon as the first sign of mould is spotted on any of the samples. The trouble is that might mean there isn't anything to evaluate on the other sample.

33 minutes ago, Theoretical said:

I'm curious what's the general plan if the experiment doesn't show the same results as found by dozens of YouTubers? Give up, or ask someone who's had good results to perform the experiment for you?

Well, I would place more trust in the results of an experiment performed as outlined above then in the obviously flawed, informal approaches I have seen on the web. And, really, that was the point of this thread: to discuss the design of an experiment, what possible biases need to be accounted for and eliminated, etc., rather than speculate about what the results might be.

But if the results were not "good" then one could repeat it with some variations to see if the result changes. But in all cases one would have to stick to the sort of robust, double-blinded approach. 

Link to comment
Share on other sites

The only possible issue I see with the rice experiments on youtube is in how they determine when the experiment is finished. Sure, they aren't using any method to determine when the experiment finished, but in nearly all of the cases I've seen on youtube it's pretty obvious. One jar is filled with black moldy rice. The other is near white. According to most of the videos, the hate and ignored jars become dark compared to the love jar. Therefore I would consider using a light meter to detect the overall change.

Link to comment
Share on other sites

1 hour ago, Theoretical said:

The only possible issue I see with the rice experiments on youtube is in how they determine when the experiment is finished.

That seems a relatively minor issue to me. As long as you examine all samples after the same time, I would expect consistent results. Ideally, perhaps, one would check the state of all samples everyday and record the changes over time as well as what the changes are. That way you could detect if the words/emotions directed at the samples changed what was happening or simply delayed a certain type of change.

There are many, far more serious issues that mean I can't take any of the experiments I have seen seriously.

Link to comment
Share on other sites

12 hours ago, Strange said:

This might need some statistical analysis to determine how many are needed to get a convincing result. 

Also need to start thinking about what statistical tests you are going to use and not do the noob biologist thing of designing an experiment, collecting the results then going to a statistician.

Area of mould would be amenable to ANOVA if we treat it as a continuous variable (or non-parametric equivalent if model assumptions not met ). But does the area of mould depend on any way on the area of the grain of rice? If so we might consider ANCOVA and treat rice area as a covariate.

Are you going to analyse the results in a longitudinal fashion? If so could use repeated measures ANOVA or maybe MANOVA.

Types of organisms found would also probably best be measured as a continuous variable but the question would be how to count them accurately - assuming you don't try to count the quantity of every single organism. The data will likely be poisson distributed so maybe poisson regression: might have to wait to see the data to decide what's best here. 

How is smell quantified? If it is subjective might look into likert scale type stuff, but is an electronic nose possible? What is said to be causing the smell - just the bacteria present? If so we have already covered that. If something else is something like mass spec an option?

 

12 hours ago, Strange said:

But as we don’t (yet) know the strength of the effect we are looking for, that might be hard to do. Maybe start with 10 samples for each treatment?

If you don't know what effect size then let practical decisions guide a pilot study. Is setting up 100 samples per treatment much harder than setting up 10 - presumably it's just an issue of space and resources. Or is there an idea of the 'love' effect being diluted if there are too many plants?

Link to comment
Share on other sites

26 minutes ago, Theoretical said:

Can you give an example of how those experiments would give false results?

I mentioned a few in my opening post.

One is that most people do the experiment once, with only one sample treated each way. So you have no way of identifying the probability of the result occurring by chance.

You might say, "ah, but lots of people have done it and got the same result". Except they haven't, necessarily. I only looked at a small number of reports, but they were not consistent. You would have to have rather flexible criteria for deciding that they all came to the same conclusion. Plus there is reporting bias. If someone did it and got uninteresting results (e.g. all samples did exactly the same thing) they might consider this boring and not worth talking about (unless they were a sceptic out to debunk things - but most of the reports I saw were people who generally expected it to work).

This is why it is so important to set out the criteria to be used for judging the results before the experiment. Otherwise it is too easy to subtly adjust how you characterise the results to slightly skew them one way or the other. (Note: I am not suggesting people would do this deliberately [although some might] but just that subtle, unconscious biases can come into play - and we know this from scientific studies!)

This is also why blinded experiments are so important. So, for example, you prepare all the samples and then later, randomly assign them to the different treatment groups. Similarly, you don't let the person analysing each sample know how it was treated. They just have an ID number for the sample. It is only later that you correlate those IDs with the treatments.

Another issue is that we don't know how well-controlled the conditions were. In some cases, the sizes of the samples were clearly different (you can see that in the photos of the link I posted, another person explicitly noted this as a potential problem). Is the proportion of rice and water the same in all samples. Are they at the same temperature and lighting conditions. 

Related to that, there is the possibility of more (or different types of) contamination when someone shouts at a sample than when they whisper to it gently. 

And so on and so on...

(I guess this is why your user name is Theoretical rather than Experimental :))

 

32 minutes ago, Prometheus said:

Also need to start thinking about what statistical tests you are going to use and not do the noob biologist thing of designing an experiment, collecting the results then going to a statistician.

...

All good points. And way outside my area of expertise!

4 minutes ago, MigL said:

Does it really matter...
All sake tastes like crap anyway.

Noooooooo!

Link to comment
Share on other sites

Strange,

The experiments that don't have equal amounts of rice in the jars or who leaving the jars open I would consider to be faulty experiments. Most of experiments I saw on youtube don't seem to fall in the category.

As far as statistics, I believe that was brought up in my status post, "Try the experiment yourself. Do it a dozen times." IOW, do it however many times is necessary to get good statistics.

 

BTW, yes I know the statistics part is a problem with the youtube videos.

Link to comment
Share on other sites

7 minutes ago, Theoretical said:

As far as statistics, I believe that was brought up in my status post, "Try the experiment yourself. Do it a dozen times." IOW, do it however many times is necessary to get good statistics.

But what exactly is  'good statistics'. Are you just going to collect data until you see results that you like?

Flip a coin enough times and you will get ridiculous runs of heads which the human mind will determine can't be down to luck. You will then discard the results of the all the initial 'failed' results because they weren't to your liking - except you will say something changed in the experiment - maybe the person flipping coins was standing on one leg when the run began, so that's what you have to do. Except, of course, it was just luck and standing on one leg has nothing to do with it. Easy to see with such a contrived example, but then extend that to real world experiments, add in an emotional investment and all of a sudden you can interpret your results anyway you like. It's pareidolia for data: determine your statistical methodology as rigorously as your experimental methodology otherwise you'll be seeing anything you want. 

Link to comment
Share on other sites

12 minutes ago, Theoretical said:

ps, please don't say six sigma. There might not be enough rice on Earth. ;)

It'll keep you busy for sure.

 

There are 4 considerations for sample size calculations, the first two are difference in means between groups and variances (standard deviations) of groups. Typically the bigger the difference between means and the smaller the variance the less samples you need. There is very little you can do about these -  well designed/controlled experiments might reduce the variance a bit.

The next two you can choose an acceptable level: they are your probability of seeing a difference in means (just by sheer luck) when really there is no difference and seeing no difference in means when really there is. (These are known as type 1 and 2 statistical errors). They are related so increasing one reduces the other so we seek an acceptable balance - what's worse, saying there is a difference when there isn't or saying there isn't when there is? Typically in biology and medicine we say an acceptable probability of a type 1 error is 0.05 and of a type 2 error 0.2 (usually stated as having a 'power of 80%). This means that if you repeat an experiment which you know has no difference in means 20 times, you can expect 1 experiment to (erroneously) say there is a difference in means. 

 

Hope that makes sense: in a rush.

Link to comment
Share on other sites

1 hour ago, MigL said:

Does it really matter...
All sake tastes like crap anyway.

I genuinely just exerted cognitive energy to suppress the urge to neg rep you which bloomed within me upon reading this. 

:)

Link to comment
Share on other sites

9 hours ago, MigL said:

Does it really matter...
All sake tastes like crap anyway.

 

In view of the subject of this thread I would be interested in hearing your experimental proceedure for comparing the respective tastes of saki and crap.

 

:)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.