Jump to content

Outliers in small sample


GnothiSeauton

Recommended Posts

Hi all :D

I have a question concerning detecting outliers in small samples.

My results are:  14.1, 4.1, 8.1, 9.1, 8.2, 8.6 - i suspect outliers are 14.1 and 4.1

Which statistical test should i use to analyze this data? I thought about Extreme studentized deviate test but it turned out to be used for big sample sizes.

Thank you so much in advance ^^

Link to comment
Share on other sites

Why specifically do you want to detect outliers, and what are you going to do with the data once you identify outliers?

There's no one hard-and-fast test that identify outliers, because there's no one reason for something to be an outlier. Do you have reason to believe the outlying results are somehow erroneous, or are they as valid as the others?

Link to comment
Share on other sites

What are the error characteristics of your measurement system?

You have 6 data points. Are they all supposes to be a repeat measurement of the others?

With the information you've posted and only 6 data points it's impossible to give any good advice. (with the possible exception of repeat the experiment some more)

Link to comment
Share on other sites

55 minutes ago, Klaynos said:

What are the error characteristics of your measurement system?

You have 6 data points. Are they all supposes to be a repeat measurement of the others?

With the information you've posted and only 6 data points it's impossible to give any good advice. (with the possible exception of repeat the experiment some more)

A first class answer.  +1

 

I would add that your description needs to identify these data points properly so the appropriate modelling distribution and sidedness can be chosen.

Link to comment
Share on other sites

In my experiment I was observing much product was formed in enzymatic reaction at different time rates.

Experiment was done in 6 parallels. 

Reaction lasted 2 minutes - my results are 14.1, 4.1, 8.1, 9.1, 8.2, 8.6  [g/L]

Reaction lasted 10 minutes - my results are 8, 14.9, 14, 17, 14.4, 15     [g/L]

Reaction lasted 30 minutes - my results are 22, 20, 33, 16, 21.8, 23      [g/L]

I need to construct graph where on y axis is concentration, a on x axis is time.

Link to comment
Share on other sites

That's better, but I would start by plotting the course of 6 individual trials of the reaction.

That is rearrange your data properly so it doesn't look as if you have mixed up the reading from flask 1 with flask 3 etc.

 

I say this because the in first entry 2 minute row is 14.1

Yet the first entry in the 10 minute row is 8

suggesting the reaction went backwards.

Edited by studiot
Link to comment
Share on other sites

I'm a big fan of boxplots for this type of data, although you have small enough samples that you can just plot each point directly.

What kind of analysis do you intend to do once you've removed the outliers? Are you trying to test for significant differences, or model the data somehow?

I ask because, to me as a statistician, there isn't really such a thing as an "outlier". Unless some data points arise from a mistake in the experiment, they're all real measurements from the true distribution of possible outcomes. Removing some data can be valid for analysis, but it depends on what your goals are. I'd be able to give better advice if I knew what you were trying to achieve.

Link to comment
Share on other sites

6 hours ago, Cap'n Refsmmat said:

I'm a big fan of boxplots for this type of data, although you have small enough samples that you can just plot each point directly.

Are you familiar with violin plots?

https://en.m.wikipedia.org/wiki/Violin_plot

http://www.sthda.com/english/wiki/ggplot2-violin-plot-quick-start-guide-r-software-and-data-visualization

 

I can't add anything else to the last two replies to help. 

Link to comment
Share on other sites

Well this looks like a typical rate of reaction determination to me, but I'm sorry to say rather sloppily recorded.

In the first place we have 24 not 18 data points since at time zero there should be no product in each of the 6 reaction flasks. ie all the curves must pass through the origin.

Secondly the results are stated as quantities of product, but are recorded as concentrations.

Thirdly, as I have already pointed out, the readings are tabulated in an odd, seemingly impossible, manner. 

If this last issue were sorted out so each reading could be properly attributed to one or other flask of reactants, then I'm sure the curves (they look like a power law to me) would appear more sensible.

We could then propose a rate law and deduce the deviations of each trial from this for statistical analysis.

Since this is about chemical calculations which are rather specialised, perhaps this thread should be drawn to the attention of our chemistry experts.

 

Edited by studiot
Link to comment
Share on other sites

Thank you all so much for your replies :)

In ideal situation concentrations of all 6 parallels at each time should be roughly the same (eg. after 2 minutes concentrations in all 6 flasks should be around 8 g/L; after 10 minutes concentrations should be around 16 g/L; after 30 minutes around 20 g/l). and my results scatter quite a bit so I can't deduce a lot of things from that. 

It is quite possible that I did some of these parallels sloppily or that some of my enzymes denatured. I think the best thing should then be to repeat the experiment with more parallels and more time rates.   

Link to comment
Share on other sites

If I were you I'd plot each parallel individually as lines between points. All on the same plot. 

Probably also on the same plot I'd include a box plot for each time.  I might do this on the same plot but that might be too messy. 

Those are just the first steps. Next would depend on how they look. 

Caveat here is that I'm a physicist rather than a statistician or chemist. 

Link to comment
Share on other sites

9 hours ago, GnothiSeauton said:

Thank you all so much for your replies :)

In ideal situation concentrations of all 6 parallels at each time should be roughly the same (eg. after 2 minutes concentrations in all 6 flasks should be around 8 g/L; after 10 minutes concentrations should be around 16 g/L; after 30 minutes around 20 g/l). and my results scatter quite a bit so I can't deduce a lot of things from that. 

It is quite possible that I did some of these parallels sloppily or that some of my enzymes denatured. I think the best thing should then be to repeat the experiment with more parallels and more time rates.   

Before rushing off to repeat the experiment (and perhaps the mistakes) you should consider the method very carefully

.Firstly the mechanics of doing the trials.

Are the trials carried out in six different flasks at the same time or is a trial repeated in one flask six times?

Are the flasks clean?
Especially if you repeat in the same flask which risks cross contamination.

How are you measuring product concentration?
How long does it take to make a concentration determination?
You are recording instantaneous concentrations. How would a 15 second error in timing make to the 2min, 10min and 20min marks?

The recorded concentration is only valid if the reaction mixture is homogeneous.
Is it stirred? or how else do you ensure this?

How temperature sensitive is this reaction?
How much heat is evolved?
Are you monitoring to see if all the trials have the same conditions?

How are you noting down the results?
I presume that each column in your table of results is meant to represent a single trial.
If so the result 14,8,22 indicates some sort of recording error.
If you can't sort this out looking back then this trial needs to be discarded - it is worse than an outlier.

Secondly the reaction itself

You say it is a catalysed reaction.
Is it autocatalysed or are you adding a catalyst?

Assume the reaction is [A] + = [C]

Is either [A] or very large compared to the other so effectively constant?

How about [C] ? is this always small or does the reaction approach completion?

What reaction rate equation are you assuming to give the figures you have stated - 8, 16, 20 g/L
 

 

Link to comment
Share on other sites

My apologies the last post was a victim of forum timeouts.

The equations should have been

Chemical equation

A + B = C

since you mention only one product I assume it is not a dissociation reaction.

With rate equation

 

 

 

 

Edited by studiot
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.