Jump to content

Statistical tool to identify outliers in a data set


pavelcherepan

Recommended Posts

Hi all!

 

At the moment I'm looking at some huge array of geological data and need to quickly analyse it. What I'm mostly looking at is partitioning the data set into smaller chunks, while ensuring that the particular way this splitting is done would give me chunks of data with lowest variability possible.

 

I have some statistical software in the office, but I'm in the middle of nowhere and won't have access to it before the due date.

 

Any thoughts on what statistical measure I can use?

 

Thanks in advance!

Link to comment
Share on other sites

I'd use R for the processing (it's free, RStudio is a pretty good ide for it).

 

My first attempt here would be to remove anything two standard deviations from the mean. That's the traditional first stab at removing outliers.

Link to comment
Share on other sites

Thanks for the help, although I think I did phrase the discussion title in a wrong way.

 

I'll try to be more clear this time. I have a set of points with spatial location and some grade values. I need to separate the entire data set into an arbitrary number of spatially correlated chunks. Obviously, I can do it in many ways. So what statistical measure I could use for these resulting "chunks" of data to compare the variability of grade between different ways it can be split?

 

I hope this makes more sense.

Link to comment
Share on other sites

Not exactly, but similar. And going by your picture I need to separate for example those peaks in the bottom-right corner, but as there can be various ways I can do it, I need to somehow compare whether one is better than the other. I tried to go simple with stdev, but as the data is not entirely random I get lower stdev if I use an entire data set. If I go more than 20-30 points stdev levels off and starts decreasing and goes to its lowest with the entire data being used.

Link to comment
Share on other sites

This is a difficult and modern subject to hit me with on a Sunday morning, especially with so little information about what your goal is, although I understand what you have told us so far.

 

Is this an exercise in geomorphology, terrain analysis, ground feature analysis or what?

 

It looks as though I have guessed correctly about the need for recognising features described by your contours and that you are looking for a way to best partition your 'map' into zones that best show these features?

Link to comment
Share on other sites

 

It looks as though I have guessed correctly about the need for recognising features described by your contours and that you are looking for a way to best partition your 'map' into zones that best show these features?

 

Pretty much exactly this.

 

 

Is this an exercise in geomorphology, terrain analysis, ground feature analysis or what?

 

More of the practical grade control or rather trying to show colleagues that the way they are doing things is wrong. I intuitively see that it's wrong and I'm capable of doing it better, but I can't figure out what statistical tool can be used on a regular basis to compare.

Link to comment
Share on other sites

Ok, I understand a bit better now. Sorry.

 

Depending on what your data is and what you're trying to show would something like hog spot analysis work? http://desktop.arcgis.com/en/arcmap/10.3/tools/spatial-statistics-toolbox/hot-spot-analysis.htm

 

ESRI have a good long standing reputation for all sorts of GIS (geographic information systems), though it is nearly twenty years since I last used their products.

 

It is good to be brought up to date.

Link to comment
Share on other sites

  • 3 weeks later...

Hi all!

 

At the moment I'm looking at some huge array of geological data and need to quickly analyse it. What I'm mostly looking at is partitioning the data set into smaller chunks, while ensuring that the particular way this splitting is done would give me chunks of data with lowest variability possible.

 

I have some statistical software in the office, but I'm in the middle of nowhere and won't have access to it before the due date.

 

Any thoughts on what statistical measure I can use?

 

Thanks in advance!

 

if you are asking a question about how to calculate a type of "probability",then you have to provide us more information and choose suitable distribution.

probably poisson distribution or one type of "continuoum distribution" (like: beta-gamma- cauchy,normal ,std.normal ...etc.) be suitable.

but I would inform that the analysis is quite different part of math.

a notation: these are commonly used at "statistical" analysis 1) Variance E[x] -E[x^2] ,2)covariance. 3)correlation=k (-1 <k <1 and k might be equal to -+1) ,5)deviation. 6)standart deviation. 7) trends ..etc.

Edited by blue89
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.