Jump to content

Double Precision float Question


Vmedvil

Recommended Posts

15 minutes ago, Vmedvil said:

How and why does it define these as the infinities. 

Arbitrary decision of IEEE float/double creators.

For example, you (programmer) can by yourself decide that -128 means -infinity and +127 means +infinity and numbers between them -127....+126 are normal numbers (when working with 8 bit signed integer). Then overload operators +, -, *, /, comparison to support custom made infinities.

 

Edited by Sensei
Link to comment
Share on other sites

11 minutes ago, Vmedvil said:

How does it define these as the infinities

Not sure what you are asking. The IEEE standard wanted a way to represent infinity and chose a couple of encodings for that purpose. As with the NaN values, they would represent valid numbers if they hadn't been reserved for these uses.

Link to comment
Share on other sites

2 minutes ago, Strange said:

Not sure what you are asking. The IEEE standard wanted a way to represent infinity and chose a couple of encodings for that purpose. As with the NaN values, they would represent valid numbers if they hadn't been reserved for these uses.

 

3 minutes ago, Sensei said:

Arbitrary decision of IEEE float/double creators.

For example, you (programmer) can by yourself decide that -128 means -infinity and +127 means +infinity and numbers between them -127....+126 are normal numbers (when working with 8 bit signed integer). Then overload operators +, -, *, /, comparison to support custom made infinities.

 

Is there a triple float?

Link to comment
Share on other sites

25 minutes ago, Strange said:

The standard defines a quadruple precision, but I don't know if anyone implements it. There is also extended precision, which use 80 bits, that is quite widely supported.

Well, whatever I will let CS advance until then double float says infinity so accurate for an electron radius which is about half as close as needed being 10^-16 vs 10^-35 until Quad floats are used because it should be a number and not infinity which that number should be around, Volume =(4/3)(1/(tpC)^2)^3 , which is 4.4704601196572883072076801920048 * 10^208

Edited by Vmedvil
Link to comment
Share on other sites

You can make any bits length floating point number as you wish/need. It's often needed when making scientific application. Operations on regular floats/doubles introduce errors. Every operation, they accumulate together. So after a while error can be quite significant. Therefor scientists-programmers make their own floating point C++ classes.

 

typedef double SciFloat;
// work with SciFloat (instead of double directly) as long as you need in project...

// then when there is error caused by low precision:
class SciFloat
{
 // custom made float implementation..
 // or use 3rd party library made by somebody else already (could contain unknown errors, as usual)
 // put them in overloaded operators..
};

 

When storing on disk, or transferring through Internet etc. etc. you will have to write such custom object as string, and parse string during loading back, as it's not binary compatible with regular IEEE float/double.

Edited by Sensei
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.