Floating points in JavaScript
Numeric data types
Programming languages usually have more than one numeric data type. Numbers can be mathematically categorized and number categories may have optimal formats for storing them in a computer memory or optimal formats for dealing with them under the hood. Furthermore, number representations may have mathematical properties that conflict with the abilities of computer memory. For instance, the decimal representation of one third involves an infinite sequence of digits 3 after the decimal point, but a computer does not have the capability to store an infinite amount of digits.
Typical numeric data types in programming languages are for instance: integers (non-fractional numbers), floating points (decimal numerals), fixed-point types (for representing monetary values) and bignum types. Programming languages may even have different specific sub-types of integers, floating points, etc.
JavaScript has only two numeric data types: number and the more recent addition to the language BigInt. The number data type represents floating point numbers or shortly floating points and is the general numeric data type for both integers and fractions. The BigInt data type is a bignum type, involving arbitrary-precision numbers to represent arbitrarily large integers.
What are floating points?
Just about all programming languages have floating point data types. Floating-point representation allows computers to store a very wide range of numbers, from very small to very large, in an efficient way. But floating points have some pitfalls you should be aware of. The next statement is a famous example of unexpected results that calculations with floating points may yield.
console.log( (0.1 + 0.2) === 0.3 ); // returns: false.
To be able to understand the quirks associated with floating points, we need to dive deeper into the details of how they work.
Decimal numerals
First some general background. A number, i.e. a real number, can be:
- An integer. Integers are whole numbers, without a fractional part, like 1 or −313.
- A rational number. A rational number is a ratio of two integers, like 1/3. Every integer is also a rational number (3 = 3/1).
- An irrational number. Real numbers that are not rational, like √2 or π, are irrational. Irrational numbers cannot be written as a ratio of two integers.
Generally we use the decimal (base-10) numeral system to represent numbers; so called decimal numerals or decimal numbers. All integers can be precisely represented in a positional numeral system like this. With rational numbers this is not always the case, and with irrational numbers this is never the case. Only decimal fractions (they include all integers) can be precisely written as decimal numerals. Decimal fractions are rational numbers where the denominator is (or can be written as) an integer power of ten:
3 = 3/1 = 30/10
1/2 = 5/10 = 0.5
1/4 = 25/100 = 0.25
1/5 = 2/10 = 0.2
313/100 = 3.13
3/200 = 15/1000 = 0.015
Thus, a non-decimal fraction is any fraction, when fully reduced, having a denominator that has a prime factor that is not a prime factor of the numeral system's base. In the decimal system the base is 10 and its prime factors are 2 and 5 (10 = 2 × 5). For example, 1/6 = 1/(2 × 3) is a non-decimal fraction, since 3 is not a prime factor of 10. Non-decimal fractions cannot be precisely represented as decimal numerals. All non-decimal fractions and all irrational numbers are only approximations when represented as decimal numerals.
1/3 ≈ 0.333
−1/6 ≈ −0.166667
π ≈ 3.1415927
In the examples above two rational and an irrational number are approximated by a decimal fraction. The decimal expansion (the sequence of digits) is actually infinitely long to the right, but rounded-off to a certain precision, turning it into a decimal fraction.
These round-offs may cause errors big enough to become a problem in your application. When performing mathematical operations on fractions and irrationals, presented as decimal numerals, you should be aware of these possible errors. An example:
3 × 1/3 = 3/3 = 1
3 × 0.333333 = 0.999999
The first expression above shows a precise calculation. This is a strict arithmetic procedure for this calculation with fractions. The second expression above converted the fraction to a rounded-off decimal numeral representation (0.333333), before multiplying. This rounding introduced an error which yields a false result: 1 ≠ 0.999999. The error is small, and will become smaller when the decimal expansion becomes longer (rounding after more digits), but errors may for example accumulate in iterative calculations or comparison could result in unexpected behavior (like the code example at the beginning of this chapter).
Scientific notation
Now, floating-point number representation is the digital analogon of scientific notation. Scientific notation is directly derived from the decimal numeral system and a more compact and especially for large or small numbers, a more accessible way to express numbers.
The general form of a number notated in floating-point representation (or scientific notation) is:
significand × base exponent
Some examples (in base-10):
12 345 678.9 = 1.23456789 × 107.
0.000000123456789 = 1.23456789 × 10−7.
9999 = 9.999 × 103 (= 104 − 1).
In normalized scientific notation the significand is a number with only one non-zero decimal digit before the decimal point. The examples above are in normalized scientific notation.
Calculators and computer programs often use e-notation as an alternative format of scientific notation with base 10.
1.23 × 107 = 1.23e7.
Floating-point representation allows computers to store a very wide range of numbers, from very small to very large, with an acceptable fixed relative precision, in a fixed format with a limited number of digits. The computer uses base-2 (binary numeral system) though, because the computer's digital electronic circuits under the hood work with two-state electric signals: 0 and 1.
IEEE 754
IEEE 754 is a standard for floating-point arithmetic.
JavaScript uses this standard for it's number data type.
More specifically, JavaScript uses 64-bit IEEE 754, known as the double-precision format, as opposed to single precision (32-bit).
In other programming languages, like C, this is known as the double
data type.
So, JavaScript uses 64 bits (binary digits) to store a floating point: one bit for the sign (for a positive or negative number),
11 bits for the exponent and 52 bits for the significand (aka mantissa).
In figure 4 this is visualized for storing the decimal number 0.1.
0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
sign (1) | exponent (11) | significand (52) |
---|
Fig.4 - Binary representation of decimal 0.1 in 64-bit IEEE 754 format.
Used converter: baseconvert.com
The number of bits for the significand (length of the significand) determines the relative precision (relative to the order of magnitude)
in which numbers can be represented.
The length of the exponent determines the possible range of orders of magnitude, from very small to very large numbers.
The exponent can be positive or negative. The 11 bits store the exponent in a range 1 ... 2046, because
0 and 2047 (211−1) are for special cases (NaN
etc.).
A fixed exponent bias of 1023 is subtracted from this range to
get an exponent value in the range −1022 ... +1023.
An exponent of all zeros (0) and a zero significand (all zero bits) represents -0
or +0
(however: -0 === +0 && -0 === 0
).
An exponent of all ones (2047) and a zero significand represents -Infinity
or +Infinity
.
An exponent of all ones (2047) and a non-zero significand represents NaN
.
In IEEE 754 a number is stored in normalized scientific form. This means that the separator point (i.e. the decimal point, or in binary, the binary point) sits just after the leftmost digit of the significand. This leftmost digit in binary is always 1. Therefore only the bits on the right side of the binary point are stored as the significand. When encoding the number from memory, the implied first bit 1 and binary point are automatically prepended back to the number's significand.
Max/min floating points in JavaScript
Given the 52 bits for the significand, the maximum number in JS should be:
1. | 111...111 | × 1052 | = | 1111...111 |
implied first 1 | 52 ones | [*] | 53 ones |
---|
A binary number of 53 ones equals: 253 − 1 = 9 007 199 254 740 991 ≈ 9 × 1015.
console.log(Number.MAX_SAFE_INTEGER); // logs: 9007199254740991
console.log(Number.MIN_SAFE_INTEGER); // logs: -9007199254740991
Two things may arrest your attention:
- These maximum and minimum numbers are integers.
- Given the range of the exponent (−1022 ... +1023), the exponent should allow for much bigger and much smaller integers.
The maximum and minimum numbers are integers and based on the length of the significand. An exponent of 52 moves the binary point all the way to the end, after the rightmost digit of the significand. There are no more digits left for a fractional part. Hence, the maximum and minimum numbers are integers. The bigger the exponent, the bigger the order of magnitude, the bigger the number itself, the less digits available for a possible fractional part, the less the absolute precision will be. The relative precision however, relative to the order of magnitude (determined by the exponent), is fixed and is determined by the number of bits available for the significand. Fractional digits (or right most digits in general) are more significant to a small order of magnitude number than to a large order of magnitude number.
The exponent can be much bigger than 52 and much smaller than −52.
So the maximum and minimum representable numbers must be much bigger and much smaller than the above mentioned
MIN_SAFE_INTEGER
and MAX_SAFE_INTEGER
.
This is true, but the representation of numbers (only integers) bigger or smaller than
MIN_SAFE_INTEGER
and MAX_SAFE_INTEGER
is not reliable.
That is why these minimum and maximum numbers have "SAFE" in their name.
Given the range of the exponent
−1022 .. +1023:
Maximum representable number: 21024 − 1 ≈ 1.8 × 10308
Minimum representable number: −(21024 − 1) ≈ −1.8 × 10308
console.log(Number.MAX_VALUE); // logs: 1.7976931348623157e+308
The property Number.MIN_VALUE
does not represent the minimum representable number, as you may expect.
It represents the minimum positive number, that is, the closest positive number to zero:
2−1022 − 52 = 2−1074 ≈ 5 × 10−324
console.log(Number.MIN_VALUE); // logs: 5e-324
But what happens between MAX_SAFE_INTEGER
and MAX_VALUE
and between
MIN_SAFE_INTEGER
and -MAX_VALUE
?
Suppose we add one to MAX_SAFE_INTEGER
. This results in a binary number written as a 1 followed by 53 zeros.
(253 − 1) + 1 = 1 × 253.
MAX_SAFE_INTEGER
, the 53rd bit of the significant should be 1, which cannot be stored.
Thus, MAX_SAFE_INTEGER
+1 "equals" MAX_SAFE_INTEGER
+2.
Adding 3 to MAX_SAFE_INTEGER
results in the 52nd bit becoming 1, which is stored, and the 53rd bit becoming 0,
which is "added" after multiplication by 253. So, this number can be correctly stored again.
console.log( (Number.MAX_SAFE_INTEGER + 1) === (Number.MAX_SAFE_INTEGER + 2) ); // logs: true
console.log( 9007199254740992 === 9007199254740993 ); // logs: true
console.log( (Number.MAX_SAFE_INTEGER + 3) === 9007199254740994 ); // logs: true
console.log( 9007199254740993 === 9007199254740994 ); // logs: false
So, numbers outside the safe range but inside the minimum/maximum range are all processed as finite numbers but a
considerable amount is not stored correctly. Numbers larger than MAX_VALUE
are represented as Infinity
.
For integers greater than MAX_SAFE_INTEGER
or integers less than MIN_SAFE_INTEGER
,
data type BigInt is available.
MAX_SAFE_INTEGER
has 16 significant decimal digits. The 16th digit however, does not
always guarantee a precise conversion, as shown above.
The number of bits for the significand (length of the significand) determines the relative precision of the floating point format.
The relative precision is 53 bits which is equivalent to 15.95... decimal digits.
253 = 10n ⇔ n = 53 log10(2) = 15.95...
Using numbers with more than 15 significant decimal digits in operations may yield improper results. Results of calculations are presented as decimal numbers, at most, rounded to 16 to 17 significant decimal digits. Significant digits do not include leading and trailing zeros (0.0123e4, 1.23, 1.2300e2, 123, 123.0, 123000 all have 3 significant digits).
Conclusion: be careful with operations that involve numbers (positive or negative) with more than 15 significant decimal digits. Also adding or subtracting numbers of very different scales or magnitudes may give unexpected results. In general: When there are too many significant digits, the number is rounded to match up with the limited number of bits in the significand, which possibly introduces rounding errors.
console.log( 0.30000000000000004 === 0.30000000000000002 ); // logs: true // more than 15 significant digits
console.log( 100_000_000 + 0.000000002 ); // logs: 100000000 // adding numbers of very different orders of magnitude
console.log( 0.3 + 0.00000000000000002 ); // logs: 0.3 // adding numbers of very different orders of magnitude
// In the next script a very small number (smallNum) is added to a
// very large number (largeNum) a million times.
//
// This has no effect though. Whether you add it once or a million times,
// every single time the result gets rounded back to 1e9.
let largeNum = 1e9, // this is 1000000000
smallNum = 1e-9; // this is 0.0000000001
for (let n = 0; n < 1000000; n++ ) {
largeNum += smallNum;
}
console.log(largeNum); // logs: 1000000000 // = 1e9.
Numbers with infinite many significant digits
As we have seen; floating points having too many significant digits need to be rounded. Consequently, numbers with an infinite sequence of digits are always rounded. As discussed before, all irrational numbers and some rational numbers have infinite sequences of digits after the decimal/binary point. Which rational numbers have an infinite expansion of digits depends on the base of the numeral system they are represented in.
Now, let's go back to the JavaScript presented at the beginning of this article: (0.1 + 0.2) !== 0.3
.
All three involved fractions are decimal fractions. When they are represented as decimal numerals they have finite sequences of digits, and therefore
are not approximated by rounding, and therefore they can be precisely added and represented as decimal numerals.
However, numbers are stored as floating points in binary form instead of decimal form, and
in binary form the three involved fractions are approximated by rounding!
In binary form only binary fractions, that are fractions, when fully reduced, having a denominator being a power of two, can be
precisely represented with a finite number of digits.
All three fractions are not binary fractions:
0.1 = 1/10,
0.2 = 2/10 and
0.3 = 3/10.
This means that the stored bits patterns for both 0.1
and 0.2
are rounded and
after adding them, the bits pattern for the sum is rounded again. Then this result is compared with the
bits pattern of the other part of the comparison, being 0.3
, which is rounded only once.
Thus, due to different rounding errors, the sum 0.1 + 0.2
evaluates as being unequal to the lone 0.3
.
Note in the next example that, as mentioned before, using numbers with more than 15 significant decimal digits may cause problems, but results may be rounded to more than 15 significant decimal digits.
console.log( (0.1 + 0.2) === 0.3 ); // logs: false
0.1 + 0.2: 0011111111010011001100110011001100110011001100110011001100110100 0.3: 0011111111010011001100110011001100110011001100110011001100110011
console.log(0.1+0.2); // logs 0.30000000000000004
console.log( (0.3) === 0.30000000000000004 ); // logs: false
console.log( 0.30000000000000004 === 0.30000000000000002 ); // logs: true
Some more examples:
console.log( (0.3 + 0.6) === (0.45 + 0.45) ); // logs: false
console.log( (0.3 + 0.6) >= (1.1 - 0.2) ); // logs: false
console.log(0.3 + 0.6); // logs: 0.8999999999999999
console.log(0.45 + 0.45); // logs: 0.9
console.log(1.1 - 0.2); // logs: 0.9000000000000001
Rounding errors in operations involving non-binary fractions not always cause flawed results. Rounding errors may cancel each other out.
console.log( (0.5 + 0.1) === 0.6 ); // returns: true.
0.5 + 0.1: 0011111111100011001100110011001100110011001100110011001100110011 0.6: 0011111111100011001100110011001100110011001100110011001100110011
In the next example all involved fractions are binary fractions, therefore no rounding occurs at all.
console.log( (0.5 + 0.25) === 0.75 ); // logs: true
Also irrational numbers have infinite many significant digits/bits after the decimal/binary point.
console.log( Math.sin(Math.PI) === 0 ); // logs: false
console.log( Math.sin(Math.PI) ); // logs: 1.2246467991473532e-16 // This is 0.00000000000000012246467991473532
Scientific calculators or computer algebra systems usually handle irrational numbers or fractions in an exact mathematical way. Their source code includes an extensive set of algorithms to perform symbolic mathematical operations directly, instead of using approximate floating point values for each intermediate calculation: 3 × 1/3 = 3/3 = 1 instead of 3 × 0.333333 = 0.999999. A system like this will return exactly 0 on an input sin(π).
Comparing floating point numbers
From what we have learned so far we can conclude: be careful with directly comparing two floating points, for instance in conditional statements or loops. Before comparing we could round the numbers to some desired number of decimals after the decimal point:
let toNdecimals = 6; // What is an appropriate round-off for your application?
console.log( roundNumber(0.1+0.2) === roundNumber(0.3) ); // logs: true.
console.log( roundNumber(10000000000.1+0.2) === roundNumber(10000000000.3) ); // logs: false.
console.log( roundNumber(10000000000.1+0.2) ); // logs: 10000000000.300001.
function roundNumber(num) {
return Math.round(num * Math.pow(10, toNdecimals)) / Math.pow(10, toNdecimals);
}
A less ponderous method is to check if the difference of the two numbers is small enough, that is, if the two numbers are "close enough" for your application.
let epsilon = 1e-6;
console.log( Math.abs((0.1+0.2) - 0.3) < epsilon ); // logs: true.
console.log( Math.abs((10000000000.1+0.2) - 10000000000.3) < epsilon ); // logs: false.
The difficulty is to find an appropriate approximation error,
often called epsilon, that the difference is compared to.
An epsilon too small makes the comparison always return false
,
an epsilon too big makes your condition test too inaccurate; it returns true
if the two numbers are too different.
As mentioned before, the precision of the IEEE 754 floating point system is relative due to the fixed number of bits in the significand.
Rounding happens in the last bit. If the number is larger, the last bit represents a larger value
and thus the absolute rounding error becomes larger. However, multiple rounding errors in operations or calculations may still (partly)
cancel each other out. The resulting total error may even be negative, hence the Math.abs()
method used in the examples.
console.log( Math.abs((0.1+0.2) - 0.3) ); // logs: 5.551115123125783e-17
console.log( Math.abs((1.1+2.2) - 3.3) ); // logs: 4.440892098500626e-16
console.log( Math.abs((11.1+22.2) - 33.3) ); // logs: 0
console.log( Math.abs((111.1+222.2) - 333.3) ); // logs: 5.684341886080802e-14 // the difference is negative
console.log( Math.abs((1111.1+2222.2) - 3333.3) ); // logs: 4.547473508864641e-13 // the difference is negative
console.log( Math.abs((11111.1+22222.2) - 33333.3) ); // logs: 0
console.log( Math.abs((111111.1+222222.2) - 333333.3) ); // logs: 5.820766091346741e-11
In the above example we see that, in general, the larger the involved numbers are, the larger the difference is. Choosing a very small
epsilon, say 1e-16, would work for 0.1 + 0.2 - 0.3
, but not for the rest in the example.
In discussions online it is often proposed to use the Number.EPSILON
property as a fixed absolute epsilon.
The Number.EPSILON
property represents the machine epsilon quantity.
console.log(Number.EPSILON); // logs: 2.220446049250313e-16
The purpose of this "epsilon" is not to use it as a fixed absolute epsilon in comparisons.
For many, many applications this "epsilon" would be (way) too small. Forget about the existence of Number.EPSILON
, you
will probably never need to use it.
So, we compare an absolute error to an absolute epsilon, while the occurring error is relative to the order of magnitude of the involved numbers. A solution could be to make the difference or the epsilon also relative. You can find functions online that do just that, but they all have issues, even the most clever ones.
The easiest and most accessible way may be to simply choose a fixed epsilon that is appropriate for your application. In practice, nine out of ten times acceptable errors in your application are way larger than the rounding errors that may occur. If orders of magnitude do not differ too much you generally can choose a value for epsilon where a smaller difference between two numbers has no practical meaning in your application. And real nerds choose binary fractions for epsilons: 0.5 = 2−1, 0.25 = 2−2, 0.125 = 2−3, 0.0625 = 2−4, 0.0009765625 = 2−10, etc.
Alternative solutions
Usually the use of floating points does not cause problems, as long as the mentioned precautions are followed. If your application really needs clean decimal calculations, like calculations with money, and simply rounding the results to a fixed number of decimal places when displaying them is not a sufficient solution, you might consider "internally" using integers only, e.g. calculate everything in cents. This avoids binary fractions with infinite many significant bits. Drawbacks are that code becomes less comprehensible and that the involved integers more likely become too large. The latter may be tackled by using the JavaScript BigInt data type, particularly in applications where values greater than 253 are reasonably expected.
There are some JavaScript libraries available that provide an imitation decimal number type for arbitrary-precision arithmetic on all decimal numbers (integers and fractions). Only use them though, if there are no other reasonable solutions available and if it is really worth the seriously worse run-time performance. Libraries are:
BigInts in JavaScript
In JavaScript BigInt is a primitive integer data type, especially for representing arbitrarily large integers, greater than 253 - 1.
BigInt uses the same literal format as for the number type, except
with a lowercase letter n
suffix, with no decimal point (it's an integer) and with no e-notation allowed.
A BigInt literal cannot begin with a zero.
console.log(-123n); // logs: -123n
console.log(0n); // logs: 0n
console.log(123_456_789n); // logs: 123456789n
Arithmetic operations may be used with BigInt values. The built-in object Math
does not work with BigInts. Operations on BigInts return BigInts.
Operations with a fractional result will be truncated, i.e. the result will be rounded towards zero.
Floating points (number data type) cannot be mixed with BigInts in arithmetic operations.
BigInts cannot be mixed with booleans in arithmetic operations.
console.log(3n - 2n); // logs: 1n
console.log(3n ** 2n); // logs: 9n
console.log(3n / 2n); // logs: 1n // not 1.5n
console.log(3n + 2n); // logs: 5n
console.log(3n + "hello"); // logs: "3hello" // not "3nhello"
console.log(3n - true); // logs: TypeError
console.log(3n - 2); // logs: TypeError
console.log(Math.pow(3n,2n)); // logs: TypeError
The examples above could have worked if JS would coerce BigInts into numbers. Probably because of safety reasons JS does not do this. BigInts can be arbitrary large and so automatically converting them to floating points may unintentionally change the value of the number. BigInts need to be explicitly converted first (see later).
Coercion of BigInts into booleans will not lead to a loss of precision. Therefore logical operators and conditional statements work with mixed BigInts and floating points. Also (non-strict) comparison operators take mixed operands.
console.log(0n || "hello"); // logs: "hello"
console.log(!3n); // logs: false
console.log(Boolean(!0n)); // logs: true
if (0n) { console.log("This will not be logged.") }
console.log(2n > 1); // logs: true
console.log(2n == 2); // logs: true
console.log(2n === 2); // logs: false // different data types
Explicit conversion from BigInt to a floating point and vice versa can be done by using the wrapper object constructor functions
BigInt(number)
and Number(BigInt)
. Using the unary plus operator to convert a BigInt throws a TypeError exception.
console.log( BigInt(Number.MAX_SAFE_INTEGER) + 2n ); // logs: 9007199254740993n
console.log( +2n ); // logs: TypeError
console.log( BigInt(1.5) ); // logs: RangeError
However, converting between floating points and BigInt values can lead to loss of precision!
So, only use BigInt values when values greater than 253 are reasonably expected,
and once you use them in your application, stick to them! An alternative may be to pass a string
to BigInt("number")
instead of the number.
console.log( BigInt(123456789123456789) ); // logs: 123456789123456784n // the floating point already lost precision before passing it to BigInt().
console.log( BigInt("123456789123456789") ); // logs: 123456789123456789n