Les nombres à virgule flottante: simple comme .1 + .2 = .3!

Patrick Fournier

Séminaires AESCM

9 novembre 2022

Contexte et contre-exemples

Tu pensais qu'c'tait ça que c'tait...

Addition de nombre rationels


                julia> 0.5 + 0.25 == 0.75
                true

                julia> 0.1 + 0.2 == 0.3
                false

Polynômes de Wilkinson2

$$ \begin{align*} w_n(x) &= \prod_{k = 1}^n (x - k)\\ &= (x - 1) (x - 2) \cdots (x - n) \end{align*} $$ $ \text{Racines}(w_n(x)) = 1, 2, \ldots, n $ 🤓


                    using Polynomials: Polynomial, fromroots
                    import PolynomialRoots: roots

                    wilkinson(n) = fromroots(1.0:n)

                    roots(poly::Polynomial) = roots(poly.coeffs)


                    julia> round.(roots(wilkinson(3)),
                                        digits = 4)
                    3-element Vector{ComplexF64}:
                    3.0 - 0.0im
                    2.0 + 0.0im
                    1.0 + 0.0im


                  julia> w20 = wilkinson(20);

                  julia> round.(roots(w20), digits = 4)
                  20-element Vector{ComplexF64}:
                  19.9999 - 0.0im
                  19.0006 + 0.0im
                  17.9973 + 0.0im
                  17.0074 + 0.0im
                  15.9855 + 0.0im
                  15.0204 + 0.0im
                  13.9779 + 0.0im
                  13.0184 + 0.0im
                  ⋮


                  julia> w20[19] -= 2^-23

                  julia> round.(roots(w20), digits = 4)
                  20-element Vector{ComplexF64}:
                  20.8469 - 0.0im
                  19.5024 + 1.9403im
                  19.5024 - 1.9403im
                  13.9923 - 2.5188im
                  13.9923 + 2.5188im
                  16.7307 + 2.8126im
                  16.7307 - 2.8126im
                  11.7935 + 1.6521im
                  ⋮


                  julia> round.(abs.(roots(w20)), digits = 4)
                  20-element Vector{Float64}:
                  20.8469
                  19.5987
                  19.5987
                  14.2172
                  14.2172
                  16.9655
                  16.9655
                  11.9086
                  11.9086
                  10.1158
                  ⋮

Suite logistique

$$ x_{n + 1} = r x_n (1 - x_n) $$

Modélise la croissance d'une population
$ r $: potentiel biotique, i.e. # d'enfants
Si $ r = 4 $, deux comportements possibles:
- $ x_0 $ rationel $ \Rightarrow $ périodicité (éventuellement)
- $ x_0 $ irrationel $ \Rightarrow $ chaos 🌪🌪🌪
Donc, $ x_0 \sim \mathcal U(0, 1) \Rightarrow $ chaos (p.s.)


                    f(x) = 4.0 * x * (1.0 - x)

                    function suite_log(n, x0)
                        for _ ∈ 1:n
                            x0 = f(x0)
                        end

                        x0
                    end


                    julia> using Random: Xoshiro
                    julia> x0 = rand(Xoshiro(45))
                    julia> suite_log(10000000, x0)
                    0.0

IEEE 754

Nombre entiers: système binaire pondéré

Ordinateur moderne: 64 bits
$ 2^{64} \approx 10^{19} $ valeurs différentes (un peu plus que # grains de sable sur Terre)
Non signé: 64 bits pour le nombre ($ 0 \text{ à } 2^{64} -1 $)
Signé: 63 bits pour le nombre, 1 bit pour le signe ($ -2^{63} \text{ à } 2^{63} - 1 $)

Exemples:

$ 0_{10} = 0_2 $
$ 42_{10} = 0 \cdots 0101010_2 $
$ -42_{10} = 1 \cdots 1010110_2 $ (complément à 2)

Opérations bit-à-bit

Négation: $\mathord{\sim} 5 = \mathord{\sim} 0 \cdots 101_2 = 1 \cdots 010_2 = -6 $
Conjonction: $ 4 \mathbin{\&} 5 = 100_2 \mathbin{\&} 101_2 = 100_2 = 4 $
Disjonction: $ 4 \mathbin{|} 5 = 100_2 \mathbin{|} 101_2 = 101_2 = 5 $
Xor: $ 4 \mathbin{\verb|^|} 5 = 100_2 \mathbin{\verb|^|} 101_2 = 001_2 = 1 $
Left shift: $ 2 \ll 1 = 10_2 \ll 1 = 100_2 = 4 $
Right shift: $ 12 \gg 2 = 1100_2 \gg 2 = 11_2 = 3 $

Nombres réels: logarithme signé

Réservé 1 bit pour le signe, $ n $ bits pour la partie entière et $ d $ bits pour la partie décimale
Problème: précision fixe pour tout $ \mathbb R $
Solution: logarithme!

Pour un petit nombre $ \tau $, la magnitude de $ A \in \mathbb R $ est $$ L_A = \begin{cases} \log\lvert \tau A \rvert &\text{si } \lvert A \rvert > \frac{1}{\tau}\\ 0 &\text{si } \lvert A \rvert \leq \frac{1}{\tau}\\ \end{cases} $$

Opérations arithmétiques:

$ A \times B $: $ L_A + L_B - L_{\tau} $
$ A + B $: Basé sur l'identité $ A \left( 1 + \frac B A \right)$

3 opérations pour une addition!

Nombres à virgule flottante

Notation scientifique en base $ \beta = 2 $

Standard IEEE 754 pour 64 bits (precision double):

1 bit pour le signe $ s $ ($ 0 \Rightarrow $ positif)
$ p = 53 $ bits pour la mantisse
11 bits pour l'exposant

Mais $ 1 + 53 + 11 = 65 > 64 $ 😱

Technicalités 🥱

Exposant biaisé: soustraire $ 2^{10} - 1 = 1023 $
Bit implicite: en base 2, vaut 1 pour nombre normalisé
Exposant $ e = 0 $ (donc -1023 avec biais) & mantisse $ c \neq 0 $ $ \Rightarrow $ bit implicite = 0; nombre dénormalisé

Nombres spéciaux:

Dans tous les cas, $ e = 1 \cdots 1_2 $

$ +\infty $: $ s = 0, c = 0$
$ -\infty $: $ s = 1, c = 0$
NaNs: $ s \in \{ 0, 1 \}, c \neq 0 $

IEEE 754 exige deux NaNs: qNaN (quiet) et sNan (signaling).

Nombres non représentables: overflow & underflow


                julia> 1e308
                1.0e308

                julia> 1e308 * 2.0
                Inf

                julia> 5e-324 - 2e-324
                5.0e-324

                julia> 5e-324 - 3e-324
                0.0

Nombres non représentables: représentation infinie

Nombre

irrationel $\left( \pi, e, \ldots \right)$: tout $ \beta $
Rationel périodique $\left( \frac 1 3, \ldots \right)$: dépend de $ \beta $

Proposition: $ n = \frac a b $ ($ a $ et $ b $ copremiers) a une représentation finie dans une base $ \beta $ ssi $ \text{facteurs}(b) \subseteq \text{facteurs}(\beta)$.

Comment "représenter" les nombres non représentables?

Erreur absolue

Manière naturelle de mesurer l'erreur d'arrondissement.
Unité: ulp (unit in the last place)

Si $ n $ est représenté par $$ \beta^e \sum_{k = 0}^{p - 1} d_k \beta^{-k}, $$ l'erreur absolue est $$ \beta^{p - 1} \sum_{k = p}^{\infty}d_k \beta^{-k} \text{ ulps}. $$

Fun fact:

Si le résultat d'un calcul est "correct", l'erreur absolue avec la véritable réponse peut aller jusqu'à $ \frac 1 2 $ ulp!

Erreur relative

Pratique pour comparer la précision d'algorithmes
Erreur absolue divisée par $ n $
Question: bornes pour l'erreur relative correspondant à $ \frac 1 2 $ ulp

$ \frac 1 2 $ ulp pour $ n $ en base $ \beta $ avec précision $ p $ correspond à
Pour obtenir cette erreur, $ \beta^e \leq n \leq \beta^{e + 1} $
Donc, les bornes sont

🧑🏽‍🔧🌭🧑🏽‍🔧

$ \epsilon = \frac 1 2 \beta^{1 - p} $ est appelé machine epsilon
Vaut $ 2^{-53} \approx 1.1 \times 10^{-16} $ en précision double

Retour sur les exemples

Addition de nombres rationels

$ 0.5 + 0.25 = \frac 1 2 + \frac{1}{2 \times 2}$
Représentables en base $ \beta = 2 $ 👍
(Aussi en précision $ p = 53 $ 🤫)
$ 0.1 + 0.2 = \frac{1}{2 \times 5} + \frac 1 5$ 👎

Polynômes de Wilkinson

Problème mal conditionné
Petit changement dans un coefficient $ \Rightarrow $ (possiblement) grand changement dans les racines
Pas un problème de représentation: ne peut être réglé en augmentant $ p $ 😢


                    setprecision(500) do
                        w20_big = wilkinson(BigFloat(20))
                        global roots_w20_big = roots(w20_big)

                        w20_big[19] -= 2.0^-23.0
                        global roots_w20_big_pert = roots(w20_big)
                    end


                    julia> round.(roots_w20_big, digits = 4)
                    20-element Vector{Complex{BigFloat}}:
                    20.0 - 0.0im
                    19.0 + 0.0im
                    18.0 + 0.0im
                    17.0 + 0.0im
                    16.0 + 0.0im
                    15.0 + 0.0im
                    14.0 + 0.0im
                    13.0 + 0.0im
                    12.0 + 0.0im
                    11.0 + 0.0im
                    ⋮


                    julia> mags = Float64.(abs.(roots_w20_big_pert))
                    julia> round.(mags, digits = 4)
                    20-element Vector{Float64}:
                    20.8469
                    19.5987
                    19.5987
                    14.2173
                    14.2173
                    16.9655
                    16.9655
                    11.9088
                    11.9088
                    ⋮

Suite logistique

Topologiquement conjuguée au bit shift map (décalage de Bernoulli)
À chaque $ y_0 $ t.q. bsm périodique correspond un $ x_0 $ t.q. la suite logistique l'est aussi

Bit shift map

Soit $ T $ un homomorphisme sur $ \{ 0, 1 \}^{\infty} $ défini par $$ T(d_1, d_2, \cdots) \mapsto (d_2, d_3, \cdots). $$ Par exemple: $$ T(1, 1, 0, 1, \cdots) = (1, 0, 1, \cdots). $$

Dualité entre BSM et [0, 1]

Pour une trajectoire d'un processus de Bernoulli $D = \{d_k \}_{k \in \mathbb N}$, on définit $$ y_n = \sum_{k = n + 1}^{\infty} d_k 2^{-k} $$ une bijection dans $ [0, 1] $
$ y_n \leftrightarrow T^n(d_1, d_2, \ldots)$
On voit que $(D, T(D), T^2(D), \ldots)$ périodique ssi $ y_0 $ rationel

$ x_0 \sim \mathcal U(0, 1) \Rightarrow $ non périodique presque sûrement
Impossible à simuler car nombres à virgule flottante rationels
Toute simulation de la suite logistique est éventuellement périodique
Encore plus fort: converge à 0

Code Julia

Disponible ici!

Références

Goldberg, D. (1991). What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys (CSUR), 23(1), 5–48. https://doi.org/10.1145/103162.103163
Wilkinson, J. H. (1959). The evaluation of the zeros of ill-conditioned polynomials. Part I. Numerische Mathematik 1959 1:1, 1(1), 150–166. https://doi.org/10.1007/BF01386381