jk-jeon/dragonbox

Do you have an algorithm to deal with ratio and percentage for integers?

Closed this issue · 5 comments

I find situations like percentages very common, but using floating pointers to print them out would be very costly.

Do you have an algorithm to deal with that?

One without precision:

For example, giving input (3,10)
Output:
30%
or
0.3
For example, giving input (3,9)
Output:
33.(3)%
or
0.(3)
For example, giving input (2,14)
Output:
14.(285714)%
or
0.(142857)

Also, with precision version (which is very common in statistics, like keeping two digits decimals):

For example, giving input (2,14), precision 2
Output:
14.26%
or
0.14

Well. I just realized precision is only possible with a few possibilities.

Maybe a roundtrip here is needed too?

For example, giving input (3,10)
Output:
30%
or
0.3
For example, giving input (3,9)
Output:
33.(3)%
or
0.(3)
For example, giving input (2,14)
Output:
14.(285714)%
or
0.(142857)

If you are looking for finding the exact decimal representation in terms of the non-repeating and the repeating digits, it's nothing but Euler's theorem. Basically, what it says is that given a pair of coprime integers $a$ and $n$, there is an exponent $e$ such that $n$ divides $a^{e}-1$, and it also gives a formula for computing this exponent in terms of the so-called Euler totient function $\varphi(n)$.

Here is the explanation on why that's relevant here. Let's say we want to compute $p/q$ for a pair of coprime integers $p$ and $q$. We want to find all the digits appearing in evaluating $p/q$ in decimal. If $q$ has $2$ or $5$ as its prime factors, we pull them out and rewrite it as

$$ \frac{p}{q} = \frac{p}{q'}\cdot \frac{1}{2^{a}5^{b}}. $$

Clearly $1/(2^{a}5^{b})$ can be rewritten as $c/10^{k}$ for some integer $c$ and $k:=\max(a,b)$. Now, since $q'$ is coprime to $10$, Euler's theorem tells us that there exists the smallest exponent $e$ such that $q'$ divides $10^{e}-1$. In other words,

$$ m := \frac{10^{e}-1}{q'} $$

is an integer and we have

$$ \frac{p}{q} = \frac{1}{10^{k}}\cdot \frac{mpc}{10^{e}-1}. $$

Next, the factor $\frac{mpc}{10^{e}-1}$ can be divided into the integer part and the fractional part, i.e., we can write

$$ \frac{p}{q} = \frac{1}{10^{k}}\left(g + \frac{h}{10^{e}-1}\right), $$

where $g$ is an integer and $h$ is an integer in the range $0 \leq h < 10^{e}-1$. What does it mean is that, let's say the decimal expansion of $h/(10^{e}-1)$ is like

$$ 0.x_{1}x_{2}x_{3}x_{4}x_{5}x_{6}\ \cdots\ , $$

then if you multiply $10^{e}$ to it, that is, to move the decimal point to right by $e$ digits,

$$ x_{1}x_{2}\ \cdots\ x_{e}.x_{e+1}x_{e+2}x_{e+3}\ \cdots\ , $$

and then subtract $h/(10^{e}-1)$ from it,

$$\begin{align*} && x_{1}x_{2}\ \cdots\ x_{e}&.x_{e+1}x_{e+2}x_{e+3}\ \cdots\ \\ -&& 0& .x_{1}x_{2}x_{3}\ \cdots\\ \hline \end{align*}$$

then the result must be an integer because

$$ 10^{e}\cdot \frac{h}{10^{e}-1} - \frac{h}{10^{e}-1} = h, $$

which means all the digits in the fractional parts cancel each other. Hence, you should have $x_{1}=x_{e+1}$, $x_{2}=x_{e+2}$, $x_{3}=x_{e+3}$, and so on, i.e., $x_{1}\ \cdots\ x_{e}$ is the repeated digits you are looking for, and this is is precisely the value of the integer $h$. In summary,

  • $g$ is precisely the non-repeating digits,
  • $h$ is precisely the repeating digits,
  • $e$ is the period of repetition, and the number of digits in $h$ might be strictly smaller than $e$ in which case there are leading zeros in the repeated digits,
  • and then you have to shift every digit to right by $k$ digit, where $k$ is the larger one between the number of factors of $2$ or $5$ in $q$.

Well, it doesn't sound simple, right? Especially, I don't think there is a blazingly fast method for finding the smallest exponent $e$. The "obvious" method based on Euler's theorem involves two prime factorizations, one for $q$ to compute the Euler totient function, and another for the totient function $\varphi(q')$ itself in order to figure out the smallest exponent. (The smallest exponent $e$ must be a divisor of $\varphi(q')$, but that's all we know about $e$ as far as I know.) And I can imagine that typically the exponent $e$ will be way larger than $19$ so that $10^{e}$ will be much larger than what a machine word can hold.

My conclusion is this: I haven't thought about this subject more than I wrote here and also don't know about any existing literature on it, but it sounds like a decent research question. It sounds like at least as hard as the arbitrary-precision formatting of floating-point numbers, if not harder.

Also, with precision version (which is very common in statistics, like keeping two digits decimals):

For example, giving input (2,14), precision 2
Output:
14.26%
or
0.14

Again, if the divisor is not an invariant constant, I do not think there are lots of things we can do with that. Floating-point case was easier because the divisor is indeed a known constant, powers of $2$ or $10$.

I'm closing this as it sounds like an out of scope issue, but feel free to reopen if you think it's relevant.

Also, with precision version (which is very common in statistics, like keeping two digits decimals):
For example, giving input (2,14), precision 2
Output:
14.26%
or
0.14

Again, if the divisor is not an invariant constant, I do not think there are lots of things we can do with that. Floating-point case was easier because the divisor is indeed a known constant, powers of 2 or 10.

I'm closing this as it sounds like an out of scope issue, but feel free to reopen if you think it's relevant.

what about just 2 decimal places for percentages for example? It is very common in academic research.

what about just 2 decimal places for percentages for example? It is very common in academic research.

Just multiply 100 to the numerator and divide by the denominator then.

EDIT: Well, actually, if you want the correct round-to-nearest, tie-to-even behavior, then you may need to do a little bit more, like multiply 200 and add 1 and things like that. But basically that's that.