Cube root of a real number

Question

Cube root of a real number

Opened this issue 5 years ago · 34 comments

ivan-pi commented 5 years ago

Related to #150 (non-special mathematical functions)

`cbrt` - Cube root of a real number

Description

Returns the cube root of the real number (x), that is a number (y) such that (y^3 = x).

Syntax

y = cbrt(x)

Arguments

x: A real number (x).

Return value

Returns the value (\sqrt[3]{x}), the result is of the type real and has the same kind as x.

Example

program demo_cbrt
use stdlib_experimental_math, only: cbrt
print *, cbrt(27.), cbrt(-27.)  ! outputs 3, -3
end program

As seen from the discussion on Discourse this function is semantically more accurate than writing x**(1./3.) which only works for positive real numbers and returns NaN otherwise.

A possible extension would be to allow complex arguments, ~~the return value would then be an array with 3 elements for the 3 cube roots~~ (if the number is real and non-zero, there is one real root and a conjugate pair of complex roots; a complex non-zero value will have three distinct cube roots)

Answer 1 · 2020-06-23T16:50:09.000Z

I've already got an implementation ready based upon the NSWC Mathematical Library version and using Fypp for templating of different real kinds.

Answer 2 · 2020-06-23T17:15:41.000Z

Thanks.

A possible extension would be to allow complex arguments, the return value would then be an array with 3 elements for the 3 cube roots (if the number is real and non-zero, there is one real root and a conjugate pair of complex roots; a complex non-zero value will have three distinct cube roots)

I don't use usually complex numbers, neither cube roots. However, to be in agreement with the intrinsic function sqrt, I think it should accept complex arguments. In this case, is a function still possible, or should it be a subroutine?

Answer 3 · 2020-06-23T19:46:00.000Z

Personally, I also don't have a foreseeable usage for complex cube roots.

It should however, be possible to have the the same interface for real and complex cube roots:

interface cbrt
function cbrt_sp(x) result(cbrt)
  complex(sp), intent(in) :: x
  complex(sp) :: cbrt
end function
function cbrt_complex_sp(x) result(cbrt)
  complex(sp), intent(in) :: x
  complex(sp) :: cbrt(3)
end function
end interface

A better option might be to follow the IMSL library CBRT(X) function:

For complex arguments, the branch cut for the cube root is taken along the negative real axis. The argument of the result, therefore, is greater than –π/3 and less than or equal to π/3. The other two roots are obtained by rotating the principal root by 3 π/3 and π/3.

Lahey/Fujitsu Fortran also provides a CBRT function, which returns a single number for real or complex variables.

In a thread on the Intel Fortran Forum, @sblionel shows the compiler does in fact call a special cube root routine for a**(1./3.), but this only works for a positive argument. This makes me wonder whether a simple implementation for real numbers could be simply:

function cbrt_v1(x) result(cbrt)
  real, intent(in) :: x
  real :: cbrt
  if (x >= 0.0) then
    cbrt = x**(1.0/3.0)
  else
    cbrt = -((-x)**(1.0/3.0))
  end if
end function

Digging back further, the behavior of the a**(1./3.) has changed on occasion in the Intel Fortran compiler:

Also the gfortran mailing list contains an interesting discussion on cbrt:

Gfortran and using C99 cbrt for X ** (1./3.)

@certik Do you have an opinion on this one? My understanding is that the goal of providing a CBRT function is to be "mathematically" correct, in the sense that it can also accept negative arguments. The problem however is can the behavior of CBRT differ from X**(1./3.) in terms of representation/accuracy/speed.

Answer 4 · 2020-06-24T05:03:54.000Z

See my comment on precisely this issue (and the subsequent discussion there): symengine/symengine#1644 (comment)

We should do whatever is the most consistent and document it well. It could be that we need two cbrt functions for the two most common conventions.

Answer 5 · 2020-06-24T17:14:12.000Z

How about an optional argument to decide which branch you want?

f = cbrt(z) always has 0 <= arg(f) < 2*pi/3
f = cbrt(z, k) has 2*pi*k/3 <= arg(f) < 2*pi*(k+1)/3

Answer 6 · 2020-06-24T18:38:33.000Z

As for negative real arguments, I am inclined to have cbrt(-x) == -cbrt(x) (for positive x).

Upshot: this is what we learn in school before getting introduced to complex numbers. It's what most people would expect when working with reals only.

Drawback: introduces some "gotcha" potential for people who use mix reals and complex numbers without reading the docs.

Answer 7 · 2020-06-24T19:00:47.000Z

How about an optional argument to decide which branch you want?

f = cbrt(z) always has 0 <= arg(f) < 2*pi/3
f = cbrt(z, k) has 2*pi*k/3 <= arg(f) < 2*pi*(k+1)/3

With an elemental procedure, you could pass an array of k's if you wanted different branches. Not sure, if it is useful in practice.

Answer 8 · 2020-06-24T22:35:19.000Z

The c++ cbrt has the property of cbrt(-x) == -cbrt(x) for positive x, which makes it not compatible with complex functions. It is equivalent to the Surd function in Mathematica, and specifically to the CubeRoot function. So one option is to simply have cbrt to do the same thing, returning a real number.

And if you want the complex number, you can just use **, as in cmplx(x, dp)**(1._dp/3) where x is some real (or complex) variable.

Answer 9 · 2020-06-24T23:59:34.000Z

Or better yet, cast it to complex first. I have in mind:

cbrt(-8.) == -2.
cbrt(cmplx(-8.)) == (1., sqrt(3.))
cbrt(cmplx(-8.), k=1) == (-2., 0.)

Answer 10 · 2020-06-25T05:32:47.000Z

The c++ cbrt has the property of cbrt(-x) == -cbrt(x) for positive x, which makes it not compatible with complex functions. It is equivalent to the Surd function in Mathematica, and specifically to the CubeRoot function. So one option is to simply have cbrt to do the same thing, returning a real number.

And if you want the complex number, you can just use **, as in cmplx(x, dp)**(1._dp/3) where x is some real (or complex) variable.

This is also the behavior of Octave cbrt

Answer 11 · 2020-06-26T13:13:27.000Z

MATLAB on the other hand does not provide a cbrt function (see How do you enter the command for a cube root?). It does however have the following behavior:

>> (8)^(1/3)

ans =

     2

>> (-8)^(1/3)

ans =

   1.0000 + 1.7321i

>> nthroot(-8,3)

ans =

    -2

>> help nthroot
 nthroot Real n-th root of real numbers.
 
    nthroot(X, N) returns the real Nth root of the elements of X.
    Both X and N must be real, and if X is negative, N must be an odd integer.
 
    Class support for inputs X, N:
       float: double, single

Answer 12 · 2020-06-26T15:33:11.000Z

It seems Matlab's ^ operator behaves like Fortran's ** operator for complex numbers. The nthroot seems to be like like Mathematica's Surd.

Answer 13 · 2020-06-28T12:15:43.000Z

Let's keep the ball rolling. The way I see it now, there are essentially two choices:

Simple version (no branch argument) - Code example
With optional branch variable - Code example

Comments:

For a complex root of a real variable the cbrt is more compact than using the power operator

y = cbrt(cmplx(x))
y = cmplx(x)**(1._sp/3)

y = cbrt(z)
y = z**(1._sp/3)

If we go with the 1st option the user can always retrieve the other two complex roots by rotating the principal root by 3π/3 and π/3. This can be documented with an example:

r = cmplx(-1.,sqrt(3.))/2.
j = cmplx(0.,1.)

z = -8 + 0*j

y1 = cbrt(z)       !  1.0000 + 1.7321i 
y2 = y1 * r        ! -2.0000 - 0.0000i
y3 = y1 * conjg(r) !  1.0000 - 1.7321i

With the second option, users who know exactly what they want, can easily recover a complex root from a either real or complex number, e.g.

  write(*,'(A)')  "real to real"
  
  write(*,fmtr) "cbrt( 8._sp) = ", cbrt(8._sp)
  write(*,fmtr) "cbrt(-8._sp) = ", cbrt(-8._sp)

  write(*,'(/,A)')  "real to complex"

  write(*,fmtc) "cbrt(8._sp,k=0) = ", cbrt(8._sp,k=0)
  write(*,fmtc) "cbrt(8._sp,k=1) = ", cbrt(8._sp,k=1)
  write(*,fmtc) "cbrt(8._sp,k=2) = ", cbrt(8._sp,k=2)

  write(*,'(/,A)')  "complex to complex"

  z = cmplx(-8._sp)
  write(*,fmtc) "z = ", z
  write(*,fmtc) "cbrt(z) = ", cbrt(z)
  write(*,fmtc) "cbrt(z,k=0) = ",cbrt(z,k=0)
  write(*,fmtc) "cbrt(z,k=1) = ",cbrt(z,k=1)
  write(*,fmtc) "cbrt(z,k=2) = ",cbrt(z,k=2)
  write(*,fmtc) "cbrt(z,k=3) = ",cbrt(z,k=3)

produces the following output:

real to real
cbrt( 8._sp) =  2.0000
cbrt(-8._sp) = -2.0000

real to complex
cbrt(8._sp,k=0) =  2.0000+0.0000j
cbrt(8._sp,k=1) = -1.0000+1.7321j
cbrt(8._sp,k=2) = -1.0000-1.7321j

complex to complex
z = -8.0000+0.0000j
cbrt(z) =  1.0000+1.7321j
cbrt(z,k=0) =  1.0000+1.7321j
cbrt(z,k=1) = -2.0000-0.0000j
cbrt(z,k=2) =  1.0000-1.7321j
cbrt(z,k=3) =  1.0000+1.7321j

Answer 14 · 2020-06-28T12:44:31.000Z

There is acbrt() function in Java, described here:
https://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Math.html#cbrt(double)

Returns the cube root of a double value. For positive finite x, cbrt(-x) == -cbrt(x); that is, the cube root of a negative value is the negative of the cube root of that value's magnitude. Special cases:
If the argument is NaN, then the result is NaN.
If the argument is infinite, then the result is an infinity with the same sign as the argument.
If the argument is zero, then the result is a zero with the same sign as the argument. 
The computed result must be within 1 ulp of the exact result.

Answer 15 · 2020-06-28T12:53:16.000Z

Thanks @vmagnin, we should also have tests for the special cases.

Answer 16 · 2020-06-28T19:18:19.000Z

@ivan-pi
A simple implementation could be something like:

module functions
    use iso_fortran_env, only: dp=>real64
    implicit none

    contains

    pure real(dp) function cbrt(x)
        real(dp), intent(in) :: x
        cbrt = sign(abs(x)**(1.0_dp / 3.0_dp), x)
    end function
end module functions

program main
    use functions
    real(dp) :: x

    x = 27.0_dp
    print *, x, cbrt(x), cbrt(x)*cbrt(x)*cbrt(x)

    x = -27.0_dp
    print *, x, cbrt(x), cbrt(x)*cbrt(x)*cbrt(x)

    x = -0.0_dp
    print *, x, cbrt(x), cbrt(x)*cbrt(x)*cbrt(x)
    x = 0.0_dp
    print *, x, cbrt(x), cbrt(x)*cbrt(x)*cbrt(x)

    ! Infinity:
    x = 1.0_dp/x
    print *, x, cbrt(x), cbrt(x)*cbrt(x)*cbrt(x)

    ! NaN:
    x = sqrt(-x)
    print *, x, cbrt(x), cbrt(x)*cbrt(x)*cbrt(x)

    ! -Infinity:
    x = 0.0_dp
    x = -1.0_dp/x
    print *, x, cbrt(x), cbrt(x)*cbrt(x)*cbrt(x)
end program main

which yields the following results in agreement with the Java specifications:

$ gfortran essai_cbrt.f90 && ./a.out
   27.000000000000000        3.0000000000000000        27.000000000000000     
  -27.000000000000000       -3.0000000000000000       -27.000000000000000     
  -0.0000000000000000       -0.0000000000000000       -0.0000000000000000     
   0.0000000000000000        0.0000000000000000        0.0000000000000000     
                  Infinity                  Infinity                  Infinity
                       NaN                       NaN                       NaN
                 -Infinity                 -Infinity                 -Infinity

But I don't know if it would be the fastest implementation. We call three functions in each case: SIGN(), ABS(), **
Treating each case, from the most probables to the least, with an if...else if... structure would perhaps be faster.

Answer 17 · 2020-06-28T21:06:46.000Z

That looks good. My naive implementation would be:

  pure real function cbrt(x)
    real, intent(in) :: x
    if (x >= 0.) then
      cbrt = x**(1./3)
    else
      cbrt = -((-x)**(1./3))
    end if
  end function

Unfortunately, for the value zero it does not preserve the sign:

   27.0000000       3.00000000       27.0000000    
  -27.0000000      -3.00000000      -27.0000000    
  -0.00000000       0.00000000       0.00000000    
   0.00000000       0.00000000       0.00000000    
         Infinity         Infinity         Infinity
              NaN              NaN              NaN
        -Infinity        -Infinity        -Infinity

Concerning speed, I've prepared a small benchmark and the difference is not that large. I've used Fypp to create some simple benchmarking macros:

#:def NTIC(n=1000)
  #:global BENCHMARK_NREPS
  #:set BENCHMARK_NREPS = n
  block
    use, intrinsic :: iso_fortran_env, only: int64, dp => real64
    integer(int64) :: benchmark_tic, benchmark_toc, benchmark_count_rate
    integer(int64) :: benchmark_i
    real(dp) :: benchmark_elapsed
    call system_clock(benchmark_tic,benchmark_count_rate)
    do benchmark_i = 1, ${BENCHMARK_NREPS}$
#:enddef

#:def NTOC(*args)
    #:global BENCHMARK_NREPS
    end do
    call system_clock(benchmark_toc)
    benchmark_elapsed = real(benchmark_toc - benchmark_tic)/real(benchmark_count_rate)
    benchmark_elapsed = benchmark_elapsed/${BENCHMARK_NREPS}$
  #:if len(args) > 0
    ${args[0]}$ = benchmark_elapsed
  #:else
    write(*,*) "Average time is ",benchmark_elapsed," seconds."
  #:endif
  end block
  #:del BENCHMARK_NREPS
#:enddef

module cbrt_mod

  implicit none
  public

contains

  elemental real function cbrt1(x)
    real, intent(in) :: x
    if (x >= 0.) then
      cbrt1 = x**(1./3)
    else
      cbrt1 = -((-x)**(1./3))
    end if
  end function

  elemental real function cbrt2(x)
    real, intent(in) :: x
    cbrt2 = sign(abs(x)**(1.0 / 3.0), x)
  end function

end module

program main

  use cbrt_mod
  implicit none
  integer, parameter :: n = 1000000
  real :: x(n), y(n), z(n)

  call random_number(x)

  @:NTIC(1000)
  y = cbrt1(x)
  @:NTOC()

  @:NTIC(1000)
  z = cbrt2(x)
  @:NTOC()

  ! We need to print something, otherwise the compiler
  ! seems to skip the calculation completely...
  print *, maxval(abs(y-z)), sum(abs(y-z))

end program

Output:

$ fypp cbrt_benchmark.fypp > cbrt_benchmark.f90
$ gfortran -Wall -O3 ./cbrt_benchmark.f90 -o cbrt_benchmark
$ ./cbrt_benchmark 
 Average time is    1.4338764190673828E-002  seconds.
 Average time is    1.6593759536743163E-002  seconds.
   0.00000000       0.00000000

Answer 18 · 2020-06-28T21:14:24.000Z

With the Intel Fortran compiler there is practically no difference:

$ fypp cbrt_benchmark.fypp > cbrt_benchmark.f90
$ ifort -O3 ./cbrt_benchmark.f90 -o cbrt_benchmark
$ ./cbrt_benchmark 
 Average time is   3.396259069442749E-003  seconds.
 Average time is   3.438067913055420E-003  seconds.
  0.0000000E+00  0.0000000E+00

Edit: I realized I was only sampling positive values... If I add an extra line with x = 54*x - 27 after the call to random_number to make the x values span the range [-27,27), I get slightly different timings:

$ fypp cbrt_benchmark.fypp > cbrt_benchmark.f90
$ gfortran -O3 ./cbrt_benchmark.f90 -o cbrt_benchmark
$ time ./cbrt_benchmark 
 Average time is    1.7529647827148439E-002  seconds.
 Average time is    1.6589702606201170E-002  seconds.
   0.00000000       0.00000000    

real	0m34,137s
user	0m34,131s
sys	0m0,004s
$ ifort -O3 ./cbrt_benchmark.f90 -o cbrt_benchmark
$ time ./cbrt_benchmark 
 Average time is   1.533928298950195E-002  seconds.
 Average time is   3.471867084503174E-003  seconds.
  0.0000000E+00  0.0000000E+00

real	0m18,838s
user	0m18,821s
sys	0m0,008s

Your sign/abs/** version looks like the ~~clear~~ winner now. :)

Answer 19 · 2020-06-29T07:11:09.000Z

Very counter-intuitive... We don't know what do exactly the compilers. With -O3 there is probably inlining in cbrt2().

Considering only ifort, my cbrt2() does not change with negative values. Normal.

But why your cbrt1() is x5 longer !? There is a jump to the negative case, and two sign changes, but 5x times longer seems unreasonable... The gfortran behavior seems therefore OK, but ifort ???

It is also amazing that in most cases ifort gives a 5x faster code than gfortran for such simple calculations. Does ifort forces some kind of parallelism inside the processor ? (SSE vectorisation ?)

Perhaps it could be interesting to add -mtune=native -march=native to the gfortran command.

Answer 20 · 2020-06-29T08:54:49.000Z

Precision loss with very big and small values:

    x = 1d300
    print *, x, cbrt(x), cbrt(x)*cbrt(x)*cbrt(x)

    x = 1d-300
    print *, x, cbrt(x), cbrt(x)*cbrt(x)*cbrt(x)

   1.0000000000000001E+300   9.9999999999998719E+099   9.9999999999996154E+299
   1.0000000000000000E-300   1.0000000000000128E-100   1.0000000000000385E-300

Answer 21 · 2020-06-29T09:02:44.000Z

Which version of ifort are you running @ivan-pi?

Playing around on godbolt.org (https://godbolt.org/z/ZLaUcW) shows ifort vectorizing over 4 elements by calling a cbrtf4 function (presumably from the Intel library) for both implementations. For some reason the branching implementation appears to call cbrtf4 twice?

gfortran (https://godbolt.org/z/PYTBfE) calls powf and compiling with -fopt-info-vec-missed shows that neither implementation is vectorized.

Apart from any clever compiler optimisations, I would expect the sign(abs(x)) implementation to be faster since it does not require a conditional. Branch prediction isn't possible for this random test case, so there will be many branch mispredictions which each stall the CPU pipeline. This may be hurting the vectorized version more in ifort than the unvectorized version in gfortran maybe?

Answer 22 · 2020-06-29T10:13:08.000Z

@LKedward
thank you, very interesting!
Yes, if the sign is always changing ifort can't use vectors with the branching version...
It's amazing to see what such a simple example can reveal !

https://arcb.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/accessibility/simdmath/ppu_spu/cbrtf4.html

The cbrtf4 function computes the real cube root of each element in the input vectors.

https://www.ibm.com/support/knowledgecenter/en/SSGH4D_15.1.2/com.ibm.xlf151.aix.doc/proguide/mass_simd.html

Answer 23 · 2020-06-29T11:36:13.000Z

It reveals also the limits of benchmarking: one implementation could be better with some compilers, some CPU but not with other (Intel, AMD, ARM...). And worse, one implementation could be better with deterministic algorithm, and another implementation with Monte Carlo algorithms... Those branch prediction mechanisms not only introduce security problems but also make benchmarking very delicate....

Answer 24 · 2020-06-29T11:54:53.000Z

Nice findings!

Indeed, apart from different processors and compiler settings there are also more subtle issues with benchmarking related to noise and measurement statistics and how the process interacts with the operating system. The README of the BenchmarkingTools.jl Julia package contains some information.

As @certik has said in a few earlier issues, but I've come to understand now, it is important we agree on an intuitive API and provide a reference implementation with the correct behavior. Optimized implementations for different platforms will hopefully come in later as more users or even hardware vendors get involved.

Answer 25 · 2020-06-29T13:24:09.000Z

Yes, if the sign is always changing ifort can't use vectors with the branching version

Interestingly, it looks like the ifort version is actually able to vectorise the branching version, using a mask (cbrtf4_mask); but this means calling cbrtf4 twice with branching logic which together probably causes the slow down.

It reveals also the limits of benchmarking: one implementation could be better with some compilers

Yep, this is a good point.

As @certik has said in a few earlier issues, but I've come to understand now, it is important we agree on an intuitive API and provide a reference implementation with the correct behavior. Optimized implementations for different platforms will hopefully come in later as more users or even hardware vendors get involved.

I agree, optimization isn't the focus for stdlib currently, but as @vmagnin has pointed out different numerical implementations may also have different edge-case behaviours such as loss of precision which are worth being aware of.

Answer 26 · 2020-06-29T13:52:39.000Z

different numerical implementations may also have different edge-case behaviours such as loss of precision which are worth being aware of.

It implies that if a "naive" implementation is used at first, its limits should be clearly stated in the source code and documentation.
Precision problems caused the Patriot missile bug: http://www-users.math.umn.edu/~arnold/disasters/patriot.html

Answer 27 · 2020-07-10T06:16:20.000Z

As for implementation, it seems good enough to pass abs(x) to the C math.h cbrt/cbrtf/cbrtl function, and then multiply by the appropriate sign or phase factor, using ieee_copy_sign if we really want to be sure that infinities and signed zeros are handled correctly (modulo whatever floating-point sins the compiler commits in the name of optimization).

Answer 28 · 2020-07-10T08:50:07.000Z

As far as I can understand, the C version already works correctly for negative numbers.
(As a side note: I've tried porting the C version to Fortran: https://gist.github.com/ivan-pi/5cf86ba198bc497331fba3d3a1a07c59 with promising results. There are however issues with portability due to the endianness.)

This leaves us to figure out our own version for complex roots.

Answer 29 · 2020-07-10T10:27:49.000Z

@ivan Jose Pulido Sanchez <ijpulidos@unal.edu.co> I tried your code with this tes codet: do i = 1,100 call random_number( x ) x = x * 10.0_dp**i x3 = cbrt(x) write(*,*) i, x, abs(x - x3**3), abs(x - x3**3)/x enddo and the relative error was either zero or at most 3.1e-16 over the whole range. I guess that this shows that the function is sufficiently accurate. (Note: this restores the original value instead of comparing two different ways of calculating the cube root) Regards, Arjen Op vr 10 jul. 2020 om 10:50 schreef Ivan <notifications@github.com>:

…

As far as I can understand, the C version already works correctly for negative numbers. (As a side note: I've tried porting the C version to Fortran: https://gist.github.com/ivan-pi/5cf86ba198bc497331fba3d3a1a07c59 with promising results.) This leaves us to figure out our own version for complex roots. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#214 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAN6YR55FQ2XQ3QCUZTYO5TR23I45ANCNFSM4OF3DRVQ> .

Answer 30 · 2020-07-10T11:25:45.000Z

Thanks @arjenmarkus for the test! I reran it with the original C libm version by the side:

  interface
    pure function c_cbrt(x) bind(c,name="cbrt")
      use iso_c_binding, only: c_double
      real(c_double), value :: x
      real(c_double) :: c_cbrt
    end function
  end interface

  do i = 1, 100
    call random_number(x)
    x = x*10._dp**i
    x3 = cbrt(x)
    cx3 = c_cbrt(x)
    write(*,*) i, x, abs(x - x3**3)/x, abs(x - cx3**3)/x
  end do

I get some small differences in the last places:

          76   4.0638919578662523E+075   1.9770924779982508E-016   1.9770924779982508E-016
          77   7.6742391032182145E+076   1.6751503544737001E-016   3.3503007089474002E-016
          78   7.0211150593663916E+076   0.0000000000000000        0.0000000000000000
          79   5.1811233562026912E+078   1.5879804862696962E-016   1.5879804862696962E-016
          80   3.6709083775467760E+079   1.7930216590378397E-016   1.7930216590378397E-016
          81   1.2689247125165195E+080   0.0000000000000000        4.1496666677608811E-016
          82   1.9609377628799389E+081   4.2964052674018527E-016   4.2964052674018527E-016
          83   6.0802906214348435E+082   0.0000000000000000        0.0000000000000000
          84   5.4375757115883662E+083   1.9832328300050751E-016   3.9664656600101501E-016
          85   6.0508530767420035E+084   2.8515592178725947E-016   1.4257796089362973E-016
          86   5.3970844933448210E+085   1.2787916059682132E-016   1.2787916059682132E-016
          87   3.7468781425583570E+086   1.4735993185149220E-016   1.4735993185149220E-016
          88   9.1656756575841826E+087   1.9276779266309714E-016   1.9276779266309714E-016
          89   6.2174751629903578E+088   2.2733949308498420E-016   2.2733949308498420E-016
          90   8.4055194736970552E+089   0.0000000000000000        0.0000000000000000
          91   8.1811243675954609E+090   2.2114947934287768E-016   2.2114947934287768E-016
          92   2.1849450318528865E+091   1.6561070122654541E-016   3.3122140245309081E-016
          93   5.0864521082672201E+092   1.1382402387030692E-016   4.5529609548122767E-016
          94   6.6274040704877999E+093   0.0000000000000000        2.7954737753913594E-016
          95   2.1064887821671668E+094   1.7590157075425002E-016   1.7590157075425002E-016
          96   1.6334791213177604E+094   2.2683772368054151E-016   2.2683772368054151E-016
          97   5.3161169157241797E+096   1.7843264361368490E-016   1.7843264361368490E-016
          98   6.4011877353899199E+097   1.1854909860403442E-016   3.5564729581210328E-016
          99   4.0327657707019941E+098   1.5053788475170072E-016   4.5161365425510220E-016
         100   4.7072761867901112E+099   0.0000000000000000        4.1269490362120304E-016

If I output as hexadecimals, I can indeed see some differences do remain, meaning my port is not a perfect match to the C one available on my platform.

Answer 31 · 2020-07-10T11:43:04.000Z

I pretty much doubt you can get closer: the Fortran and C compilers are likely to use slightly different ordering of the machine instructions. I tried with gfortran and Intel Fortran and they also gave slightly different results, but always in the order of the last few bytes. BTW, gfortran complained about overflow in some of the constants, so I had to use -fno-range-check to compile the code. Not a showstopper, but still it might be a complication. Regards, Arjen Op vr 10 jul. 2020 om 13:26 schreef Ivan <notifications@github.com>:

…

Thanks @arjenmarkus <https://github.com/arjenmarkus> for the test! I reran it with the original C libm version by the side: interface pure function c_cbrt(x) bind(c,name="cbrt") use iso_c_binding, only: c_double real(c_double), value :: x real(c_double) :: c_cbrt end function end interface do i = 1, 100 call random_number(x) x = x*10._dp**i x3 = cbrt(x) cx3 = c_cbrt(x) write(*,*) i, x, abs(x - x3**3)/x, abs(x - cx3**3)/x end do I get some small differences in the last places: 76 4.0638919578662523E+075 1.9770924779982508E-016 1.9770924779982508E-016 77 7.6742391032182145E+076 1.6751503544737001E-016 3.3503007089474002E-016 78 7.0211150593663916E+076 0.0000000000000000 0.0000000000000000 79 5.1811233562026912E+078 1.5879804862696962E-016 1.5879804862696962E-016 80 3.6709083775467760E+079 1.7930216590378397E-016 1.7930216590378397E-016 81 1.2689247125165195E+080 0.0000000000000000 4.1496666677608811E-016 82 1.9609377628799389E+081 4.2964052674018527E-016 4.2964052674018527E-016 83 6.0802906214348435E+082 0.0000000000000000 0.0000000000000000 84 5.4375757115883662E+083 1.9832328300050751E-016 3.9664656600101501E-016 85 6.0508530767420035E+084 2.8515592178725947E-016 1.4257796089362973E-016 86 5.3970844933448210E+085 1.2787916059682132E-016 1.2787916059682132E-016 87 3.7468781425583570E+086 1.4735993185149220E-016 1.4735993185149220E-016 88 9.1656756575841826E+087 1.9276779266309714E-016 1.9276779266309714E-016 89 6.2174751629903578E+088 2.2733949308498420E-016 2.2733949308498420E-016 90 8.4055194736970552E+089 0.0000000000000000 0.0000000000000000 91 8.1811243675954609E+090 2.2114947934287768E-016 2.2114947934287768E-016 92 2.1849450318528865E+091 1.6561070122654541E-016 3.3122140245309081E-016 93 5.0864521082672201E+092 1.1382402387030692E-016 4.5529609548122767E-016 94 6.6274040704877999E+093 0.0000000000000000 2.7954737753913594E-016 95 2.1064887821671668E+094 1.7590157075425002E-016 1.7590157075425002E-016 96 1.6334791213177604E+094 2.2683772368054151E-016 2.2683772368054151E-016 97 5.3161169157241797E+096 1.7843264361368490E-016 1.7843264361368490E-016 98 6.4011877353899199E+097 1.1854909860403442E-016 3.5564729581210328E-016 99 4.0327657707019941E+098 1.5053788475170072E-016 4.5161365425510220E-016 100 4.7072761867901112E+099 0.0000000000000000 4.1269490362120304E-016 If I output as hexadecimals, I can indeed see some differences do remain, meaning my port is not a perfect match to the C one available on my platform. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#214 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAN6YR2TSBWQTZVITOB7T53R233EPANCNFSM4OF3DRVQ> .

Answer 32 · 2020-07-10T12:04:21.000Z

BTW, gfortran complained about overflow in some of the constants, so I had to use -fno-range-check to compile the code. Not a showstopper, but still it might be a complication.

I have learned from @kargl that the -fno-range-check flag is not needed with gfortran 10.1.

Answer 33 · 2022-01-19T23:02:57.000Z

Just found this interesting implementation of a cube root from Takuya Ooura.

The usage license is the following:

copyright
    Copyright(C) 1996 Takuya OOURA (email: ooura@mmm.t.u-tokyo.ac.jp).
    You may use, copy, modify this code for any purpose and 
    without fee. You may distribute this ORIGINAL package.

The code in dcbrt.f:

! cubic root function in double precision
!
      function dcbrt(x)
      implicit real*8 (a - h, o - z)
      dimension c(0 : 23)
      parameter (
     &    p2pow16 = 65536.0d0, 
     &    p2pow48 = 281474976710656.0d0)
      parameter (
     &    p2powm16 = 1 / p2pow16, 
     &    p2powm48 = 1 / p2pow48)
      data c / 
     &    1.5319394088521d-3, -1.8843445653409d-2, 
     &    1.0170534986000d-1, -3.1702448761286d-1, 
     &    6.3520892642253d-1, -8.8106985991189d-1, 
     &    1.0517503764540d0, 4.2674123235580d-1, 
     &    1.5079083659190d-5, -3.7095709111375d-4, 
     &    4.0043972242353d-3, -2.4964114079723d-2, 
     &    1.0003913718511d-1, -2.7751961573273d-1, 
     &    6.6256121926465d-1, 5.3766026150315d-1, 
     &    1.4842542902609d-7, -7.3027601203435d-6, 
     &    1.5766326109233d-4, -1.9658008013138d-3, 
     &    1.5755176844105d-2, -8.7413201405100d-2, 
     &    4.1738741349777d-1, 6.7740948115980d-1 / 
      if (x .eq. 0) then
          dcbrt = 0
          return
      end if
      if (x .gt. 0) then
          w = x
          y = 0.5d0
      else
          w = -x
          y = -0.5d0
      end if
      if (w .gt. 8) then
          do while (w .gt. p2pow48)
              w = w * p2powm48
              y = y * p2pow16
          end do
          do while (w .gt. 8)
              w = w * 0.125d0
              y = y * 2
          end do
      else if (w .lt. 1) then
          do while (w .lt. p2powm48)
              w = w * p2pow48
              y = y * p2powm16
          end do
          do while (w .lt. 1)
              w = w * 8
              y = y * 0.5d0
          end do
      end if
      if (w .lt. 2) then
          k = 0
      else if (w .lt. 4) then
          k = 8
      else
          k = 16
      end if
      u = ((((((c(k) * w + c(k + 1)) * w + 
     &    c(k + 2)) * w + c(k + 3)) * w + 
     &    c(k + 4)) * w + c(k + 5)) * w + 
     &    c(k + 6)) * w + c(k + 7)
      dcbrt = y * (u + 3 * u * w / (w + 2 * u * u * u))
      end
!

I haven't tested the accuracy, speed, or behavior for special values. If someone can decipher the algorithm I'd be interested to read the explanation.

Answer 34 · 2022-01-28T03:44:36.000Z

If someone can decipher the algorithm I'd be interested to read the explanation.

Sure: it seems it's a rational function approximation (the last two lines), the c is an array of coefficients. Before that it seems it is adjusting this approximation based on which interval you are on ("argument reduction").