fortran-lang/stdlib

Routines to convert real/complex values to character values

Opened this issue ยท 9 comments

The general issue to solve with this is the conversion from intrinsic data types to character values, which is in Fortran only possible by internal IO:

write(string, '(g0)') val

This is error prone since the character variable has to be a fixed size character variable with sufficient length and this approach is not usable in a functional way. In #69 such functionality for stdlib has been briefly discussed. This issue should work out the details for such routines.

Converting integers and logicals to character values without internal IO is quite straight-forward (see #336), somewhat less straight-forward is reliably converting real and complex values to character sequences.

To get started such routines for the real and complex case could be implemented by internal IO until we found a robust way to handle all exceptional values (NaN, Inf, -Inf) and correctly write the decimal places and exponent for a given real or complex value without resorting to internal IO or excessive use of logarithms and exponentials.

A few points for discussion:

  • Should the characters be output in the default formatting '(G0)'? This would be equivalent to the C++ function std::to_string

  • Should the real/complex string formatting settings be adjustable at a global level using some subroutines?

  • Should conversion of complex number to characters be formatted like in scripting languages? (Even if not I suggest we open a separate issue for a function returning a format specifier for complex number).

    • Python
    >>> a = 2 + 3j
    >>> a
    (2+3j)
    >>> str(a)
    '(2+3j)'
    
    • MATLAB:
    >> a = 2 + 3i
    
    a =
    
       2.0000 + 3.0000i
    
    >> num2str(a)
    
    ans =
    
        '2+3i'
    
    • Julia:
    a = 1 + 2im
    println(string(a))
    
    $julia main.jl
    1 + 2im
    
    • Perl
    use Math::Complex; 
    $a = Math::Complex->new(3,5); 
    $b = Math::Complex->new(2,-2); 
    $c = $a * $b; 
    print "c = $c\n"; 
    
    c = 16+4i 
    

IMO, (g0) should be the default but there should be an optional argument that allows user-defined formatting.

IMO, (g0) should be the default but there should be an optional argument that allows user-defined formatting.

Would it be an optional argument, to_chars(x [, fmt]), or do we really want two functions, say to_chars (default characters) / to_fchars (formatted characters)?

As an example in fortran-utils @certik had the following:

pure function str_real(r) result(s)
! Converts the real number "r" to string with 7 decimal digits.
real(dp), intent(in) :: r
character(len=*), parameter :: fmt="(f0.6)"
character(len=str_real_len(r, fmt)) :: s
write(s, fmt) r
end function

pure function str_real_n(r, n) result(s)
! Converts the real number "r" to string with 'n' decimal digits.
real(dp), intent(in) :: r
integer, intent(in) :: n
character(len=str_real_len(r, "(f0." // str_int(n) // ")")) :: s
write(s, "(f0." // str_int(n) // ")") r
end function

Should the user formatting re-use the Fortran formatting conventions or do we want to adopt something like the Format Specification Mini-Language in Python?

Personally I would be in favor of having a function called format (like the Python .format() or the C++ 20 std::format) for formatted string conversion, but I'm not sure we can really get there in standard Fortran. Probably we would need to limit the number of arguments and use the class(*) approach like in M_msg.

A stdlib_format module would be nice to implement a more widely adopted formatting convention, independently of this issue.

Formatting reals is more complicated than just the decimal places. First, there are plenty of formatters for reals available:

print'(f10.4)', 42.27e-3
print'(d10.4)', 42.27e-3
print'(g10.4e1)', 42.27e-3
print'(e10.4e1)', 42.27e-3
print'(es10.4e1)', 42.27e-3
end
    0.0423
0.4227D-01
 0.4227E-1
 0.4227E-1
 4.2270E-2

I usually prefer to use es for printing out reals. So in case we allow formatters, we should require passing the complete format string.

For the default conversion, I would strongly prefer that the round-trip real -> character -> real works as lossless as possible.

Would it be an optional argument, to_chars(x [, fmt]), or do we really want two functions, say to_chars (default characters) / to_fchars (formatted characters)?

I don't have a preference as long as there is a unique name (e.g. to_chars, through an interface block) to call the two functions (e.g. to_dchars and to_fchars).

We might be able to port something like flang's self-contained format-double.c to Fortran to get accurate writing of floating point numbers in stdlib.

https://urbanjost.github.io/M_msg/man3.html
has two routines that might be of interest -- fmt and str. There are other conversion routines in M_strings.f90 and some overloading as well in M_overload including letting INT(), REAL(), and so on take string inputs and convert them to numeric values, and some overloading in those and M_overload.f90 at the same site, so things like 'the value is'//10 automatically produce a string, and
'30'+1 produces 31 as an example of a few approaches.

Hello, I improved num2str in the forlab package, which can transform the integer, float , and complex type into strings with good results. Hope that can help you.
This should belong to stdlib_ string.fypp routine.
In addition, I particularly look forward to stdlib_ io.fypp joining the perfect disp routines in forlab. They are easy-to-use. Their effects are as follows :
Test program is here.

 forlab_num2str.fypp :
 ---------------------- 
 num2str(c):
 (1.00000000,1.00000000)
 (1.00,1.00)
 num2str(i):
 2
 2
 num2str(r):
 1.00000000
 1.00
 forlab_disp.fypp :
 ----------------------
 disp(ones(c)):
          (1.000,1.000)           (1.000,1.000)
          (1.000,1.000)           (1.000,1.000)
 disp(ones(i)):
          1           1
          1           1
 disp(ones(r)):
  1.000       1.000
  1.000       1.000
 disp(l)
          T           T           T
          T           T           T
 ----------------------
 disp(ones(c)):
     (0.1000E-03,0.000)           (1.000,1.000)
          (1.000,1.000)           (1.000,1.000)
 disp(ones(i)):
          1           1
          1           1
 disp(ones(r)):
 0.1000E-03   0.000
  1.000       1.000

Thanks @zoziha, in this issue I was looking for a possibility to print real and complex numbers without internal IO. It looks like num2str is using the internal IO functionality and is therefore more relevant for #435.