emil-e/rapidcheck

string and char questions

miguel-negrao opened this issue · 1 comments

I have some questions about strings and chars.

gen::character sometimes returns the null character \0. Is this expected ? [edit] ignore, it does not, a char generator does.

Most of the characters in the strings generated by gen::string are not printable. Is this expected ?

�򁓰ӈi#5¢o�kO�)���'���9�'�<E�5��8��'�
string: '>�<0� �M*����oo��l�|M�c'
�&���Z�'('d���0�`�����'
string: 'w]��3��k��(0�/%����ߧ#'߷�=����;���9�27�'
R�����Ŭ4L�����'��r�
string: '�'Ž���m��6(�ӂ���V��uP̦'$0���7� `e����!3a��0�������'
string: '�F*�$
              ��yӍnF��� ��v�|P��S$Y�:O�"U�,�|F����s�4+� 1=��Z����Bz��7����'
string: '�����9�,��@\��&/p20=���1�
Z [��&����+5�5����f�"���(�M�&�+�����'
string: '����*
              :g\���W�7�$���?ݹ?1�(T���0ܻ-ǨI��'
string: '��_��"�q
��#@�[2��|���

Is the generator generating valid ASCII chars, or any char at all (full 255 values) ?

thanks.

I believe this behavior is by design; see here.

Generating strings of specific character classes is easy, however:

/**
 * Returns a generator that yields elements of the specified string with equal
 * probability.
 *
 * @param  s  A list of characters to draw from.
 *
 * @return A generator that produces elements of 's' with equal probability.
 */
inline rc::Gen<char> character(std::string_view s)
{
  assert(!s.empty());

  return s.size() == 1 ? rc::gen::just     (s.front())
                       : rc::gen::elementOf(s);
}

/**
 * Returns a generator that yields strings of elements of the given string.
 *
 * @param  s  A list of characters to draw from.
 *
 * @return A generator that produces strings of elements of 's'.
 */
inline rc::Gen<std::string> string(std::string_view s)
{
  return rc::gen::container<std::string>(character(s));
}

const auto blank  = " \t"s;
const auto space  = " \f\n\r\t\v"s;
const auto digit  = "0123456789"s;
const auto lower  = "abcdefghijklmnopqrstuvwxyz"s;
const auto upper  = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"s;
const auto punct  = "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"s;
const auto cntrl  = "\00\01\02\03\04\05\06\07\10\11\12\13\14\15\16\17\20\21\22\23\24\25\26\27\30\31\32\33\34\35\36\37\x7F"s;
const auto alpha  = lower + upper;
const auto alnum  = alpha + digit;
const auto graph  = alnum + punct;
const auto print  = graph + ' ';
const auto xdigit = digit + "abcdefABCDEF";

TEST_CASE("blah blah...")
{
  /**
   * Returns 'true' if every character of the given string satisfies the given
   * predicate.
   */
  auto f = [](std::string_view s, int (predicate)(int))
  {
    return std::ranges::all_of(s, predicate);
  };

  check("alnum",  [f](){return f(*string(alnum ), std::isalnum );});
  check("alpha",  [f](){return f(*string(alpha ), std::isalpha );});
  check("blank",  [f](){return f(*string(blank ), std::isblank );});
  check("cntrl",  [f](){return f(*string(cntrl ), std::iscntrl );});
  check("digit",  [f](){return f(*string(digit ), std::isdigit );});
  check("graph",  [f](){return f(*string(graph ), std::isgraph );});
  check("lower",  [f](){return f(*string(lower ), std::islower );});
  check("print",  [f](){return f(*string(print ), std::isprint );});
  check("punct",  [f](){return f(*string(punct ), std::ispunct );});
  check("space",  [f](){return f(*string(space ), std::isspace );});
  check("upper",  [f](){return f(*string(upper ), std::isupper );});
  check("xdigit", [f](){return f(*string(xdigit), std::isxdigit);});
}

This example shows that a string can be thought of as a representation of a character class from which both characters, and thus strings, can then be generated. From this perspective, string concatenation acts as a set theoretic union of character classes. Moreover, by repetition of certain characters in the the string can be used to affect the frequency with which they occur in generated strings.

Hope this helps,

Jonathon