dart-lang/sdk

substring does not support UTF-16

maxlapides opened this issue ยท 12 comments

"๐Ÿ•".substring(0, 1);
> "๏ฟฝ"

This issue came up because a user inputted an emoji into a data field. Attempting to render Text("๐Ÿ•".substring(0, 1)) in Flutter results in:

flutter: โ•โ•โ•ก EXCEPTION CAUGHT BY RENDERING LIBRARY โ•žโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
flutter: The following ArgumentError was thrown during performLayout():
flutter: Invalid argument(s): string is not well-formed UTF-16
flutter:
flutter: When the exception was thrown, this was the stack:
flutter: #0      ParagraphBuilder.addText (dart:ui/text.dart:1157:7)
flutter: #1      TextSpan.build 
package:flutter/โ€ฆ/painting/text_span.dart:172
flutter: #2      TextPainter.layout 
package:flutter/โ€ฆ/painting/text_painter.dart:352
flutter: #3      RenderParagraph._layoutText 
...
โฏ flutter doctor -v
[โœ“] Flutter (Channel beta, v1.0.0, on Mac OS X 10.14.2 18C54, locale en-US)
    โ€ข Flutter version 1.0.0 at /Users/maxlapides/flutter
    โ€ข Framework revision 5391447fae (9 weeks ago), 2018-11-29 19:41:26 -0800
    โ€ข Engine revision 7375a0f414
    โ€ข Dart version 2.1.0 (build 2.1.0-dev.9.4 f9ebf21297)

[โœ“] Android toolchain - develop for Android devices (Android SDK 28.0.3)
    โ€ข Android SDK at /Users/maxlapides/Library/Android/sdk
    โ€ข Android NDK location not configured (optional; useful for native profiling support)
    โ€ข Platform android-28, build-tools 28.0.3
    โ€ข ANDROID_HOME = /Users/maxlapides/Library/Android/sdk
    โ€ข Java binary at: /Applications/Android Studio.app/Contents/jre/jdk/Contents/Home/bin/java
    โ€ข Java version OpenJDK Runtime Environment (build 1.8.0_152-release-1248-b01)
    โ€ข All Android licenses accepted.

[โœ“] iOS toolchain - develop for iOS devices (Xcode 10.1)
    โ€ข Xcode at /Applications/Xcode.app/Contents/Developer
    โ€ข Xcode 10.1, Build version 10B61
    โ€ข ios-deploy 1.9.4
    โ€ข CocoaPods version 1.6.0.beta.1

[โœ“] Android Studio (version 3.3)
    โ€ข Android Studio at /Applications/Android Studio.app/Contents
    โ€ข Flutter plugin version 31.3.3
    โ€ข Dart plugin version 182.5124
    โ€ข Java version OpenJDK Runtime Environment (build 1.8.0_152-release-1248-b01)

[โœ“] VS Code (version 1.30.2)
    โ€ข VS Code at /Applications/Visual Studio Code.app/Contents
    โ€ข Flutter extension version 2.22.1

[โœ“] Connected device (2 available)
    โ€ข SAMSUNG SM G920A โ€ข 03157df3c5558209                     โ€ข android-arm64 โ€ข Android 5.1.1 (API 22)
    โ€ข iPhone XS        โ€ข 9CFC4B23-BE03-4DC9-B1CD-5E1226F5A183 โ€ข ios           โ€ข iOS 12.1 (simulator)

โ€ข No issues found!

Here's my workaround for now:

String.fromCharCode(str.runes.first)

A substring method that works on code points rather than UTF-16 code units might be inefficient. Eg, taking a substring near the end of a very long string would be slow, because it would have to iterate through most of the string counting code points.

The substring documentation doesn't make it clear that the method operates on code units. Can we update it to explain this, similarly to the good explanation given on the [] operator?

I am having the exact same problem. My case is I want to programmatically delete (backspace) a text input which may have emojis. So far, my workaround is as follows:

var s = "abc๐Ÿ˜€";
var sRunes = s.runes;
print(String.fromCharCodes(sRunes, 0, sRunes.length-1));

And make sure users do not input those emojis which have length 4.

FYI, My another workaround https://stackoverflow.com/a/56135774/348719 which is currently broken.

I have found another workaround , is more respource expensive but so far it works with all emojis.

In my particular case i wanted to do a subtring [EX from index 0 to 16], and count an emoji as an individual character , however it was just getting half of the text due to the emojis in it

My workaround is this one

Create a function called runeSubtring() like this one

String runeSubstring({String input , int start , int end}){
  String finalString = ''; //initialize the string
  List individualRunes = input.runes.toList(); //convert the string to a list of runes
  individualRunes.sublist(start,end).forEach((rune) { //"substring" the list
          String character = String.fromCharCode(rune); //convert the list back to the string one by one
          finalString = finalString + character;
});
  return finalString; //return the substring
}

and just use it like this when you like

String example = r'Example \ud83d\ude13  Example \ud83d\ude13';
String result = runeSubstring(input: example,start: 0,end:10);
String resultEnd = runeSubstring(input: example,start: 11); //from 11 to the end

If the text is really big, you should do input.runes.toList(); outside the function since it will leverage the charge of converting text to runes and to a list everythime the function is called.

See #28404 and dart-lang/language#34, long discussions about making correct String manipulation easier. We will probably close this issue as being one example of the bigger issue.

@LiteCatDev String.fromCharCodes takes an Iterable of rune values, so you can simplify your code to:

String runeSubstring({String input, int start, int end}) {
  return String.fromCharCodes(input.runes.toList().sublist(start, end));
}

dart-lang/language#685 is our current attempt at supporting this via a package; closing the present issue in favor of that

Similar issue I need to solve have posted problem here

https://stackoverflow.com/questions/68518125/flutter-dart-how-to-trim-if-special-characters-are-present

please give your inputs thanks