scelis/twine

HTML tags in strings

jpm-polymorph opened this issue · 48 comments

  • Twine version: unknown - from gem install - 0.10.1 according to generated files

  • Minimal input file (test-twine.txt)

[[Uncategorized]]
  [teststring]
    en = This is a <b>bold</b> html test.
  [testescaped]
    en = `An <b>escaped</b> html string.`
  • Exact twine command that's being run
    twine generate-localization-file test-twine.txt strings.xml --lang en

  • Generated output

<?xml version="1.0" encoding="utf-8"?>
<!-- Android Strings File -->
<!-- Generated by Twine 0.10.1 -->
<!-- Language: en -->
<resources>
	<!-- SECTION: Uncategorized -->
	<string name="teststring">This is a &lt;b>bold&lt;/b> html test.</string>
	<string name="testescaped">An &lt;b>escaped&lt;/b> html string.</string>
</resources>
  • Expected output
<string name="teststring">This is a &lt;b>bold&lt;/b> html test.</string>
<string name="testescaped">An <b>escaped</b> html string.</string>

Awesome, thanks for filling out the issue template 👍 (it's new, that's why I'm excited ;-))

Currenty the backticks only preserve leading and trailing spaces. Why do you want to preserve the formatting tags? And which Android method are you using to read the Strings? getString()?

I use the strings directly in my layout.xml file like so:

            <TextView
                android:layout_width="match_parent"
                android:layout_height="wrap_content"
                android:text="@string/teststring"
                android:paddingBottom="@dimen/mega_margin"
                style="@style/AppTheme.SecondaryText.Small"/>

            <TextView
                android:layout_width="match_parent"
                android:layout_height="wrap_content"
                android:text="@string/testescaped"
                style="@style/AppTheme.SecondaryText.Small"/>

When a the string is surrounded by <b>...</b> it will be typeset in bold.

I see. Do you know of a way to store the string with HTML tag formatting in the XML file so it can be retrieved in code and be used in a layout? As far as I know unescaped tags will be stripped/lost when using getString(). I'd find it strange if one hat to encode the string differently depending on where and how it's used.

I found a nice clear explanation here:
https://stackoverflow.com/questions/3235131/set-textview-text-from-html-formatted-string-resource-in-xml/18199543#18199543

Another solution is to wrap the HTML in <![CDATA[ ...raw html... ]]>. But in that case Twine would also replace the first <.

Awesome find, thank you very much! So the solution would be:

  • leave <b>, <i> and <u> alone
  • leave <!CDATA alone
  • escape all other Tags
  • maybe advise Users to use getText()

right?

That is a possibility, but feels hackish. I would suggest letting the grave (`) escaped strings mean do not escape. And also allowing substrings to be escaped.

In that case you have freedom to do things like:

`   <b>1</b>   ` is < 2 

Mh, I actually thought it'd be a lot cleaner and closer to how it's intended to work on Android. Why do you think that would be hackish?

I just prefer having one generic rule ( don't touch anything inside `` ) above having 4 exceptions ( <b>, <i>, <u>, <!CDATA ). As soon as you start adding exceptions for special cases things can become messy.

@scelis, what's your opinion?

I would rather not make the rules around Twine backtick escaping more complicated, especially as this particular issue seems to only manifest with the Android formatter. I tend to agree with @sebastianludwig's suggestions above, which don't seem hacky to me as they appear to be documented in the Android string resources documentation.

Also, why is the < encoded to &lt; and the > is not encoded to &gt;? That seems very strange to me.

That is as suggested by the Android docs 🤷‍♂️

I think we should have a way to choose if we want the < and & encoded or not. Maybe in some cases I want to run the string through Html.fromHtml() and need an encoded string. But for other strings I want to include them directly in the xml and don't want the html to be encoded. And what if I want to have a visible <b> in my string?

Consider this output containing a mix of encoded and not encoded strings:
<string name="example_html">If you want <b>bold</b> you have to enclose the substring in &lt;b&gt; and &lt;/b&gt; html tags.</string>
Which should render as:
If you want bold you have to enclose the substring in <b> and </b> html tags.

This is maybe a use case that is not so common, but preferably our solution must be versatile enough to support this. Just not encoding a specific list of strings, including <b> will not work in this case.

I don't have an Android dev environment at hand, but shouldn't these (rather exotic) cases be solvable with CDATA tags?

<string name="example_html">If you want <b>bold</b> you have to enclose the substring in <![CDATA[<b> and </b>]]> html tags.</string>

In strings.xml:
<string name="test_string">If you want <b>bold</b> you have to enclose the substring in <![CDATA[<b> and </b>]]> html tags.</string>

  • android:text="@string/test_string": If you want bold you have to enclose the substring in <b> and </b> html tags.
  • tv.setText(getText(R.string.test_string));: If you want bold you have to enclose the substring in <b> and </b> html tags.
  • tv.setText(getString(R.string.test_string));: If you want bold you have to enclose the substring in <b> and </b> html tags.
  • tv.setText(Html.fromHtml(getString(R.string.test_string)));: If you want bold you have to enclose the substring in and html tags.

This is an acceptable solution.

Another tag we should support is <a>.
Using:
<string name="test_string">You can add a <a href="http://www.google.com">link</a> to text too.</string>
Renders "link" underlined and in a different colour, indicating it is clickable.
Adding tv.setMovementMethod(LinkMovementMethod.getInstance()); will then make the text clickable and launch the browser. <a> is also sometimes used for deeplinking into a certain view in your app. Here we should also be careful not to encode the href url (http://url.com?param=1&param2=3).

This feature makes string formatting much more difficult (or impossible).
Let's say I have <string name="test_string">Hello <b>%s</b>.</string>, how am I supposed format it as: Hello world?
Thanks.

@Aqua-Ye We tried to match the behavior and expectations as explained in the Android docs here: http://developer.android.com/guide/topics/resources/string-resource.html#FormattingAndStyling

Can you use:

[test_string]
    en = Hello <b>%@</b>.

And then in your code use: tv.setText(getText(R.string.test_string));

Where do I pass "world" in your snippet?

From the doc, we should pass escaped entities in this case:

In some cases, you may want to create a styled text resource that is also used as a format string. Normally, this doesn't work because the format(String, Object...) and getString(int, Object...) methods strip all the style information from the string. The work-around to this is to write the HTML tags with escaped entities, which are then recovered with fromHtml(String), after the formatting takes place. For example:

<resources>
  <string name="welcome_messages">Hello, %1$s! You have &lt;b>%2$d new messages&lt;/b>.</string>
</resources>

Ok, so then does this work for your scenario?

[welcome_messages]
    en = Hello, %1$@! You have &lt;b>%2$d new messages&lt;/b>.
Resources res = getResources();
String text = res.getString(R.string.welcome_messages, username, mailCount);
CharSequence styledText = Html.fromHtml(text);

Yes. The problem is that you now always generate:

[welcome_messages]
    en = Hello, %1$s! You have <b>%2$d new messages</b>.

getString will always strip out <b> tag as documented.

Ah, sorry, I think I understand now. Perhaps we are being too clever with Twine escaping HTML entities and instead Twine should just pass along whatever it has to the XML file and we let developers handle this themselves when they are creating their strings. @sebastianludwig thoughts?

Take the string:
Jack & Jill
For Android we need to escape to Jack &amp; Jill, but not for iOS.

So not escaping at all won't work either. Perhaps there is some middle ground solution?

@jpm-polymorph I liked the idea of using ` to preserve the original string. Why didn't you go for this solution instead?

What if we continue to escape characters and entities that must be escaped (like &, ', etc), but leave < and > alone? Are there edge-cases that this won't handle?

I'm also not sure that ` preservation works if we want the strings to look one way on iOS and another on Android. How do we accomplish that goal? These Android escaping rules seem really complicated and intense.

Get parameters for a <a href=...> will break then:

<a href="http://www.google.com/?s=house&a=chrome">Click to search house.</a>
becoming
<a href=\"http://www.google.com/?s=house&amp;a=chrome\">Click to search house.</a>

This is becoming quite a popular use case with deep linking support in Android.

Currently I am using two separate strings in my Twine file to support both iOS and Android:
iOS: You can change this in settings.
Android: You can change this in <a href="internal://settings">settings</a>.

This is however more expensive ($$$) to get translated as we are billed per word. For now it is an ok workaround though. The main thing I need is to have unescaped html tags propagating from twine to Android. For these few cases I can have duplicate strings for iOS. Currently I am still editing the exported file by hand every time to check that the escaping didn't break the href.

If we go back to escaping the < to &lt, then using string resource references in your layouts won't work (the starting point of this change). As far as I understand it, if we don't escape the HTML tags, there's no way of replacing placeholders in a string (are we certain about this?), because getString() strips all tags and format in format(getText(), userName, count) does too (yay...).

About the backtick approach: Even though it certainly could be implemented, I strongly dislike the idea of changing your multi platform localization file depending on how it's used in one platform. It just feels wrong.

Another approach: string references in layout XML files can, by definition, only be used for strings without placeholders. If a string contains a placeholder, it must be retrieved and formatted via code. How about:

  • use the current behavior for strings without placeholders (leave styling and link tags alone)
  • escape them, if they contain a placeholder

From a users perspective that would mean:

  • all strings can be used in layout XML files, if it makes sense (they don't contain placeholders)
  • if you want to set such a string via code, use getText (as in "give me the text, I don't want to do anything with it")
  • if you need placeholder substitution, use getString

I don't think having 2 different behaviours of escaping depending if the string contains placeholders is a better solution... It will be even more confusing for developers.
The layout problem is more an Android Studio issue in my opinion, and I don't think twine should try to be smart just to make it work a bit better.

How is using strings in layouts tied to Android Studio? Isn't all that either resolved by the compiler or the Android runtime?

I'm a bit lost, could you summarize your proposed solution again? (If you have one :-))

Sorry I wasn't thinking straight about Android Studio :/
Your solution might be ok then, though it might be a little confusing.

The first answer in this thread seems to work:
https://stackoverflow.com/questions/23503642/how-to-use-formatted-strings-together-with-placeholders-in-android

I tested as follows and can confirm it works.

strings.xml:
<string name="test_string">My <b>%s</b> sentence.</string>

MainActivity.java:

TextView tv = (TextView) findViewById(R.id.helloworld);
tv.setText(Html.fromHtml(String.format(Html.toHtml(new SpannedString(getText(R.string.test_string))), "test")));

Is this sufficient @Aqua-Ye?

That works perfectly @jpm-polymorph! Thanks.

Does this mean that there is a valid workaround and that twine is functioning as best as it can? Or is there something we can do to make twine better for Android strings?

It is still a workaround because this method adds extra "\n\n" at the end of the text, and this is problematic when using with Toast.

That doesn't sound like a twine thing but an android SDK thing. Shouldn't it be easy to trim the string in code?

Yes it is easy. It was just to point out the solution proposed by @jpm-polymorph was still a workaround, and didn't seem ideal in my opinion.

@Aqua-Ye Ok, given all of this knowledge I must ask:

Is there something that Twine can do here to make this better without adding Android-specific syntax rules to the Twine text file?

@scelis Except what we said about generating different strings depending if Twine detects string format, I don't think so, and I can work with what Twine does currently :)

👍

Thanks! Let's go with this for now and we can always revisit in the future if it becomes a bigger issue or if a cleaner solution is discovered.

This still doesn't solve core issues. Why are you string escaping the < only some of the times ? This seems really broken if you want a static string such as <b>APP_Name</b> some random text we can't have a nice solution because the < is preserved and no longer works on android. The android system will always strip the < when trying to get it because it is not escaped. So you can't even do the expected fromHtml(). I don't want to have to inject a %@ and then send in a blank character because this library only escapes if there is a parameter.

Why was this choice made to not follow a common rule? If you read the documentation more closely you'll see they inform you

Normally, this doesn't work because the format(String, Object...) and getString(int, Object...) methods strip all the style information from the string.

So currently the output will not be escaped and there for the android system will strip all style information from your text.

I feel that having to wrap my string with <![CDATA[ .... ]]> really REALLY bad form as what does iOS care about having to put that in our strings and why do we have to confuse our translators with this work around?

This also doesn't support iOS we get really bad results. Currently twice will stripping <![CDATA[ but then it doesn't strip ]]> and so it renders correctly on android but it now breaks our strings on iOS.

The current solution is wrong as we are now injecting android code into a cross project cross discipline file for no real good reason and it also breaks another project...

Hey @ben-goulet, thanks for reaching out. We do not escape the < to support styled string references in layouts. These do not work with escaped styling tags (as far as I know).

Just to be sure we're talking about the same thing: In your Twine file you want to have <b>APP_Name</b> some random text and the Android app should display "APP_Name some random text", right?

If you want to retrieve static text as in <b>APP_Name</b> some random text and display it in a TextView, you should be able to do that with textView.setText(getText(R.string.myStyledStaticString)). If not, which Android Version are you using? And could you provide me with an strings.xml entry and the exact code you're using to set the text on a text view?

format() and getString() should only be used for Strings that contain placeholders. And the CDATA dance should only be necessary, if you want to display the literal <b> and your text view text needs to be "APP_Name some random text".

I'd really love a simpler solution! If you have an idea how to use one way to encode/retrieve strings that supports styling in layout references and code, I'm all ears.

These styled strings shouldn't work in the layouts as the system will strip them out. You must have them string escaped for them appear properly in the layouts. With the current implementation to have this appear correctly we have to handle the strings in two different manners to get the styled strings.

if we do getText() to get a charSet which will take a style tag this will return a different bold style then if you had your text using the tag. We use a custom font so if you use the bold tag it will preserve the font where a charSet will ignore the font causing rendering issues.

Also wrapping it in CData becomes messy as iOS isn't parsed correctly.

These styled strings shouldn't work in the layouts as the system will strip them out. You must have them string escaped for them appear properly in the layouts.

Mmhh..that's not what I'm seeing in my tests. Please see my test project. The styled string

<string name="styled">Hello <b>WORLD</b></string>

is referenced in activity_main.xml as android:text="@string/styled" and is displayed correctly with a bold WORLD. If we would escape the bold tags to

<string name="styled">Hello &lt;b>WORLD&lt;/b></string>

the string gets displayed as Hello <b>WORLD</b> which is not intended. The behaviour might have changed though. I'm testing with API level 27, are you seeing different results?

With the current implementation to have this appear correctly we have to handle the strings in two different manners to get the styled strings.

Yes, that is correct. Strings (styled ones at least) that do not contain placeholders need to be retrieved with getText(), (styled) strings with placeholders need to be retrieved with getString().

if we do getText() to get a charSet which will take a style tag this will return a different bold style then if you had your text using the <b> tag. We use a custom font so if you use the bold tag it will preserve the font where a charSet will ignore the font causing rendering issues.

This is weird. First, when you say 'charSet', do you mean CharSequence, the interface implemented by String? And you are saying that there is a difference in the behaviour of a TextView depending on the parameter type that is passed to .setText()? The actual type of getText() and getString() is String so this shouldn't even matter... Can you provide a minimal Android project showcasing the error?

Also wrapping it in CData becomes messy as iOS isn't parsed correctly.

Totally agreed. And if it currently is, I'd consider it a bug.

@ben-goulet, any updates?

Why only <b>, <i>, <u> and <a> are chosen for this special treatment and not all the tags listed in the official Android docs on the matter? What about <font>, for example?

Hey @doganov, very good point indeed! At the time this was implemented, these were the only supported tags: https://web.archive.org/web/20171001083841/https://developer.android.com/guide/topics/resources/string-resource.html#FormattingAndStyling

I've opened #278 - PRs are welcome ;-)