geonetwork/core-geonetwork

URLs ending in 1,2,3 or 5 within text blocks are truncated, making the link invalid

duncanw opened this issue · 1 comments

Describe the bug
If a multi-line text field (e.g the abstract) in a metadata record contains a URL that ends in 1,2,3 or 5 (e.g https://doi.org/10.21420/TTQ0-SR11), the digit(s) on the end of the URL are omitted when the record is viewed.

To Reproduce
Steps to reproduce the behavior:

  1. Go to Contribute > Add new record
  2. Select the template and group then click Create
  3. In a multi-line text field, add some text containing such a URL, e.g "This is a link: https://doi.org/10.21420/TTQ0-SR11 [new line]...and it is borked"
  4. Fill in all the mandatory fields
  5. Save and view the new record
  6. The link in the multi-line text field is broken, in this example the href value will be https://doi.org/10.21420/TTQ0-SR

Expected behavior
The full, correct link should be rendered and clickable in the record view.

Screenshots
In the edit page:
image

In the view page:
image

Inspecting the view page link:
image

Log file
N/A

Desktop (please complete the following information):

  • Browser: Microsoft Edge for Business Version 125.0.2535.92 (Official build) (64-bit)
  • GeoNetwork Version: 4.4.2.0
  • Server Application: docker container: jdk-11.0.22+7 and Jetty 9.4.54.v20240208

Additional context
N/A

My guess is this is caused by incorrect XML character encoding in this regex in core-geonetwork > web/src/main/webapp/xslt/common/utility-tpl.xsl:
<xsl:analyze-string select="$string" regex="(http|https|ftp)://[^\s()&gt;&lt;]+[^\s``!()\[\]&amp;#123;&amp;#125;;:'&apos;&quot;.,&gt;&lt;?«»“”‘’]">