silverstripe/silverstripe-blog

Special characters in Category/Tag URLSegment not working

Closed this issue · 12 comments

When creating a Category/Tag for a blog through the CMS, URLSegmentFilter allows special characters (unlike the silverstripe default) and saves it as url encoded string.
However, when visiting the page, getCurrentCategory() searches for a non-urlencoded string and for a string that is filtered by URLSegmentFilter without allowing special characters.

As a result, creating category küche will save as k%C3%BCche. But when visiting /blog/category/k%C3%BCche then $this->request->param('Category') in Blog->getCurrentCategory() will actually return küche (non url encoded). And will search for the category with ->filter('URLSegment', ['küche', 'kueche']), which returns nothing.

I realise that there is a reason (#376) for allowing url encoded strings to be used. But currently that's not actually working.

We should either:

  1. change how the query is handled when visiting the page (eg do ->filter('URLSegment', ['küche', 'kueche', 'k%C3%BCche'])
  2. change how the URLSegment is created and use URLSegmentFilter like the rest of the CMS (I'd prefer that way, but we need to come up with a way how to handle empty strings)

Hi @Zauberfisch - thanks for the bug report. We have some functional tests for multibyte URLs, can you see if we have been doing something wrong with the way they’re set up? https://github.com/silverstripe/silverstripe-blog/blob/master/tests/BlogFunctionalTest.php

Actually, I only tested it in 3.x. Though looking at the 4.x code, I thought it was not fixed yet. But looking again I just found silverstripe/silverstripe-cms#2384 which might have fixed this issue.

I'll do some testing in 4.x later and will report back.
Though either way, I'd like to see it fixed in 3.x. I still have sites running 3 and if we chose solution (1) it's a non-breaking patch.

Ah ok, good to know. If the problem still exists in 4.x then we can look at fixing it, but SilverStripe 3 entered limited support in June 2018. This means we'll only be fixing critical bugs and security issues for SilverStripe 3 going forward.

You're welcome to make a pull request to fix the bug yourself, I'll happily review it.

I just stumbled over this bug on a project. It still exists with Silverstripe 4.4 and Blog 3.4 / 3.5

As a quick workaround, you can add a TextField for the URLSegment, and advise users to go in and edit the URLSegment manually for each category with special chars.

class BlogCategoryExtension extends DataExtension {
	public function updateCMSFields(FieldList $f) {
		$f->push((new TextField('URLSegment', 'URL Segment'))->setDescription('Do not use special characters'));
	}
}

For my SilverStripe 3.x deployments, I'm not sure it's worth fixing here. This workaround is "good enough".
But if the bug still exists in 4.x, we definitely need to look into this.

Yeah, this is still an issue with SilverStripe 4. We have a category called "Diversity, Equity, and Inclusion" and the commas completely break the category controller and the URLSegment has to be manually adjusted in order for it to work again.

siorp commented

Still not working on SS 4.6. I have a tag "Identità" and the link doesn't work.

@Zauberfisch's workaround did'nt work for me on SS 3.5.x
Even though I had entered the URLSegment manually into the field, still the title (Cyrillic in this case) was used and for the URLSegment genaration and it was written as multibyte characters. (edited: unclear wording)
My workaround:

class BlogCategoryExtension extends DataExtension {
	// some information/explanations for the editor
	public function updateCMSFields(FieldList $f) {
		// this does not work - even if URL segment is entered manually, another one is still created from the title (with special characters/multibyte/utf which does not work)
		// $f->push((new TextField('URLSegment', 'URL Segment'))->setDescription('Do not use special characters'));
		// therefore only hint for auto-conversion
		$f->dataFieldByName('Title')->setDescription('If possible please do not use special characters');
		$f->push((new ReadonlyField('URLSegment', 'URL Segment'))->setDescription('after saving, reload page to display the current/corrected value here for checking purposes'));
		$f->push(new LiteralField('URLSegmentTransliterationInfo','<p>The used version of the blog module has problems when special characters are inserted here.<br>An attempt is being made to "romanize" the URL segment using an extension.<br>If this does not work, please contact the administrator.</p><p>With these tools, e.g. Cyrillic or Chinese characters can be converted into Latin characters ("romanized"), u, to check the output above if necessary:<br><a href="https://www.lexilogos.com/keyboard/russian_conversion.htm" target="_blank">https://www.lexilogos.com/keyboard/russian_conversion.htm</a><br><a href="https://chinese.gratis/tools/zhuyin/" target="_blank">https://chinese.gratis/tools/zhuyin/</a></p>'));
	}

	public function onBeforeWrite() {

		// re-encode URLSegment with our Advanced Transliterator create URL-Segment without utf or multibyte characters
		// the module "derralf/silverstripe-advanced-transliterator" could be helpful to romanize Cyrillic (edited: not chinese!)

		$URLSegmentFilter = URLSegmentFilter::create();

		$orig_urlsegment = rawurldecode($this->owner->generateURLSegment());
		$corrected_urlsegment = $URLSegmentFilter->filter($orig_urlsegment);
		// $this->owner->URLSegment = $corrected_urlsegment; // won’t work, leave it out!


		// Write value directly into the database
		// and do not go via ORM: otherwise triggers onBerforeWrite again and overwrites the URLSegment again
		// see also
		// Try not calling write() as that will trigger the loops again.
		// You can use something like a direct DB::query() to update your database rather than using the ORM.
		// see https://forumarchive.silverstripe.org/community/forums/data-model-questions/show/20309
		// see https://docs.silverstripe.org/en/3/developer_guides/model/sql_query/
		// see zauberfisch https://github.com/silverstripe/silverstripe-blog/issues/605

		$sql_update = SQLUpdate::create('"BlogCategory"')->addWhere(array('ID' => $this->owner->ID));
		// Assigning a single value
		$sql_update->assign('"URLSegment"', $corrected_urlsegment);
		// Perform the update
		$sql_update->execute();
	}
}

@derralf sorry, perhaps I wasn't clear. I put the text field there so users can manually specify a URLSegment WITHOUT speciall characters. So if your title is "Küche" then I'd advise my editors to write "kueche" in the URLSegment field.

@Zauberfisch I understand that, but it didn't work for me. Nevertheless the URLSegment was created in URLSegmentExtension::onBeforeWrite from the title and my manually entered URLSegment was ignored. Maybe I use a different version of the blog module than you and it is because of that?

RVXD commented

had to use @derralf BlogCategoryExtension which works fine.
My category title contains '(' and ')'.
ie. 'Bijeenkomsten (opleidingen en masterclasses)'

Would be nice if on could turn off the the line: '$filter->setAllowMultibyte(true);' in BlogObject.

silverstripe/framework: version: 4.7.3
silverstripe/blog, version: 3.6.0

The category URL is effectively being encoded twice by the BlogController

// url encode unless it's multibyte (already pre-encoded in the database)
// see https://github.com/silverstripe/silverstripe-cms/pull/2384
if (!$filter->getAllowMultibyte()) {
$category = rawurlencode($category ?? '');
}

as the category URLs are forced to be multibyte by the BlogObject::generateURLSegment() method.

// Setting this to on. Because of the UI flow, it would be quite a lot of work
// to support turning this off. (ie. the add by title flow would not work).
// If this becomes a problem we can approach it then.
// @see https://github.com/silverstripe/silverstripe-blog/issues/376
$filter->setAllowMultibyte(true);

You can workaround this issue by setting default_allow_multibyte on SilverStripe\View\Parsers\URLSegmentFilter to true via YML.

SilverStripe\View\Parsers\URLSegmentFilter:
  default_allow_multibyte: true

WARNING: This will cause all URL segments that use that filter to allow multibyte characters so might have some unintended consequences.