daomapsieucap/fiber-admin

Clean image file names

daomapsieucap opened this issue · 4 comments

It's great to have the feature to format the image file name into browser and server-friendly, non-accent characters.

There are many customers won't have enough basic knowledge about renaming the file name before uploading it to Media Library, we will help them cover this case. The case often happens for Vietnamese customers especially.

For example: The file name "Báo-cáo-thường-niên-2023.pdf" should be converted to "bao-cao-thuong-nien-2023.pdf" automatically.

For this feature, there should be a setting where admin can choose to turn on / off in CMS (default value is on).

WordPress already has its built-in sanitizing filename. But it's not good enough.

Case failed:

  • "ÐÕçument full of $$$.pdf" -> "DOcument-full-of-.pdf" (remaining 1 dash at the end)
  • "Really%20Ugly%20Filename--That-Is_Too Common…..png" -> "Really20Ugly20Filename--That_-_Is_Too-Common….png.pdf" (too many special chars & dots remaining)
  • "Báo-cáo-thường-niên-2023.pdf" -> "Bao-cao-thuong-nien-2023.pdf" (the B still uppercase)
  • "__example@@@file...name.pdf" -> "example@@@file.name.pdf" (too many @, remain 1 dot and trailing _)

So we still need to write our own setting.

From this ticket. Their developers had added logic for %20 and +, but in this change set we can see that % and + is considered as special chars.

In the code of sanitize_file_name(), we can see the char array stay the same since:
image

And the logic to replace special char run before the replacement of %20, so that causes the bug: % was stripped but leaving the 20
image

So I think we need to hook into the sanitize_file_name filter to handle that case ourselves.

@phongkhuu115 : We still have the incorrect case when the file name has an encoded URL, for example:
https%3A%2F%2Fgithub.com%2F.pdf --> https-3a-2f-2fgithub-com-2f.pdf which doesn't make sense since we do have the case Really%20Ugly%20Filename--That-Is_Too Common…..png.

Please try to check this case. Thanks!

@daomapsieucap: I have finished with the case you mentions above. I have updated it so it can get rid of every urlencoded char. Can you please check it out again, thank you!!