tholu/php-packer

UTF-8 Characters Broken?

sdkcarlos opened this issue · 10 comments

Running the example PHP code:

$jsFilePath = 'javascript/script.js';

$jsCode = file_get_contents($jsFilePath);

$packer = new Packer($jsCode, 'Normal', true, true, true);

$obfuscatedCode = $packer->pack();

with the following JavaScript:

var ArtyomVoicesIdentifiers = {
    // German
    "de-DE": ["Google Deutsch", "de-DE", "de_DE"],
    // Spanish
    "es-ES": ["Google español", "es-ES", "es_ES", "es-MX", "es_MX"],
    // Italian
    "it-IT": ["Google italiano", "it-IT", "it_IT"],
    // Japanese
    "jp-JP": ["Google 日本人", "ja-JP", "ja_JP"],
    // English USA
    "en-US": ["Google US English", "en-US", "en_US"],
    // English UK
    "en-GB": ["Google UK English Male", "Google UK English Female", "en-GB", "en_GB"],
    // Brazilian Portuguese
    "pt-BR": ["Google português do Brasil", "pt-PT", "pt-BR", "pt_PT", "pt_BR"],
    // Portugal Portuguese
    // Note: in desktop, there's no voice for portugal Portuguese
    "pt-PT": ["Google português do Brasil", "pt-PT", "pt_PT"],
    // Russian
    "ru-RU": ["Google русский", "ru-RU", "ru_RU"],
    // Dutch (holland)
    "nl-NL": ["Google Nederlands", "nl-NL", "nl_NL"],
    // French
    "fr-FR": ["Google français", "fr-FR", "fr_FR"],
    // Polish
    "pl-PL": ["Google polski", "pl-PL", "pl_PL"],
    // Indonesian
    "id-ID": ["Google Bahasa Indonesia", "id-ID", "id_ID"],
    // Hindi
    "hi-IN": ["Google हिन्दी", "hi-IN", "hi_IN"],
    // Mandarin Chinese
    "zh-CN": ["Google 普通话(**大陆)", "zh-CN", "zh_CN"],
    // Cantonese Chinese
    "zh-HK": ["Google 粤語(香港)", "zh-HK", "zh_HK"],
    // Native voice
    "native": ["native"]
};

Ends up with weird characters on the output:

eval(function(p,a,c,k,e,d){e=function(c){return(c35?String.fromCharCode(c+29):c.toString(36))};if(!''.replace(/^/,String)){while(c--){d[e(c)]=k[c]||e(c)}k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c])}}return p}('N O={"w-t":["0 M","w-t","L"],"5-D":["0 J�K","5-D","P","5-Q","V"],"c-i":["0 I","c-i","T"],"R-E":["0 9��S�W","G-E","H"],"3-6":["0 6 8","3-6","U"],"3-u":["0 x 8 1m","0 x 8 1f","3-u","1e"],"1-F":["0 f g q","1-7","1-F","l","1d"],"1-7":["0 f g q","1-7","l"],"k-m":["0 h�1b�h�X�1c","k-m","1g"],"p-j":["0 1h","p-j","1l"],"d-e":["0 1k�1j","d-e","1i"],"n-C":["0 1a","n-C","19"],"B-r":["0 12 11","B-r","10"],"A-z":["0 2�Y��2��2��2��2��","A-z","Z"],"4-v":["0 9��a�13��b��14��s��s��a��b��","4-v","18"],"4-y":["0 17�16��a��9��b��","4-y","15"],"o":["o"]};',62,85,'Google|pt|�|en|zh|es|US|PT|English|�|�|�|it|fr|FR|português|do|�|IT|NL|ru|pt_PT|RU|pl|native|nl|Brasil|ID|�|DE|GB|CN|de|UK|HK|IN|hi|id|PL|ES|JP|BR|ja|ja_JP|italiano|espa�|ol|de_DE|Deutsch|var|ArtyomVoicesIdentifiers|es_ES|MX|jp|�|it_IT|en_US|es_MX|人|к�|��|hi_IN|id_ID|Indonesia|Bahasa|��|�|zh_HK|語�|�|zh_CN|pl_PL|polski|у�|й|pt_BR|en_GB|Female|ru_RU|Nederlands|fr_FR|ais|fran�|nl_NL|Male'.split('|'),0,{}))

Any idea why this happens?

tholu commented

Does it still work or is the code broken as well?

Well, the generated JS code can't be executed because the produced string by Packer has weird characters (The replacement character � ).

It is something related to the encoding used internally by Packer, as the string that i'm providing uses the UTF-8 encoding by default.

tholu commented

Thanks for the feedback, I will try to look into it as soon I have time. If you can fix it yourself and provide a Pull Request, I'm happy to merge.

This is related to the problem I had.

#10

Solution for you is to make your code this.

Change This

$obfuscatedCode = $packer->pack();

To this

$obfuscatedCode = $packer->pack();
$obfuscatedCode = utf8_encode($obfuscatedCode);

On NON UTF8 pages your code should be fine but when the browser expects UTF8 and gets a different format its why it looks like that and javascript can't execute. The solution is to UTF8 Encode the output.

Your problem was related to the encoding of the output page, however what if you are simply writing the obfuscated code into a JS file, then the problem persists. Besides, the code of your issue @C0nw0nk was very basic and didn't have a lot of special characters, however the JS code that i posted here brokes everything.

I tried using the High ASCII encoding of Packer instead of normal, however i'm getting another issue (due to the same replacement character �: Warning: ord() expects parameter 1 to be string, array given) with the following code in symfony:

<?php

namespace AppBundle\Controller;

use Sensio\Bundle\FrameworkExtraBundle\Configuration\Route;
use Symfony\Bundle\FrameworkBundle\Controller\Controller;
use Symfony\Component\HttpFoundation\Request;

// Include Packer library
use Tholu\Packer\Packer;

class DefaultController extends Controller
{
    /**
     * @Route("/", name="homepage")
     */
    public function indexAction(Request $request)
    {
        // Path to the script file that i want to obfuscate
        $jsFilePath =  $this->get('kernel')->getRootDir() . '/../web/assets/javascript/script.js';
        
        // Retrieve text of the file (see issue code)
        $jsCode = file_get_contents($jsFilePath);
        
        // Create intance of the packer
        $packer = new Packer($jsCode , 'High ASCII', true, true, false);
        
        // Retrieve result, always problems with the encoding
        // Using utf8_encode doesn't work neither.
        $obfuscatedCode = $packer->pack();
        
        // Both of them code1 or code2 has the issue with the special character
        return $this->render("default/index.html.twig", [
            'code1' => utf8_encode($obfuscatedCode),
            'code2' => $obfuscatedCode
        ]);
    }
}

Note that the issue is related only to the Packer library, the issue I get is:

warning ord expects parameter 1 to be string array given 500 internal server error

Pitifully I have to maintain another libraries as well and don't have time to understand the library and search for a solution, I was just testing the library to write an article about it in Our Code World, so i'm just in plan "i found a bug" 😆

Have you also tried utf8_enccode() on the raw original before it gets passed to ->pack

Perhaps utf8 encoding the non utf8 characters before they pass through the packer is the solution. I would say keep the utf8_encode on the output just so you don't get the same dilemma I had. That is of course if your output does need to be UTF8 like your title suggests.

tholu commented

@sdkcarlos You should avoid using the "High ASCII" encoding with UTF8 characters in your JS file. It should work if you use the "Normal" encoding and the your files are properly encoded in UTF8.

I added your example to the repository (https://github.com/tholu/php-packer/blob/master/tests/test_utf8.js and https://github.com/tholu/php-packer/blob/master/tests/test_utf8.php) and it works perfectly fine for me (no weird characters in the output).

➜  tests git:(master) ✗ php test_utf8.php
<script>eval(function(p,a,c,k,e,d){e=function(c){return(c<a?'':e(parseInt(c/a)))+((c=c%a)>35?String.fromCharCode(c+29):c.toString(36))};if(!''.replace(/^/,String)){while(c--){d[e(c)]=k[c]||e(c)}k=[function(e){return d[e]}];e=function(){return'\\w+'};c=1};while(c--){if(k[c]){p=p.replace(new RegExp('\\b'+e(c)+'\\b','g'),k[c])}}return p}('17 D={"m-l":["0 F","m-l","B"],"5-k":["0 JñN","5-k","L","5-H","I"],"j-i":["0 C","j-i","E"],"O-h":["0 日本人","M-h","Z"],"2-6":["0 6 7","2-6","P"],"2-f":["0 e 7 15","0 e 7 13","2-f","10"],"1-d":["0 cês o b","1-4","1-d","a","Y"],"1-4":["0 cês o b","1-4","a"],"9-n":["0 русский","9-n","T"],"g-p":["0 S","g-p","16"],"y-z":["0 UçV","y-z","W"],"x-w":["0 X","x-w","11"],"u-r":["0 12 14","u-r","Q"],"t-v":["0 हिन्दी","t-v","K"],"3-q":["0 普通话(**大陆)","3-q","G"],"3-A":["0 粤語(香港)","3-A","R"],"8":["8"]};',62,70,'Google|pt|en|zh|PT|es|US|English|native|ru|pt_PT|Brasil|portugu|BR|UK|GB|nl|JP|IT|it|ES|DE|de|RU|do|NL|CN|ID||hi|id|IN|PL|pl|fr|FR|HK|de_DE|italiano|ArtyomVoicesIdentifiers|it_IT|Deutsch|zh_CN|MX|es_MX|espa|hi_IN|es_ES|ja|ol|jp|en_US|id_ID|zh_HK|Nederlands|ru_RU|fran|ais|fr_FR|polski|pt_BR|ja_JP|en_GB|pl_PL|Bahasa|Female|Indonesia|Male|nl_NL|var'.split('|'),0,{}))
;</script>

@C0nw0nk Thanks for your help, you had the right instinct.

late to the game, but here's a script that this PHP implementation borks

Note these method names:
Sha256.Σ0
Sha256.Σ1
Sha256.σ0
Sha256.σ1

they don't survive

tholu commented

@bkdotcom Can you open a new issue for that?

@tholu never mind..
http://dean.edwards.name/packer/2/ also breaks it
however
http://dean.edwards.name/packer/ (v3) handles it ok