Foreign character processing

Hi Team,

We have created an Azure CI/CD pipeline for our project for SnowFlake deployments using SchemaChange. The pipeline is working flawlessly for all scenarios of execution of DDL, DML etc.

We have encountered one issue when we have a foreign character (French, German) in the insert scripts. The pipeline fails to read
the file and below error is received:

2023-09-15T12:44:44.0639006Z syntax error line 1 at position 0 unexpected '/'.

This error occurs even when the foreign character is part of comments or at 1000th line. We receive the same error. I am unable attach the screenshot for reference as it give the error "Something went really wrong, and we can't process that file. Try again."

Sample Insert script:
INSERT INTO DB.SCHEMA.TABLE VALUES ('TEST ü foreign character', 'TEST ä foreign character ');

Eagerly waiting for a response.

Thank you in advance,
Sharath SJ

We might need to check into the encoding used in the following function.

schemachange/schemachange/cli.py

Lines 447 to 451 in d638a5c

    
           def apply_change_script(self, script, script_content, change_history_table): 
        
             # Define a few other change related variables 
        
             checksum = hashlib.sha224(script_content.encode('utf-8')).hexdigest() 
        
             execution_time = 0 
        
             status = 'Success'

Can you adjust the encoding in that function and see if it helps with the issue? iso-8859-1 might do it.

We will need to think through a more dynamic way to handle this.

Hi. Thank you for the suggestion.

As per our current design with customisation, this setting get overridden and hence we are unable to try the option. THe main reason being, everytime the SchemaChange is installed to ensure the latest version is being used. We are checking how to overcome this.

Can you suggest any other approach which will enable us to use this approach without any manual configuration post schemachange download and installation.

Thank you very much!!!
Sharath

Hi Team,

We have identified, this issue is not due to ENCODING. This issue occurs if we have variable replacements within the script. If there are no variable replacements, then the foreign characters are processed correctly and are getting inserted into the table.

We have defined variables for DB so that replacements can taken care for different environments like DEV, TEST, PROPROD and PROD.

Please guide how this can be resolved as Variable replacements are needed to ensure same script is used for all environments.

Warm Regards,
Sharath

@2023ASharathSJ Can you post the SQL file excerpt where you reference the variable and the schemachange call on how you replace the variable.

Alternatively, you could try using

schemachange render --vars '{ "var1":"value"}' ./path/to/sql/file/*.sql and see what is being rendered to troubleshoot your issue.

Thank you @sfc-gh-tmathew for the response. We have idenfied the root cause. This was due to enabling the below propery in the replace token task:
Write unicode BOM

After this property was diabled, the pipeline execution was successful and the foreign characters were succesfully inserted into the tables.

Thank you all for the support.
Sharath S J

	def apply_change_script(self, script, script_content, change_history_table):
	# Define a few other change related variables
	checksum = hashlib.sha224(script_content.encode('utf-8')).hexdigest()
	execution_time = 0
	status = 'Success'