arduino/arduino-cli

Spurious compilation failure when sketch code file has "UTF-8 with BOM" encoding

Closed this issue · 4 comments

Describe the problem

Recently I've found that simple alpha-numeric multi-line comments (no back slashes etc) are sometimes the cause of build failure in unpredictable ways. It appears to be associated with any white space lines between the comment block and code. Deleting blank lines always cures the problem but that is the only reproducible effect.

Examining the hex edit of an offending sketch reveals that in some cases (but not all) the UTF-8-BOM encoding header characters are the cause. If these characters are cut then the .ino builds just fine.

In other cases there are no such rogue characters yet the problem remains.

To reproduce

Appears random in my experience so not reproducible by me.

Expected behavior

I expect a

/* comment */ 

to have no impact on the build.

I expect a

/* multi
    line
    comment */ 

to have no impact on the build.

I expect that added white-space helping the readability of the code will have no impact on the build.

Arduino CLI version

Original report

1.2.0 (Arduino IDE 2.3.6)

Last verified with

504b43e

Operating system

Windows

Operating system version

11

Additional context

Additional reports

Related

Issue checklist

  • I searched for previous reports in the issue tracker
  • I verified the problem still occurs when using the latest nightly build
  • My report contains all necessary details

Confirmed root cause: UTF-8 BOM

I was able to reproduce this consistently. It’s not random: the failures happen when the .ino file is saved as UTF-8 with BOM.

Environment

Arduino IDE: 2.3.6 (Windows 11)

Board: Arduino Uno — Arduino AVR Boards 1.8.6

FQBN: arduino:avr:uno

Minimal sketch

/* test */

int x = 42;

void setup() {
  Serial.begin(9600);
}

void loop() {
  Serial.println(x);
  delay(1000);
}

Steps to reproduce

  1. Save this sketch as UTF-8 with BOM (VS Code → Save with Encoding → UTF-8 with BOM).

  2. Compile in Arduino IDE.

Actual result

error: stray '\357' in program
error: stray '\273' in program
error: stray '\277' in program

These are the 3 BOM bytes (EF BB BF).

Control
Re-save as UTF-8 (without BOM) → compiles fine.

Notes
Explains why it feels random: some editors add BOM silently.
The blank line after the comment makes the bug surface consistently.

Proposed fix
When reading .ino files, strip the BOM before preprocessing. For example, skip the first 3 bytes if they are EF BB BF.

Workaround
Save .ino as UTF-8 (no BOM).

Artifacts
PowerShell check of first 3 bytes:

# Show first 3 bytes of the .ino file in hex
$b = Get-Content .\BomBug\BomBug.ino -AsByteStream -TotalCount 3
$b | ForEach-Object { '{0:X2}' -f $_ }

Output:

EF
BB
BF

@ritesh006 Thanks for the confirmation.

“I’ve updated the patch, and after merging, this issue is unlikely to occur again.

@ritesh006 Nice work! Thank you for the time you spent.