Spurious compilation failure when sketch code file has "UTF-8 with BOM" encoding
Closed this issue · 4 comments
Describe the problem
Recently I've found that simple alpha-numeric multi-line comments (no back slashes etc) are sometimes the cause of build failure in unpredictable ways. It appears to be associated with any white space lines between the comment block and code. Deleting blank lines always cures the problem but that is the only reproducible effect.
Examining the hex edit of an offending sketch reveals that in some cases (but not all) the UTF-8-BOM encoding header characters are the cause. If these characters are cut then the .ino builds just fine.
In other cases there are no such rogue characters yet the problem remains.
To reproduce
Appears random in my experience so not reproducible by me.
Expected behavior
I expect a
/* comment */
to have no impact on the build.
I expect a
/* multi
line
comment */
to have no impact on the build.
I expect that added white-space helping the readability of the code will have no impact on the build.
Arduino CLI version
Original report
1.2.0 (Arduino IDE 2.3.6)
Last verified with
Operating system
Windows
Operating system version
11
Additional context
Additional reports
Related
Issue checklist
- I searched for previous reports in the issue tracker
- I verified the problem still occurs when using the latest nightly build
- My report contains all necessary details
Confirmed root cause: UTF-8 BOM
I was able to reproduce this consistently. It’s not random: the failures happen when the .ino file is saved as UTF-8 with BOM.
Environment
Arduino IDE: 2.3.6 (Windows 11)
Board: Arduino Uno — Arduino AVR Boards 1.8.6
FQBN: arduino:avr:uno
Minimal sketch
/* test */
int x = 42;
void setup() {
Serial.begin(9600);
}
void loop() {
Serial.println(x);
delay(1000);
}
Steps to reproduce
-
Save this sketch as UTF-8 with BOM (VS Code → Save with Encoding → UTF-8 with BOM).
-
Compile in Arduino IDE.
Actual result
error: stray '\357' in program
error: stray '\273' in program
error: stray '\277' in program
These are the 3 BOM bytes (EF BB BF).
Control
Re-save as UTF-8 (without BOM) → compiles fine.
Notes
Explains why it feels random: some editors add BOM silently.
The blank line after the comment makes the bug surface consistently.
Proposed fix
When reading .ino files, strip the BOM before preprocessing. For example, skip the first 3 bytes if they are EF BB BF.
Workaround
Save .ino as UTF-8 (no BOM).
Artifacts
PowerShell check of first 3 bytes:
# Show first 3 bytes of the .ino file in hex
$b = Get-Content .\BomBug\BomBug.ino -AsByteStream -TotalCount 3
$b | ForEach-Object { '{0:X2}' -f $_ }
Output:
EF
BB
BF
@ritesh006 Thanks for the confirmation.
“I’ve updated the patch, and after merging, this issue is unlikely to occur again.
@ritesh006 Nice work! Thank you for the time you spent.