Source file format incompatible with Cray Compiler (and inconsistent)

Question

Source file format incompatible with Cray Compiler (and inconsistent)

Closed this issue 8 years ago · 6 comments

When trying to build with the Cray Compiler, I got the following message

CC-7 crayc++: ERROR File = ./umesimd/UMESimd.h, Line = 1
  The indicated token is not valid in this context.
  // The MIT License (MIT)
  ^

Looking at the files, there seems to some inconsistency... and windows line endings. ;-)

file UMESimd.h 
UMESimd.h: UTF-8 Unicode (with BOM) C++ program text, with CRLF line terminators

file UMESimdTraits.h 
UMESimdTraits.h: ASCII C++ program text, with CRLF line terminators

The Problem seems to be the BOM (byte order mark) which is some magic number at the beginning of the file. The Cray compiler seems not be able to deal with it. If I open such a file in GNOME's gedit, set the cursor to the beginning of the file, the BOM materialises as an invisible character, i.e. I have to press the right arrow key twice to get the cursor one position to the right. If I delete the first character (no visible change), the file works and the compiler complains with the same message for the next include.

I'll report that to Cray too, but would hope for you to fix the format, or maybe put a script inside the repository, if your editor of choice enforces that format.

Answer 1 · 2016-10-20T13:12:25.000Z

There are windows line endings, as I am using windows;] And the code is supposed to work also on windows, so there is no reason why this shouldn't be acceptable. After all we want to have something portable...

As for the Cray problem: I don't mind that we change the file format, if it doesn't break anything for already tested setups. However I don't have access to a cray setup to do the proper validation.

My proposed solution would be: write the script that helps with your problem. I will try it out and say whether it breaks any remaining configuration. If not, then we can apply the modification, or put the script in the repository so that whoever is interested could use that.

Answer 2 · 2016-10-20T14:07:31.000Z

Wrote a script, let it run over the whole repository, now Cray compiles (on Haswell):

#!/bin/bash

TMP_FILE=$(mktemp)

for f in $(find . -not -path '*/\.*' -type f \( ! -iname ".*" \)); do
    echo removing BOM from $f
    awk 'NR==1{sub(/^\xef\xbb\xbf/,"")}{print}' $f > $TMP_FILE && mv $TMP_FILE $f
done

rm -f $TMP_FILE

Answer 3 · 2016-10-20T14:31:03.000Z

Related: in, e.g. UMESimdVecFloat64_8.h, e.g. the comments from line 384 to 412, there seems to be some different kind of space between //and the SSUBV, than for others. Without running the script above, my editor complains about invalid characters, and shows the file like:

        // MADDSA
        UME_FORCE_INLINE SIMDVec_f & adda(SIMDVecMask<8> const & mask, double b) {
            mVec = _mm512_mask_add_pd(mVec, mask.mMask, mVec, _mm512_set1_pd(b));
            return *this;
        }
        //\A0SADDV
        UME_FORCE_INLINE SIMDVec_f sadd(SIMDVec_f const & b) const {
            return add(b);
        }

file UMESimdVecFloat64_8.h
UMESimdVecFloat64_8.h: ISO-8859 C++ program text, with CRLF line terminators

Answer 4 · 2016-10-20T16:53:23.000Z

Thanks for the script!

I am aware of the bad formatting. I think I messed up in one of the early commits and didn't want to waste the time to fix it then. I'll look into that now.

Answer 5 · 2016-10-21T09:20:08.000Z

This should be fixed now. Can you confirm?

Answer 6 · 2016-10-21T18:09:21.000Z

Confirmed.

If you want to check the encoding (in Linux), just use the script from above and put a file $f inside the loop.