IEDB/TCRMatch

compile error

Closed this issue · 19 comments

Hi @schristley @acrinklaw ,

I git clone the repository and tried to compile it. cmake . runs successfully. But there is an error message when I execute cmake --build . (shown below)

image

My cmake version is 3.20.4. Could you give me a hint on how to fix the problem?

Thanks!

Hi @tianshilu, what version of gcc do you have? This is most likely an issue of having a different version of gcc (which implements openmp) than is required.

Hi @acrinklaw
Thanks for your prompt response. I attached my gcc version.

image

No problem at all. I would suggest trying to update your gcc to the highest you can on ubuntu 18.04, when I wrote this I was using gcc 9.X. If it still gives you issues I will troubleshoot. I will also add some logic to CMake to make sure that everything is compatible for future users. Thanks!

Thank you for your suggestion. I will upgrade gcc. Will keep you posted. Appreciate your help!

Yeah I believe that is because I compiled that binary in a different environment than you have. It was a mistake on my part - I'm still learning C++ development practices :-) It is getting late here but tomorrow I will see how people go about properly sharing binaries. For now I still suggest trying to upgrade gcc and compiling TCRMatch

I upgraded gcc to 9.4.0 and compiled TCRMatch. It seems the same error "invalid controlling predicate" existed. Do you have other suggestions to fix the problem?

Hmm that is quite strange. The error is suggesting something is wrong with the parallelized for loops that are using openMP.
I am unable to recreate it locally. Would you mind attaching the code from tcrmatch.cpp? In addition can you attach the output from echo |cpp -fopenmp -dM |grep -i open? Thank you

The output is: #define _OPENMP 201511
It doesn't allow me attach the *cpp file here. So I pasted the code here. Let me know if it is not convenient for you. I can send you an email. Thanks for your help!

#include
#include
#include
#include
#include <math.h>
#include <omp.h>
#include
#include
#include <unistd.h>
#include

std::array<std::array<float, 20>, 20> k1;
int p_kmin = 1;
int p_kmax = 30;
float p_beta = 0.11387;
// Hardcoded because parsing + computing matrix is annoying
float blm_qij[20][20] = {
{0.0215, 0.0023, 0.0019, 0.0022, 0.0016, 0.0019, 0.003,
0.0058, 0.0011, 0.0032, 0.0044, 0.0033, 0.0013, 0.0016,
0.0022, 0.0063, 0.0037, 0.0004, 0.0013, 0.0051},
{0.0023, 0.0178, 0.002, 0.0016, 0.0004, 0.0025, 0.0027,
0.0017, 0.0012, 0.0012, 0.0024, 0.0062, 0.0008, 0.0009,
0.001, 0.0023, 0.0018, 0.0003, 0.0009, 0.0016},
{0.0019, 0.002, 0.0141, 0.0037, 0.0004, 0.0015, 0.0022,
0.0029, 0.0014, 0.001, 0.0014, 0.0024, 0.0005, 0.0008,
0.0009, 0.0031, 0.0022, 0.0002, 0.0007, 0.0012},
{0.0022, 0.0016, 0.0037, 0.0213, 0.0004, 0.0016, 0.0049,
0.0025, 0.001, 0.0012, 0.0015, 0.0024, 0.0005, 0.0008,
0.0012, 0.0028, 0.0019, 0.0002, 0.0006, 0.0013},
{0.0016, 0.0004, 0.0004, 0.0004, 0.0119, 0.0003, 0.0004,
0.0008, 0.0002, 0.0011, 0.0016, 0.0005, 0.0004, 0.0005,
0.0004, 0.001, 0.0009, 0.0001, 0.0003, 0.0014},
{0.0019, 0.0025, 0.0015, 0.0016, 0.0003, 0.0073, 0.0035,
0.0014, 0.001, 0.0009, 0.0016, 0.0031, 0.0007, 0.0005,
0.0008, 0.0019, 0.0014, 0.0002, 0.0007, 0.0012},
{0.003, 0.0027, 0.0022, 0.0049, 0.0004, 0.0035, 0.0161,
0.0019, 0.0014, 0.0012, 0.002, 0.0041, 0.0007, 0.0009,
0.0014, 0.003, 0.002, 0.0003, 0.0009, 0.0017},
{0.0058, 0.0017, 0.0029, 0.0025, 0.0008, 0.0014, 0.0019,
0.0378, 0.001, 0.0014, 0.0021, 0.0025, 0.0007, 0.0012,
0.0014, 0.0038, 0.0022, 0.0004, 0.0008, 0.0018},
{0.0011, 0.0012, 0.0014, 0.001, 0.0002, 0.001, 0.0014,
0.001, 0.0093, 0.0006, 0.001, 0.0012, 0.0004, 0.0008,
0.0005, 0.0011, 0.0007, 0.0002, 0.0015, 0.0006},
{0.0032, 0.0012, 0.001, 0.0012, 0.0011, 0.0009, 0.0012,
0.0014, 0.0006, 0.0184, 0.0114, 0.0016, 0.0025, 0.003,
0.001, 0.0017, 0.0027, 0.0004, 0.0014, 0.012},
{0.0044, 0.0024, 0.0014, 0.0015, 0.0016, 0.0016, 0.002,
0.0021, 0.001, 0.0114, 0.0371, 0.0025, 0.0049, 0.0054,
0.0014, 0.0024, 0.0033, 0.0007, 0.0022, 0.0095},
{0.0033, 0.0062, 0.0024, 0.0024, 0.0005, 0.0031, 0.0041,
0.0025, 0.0012, 0.0016, 0.0025, 0.0161, 0.0009, 0.0009,
0.0016, 0.0031, 0.0023, 0.0003, 0.001, 0.0019},
{0.0013, 0.0008, 0.0005, 0.0005, 0.0004, 0.0007, 0.0007,
0.0007, 0.0004, 0.0025, 0.0049, 0.0009, 0.004, 0.0012,
0.0004, 0.0009, 0.001, 0.0002, 0.0006, 0.0023},
{0.0016, 0.0009, 0.0008, 0.0008, 0.0005, 0.0005, 0.0009,
0.0012, 0.0008, 0.003, 0.0054, 0.0009, 0.0012, 0.0183,
0.0005, 0.0012, 0.0012, 0.0008, 0.0042, 0.0026},
{0.0022, 0.001, 0.0009, 0.0012, 0.0004, 0.0008, 0.0014,
0.0014, 0.0005, 0.001, 0.0014, 0.0016, 0.0004, 0.0005,
0.0191, 0.0017, 0.0014, 0.0001, 0.0005, 0.0012},
{0.0063, 0.0023, 0.0031, 0.0028, 0.001, 0.0019, 0.003,
0.0038, 0.0011, 0.0017, 0.0024, 0.0031, 0.0009, 0.0012,
0.0017, 0.0126, 0.0047, 0.0003, 0.001, 0.0024},
{0.0037, 0.0018, 0.0022, 0.0019, 0.0009, 0.0014, 0.002,
0.0022, 0.0007, 0.0027, 0.0033, 0.0023, 0.001, 0.0012,
0.0014, 0.0047, 0.0125, 0.0003, 0.0009, 0.0036},
{0.0004, 0.0003, 0.0002, 0.0002, 0.0001, 0.0002, 0.0003,
0.0004, 0.0002, 0.0004, 0.0007, 0.0003, 0.0002, 0.0008,
0.0001, 0.0003, 0.0003, 0.0065, 0.0009, 0.0004},
{0.0013, 0.0009, 0.0007, 0.0006, 0.0003, 0.0007, 0.0009,
0.0008, 0.0015, 0.0014, 0.0022, 0.001, 0.0006, 0.0042,
0.0005, 0.001, 0.0009, 0.0009, 0.0102, 0.0015},
{0.0051, 0.0016, 0.0012, 0.0013, 0.0014, 0.0012, 0.0017,
0.0018, 0.0006, 0.012, 0.0095, 0.0019, 0.0023, 0.0026,
0.0012, 0.0024, 0.0036, 0.0004, 0.0015, 0.0196}};

struct peptide {
std::string seq;
int len;
float aff;
std::vector i;
};

std::vectorstd::string read_IEDB_data() {
std::vectorstd::string iedb_data;
std::ifstream iedb_file("data/IEDB_data.tsv");
std::string line;
while (getline(iedb_file, line)) {
std::stringstream ss(line);
std::string sequence;
ss >> sequence;
if (sequence != "trimmed_seq") {
iedb_data.push_back(sequence);
}
}
return iedb_data;
}

std::array<std::array<float, 20>, 20> fmatrix_k1() {
// Calculates the modified (normalized) blosum62 matrix

int k, j;
float marg[20];
float sum;

// initialize margin array
for (int i = 0; i < 20; i++) {
marg[i] = 0.0;
}
// initialize k1
for (int i = 0; i < 20; i++) {
for (int j = 0; j < 20; j++) {
k1[i][j] = 0.0;
}
}
// normalize matrix by marginal frequencies
for (j = 0; j < 20; j++) {
sum = 0;
for (k = 0; k < 20; k++)
sum += blm_qij[j][k];
marg[j] = sum;
}
// calculate K1
for (j = 0; j < 20; j++) {
for (k = 0; k < 20; k++) {
k1[j][k] = blm_qij[j][k] / (marg[j] * marg[k]);
k1[j][k] = pow(k1[j][k], p_beta);
}
}

return (k1);
}

float k3_sum(peptide pep1, peptide pep2) {
// Recursively calculate Kernel 3 using Kernel 1 lookups
float k2, term, k3 = 0.0;
int start1, start2;
int k, j1, j2;

float k2_prod_save[31][31][31];

for (k = p_kmin; k <= p_kmax; k++) {
for (start1 = 0; start1 <= pep1.len - k; start1++) {
for (start2 = 0; start2 <= pep2.len - k; start2++) {

    j1 = pep1.i[start1 + k - 1];
    j2 = pep2.i[start2 + k - 1];
    term = k1[j1][j2];

    if (k == 1) {
      k2 = term;
    } else {
      k2 = k2_prod_save[k - 1][start1][start2] * term;
    }

    k2_prod_save[k][start1][start2] = k2;
    k3 += k2;
  }
}

}
return (k3);
}

void multi_calc_k3(std::vector peplist1,
std::vector peplist2, float threshold) {
// Simple method to calculate pairwise TCRMatch scores using two peptide
// vectors
std::vector<std::tuple<std::string, std::string, float>>
results[omp_get_max_threads()];
#pragma omp parallel for
for (int i = 0; i < peplist1.size(); i++) {
for (int j = 0; j < peplist2.size(); j++) {
peptide pep1 = peplist1[i];
peptide pep2 = peplist2[j];
float score = 0.0;
score = k3_sum(pep1, pep2) / sqrt(pep1.aff * pep2.aff);
if (score > threshold) {
int tid = omp_get_thread_num();
results[tid].push_back(make_tuple(pep1.seq, pep2.seq, score));
}
}
}
for (int i = 0; i < omp_get_max_threads(); i++) {
for (auto &tuple : results[i]) {
std::cout << std::fixed << std::setprecision(2) << std::get<0>(tuple)
<< " " << std::get<1>(tuple) << " " << std::get<2>(tuple)
<< std::endl;
}
}
}

// Move this to outside -> import everything you need
int main(int argc, char *argv[]) {
int opt;
int n_threads;
float threshold;
std::string in_file;
int i_flag = -1;
int t_flag = -1;
int thresh_flag = -1;

// Command line argument parsing
while ((opt = getopt(argc, argv, "t:i:s:")) != -1) {
switch (opt) {
case 't':
n_threads = atoi(optarg);
t_flag = 1;
break;
case 'i':
in_file = optarg;
i_flag = 1;
break;
case 's':
threshold = std::stof(optarg);
thresh_flag = 1;
break;
default:
std::cerr << "Usage: ./tcrmatch -i infile_name.txt -t num_threads -s score_threshold"
<< std::endl;
return EXIT_FAILURE;
}
}
// Check that required parameters are there
if (i_flag == -1 || t_flag == -1) {
std::cerr << "Missing mandatory parameters" << std::endl
<< "Usage: ./tcrmatch -i infile_name.txt -t num_threads"
<< std::endl;
return EXIT_FAILURE;
}
if (thresh_flag == -1) {
threshold = .97;
}

std::vectorstd::string iedb_data = read_IEDB_data();
std::ifstream file1(in_file);
std::string line;
std::string alphabet;
std::vector peplist1;
std::vector peplist2;

omp_set_num_threads(n_threads);

alphabet = "ARNDCQEGHILKMFPSTWYV";
k1 = fmatrix_k1();
while (getline(file1, line)) {
std::vector int_vec;
for (int i = 0; i < line.length(); i++) {
if (alphabet.find(line[i]) == -1) {
std::cerr << "Invalid amino acid found in " << line << " at position "
<< i + 1 << std::endl;
return EXIT_FAILURE;
}
}
peplist1.push_back({line, int(line.length()), -99.9, int_vec});
}
file1.close();

// Calculate the normalization score (aff) (kernel 3 self vs self) list 1
#pragma omp parallel for
for (std::vector::iterator it = peplist1.begin();
it != peplist1.end(); it++) {
for (int x = 0; x < it->len; x++) {
it->i.push_back(alphabet.find(it->seq[x]));
}
it->aff = k3_sum(*it, *it);
}

// change to IEDB data
for (std::vectorstd::string::iterator it = iedb_data.begin();
it != iedb_data.end(); it++) {
std::vector int_vec;
for (int i = 0; i < (*it).length(); i++) {
if (alphabet.find((*it)[i]) == -1) {
std::cerr << "Invalid amino acid found in " << *it << " at position "
<< i + 1 << std::endl;
return EXIT_FAILURE;
}
}
peplist2.push_back({*it, int((*it).length()), -99.9, int_vec});
}

// Calculate the normalization score (aff) (kernel 3 self vs self) for list 2
#pragma omp parallel for
for (std::vector::iterator it = peplist2.begin();
it != peplist2.end(); it++) {
for (int x = 0; x < it->len; x++) {
it->i.push_back(alphabet.find(it->seq[x]));
}
it->aff = k3_sum(*it, *it);
}
multi_calc_k3(peplist1, peplist2, threshold);

return 0;
}

So it looks like the code is the same as what I am compiling. And your openMP specification is 4.5 which is also what I used.
I think a last step before I do a deep dive into why this might be occurring is to try building without cmake.

Can you try g++ -fopenmp -O3 src/tcrmatch.cpp -o tcrmatch and see if that builds? Then if you do ./tcrmatch you should get
Missing mandatory parameters Usage: ./tcrmatch -i infile_name.txt -t num_threads

Thank you for your response! I got the same error.
image

Not sure if it relevant, but to compile on our HPC system here I had set CC and CXX to get cmake to use the right compiler otherwise it kept using the super-old one in /bin

$ module load gcc/9.1.0
$ export CC=/opt/apps/gcc/9.1.0/bin/gcc
$ export CXX=/opt/apps/gcc/9.1.0/bin/g++
$ cmake .
$ cmake --build .
Scanning dependencies of target tcrmatch
[ 50%] Building CXX object CMakeFiles/tcrmatch.dir/src/tcrmatch.cpp.o
[100%] Linking CXX executable tcrmatch
[100%] Built target tcrmatch

$ ./tcrmatch 
Missing mandatory parameters
Usage: ./tcrmatch -i infile_name.txt -t num_threads

Thanks @schristley. That may be the issue.
@tianshilu can you verify which version of g++ is being used? g++ --version. If it is using the proper version (9.X) and you still get the error I can do one of two things, I can either provide you with the Python version that has slightly slower run time, or I can provide you with a Dockerfile that I know will run.

If you do a google for "pragma omp parallel for invalid controlling predicate" then you will also find various discussions. It's also possible that openmp doesn't like the the C++ style for iterator, maybe it isn't "basic" enough.

  for (std::vector<peptide>::iterator it = peplist1.begin();

Not sure why it would work for me though

If you do a google for "pragma omp parallel for invalid controlling predicate" then you will also find various discussions. It's also possible that openmp doesn't like the the C++ style for iterator, maybe it isn't "basic" enough.

  for (std::vector<peptide>::iterator it = peplist1.begin();

Not sure why it would work for me though

Yeah that's why I'm confused as well, when I was first writing this it seemed that openMP 4.5 supported iterators as well as the != operator, which it didn't in the past. And I'm having trouble recreating this since it compiles and works fine using a very similar environment

Ah, I bet it's this:

it != peplist1.end();

The != is not a valid operator.

Yeah that's why I'm confused as well, when I was first writing this it seemed that openMP 4.5 supported iterators as well as the != operator, which it didn't in the past. And I'm having trouble recreating this since it compiles and works fine using a very similar environment

Hmm, ok, then it's likely something with the environment...

I have a feeling it is because of g++ versioning. @tianshilu please make sure g++ is up-to-date and 9.X. If this still does not work, email me at acrinklaw@lji.org and I will send you the Python version or Dockerfile, whichever one is easiest for you. Thanks

Thank you so much for your responses, suggestions, and help @schristley @acrinklaw. You are totally right. It is the problem of g++ versioning. I am trying to update g++ on my machine.