jonathf/matlab2cpp

Feature request: enhance variable type guess when matlab variable are explicitly declared

kupiqu opened this issue · 3 comments

Hi,

I am trying to automatize code conversion from matlab/octave to cpp. Let me put my case in context:

  • imagine that there is a master code in matlab/octave, that calls another matlab/octave function (worker), which takes most of the execution time.
  • the worker code may actually vary in successive invocations of the master (although the basic structure of the worker remains the same). Therefore the master does not only execute the worker, but it also checks first if the worker needs to be regenerated, which is automatized.

Now I want the master to use matlab2cpp to automatize the creation of a worker in cpp from the worker that is written in matlab/octave.

One of the main issues for this is to get the proper types of some variables. I know that they can be set manually, but sometimes they can also be inferred by using the -s argument.

So, this feature request is about enhancing the guess of variables that are explicitly declared in matlab/octave, as this would very much ease the automatic conversion.

To see this better please consider the following example:

%  i = int64(1);
%  x = randn;
%  str = 'hello';
%  v = randn(10,1);
%  M = randn(10,2);
%  s.i = i;
%  s.x = x;
%  s.str = str;
%  s.v = v;
%  s.M = M;
%  c{1} = i;
%  c{2} = x;
%  c{3} = str;
%  c{4} = v;
%  c{5} = M;

% save('test_types');

load('test_types');

i = int64(i);
x = double(x);
str = char(str);
v = double(v);
M = double(M);
s = struct(s);
c = cell(c);

Now Imagine that I first created a test_types.mat using the piece of code that is commented above and then I run this uncommented code. In matlab/octave the variables i to c are explicitly declared. My request is about using that information during the conversion to cpp, which doesn't happen so far, please see below:

// Automatically translated using Matlab2cpp 0.5 on 2016-04-28 16:15:13

#include <armadillo>
using namespace arma ;

int main(int argc, char** argv)
{
  TYPE M, c, i, s, str, v, x ;
  load("test_types") ;
  i = int64(i) ;
  x = double(x) ;
  str = char(str) ;
  v = double(v) ;
  M = double(M) ;
  s = struct(s) ;
  c = cell(c) ;
  return 0 ;
}

If you would think positively about implementing this, please let me know if I could be of any help on this.

A side note:

I realize that the proper types for v and M are more difficult to convert as they would be guessed as double instead of vec and mat. For this I would be more explicit during the creation of the matlab/octave worker, to initialize those to the proper dimensions, before they take the particular valeus from the load function.

Still in this case, having this enhanced guessing would be highly beneficial as it would be able to distinguish between vec (v = double(v);) and ivec (v = int64(v);), and similary between mat and imat.

I intend to work on Matlab2cpp's ability to get a more automatic way of getting the variable types. I was thinking that the translator could create some new Matlab files (from the original Matlab files), which are the similar to the original Matlab files, but also include a function which dumps the types of the workspace variables (modified whos function)

original matlab file:

a = [1, 4.5];
b = 3;
c = [1, 3; 4, 8];

modified matlab file

a = [1, 4.5];
b = 3;
c = [1, 3; 4, 8]
whos_f

By running this modified Matlab code in Matlab/Octave I hope to get the data types. The whos_f function write a file like this:

#command window:
#name, size, class, complex, integer
a, 1x2, double, 0, 0
b, 1x1, double, 0, 1
c, 2x2, double, 0, 1

I ran the code above in the command window. When a script is used "command window" should be the name of the script file and also the name of the function should be included. From this I hope to extract the file, function and the variable types.

For a we have: a, 1x2, double, 0, 0 which makes it a rowvec, and the second zero indicate that it should be a double type. b and c can be integer or floating point (hard to tell because they are declared as integer but could be used as double later). Declaring b = 3.0 doesn't change this because the way the value is found to be an integer is by converting it to and integer and comparing it to the value before conversion.

So the idea is not perfect. Also variables declared inside for and if statements go out of scope and disappear before the whos_f function dumps the variable information.

As for your idea, I realize I have to write some code for handling save and load. In Matlab you can save and load workspace variables with save("file.mat") and load("file.mat"). In Armadillo it seems you can only load one variable at a time, see Armadillo documentation. So there are things from Matlab that don't carry over to C++ with Armadillo.

So a possible translation of load from Matlab to C++ with Armadillo

in matlab: 
save("file.mat", "var");

C++ with Armadillo:
Datatype var;
var.load("file.mat");

Also how to translate cells in Matlab to C++ is not implemented and I have no clue how to do that.

Very interesting approach. Please see some comments below:

  1. Wrt double vs int when a variable takes an int value: would it be ok if the converter is conservative and assumes variables are just doubles (perhaps raising a warning when variables take int values)?

    The only problem I see with this is when the variable is used as an index afterwards, could the converter check this out and if that is the case, use int instead?

  2. Is the problem of for loops and if statements coming from the original whos function? Isn't there a way to workaround those cases?

  3. With respect to save/load instructions, please take into consideration the MATIO (MATLAB MAT File I/O) Library: https://github.com/tbeu/matio

    Please see this example:
    https://github.com/kupiqu/matiocpp/blob/master/examples/example_matio.cpp

  4. Related to this and also to how to convert cell arrays, I have recently forked the matiocpp code, which is a cpp wrapper to the matio library that eases load/save instructions as well as variable declarations: https://github.com/kupiqu/matiocpp

    Currently the supported types are: vec, rowvec, ivec, irowvec, mat, imat, Struct and Cell

    Please see this example:
    https://github.com/kupiqu/matiocpp/blob/master/examples/example_matiocpp.cpp

  1. Yes the problem would be if the variable is used as an index. I think it would be possible to use double and then typecast the float to an integer. Since double don't have exact value (the value 2 may be stored as 1.999999 or 2.00001), also adding many floats/doubles give some round off error, I suppose a function will be needed to convert the floating point values to the nearest integer.

  2. I was mistaken about the variables declared in if statements going out of scope. I thought that was the case, since I had seen variables in some if statements not included in the output. Take a look at this example (whos is the normal Matlab function):

clear

a = 4.5;

if (a > 0)
   b = 10;
   a = a + b;
else
    c = -2;
    a = a + c;
end

whos

If you run this example, you can see that the else part of the if statement is not entered. Therefore the whos function have no information about the c variable.

3-4) I will look into this next week, as I will more time then.