Fix for: Constant propagation for loop variables

As I already commented in #27, I think that the current solution unnecessarily complicates constant propagation for loop variables, since two artifical ast.Call objects (Line 239 and 241) are created, while the actual ast.Call object (Line 237) of the loop is not used at all.

Scalpel/scalpel/SSA/const.py

Lines 235 to 242 in 2ba6cab

    
           if hasattr(stmt.target, "id"):    
        
               left_name = stmt.target.id 
        
               iter_value = stmt.iter 
        
               # make a iter call 
        
               iter_node = ast.Call(ast.Name("iter", ast.Load()), [stmt.iter], []) 
        
               # make a next call  
        
               next_call_node = ast.Call(ast.Name("next", ast.Load()), [iter_node], []) 
        
               const_dict[left_name] = next_call_node

I would like to create a PR, but I do not have the permission. Therefore, I provide my suggested fix here:

 if hasattr(stmt.target, "id"):    
     left_name = stmt.target.id 
     iter_value = stmt.iter 
     const_dict[left_name] = iter_value

or even better:

 if hasattr(stmt.target, "id"):    
     left_name = stmt.target.id 
     const_dict[left_name] = stmt.iter

Corresponding test cases can look like this:

def test_in_range_loop():
    code_str = """for i in range(3):    x = i"""

    cfg = CFGBuilder().build_from_src(name="", src=code_str)

    _, const_dict = SSA().compute_SSA(cfg)
    
    assert const_dict
    assert len(const_dict) == 2
    assert isinstance(const_dict.get(('i', 0)), ast.Call) 
    assert isinstance(const_dict.get(('x', 0)), ast.Name)


def test_for_each_loop():
    code_str = """for x in y:    z = x"""

    cfg = CFGBuilder().build_from_src(name="", src=code_str)

    _, const_dict = SSA().compute_SSA(cfg)
    
    assert const_dict
    assert len(const_dict) == 2
    assert isinstance(const_dict.get(('x', 0)), ast.Call) 
    assert isinstance(const_dict.get(('z', 0)), ast.Name)

We really appreciate your idea and the test cases. But the problem is in the context of a for-loop statement, we can not bind the iterator name to the target variable as the "in" is not equivalent to assignment .

Could you please provide some references that your fix is semantically and syntacally correct? If this is your intuitive idea, we will discuss this later and then decide.

PR will be allowed soon.

Well, I'm not sure if I understand the problem correctly, because I do not know what you mean with "we can not bind the iterator name to the target variable as the "in" is not equivalent to assignment".

To highlight the differences between both variants, let's have a look into the corresponding ast objects. The former one refers to node created by your fix and the latter one refers to my suggestion.

next_call_node:  Call(func=Name(id='next', ctx=Load()), args=[Call(func=Name(id='iter', ctx=Load()), args=[Call(func=Name(id='range', ctx=Load()), args=[Constant(value=5)], keywords=[])], keywords=[])], keywords=[])
iter_value: Call(func=Name(id='range', ctx=Load()), args=[Constant(value=5)], keywords=[])

As we can see, we already have the correct ast object when using the iter_value. In my opinion it makes no sense to additionally create two ast objects. This only complicates the entire constant propagation.

To check whether my suggestions works for all in range() variants, I printed out the iter_value.

(1)
for i in range(5):
    print(i)

(2)
for c in range(2, 6):
    print(c)

(3)
for d in range(2, 30, 3):
    print(d)

(1)
iter_value: Call(func=Name(id='range', ctx=Load()), args=[Constant(value=5)], keywords=[])
(2)
iter_value: Call(func=Name(id='range', ctx=Load()), args=[Constant(value=2), Constant(value=6)], keywords=[])
(3)
iter_value: Call(func=Name(id='range', ctx=Load()), args=[Constant(value=2), Constant(value=30), Constant(value=3)], keywords=[])

In all cases, iter_value already contains the correct ast object. Therefore, it is not necessary to additionally create two artificial ast objects.

Thanks for the examples. I agree that my implementation of two nested function calls is not a concise way to fix it. However, the actual problem here is, for the code snippet:

for i in range(5):
    a = i*2

If we want to evaluate the right side assignment for the variable a, replacing the variable i (I mean folding the expression) with an iter object in your case seems not workable and semantically correct because the call itself simply return a range object. In the runtime, the variable i is should be an element from the generator.

The actual purpose of this module is to bind the variable definition and its usage.

What is your opinion for this case?

I think this is a special case, as we have here a variable a that relies on another variable i. However, I do not think that such cases should be handled by your framework, because the constant propagation is designed to return a dictionary of values for all identified variables and not to identify the actual values of the generator.

Therfore, I would expect to get a binary operation object for a in your example. In my opinion, identifying the actual generator value of i is up to the developer who is using your framework.

For example, for the variable i, I also have to process the ast.Call object computed by the constant propagation to get the actual values of the generator.

This makes sense. We will discuss this within the team and then make changes. Thanks very much and I really enjoyed the discussion.

Many thanks for the suggestion@simisimon. The suggested implementation has been pushed.

	if hasattr(stmt.target, "id"):
	left_name = stmt.target.id
	iter_value = stmt.iter
	# make a iter call
	iter_node = ast.Call(ast.Name("iter", ast.Load()), [stmt.iter], [])
	# make a next call
	next_call_node = ast.Call(ast.Name("next", ast.Load()), [iter_node], [])
	const_dict[left_name] = next_call_node