/paper-esec-fse-2017

Measuring Neural Efficiency of Program Comprehension. Preprint of our paper:

Primary LanguageJavaCreative Commons Attribution Share Alike 4.0 InternationalCC-BY-SA-4.0

Measuring Neural Efficiency of Program Comprehension

[![CC BY-SA 4.0][cc-by-sa-shield]][cc-by-sa]

This repository contains additional information to the ESEC/FSE 2017 paper:

Janet Siegmund und Norman Peitek und Chris Parnin und Sven Apel und Johannes Hofmeister und Christian Kästner und Andrew Begel und Anja Bethmann und André Brechmann, Measuring Neural Efficiency of Program Comprehension, Proceedings of the European Software Engineering Conference and the ACM SIGSOFT International Symposium on the Foundations of Software Engineering (ESEC/FSE), Seiten (140-150), ACM Press, Sep, 2017 (preprint, digital library)

Appendix: Finding fMRI-Compatible Material

In this appendix, we explain in more detail how we found the correct scrambling degree for the semantic-cues program comprehension of our study. Specifically, we conducted a series of pilot studies with graduate computer-science students, in which we experimented with different degrees of scrambling.

Scrambling to Japanese Characters

First, we tested a high degree of scrambling by translating the entire source code character-wise into Japanese characters. An example is shown below. We scrambled all characters except for the identifier names serving as beacons. The graduate students, of whom nobody spoke Japanese, needed considerably longer than 30 seconds to understand the snippets. Furthermore, when interviewing the participants about how they proceeded, we observed that Japanese characters lead to a clouding of the semantic-cues comprehension by provoking other, unrelated cognitive processes (e.g., visual search). Thus, this scrambling degree was considerably too high for our study.

パブリック  静的  ボイド  メイン (文字列 [] 引数)# {
	型 数字#= 323;
	型 #checksum = 0;
	同時に (数字 != 0) {
		checksum = checksum + 数字 % 10;
		数字 = 数字/10;
	}
	システム.でる.印刷行(checksum);
}

Scrambling with Constant Caesar Shifting

Consequently, in an incrementally conducted second attempt, we reduced the scrambling degree. To this end, we kept the source code in the Latin alphabet, but scrambled it with a Caesar shifting1. We show an example in below. When testing this scrambling degree, participants did not indicate any clouding cognitive processes such as visual search. Thus, this lower level of scrambling degree is more suitable by still allowing semantic-cues comprehension. However, we found that a constant Caesar shifting permits the participants to quickly adapt to the scrambling, such that they learned what a scrambled word stood for (e.g., that qom means int).

Scrambling with Variable Caesar Shifting

In the third iteration, we achieved an effective scrambling by modifying the Caesar shifting for each code snippet. For example, int was differently scrambled in each snippet. The graduate students who worked with these snippets showed that this variable Caesar shifting was effective as a scrambling method, since the participants needed not longer than 30 seconds to understand the snippets. Furthermore, when interviewing the participants afterward, they described that they still applied semantic-cues comprehension. However, one participant mentioned that having code that does not compile was rather confusing (e.g., because of scrambling int to qom). Nevertheless, since the response times and correctness were promising, we decided to evaluate and fine-tune this scrambling degree in a large pilot study. Specifically, we evaluated scrambling Java keywords vs. not scrambling keywords, along with the variations of the two source code aspects (beacons, layout).

public float ayyaoAwyyaky(int[] array) {
    int mgqakyy = 0;
    int equ = 0;

    while (mgqakyy < array.length) {
        equ = equ + array[mgqakyy];
        mgqakyy = mgqakyy + 1;
    }

    float awyyaky = equ / (float) mgqakyy;
    return awyyaky;
}  

Additional Data: Number of Participants, Tasks & Response Correctness

# of Sessions Condition # of Participant-Task-Tupels Response Correctness
14 semantic-cues: beacons, layout-pretty 84 86%
14 semantic-cues: beacons, layout-disrupted 84 89%
14 semantic-cues: no beacons, layout-pretty 84 90%
14 semantic-cues: no beacons, layout-disrupted 84 92%
14 control: bottom-up 42 86%
14 control: syntax 42 71%

Additional Data: Replication of the Siegmund Study

Study # of Participants # of Sessions # of Session-Bottom-up-Task-Tupels # of Session-Syntax-Task-Tupels
Siegmund 2014 17 17 204 204
FSE17 11 14 42 42

Additional Data: Brain Activation Location

Activated Brain Area (Brodmann) Cluster Size (in Voxel) TAL Coordinates (X, Y, Z)
BA6 665 -34, 3, 53
BA21 1352 -56, -39, -3
BA44 547 -46, 11, 22
BA39 left 1249 -44, -56, 18
BA39 right 933 50, -53, 11

Replication

Experimental Setup of Pre-Pilots for Replication

We provide the setup for both conducted pre-pilots in /pre-pilots. Both use the PROPHET tool.

The first pre-pilot (/pre-pilots/pre-pilot-1/) fine-tuned the scrambling degree (i.e., keywords scrambled vs. not scrambled) in combination with beacons.

In the second pre-pilot (/pre-pilots/pre-pilot-2/) we tested the layout variable and the interdependence between beacons and layout.

Used Training & fMRI Source-Code Snippets

Semantic-cues

You can find all twelve code snippets we used in the folder /snippets/semantic-cues/ (in all versions: original, with beacons (BO), without beacons (BS), and pretty layout (LO), and disrupted layout (LD).

The selected algorithms are:

Numerical Strings
ArrayAverage ContainsSubstrings
BinaryToDecimal CountSameCharsAtSamePosition
CrossSum CountVowels
FirstAboveThreshold IntertwineTwoWords
Power Palindrome
SquareRoot ReverseWord

Control: Bottom-up

You can find all six bottom-up code snippets we used in the folder /snippets/bottom-up/.

  • CommonChars
  • CrossSum
  • DoubleArray
  • Factorial
  • MaxInArray
  • SumUpToN

Control: Syntax

You can find all six syntax error code snippets we used in the folder /snippets/syntax/.

  • Average
  • DoubleArray
  • Power
  • ReverseIntArray
  • ReverseWord
  • Swap

Before fMRI Session: Training with PROPHET

The participant had to view, recall and review six code snippets before each fMRI session. This training before the fMRI session was done with PROPHET. The tool guided the participant through this process.

The selection of six snippets was randomized for each participant. Thus, which snippets each participant saw in their sessions varies. Next, we will only show one example for a participant:

  • View BinaryToDecimal snippet
  • Recall the snippet
  • Review BinaryToDecimal snippet
  • View CrossSum snippet
  • Recall the snippet
  • Review CrossSum snippet
  • View ContainSubstring snippet
  • Recall the snippet
  • Review ContainSubstring snippet
  • View CountSameCharsAtSamePosition snippet
  • Recall the snippet
  • Review CountSameCharsAtSamePosition snippet
  • View ArrayAverage snippet
  • Recall the snippet
  • Review ArrayAverage snippet
  • View CountVowels snippet
  • Recall the snippet
  • Review CountVowels snippet

You can find an examplary Prophet experiment file under /prophet/training_example.xml.

fMRI Session: Snippet Sequence

The session in the fMRI scanner featured the same six snippets the participants were trained for. Thus, the fMRI session was also different for each participant. The snippet display order was also randomized. The following example is for one participant with the training from above:

Trial 1:

  • Semantic-cues: BinaryToDecimal (LP, BY)
  • Semantic-cues: CrossSum (LD, BY)
  • Semantic-cues: ContainSubstring (LP, BN)
  • Semantic-cues: CountSameCharsAtSamePosition (LD, BN)
  • Bottom-up: DoubleArray

Trial 2:

  • Semantic-cues: CountSameCharsAtSamePosition (LP, BY)
  • Semantic-cues: CountVowels (LD, BY)
  • Semantic-cues: ArrayAverage (LP, BN)
  • Semantic-cues: BinaryToDecimal (LD, BN)
  • Syntax: Swap

Trial 3:

  • Semantic-cues: CountVowels (LP, BY)
  • Semantic-cues: ArrayAverage (LD, BY)
  • Semantic-cues: BinaryToDecimal (LP, BN)
  • Semantic-cues: CrossSum (LD, BN)
  • Bottom-up: SumUpToN

Trial 4:

  • Semantic-cues: ArrayAverage (LP, BY)
  • Semantic-cues: BinaryToDecimal (LD, BY)
  • Semantic-cues: CrossSum (LP, BN)
  • Semantic-cues: ContainSubstring (LD, BN)
  • Syntax: Average

Trial 5:

  • Semantic-cues: ContainSubstring (LP, BY)
  • Semantic-cues: CountSameCharsAtSamePosition (LD, BY)
  • Semantic-cues: CountVowels (LP, BN)
  • Semantic-cues: ArrayAverage (LD, BN)
  • Bottom-up: Factorial

Trial 6:

  • Semantic-cues: CrossSum (LP, BY)
  • Semantic-cues: ContainSubstring (LD, BY)
  • Semantic-cues: CountSameCharsAtSamePosition (LP, BN)
  • Semantic-cues: CountVowels (LD, BN)
  • Syntax: StringReverse

Each snippet was shown for 30 seconds. Between each snippet was a break of 30 seconds.


1: Caesar shifting is a substitution cipher in which each letter is replaced by a letter some fixed number of positions down the alphabet.