ds4se/chapters

./venkatesh-prasad-ranganath/embraceDynamicArtifacts.md

Closed this issue · 10 comments

timm commented

After review, relabel to 'reviewTwo'. After second review, relabel to 'EditorsComment'.

barik commented

Title of chapter

Embrace dynamic artifacts

URL to the chapter

https://github.com/ds4se/chapters/blob/master/venkatesh-prasad-ranganath/embraceDynamicArtifacts.md

Message?

The message is that we should use dynamic (run-time) artifacts, such as execution logs and call stacks in our software engineering data science activities.

Accessible?

I found the chapter to be quite accessible, with minimal use of jargon. The example of USB drivers is something that most readers will have an understanding of, and domain-specific terms like ASIC are given a corresponding layperson description.

I was a bit thrown off in the first paragraph. My intuition was that "dynamic" meant running the program, and "static" meant the more traditional static analysis type of activities, such as on source code. This intuition seemed to match the rest of the article, which was about USB devices. But with the examples of "version control", "bug reports", and so on -- this didn't seem to cleanly fit this intuition for me. What does it mean for a mailing list to be a static artifact versus a dynamic artifact? This might be as simple as explicitly providing a definition of the two terms so that you and the reader are on the same page, so to speak.

Another bit of terminology that is inconsistent is that the article starts of with static and dynamic, but we don't see those terms again for a while -- instead we see the term "interactions". So explicitly connecting the interaction logs back to dynamic to make this connection obvious for the reader would be useful, I think.

Other than that, I enjoyed the rest of the article. The USB example was a crisp story that carried through the article. It gave a concrete example of how leveraging dynamic artifacts helped them in a particular situation.

The article also has a good hour-glass model: start general, specific example, and then generalize again (with DebugAdvisor and StackMine).

Size?

The length is good. I didn't find the argument of "Lastly, with the rise of cloud computing, we have the necessary power to plow through heaps of dynamic data" to be compelling. That's true of "static artifacts" as well, and isn't unique to dynamic artifacts. What I think is more compelling is that the cloud gives rise to "telemetry", which enables us to collect data at scale from devices from actual customers. In that sense, we definitely should be embracing that sort of dynamic artifact (and from my own understanding from Microsoft, App Insights and such are enabling that sort of intelligence).

Gotta Mantra?

I liked the mantra. The term "embrace" is colorful.

Best Points

The article has a concrete story that provides a measurable, impactful outcome. I wouldn't change that. I also liked the section headers; they did a good job of sign-posting and facilitating the flow of the story.

Adding Pete Rotella's review.

Title of chapter

Embrace Dynamic Artifacts

URL to the chapter

https://github.com/ds4se/chapters/blob/master/venkatesh-prasad-ranganath/embraceDynamicArtifacts.md

Message?

What is the chapter's clear and approachable take away message?

Static artifacts, such as bug reports, version history, etc., are very commonly used for data science work in software engineering. Dynamic artifacts, such as execution logs, crash dumps, traffic logs, etc., are also very useful and may help, for example, save time/effort in testing large systems.

Accessible?

Is the chapters written for a generalist audience (no excessive use of technical terminology) with a minimum of diagrams and references?
How can it be made more accessible to generalist?

The chapter is written for a general audience. There is little or no jargon used.

Size?

Is the chapter the right length?
Should anything missing be added?
Can anything superfluous be removed (e.g. by deleting some section that does not work so well or by using less jargon, less formulae, lees diagrams, less references).?
What are the aspects of the chapter that authors SHOULD change?

The chapter is about the right length. A little more detail would help in the 'Yes, let's onbserve interactions' section, explaining more about the way the experiment was carried out. (I do wonder, however, if the 75%-80% bug coverage using the dynamic approach is adequate? Wouldn't we need in the neighborhood of 90% or better?)

Gotta Mantra?

We encouraged (but did not require) the chapter title to be a mantra or something cute/catchy, i.e., some slogan reflecting best practice for data science for SE? If you have suggestion for a better title, please put them here.

The title is good as-is.

Best Points

What are the best points of the chapter that the authors should NOT change?

The experiment is a good one, as is the description of the DebugAdvisor and StackMine results, but I do wonder if 75% to 80% coverage is adequate. The USB device driver introduction, is good and appropriate/important.

Here's an example from personal experience. => Provide some context. Assuming that this while working at Microsoft, this piece of information will add additional credibility.

USB driver is tested => the USB driver is tested

Windows USB testing team => the Windows USB testing team [several occurrences]

I'm not sure if I agree with the premise in the introduction: "When we talk about data science in the context of software engineering, we often only consider static sources/artifacts such as source code, version history, bug reports, mailing lists, developer network, and organization structure. Why don't we consider dynamic artifacts such as execution logs, crash/core dumps, call stack, and traffic logs?"

This may be true for academia (simply because it's hard to get dynamic artifacts), but I would argue that the value of dynamic artifacts is more accepted in industry (and your three examples all support my point).

With the book we are aiming at a broad audience. Not just academics, but also at software engineers or data scientist in industry. I think your examples are relevant even for people who accept the value of dynamic artifacts. Don't ask them to skip the chapter.

So my challenge to you (in addition to addressing the reviews) is to rework the introduction to make the chapter appeal to a broader audience.

@rvprasad Please prepare a new version of your paper by January 13 taking the reviewers' feedback into account. They offer great advice on how to make this a stronger chapter (e.g., terminology, more details).

Addressing @tzimmermsr comments

  • provided context for the personal experience reference.
  • fixed typos pertaining to missing article "the".
  • fixed the intro to pitch dynamic artifacts as a valuable source of info to enable SE tasks.
  • no idea how to get bib-style references in markdown. Any pointers will be helpful. In the meantime, I will use manually crafted references.

Addressing @barik comments

  • changed the first paragraph to describe static and dynamic artifacts.
  • added the words runtime, requests, and responses to explicitly connect of interactions to dynamic artifacts
  • the use of cloud computing to deal with the size of dynamic data is made explicit.
  • mentioned telemetry in the context of ease of dynamic data collection

Addressing Rotella's comments @tzimmermsr

  • added minor details about the type of clustering that was used and how the number of clusters were chosen. For details about mining patterns and using patterns as features, we provide references to original works.
  • as for 75-80% bug coverage, this is what we observed in our experiments when we merely adapted the technique from USB compatibility testing. Without further experimentation, I can only speculate that the coverage could have been higher had we tuned the technique. So, I am stating what was observed.

Thanks to all the reviewers for their feedback. I have tried to incorporate/address your comments/concerns to the best extent possible. Please let me know if you have further inputs.

Thanks @rvprasad !