PRUNERS/archer

How can I contribute?

Closed this issue · 15 comments

I know my way around LLVM and also have written a few passes. I also once worked on a paper which dealt with detecting deadlock detection. Please let me know how can I get started?

Sounds great and welcome!

I would first read our IPDPS16 paper and SC14 workshop paper to build up a understanding on our technologies. The citations are at the bottom of README.md. Then, I encourage you to ask specific questions you may have and also let us now if you have specific problems you want to address?

A suggestion: could you change the title of issue as "How can I contribute," and then put your question into the body of the issue?

@SumedhArani: Thanks. Look forward to hearing from you!

I read the papers published by you and as cited in your readme file. I also had a cursory glance on the written code. I was able to gauge as to what was being done.

I wish to contribute to the following point as mentioned in your IEEE paper.

we plan to crack open each of these potentially racy regions and apply fine-grained static techniques in order to identify and exclude race-free sub-regions within it.

What do you've in mind regarding the fine-grained static techniques you intend to apply to identify race free sub regions and appropriately exclude them?

Before answering this question, let me ask you some questions.

Are you only interested in furthering research or also in contributing usability improvements features etc y?

The reason I am asking this is, it would probably be difficult for you to get into the gut of the core techniques without engaging us with regularly scheduled concall etc and make direct impacts. But as a first step, I have some ideas about adding user interfaces to help extract "unique races" out of many race reports... This has been one of the common feedbacks we got from our users: they can make good use of scalable display of race reports generated out of many regression tests of the application. In addition, such a smaller project may give you more feel about this tool and where our research team is headed.

That'll be a nice start, I'll definitely get started on this smaller project as of now.
I'm using macOS X Sierra and am having some issues with getting the software installed.
Once I'm done on that, I'll try to enhance usability improvements to get a hang of the project.
I have been going through the code base and I feel I'm able to make a good progress.

I'm an undergraduate student and I am definitely interested in furthering research and I've once worked on generating trace for improved deadlock detection on the basis of a research paper.

So answering to your questions, it's a yes for both. It'll be really great if we are able to engage regularly and guide me to get to the core techniques used and in the near future make a direct impact regarding the same.

Let me know about what ideas you've regarding extracting unique races.

@dongahn
If I'm getting you right when you say extract "unique data races", you mean that on the race report generated the number of races seen are correspondingly proportionate to the number of threads in the sample example as given in the readme.md file but all are pointing to the same error where multiple threads are trying to write onto the array?

If so, as per my preliminary observations, I think we can make use of line numbers and the operation. If they happen to be a match, we can check further.

When you say parse the race report, do you suggest to redirect it to a output file and then have a program that analyses the parse report?

On the user interface tools, I think the first step would always be able to parse a race report and load each incident into memory...

Have I got the problem statement right?

@dongahn
You've any specific language to be used to implement the parsing on the file containing the reports?

Thanks.
Sumedh A.

I would say Python or some functional language (if you are familiar with any Racket, Huskell, Scala). I am sure we are all familiar with Python, or a Functional Language would force me to learn it :)

Let's see what @dongahn thinks.

@simoatze Even I had thought about using Python!

@dongahn your valuable inputs tend to definitely be interesting and worth pursuing. I've started exploring and trying to get to the know how of how tsan reports

Will get back to you with more definitive action plan very soon.

Thanks,
Sumedh.

I've been going through the source code of reporting routine of tsan and what @dongahn suggests can be certainly done and does add a benefit of doing a in situ reduction of data races instead of parsing it after the reports are outputted.

As per my known knowledge of tsan reporting routine and the data races I'm exposed to, my plan would be(as per my observation)

  1. To store the immediate return stack i.e where the race is taking place in a data structure which will contain two attributes (read location and write location)
  2. To compare any further of data race being reported with the ones stored earlier
  3. If repeated, don't print
    I was thinking as to having dictionary with key as the the read and write location and value a vector with all the repeated instance and then eventually print the key or somewhere on these lines.

I do have a road map(plan) as to how to do the above task.
Is it possible to have a much more detailed discussion say via email or Slack @dongahn @simoatze ?

Thanks,
Sumedh Arani.

Sounds like progress!

This sounds right. I think a key is to make this a callback architecture. One can have a default function as you described which does a reduction over the code location. But some ppl may want to have a bit finer granularity differentating the code location that came from different call paths... A callback design should allow multilevel reduction, I think.

@dongahn Could you please elaborate as I didn't understand clearly

But some ppl may want to have a bit finer granularity differentating the code location that came from different call paths

Also like what sort of multilevel reduction are you thinking

A callback design should allow multilevel reduction

A callback design suggested by you is definitely advantageous in the situation as suggested by you but I could not understand as to what issues were you actually thinking.
I mean to ask these questions so as to I can understand the various scenarios and cleanly handle them before designing a solution.

Thank you,
Sumedh Arani.

OK. Move the unique race extraction to #13 for better tracking.