greenelab/deep-review

Current Section Status

cgreene opened this issue · 64 comments

@agitter is no longer updating the outline, we are no longer accepting new sections

Of course you should feel welcome to contribute to sections that already exist. We're also looking for people to take primary responsibility for sections. I'll copy an e-mail from @agitter below that has our - as far as I know - most up to date status:

As described in the intro https://github.com/greenelab/deep-review/blob/master/sections/02_intro.md we’ve broken the paper into Categorize, Study, and Treat sections. Each of these has been outlined, though we welcome suggestions for new sub-sections. There is also a Discussion of general issues pertinent to all three application areas and future outlook. Here are that primary topics in each section that are unclaimed as far as I know.

Categorize:

  • Finished a first draft of everything

Study:

  • Finished a first draft of everything

Treat:

  • Finished a first draft of everything

Discussion:

  • Finished a first draft of everything (pending pull requests in progress)

#88 and #2 provide some context on our goals for the review and how we hope to differentiate it from existing papers. We don’t want to enumerate all deep learning papers in biomedicine so some of the Study sub-sections may be cut entirely if there is nothing especially interesting to say about them. To start working on a sub-section, you can create a pull request. #147 is an example of a completed pull request and #174 is one I’m actively working on where I’m still outlining and searching for relevant literature.

Please let us know if you want to discuss anything else specific or else we can take the discussion to GitHub so that others can contribute.

Edit by @agitter
@cgreene had good suggestions in #200 that are helpful prompts for anyone starting a topic sub-section

  • introductory paragraph to the problem and/or data type
  • places where deep NNs have been successful [with some interpretation as to why]
  • places where deep NNs have not been successful [again maybe with some why]
  • places where deep NNs have not been applied, but you think they should be

Yup. Definitely going to help with evaluation and interpretation sections. I'll work on these in the next 2-3 weeks.

Thanks @cgreene. My list above accounts for sections that are current being written (#174, #183, #191) and @gailrosen volunteering to write about metagenomics.

@AvantiShri I'm roping you into the Interpretation section for this review. Lets plan and start writing it.

Chatted with @qiyanjun and @jacklanchantin at PSB and they are going to take a stab at "Transcription factors and RNA-binding proteins"

Edit: Conference was actually PSB! :)

I can lead the rest efforts in the Categorize and Treat sections. I am familiar with these topics, and has just made first attempt at Categorize (sent PR yesterday).

In addition, I can help @brettbj with his new Data sharing and privacy section later if he needs my help, since this is my primary research.

I previously helped on the Genomics section (led by @agitter ), but did not complete yet. My main obstacle is how to differentiate with existing reviews with extensive coverage of Transcription factors, etc. Reading all these alternative reviews took way too long than I expected. External help, as recommended by @cgreene , would be very helpful.

@XieConnect : agree that the sheer number of papers in many of these domains (and rapid rate of new articles appearing) has become killer. I suggest perhaps even a further divide and conquer approach. @qiyanjun and @jacklanchantin are interested in the TF question at least. Maybe you guys could strategize on how to most effectively divide the literature.

@XieConnect I also advocate spending time on papers that are especially interesting or relevant to our guiding question. I don't think we should feel compelled to cover every single paper in an area since our goal is to address a specific theme and not enumerate all relevant work.

@cgreene @agitter Great advice. I'll stand-by and aid @qiyanjun and @jacklanchantin later if help is needed. For now, I will wrap up the aforementioned healthcare related sections first.

@gailrosen I'm not sure what our current deadline is (January 15 is not realistic), but thanks a lot for starting this section. When it's ready for review and comments please proceed with the pull request.

@blengerich will work on 'categorizing patients for clinical decision making' in the Treat section.

I added some prompts from @cgreene to the original post

@blengerich Are you still interested in drafting part of the Treat section?

I'm new to this! Happy to work on splicing and single-cell sections if no one's working on them right now.

@bdo311 both those sections are free, and I added you to the list in the first post. Please check out some of the suggested prompts there if you haven't already as you think about organizing the sections.

We're making a serious effort to have a first draft of most of the sections within the next week or so. I updated the outline in the first post to show what has been drafted and what remains untouched.

I'll argue that anything that hasn't been drafted in ~6 months isn't exciting enough to be considered "transformative". These topics can be alluded to in passing and covered to very briefly with a couple sentences. If there are any unclaimed topics you would like to "save", please let us know that you have started working on a draft.

@blengerich, you were interested in drafting something for 'categorizing patients for clinical decision making'. Will you have time in the next week to work on that?

@cgreene I mostly focused on the status of the Study section. Please make updates for Categorize if you have any. I think you and I can write most of the Discussion if no one else jumps in to do it.

Hi @agitter, sorry for the delay. I've had a bit of trouble finding successful papers to include in the 'categorizing patients for clinical decision making' section. If you would like, in a few days, I can push a draft with a slightly more pessimistic tone that focuses on the challenges underlying this application. However, if you have other ideas, or anyone else would like to take over the section, I am happy to step aside.

@blengerich We are completely willing to take a pessimistic tone on some of these sections. If the state of the area is that things aren't working yet or haven't made a big difference over previous baselines, then this is the message we should deliver. We can still project an optimistic view of future opportunities, if that's warranted.

We'd be grateful to have anything you can contribute. This is an important topic, and if you aren't able to contribute I'm not sure that we have anyone else who can step in before the deadline. Thanks!

Thanks for the feedback. I have a draft in progress and will push it in a couple of days.

I think I can do the remainder of the "study" subsections -- a lot of the content will potentially overlap things that have already been written so I'll either keep it short or try to reorganize.

@bdo311 Thanks for the additional help. Do you think there is a strong message to deliver in these remaining areas beyond what has been covered in other recent reviews? There has been a lot of deep learning papers in miRNA binding prediction and epigenetics, and we shouldn't try to present all of them. I think it would be most valuable to focus on whether neural network are being applied to the right problems (e.g. we had earlier discussions on predicting enhancer locations versus enhancer targets), offer such improved performance that new types of biological conclusions can be draw, have architectures that are particularly well-suited for the data types (beyond 1D convolutions on sequence), etc.

For the variant detection, I suggested #159 and #171 because #159 considers an unusual type of transfer learning that uniquely takes advantage of pre-trained networks for very different problems. That's something that could not be replicated with a different type of classifier. #171 (and its predecessor #99) provides a counter point to #159.

@agitter I will think about those over the coming weekend. My initial feeling is still that most improvements in accuracy are incremental and the real benefit lies in interpretation and integration of datasets -- which we've talked about in some of the sections we've written for the 04_study.md. Variant detection should be a different story though and I'll read that carefully.

I updated the outline again today.

@blengerich thanks for writing one of the remaining Treat sections. My Treat section contribution on ligand-based chemical screening should be coming this weekend.

@jacklanchantin we have several open TODOs on the first draft of the TF binding section. Do you think you'll have time to work on those in the next week or two per the updated timeline in #310? Specifically, I would like to see us be more critical about what constitutes state of the art results and how impactful deep learning has been in this area. Some evaluation strategies make it seem as if the TF binding prediction problem is solved and others show much more pessimistic performance. There are also a few specific papers we wanted to cover, and maybe even others that aren't in the TODOs.

@jisraeli offered to help with a few remaining sections, especially evaluation. You can see the basic outline we have in 06_discussion.md. Some of the problems with ROC have also come up in individual domains, such as my draft of #313. It would be great to pull in any lessons learned from the DREAM challenge if there is anything we can reference, even a stable URL.

Note that 04_study.md also has a first draft on TF binding. Because you've worked on that topic, it would be great to have your revisions there. My comments directly above summarize some of the open TODOs and there are many related papers listed as GitHub issues in this repo. I'm also wondering if the GitHub URL is the best DragoNN reference or if we should use something else.

There are some contribution suggestions here. You don't have to use the reference tags.

@agitter , I should be able to finish those TODOs by the 24th. I am leaving in a few days for ICLR in France, so I have some things I need to do before I leave, but I should be able to. I should be able to work on it this Thursday.

@jacklanchantin thanks. It may be a good idea to coordinate with @jisraeli, who may also make a round of edits.

Re evaluation section - I can't comment much on the DREAM challenge until the results are published. But there are enough papers out now that close examination of supplementary sections will reveal where the TFBS deep learning field stands so should be sufficient for this discussion.

Re referencing DragoNN - we are aiming to put up the manuscript on bioarxiv this month so we will have something to cite.

@jisraeli I agree we'll be okay without DREAM if those results aren't available. There are indeed plenty of other sources to draw upon, e.g., I believe you've commented on the supplement of #258.

We are hoping to get a preprint of the DREAM paper out by mid June. If the review is not accepted by then, it would be great to cite the discussion of performance there.

@akundaje That timing should work well. My best guess for our timeline is an initial submission in a few weeks, which means we should still be in review or making revisions when the DREAM preprint comes out. We can add it during our revisions.

Btw here are some slides from my CSHL Sysbio talk on the DREAM Challenge https://drive.google.com/file/d/0B_ssVVyXv8ZSYWIyWnppRk5ZMDQ/view?usp=sharing

Specifically focus on slides 20-24 for the dramatic differences between performance measures as expected.

@akundaje I looked at your slides, and if I'm reading it correctly, number 30 is quite profound. For the purposes of this review, it would be hard for us to claim deep learning for TF binding has revolutionized predictive performance if a much simpler model can beat it in the DREAM setting. We'll definitely want to incorporate the DREAM preprint when it is out.

@qiyanjun Thanks for offering to help. DeepChrome is currently referenced in the Gene Expression subsection but not discussed in great detail.

@akundaje This is all very interesting. Even without the DREAM preprint, it would be great to include some of this conversation into the first draft of the review using support from existing literature as much as possible. Then we could add the DREAM reference and make small changes without changing the entire section during revision. I expect many readers will have some familiarity with DeepBind and DeepSEA but not have a deep understanding of the complexity of the domain and the limitations of standard CNNs.

@jisraeli or @jacklanchantin do you have any interest in this type of edit?

I started writing something on Discussion:transfer learning (#129, #330, #331, #332), also considering to add a paragraph on multi-modal/integrative DL in the same section (#14, #110, #112, #238).

Thanks @alxndrkalinin, noted in the first post. I think we can be brief about image-to-image transfer learning and don't need to cite too many primary papers. #47 did a good job with it already.

Hello, as discussed in #317 I will try to add a 'Drug repositioning' subsection with a brief overview of deep learning applications in the Treat section. I'll be covering #38, #113, #317, #333 and a few other papers for general context/background.

I'd also like to slightly modify the relevant paragraph in the Introduction to better match this subsection.

Can someone please clarify what papers were you considering for the 'Effects of drugs on transcriptomic responses'? I would expect this to substantially overlap with drug repositioning so I'd be happy to review these papers, see if I've missed something and, if possible, merge everything into a single subsection.

EDIT: Was it maybe #24? If so I can see it's now been included in Study (Gene expression). I would then suggest to remove the 'Effects of drugs on transcriptomic responses' from Treat unless there was anything else?

@enricoferrero Thanks. I support editing the intro, but for practical purposes I suggest not editing that text directly until we figure out what we're doing with #246. Perhaps add a TODO in Treat that we need to update the intro.

The transcriptomic sub-section was indeed focused on #24 so I agree we can remove it.

I've unfortunately been really busy the past 2 weeks, but just submitted an initial draft of the variant calling section. I think I can do a brief review of microRNA binding and whether that is a good focus for deep learning research -- otherwise, likely not going to have time for anything else.

@bdo311 excellent, you've been a huge help. I'll review #344, and I agree that a short overview of miRNA binding is appropriate.

See #347 for a draft for Transfer learning section. Review is requested.

Next I'm going to add a couple of things on imaging to Categorize:Imaging applications and Study:Morphological phenotypes as discussed with @cgreene before.

I suggest to keep empty sections that are important like evaluation, data limitations, and code, data, and model sharing. Even if assigned authors don't have time, I believe we can write at least something and not leave them completely out.

Thanks @alxndrkalinin. I agree those Discussion sections are too important to drop. Those may be the only exception to the April 24 drop deadline, though one of us will need to draft something soon.

I agree with @agitter : I think the dropping at this stage would be in treat/categorize/study. I agree miRNA binding would be great @bdo311 if you have time to contribute it.

@agitter sorry it took me a while to get back to you. I added the TODO revisions for the TFBS section. What's the best way to go about submitting the request since the old one already got merged?

@jacklanchantin - I think the best workflow is to pull / update your fork's master to the current master. Make a branch off of it. Make the changes to the branch. Then file a pull request back to greenelab's master.

@cgreene I think (?) I did it.

Looks like your branch is up to date with master:
https://github.com/jacklanchantin/deep-review

But I think your pull request is going back into your own master:
https://github.com/jacklanchantin/deep-review/pull/1/files

Can you file the pull request against the greenelab master? That'll make it show up here. Thanks!

I'm really sorry I'm so late on the response. I didn't realize I submitted it to my own fork by accident. Hopefully it's not too late.

Unless there is any particular reason not to, I'd like to add #342 to the EHR section in Categorize. I'll aim to open a small pull request over the next few days.

@enricoferrero : please do - thanks!

I'm working on code/data/model sharing and the conclusion. I'd like to get a lot of feedback once I have a rough conclusion ready.

I finished fleshing out in the interpretability section. I am sorry about the delay; I had a flare up of RSI and had to code using speech recognition software. Travis says the build is failing, but isn't giving details in the log. Any suggestions? My pull request is here: #368. The markdown at least seems to be rendering correctly. My style violations are all to do with trailing whitespace, which is mostly a consequence of the speech recognition software.

Thank you very much for the contribution, and sorry to hear about the RSI. I'll head to #368 to check out the build errors.

With the exception of the Evaluation section, I'm not anticipating any more external contributions. Please let us know if you are working on something. We plan to finish an end-to-end draft very soon so we can start having co-authors review, revise, and approve the submission.

We're close to having a complete draft. Here are the remaining major TODOs I'm aware of:

  • TF binding pull request #356. @jacklanchantin can you make these changes today or tomorrow?
  • Two Discussion pull requests #367 and #368 that are in progress
  • Evaluation section from @jisraeli, which will be coming soon
  • Need to write an abstract
  • Need to introduce the Study and Treat sections
  • Need to merge the two versions of the Introduction

@cgreene, do you have time to work on any of these? Any else want to grab one? If not, I plan to address them myself as time permits.

Once we close these, I'll ask all authors to review the full draft so we can move toward submission.

@jacklanchantin Do you still have the local commits on your machine? You should be able to push them if they still exist.

I'll draft the abstract.

Thanks @cgreene! @agapow found some additional typos in these comments that I didn't patch, and there were lingering suggestions from #246 and #363

Abstract submitted.