/nlp-sandbox

Cloud-based sandbox for text analytics

Primary LanguageHTMLMIT LicenseMIT

nlp-sandbox

Developing a continuous benchmarking envrionment for NLP de-id methods.

Project vision and values

The widespread adoption of Electronic Health Records (EHRs) has enabled secondary use of EHR data for clinical research and healthcare delivery. As much of the detailed patient information is recorded in clinical narratives, unlocking information from clinical narratives and integrating such information with structured EHR data become critical for EHR-based studies. PHI information in clinical narratives becomes a barrier in conducting EHR-based clinical research and sharing the research data across sites.

Vision

  • Create a cloud-based environment that enables the systematic validation of text analytics tools to solve specific tasks (i.e. the “NLP Sandbox”).
  • Populate the “NLP Sandbox” with appropriate reference data sets to be used in shared validation tasks.
  • Engages CTSA hubs to contribute tools and methods to the project and demonstrate their performance, reproducibility, and rigor in such a shared environment

Related Cores

  • Tools and Cloud Infrastructure
  • Next Generation Data Sharing
  • Informatics Maturity and Best Practices

Contact person

Point person (github handle) Site Program Director
Justin Guinney (@jguinney) Sage Bionetworks Melissa Haendel (@mellybelly)

Leads

Project scientific leadership, should be 1-3 persons.

Leads (github handle) Site
Thomas Schaffter (@tschaffter) Sage Bionetworks
James Eddy (@jaeddy) Sage Bionetworks

Team members

Members (github handle) Site
Thomas Schaffter (@tschaffter) Sage Bionetworks
Yao Yan (@yy6linda) Sage Bionetworks
Yooree Chae (@ychae) Sage Bionetworks
James Eddy (@jaeddy) Sage Bionetworks
Justin Guinney (@jguinney) Sage Bionetworks
George Kowalski (@gkowalski) MCW
Bradley Taylor (@btaylormcw) MCW
Tom Dillon (@tmdillon) WashU

Resources

Resource Link Site
GitHub team nlp-team CD2H
GitHub project data2health/projects/7 CD2H
Google folder NLP Sandbox CD2H
Slack channel CD2H workspace / nlp-sandbox CD2H

Access to resources is limited to onboarded participants (CD2H Onboarding Form).

Get involved

We encourage the community to get involved. Please make tickets or provide comments.

References

  1. NLP Sandbox - CD2H Phase III Project Proposal
  2. https://github.com/data2health/nlp-review