Register [here] (http://www.eventbrite.co.uk/e/contentmine-chemistry-hack-tickets-18534620549) (registration is FREE, places limited to 25 )
==============
Please bring laptops, and [pre-load software] (https://github.com/ContentMine/vms/blob/master/installation_intructions.md).
18 September 2015 | 19 September 2015 |
---|---|
Training Workshop & Publisher Panel Session | Hackday |
9:00 - 18:00 | 10:00 - 17:00 |
[@chemcambridge] (https://twitter.com/chemcambridge)
Contact us via [@TheContentMine] (https://twitter.com/TheContentMine) or contact@contentmine.org
- Peter Murray-Rust @petermurrayrust
- Judith Rommel [@jbr_science] (https://twitter.com/jbr_science)
- Jenny Molloy @jenny_molloy
- The ContentMine Team [@TheContentMine] (https://twitter.com/TheContentMine)
Please read the [Pre-workshop Installation Instructions] (https://github.com/ContentMine/vms/blob/master/installation_intructions.md)
We would also appreciate your feedback
Ever found that the key data you want is published in a text-based PDF journal?
- ...found yourself manually downloading 100 papers click-by-click?
- ...redrawing structures/spectra/graphs so you can recompute/analyze them?
- ...retyping data from tables?
- ...wishing that a computer can do the really boring discovery and retrieval of the data in the literature?
We all have. But new approaches are solving it. That's why Content-Mining (aka text-and-data mining, TDM) is one of the most exciting areas in scientific data. It's even been intensively debated in the European Parliament and Commission. And the UK is leading the way with new exemptions from copyright so that Universities like Cambridge are the ideal places to learn and develop the new techniques.
The workshop will bring together:
- scientists with a need to discover data, especially in chemistry, materials, molecular bioscience - both experimental and computational
- scientific publishers
- library staff
- technology developers.
We'll show how Open software can be used to
- crawl the literature effectively using search APIs
- scrape all the content from publisher web pages (supplemental data, structures)
- normalize PDFs into semantic HTML
- run search plugins to discover particular.
The first day will include overviews, installation of technology [1], and a panel of experts from the participants on policy and practice and a hands-on introduction. The second day will be a project-based hack where small groups will tackle their own communal problems. The event is sponsored by the EPSRC-IAA Knowledge Transfer Fund of the Chemistry Department. Facilitators are from Chemistry and Plant Sciences. Coffee, lunches and a Friday dinner are provided.
[1] all essential technology is Open and from contentmine.org, an Open project funded by the Shuttleworth Foundation.
Times | Session |
---|---|
9:00 | Introductions |
9:15 | What is content mining?
|
9:30 | Think like a content miner
|
Scraping and the anatomy of scrapers
|
|
11:00 | Preparations for panel discussion with publishers |
12:30 | Lunch |
13:30 | Publishers Q&A |
15:30 | Tea time |
16:00 | Entity recognition using AMI
|
18:00 onwards | Informal social event (dinner)
|
Reservation to be confirmed at Browns from 18:00 onwards. |
Times | Session |
---|---|
10:00 | **Hacking in teams working on AMICHEM, Chemical tagger,... ** |
12:30 | Lunch |
13:30 | **Hacking in teams working on AMICHEM, Chemical tagger,... ** |
15:30 | Coffee Break |
16:00 | Presentation of hackday projects
|
16:30 | Panel discussion on accelerating uptake of content mining.
|
17:00 | Event close |
This two day event is intended for researchers or research-related staff who are not currently heavily involved in text and data mining but have at least some pre-existing computational skills. At minimum we expect familiarity with a command line interface and basic coding abilities in some language.