Indexing of bed files for region search

Question

Indexing of bed files for region search

Parul-Kudtarkar opened this issue 7 years ago · 4 comments

Hi,

The indexing(peak indexing) of bed files slows down significantly as more experimental data-sets are added. This might not be evident on servers running for longer time, since I believe re-indexing is done only for newer data-sets. However, every-time a new server is launched there is complete indexing which is slow due to larger data-set (peak files). Is there a workaround this issue?

Thank you!
Parul Kudtarkar

Answer 1 · 2017-11-01T23:33:01.000Z

You can run separate EC2 instance with Elasticsearch installed on it and edit the security group rules to allow your instances to talk over 9200-9300 port ranges. Then peak indexer uses the remote machine as specified here: https://github.com/ENCODE-DCC/encoded/blob/master/buildout.cfg#L89. New instances can now connect to machine that has the indexed data.

Answer 2 · 2017-11-01T23:38:24.000Z

Thank you so much @Bek for a quick response.

Answer 3 · 2017-11-02T00:58:00.000Z

Great! This works

Answer 4 · 2017-11-07T19:44:37.000Z

@Bek a quick question, the peak_indexer.py and region_search.py scripts would be native to the ec2 instance running those scripts and not machine with indexed data, right?

Thank you!
Parul Kudtarkar