chembl/chembl_webresource_client

HttpApplicationError when downloading data from chembl document

itssahil opened this issue · 1 comments

I am trying to download data from the ChEMBL database with the simple code below and getting the following error when I try to download data for "Bioorg. Med. Chem."

Code:
from chembl_webresource_client.new_client import new_client
document = new_client.document
docs = document.filter(journal="Bioorg. Med. Chem.").only('document_chembl_id')
compound_record = new_client.compound_record
records = compound_record.filter(document_chembl_id__in=[doc['document_chembl_id'] for doc in docs]).only(['document_chembl_id', 'molecule_chembl_id'])
records

Error:
HttpApplicationError Traceback (most recent call last)
/cluster/app/Python/3.7.2/lib/python3.7/site-packages/IPython/core/formatters.py in call(self, obj)
700 type_pprinters=self.type_printers,
701 deferred_pprinters=self.deferred_printers)
--> 702 printer.pretty(obj)
703 printer.flush()
704 return stream.getvalue()

/cluster/app/Python/3.7.2/lib/python3.7/site-packages/IPython/lib/pretty.py in pretty(self, obj)
400 if cls is not object
401 and callable(cls.dict.get('repr')):
--> 402 return _repr_pprint(obj, self, cycle)
403
404 return _default_pprint(obj, self, cycle)

/cluster/app/Python/3.7.2/lib/python3.7/site-packages/IPython/lib/pretty.py in repr_pprint(obj, p, cycle)
695 """A pprint that just redirects to the normal repr function."""
696 # Find newlines and replace them with p.break
()
--> 697 output = repr(obj)
698 for idx,output_line in enumerate(output.splitlines()):
699 if idx:

~/.local/lib/python3.7/site-packages/chembl_webresource_client/query_set.py in repr(self)
76 return '{0} resource'.format(self.model.name)
77 clone = self._clone()
---> 78 data = list(clone[:Settings.Instance().REPR_OUTPUT_SIZE])
79 length = len(self)
80 if length > Settings.Instance().REPR_OUTPUT_SIZE:

~/.local/lib/python3.7/site-packages/chembl_webresource_client/query_set.py in next(self)
125
126 def next(self):
--> 127 return self.next()
128
129 #-----------------------------------------------------------------------------------------------------------------------

~/.local/lib/python3.7/site-packages/chembl_webresource_client/query_set.py in next(self)
111 return None
112 if not self.chunk and not self.current_index:
--> 113 self.chunk = self.query.get_page()
114 if not self.chunk or self.current_index >= len(self.chunk):
115 self.chunk = self.query.next_page()

~/.local/lib/python3.7/site-packages/chembl_webresource_client/url_query.py in get_page(self)
392 self.logger.info('From cache: {0}'.format(res.from_cache if hasattr(res, 'from_cache') else False))
393 if not res.ok:
--> 394 handle_http_error(res)
395 if self.frmt == 'json':
396 json_data = res.json()

~/.local/lib/python3.7/site-packages/chembl_webresource_client/http_errors.py in handle_http_error(request)
111 exception_class = status_to_exception.get(request.status_code, BaseHttpException)
112 if request.text:
--> 113 raise exception_class(request.url, request.text)
114 raise exception_class(request.url, request.content)
115

HttpApplicationError: Error for url https://www.ebi.ac.uk/chembl/api/data/compound_record.json, server response: <!doctype html>

<!-- Use the .htaccess and remove these lines to avoid edge case issues.

More info: h5bp.com/b/378 -->

<title>Server error &lt; EMBL-EBI</title>
<meta name="description" content="EMBL-EBI"><!-- Describe what this page is about -->
<meta name="keywords" content="bioinformatics, europe, institute"><!-- A few keywords that relate to the content of THIS PAGE (not the whol project) -->
<meta name="author" content="EMBL-EBI"><!-- Your [project-name] here -->

<!-- Mobile viewport optimized: j.mp/bplateviewport -->
<meta name="viewport" content="width=device-width,initial-scale=1">

<!-- Place favicon.ico and apple-touch-icon.png in the root directory: mathiasbynens.be/notes/touch-icons -->

<!-- CSS: implied media=all -->
<!-- CSS concatenated and minified via ant build script-->
<link rel="stylesheet" href="//www.ebi.ac.uk/web_guidelines/css/compliance/develop/boilerplate-style.css">
<link rel="stylesheet" href="//www.ebi.ac.uk/web_guidelines/css/compliance/develop/ebi-global.css" type="text/css" media="screen">
<link rel="stylesheet" href="//www.ebi.ac.uk/web_guidelines/css/compliance/develop/ebi-visual.css" type="text/css" media="screen">
<link rel="stylesheet" href="//www.ebi.ac.uk/web_guidelines/css/compliance/develop/984-24-col-fluid.css" type="text/css" media="screen">

<!-- you can replace this with [projectname]-colours.css. See http://frontier.ebi.ac.uk/web/style/colour for details of how to do this -->
<!-- also inform ES so we can host your colour palette file -->
<link rel="stylesheet" href="//www.ebi.ac.uk/web_guidelines/css/compliance/develop/embl-petrol-colours.css" type="text/css" media="screen">

<!-- for production the above can be replaced with -->
<!--
<link rel="stylesheet" href="//www.ebi.ac.uk/web_guidelines/css/compliance/mini/ebi-fluid-embl.css">
-->


<!-- end CSS-->

    
<!-- All JavaScript at the bottom, except for Modernizr / Respond.

Modernizr enables HTML5 elements & feature detects; Respond is a polyfill for min/max-width CSS3 Media Queries
For optimal performance, use a custom Modernizr build: www.modernizr.com/download/ -->

<!-- Full build -->
<!-- <script src="//www.ebi.ac.uk/web_guidelines/js/libs/modernizr.minified.2.1.6.js"></script> -->

<!-- custom build (lacks most of the "advanced" HTML5 support -->
<script src="//www.ebi.ac.uk/web_guidelines/js/libs/modernizr.custom.49274.js"></script>
EMBL European Bioinformatics Institute
        <nav>
            <ul id="global-nav">
                <!-- set active class as appropriate -->
                                    <li id="services" class=" first "><a href="//www.ebi.ac.uk/services" title="Services">Services</a></li>
                                    <li id="research" class=""><a href="//www.ebi.ac.uk/research" title="Research">Research</a></li>
                                    <li id="training" class=""><a href="//www.ebi.ac.uk/training" title="Training">Training</a></li>
                                    <li id="industry" class=""><a href="//www.ebi.ac.uk/industry" title="Industry">Industry</a></li>
                                    <li id="about" class=" last"><a href="//www.ebi.ac.uk/about" title="About us">About us</a></li>
                                </ul>
        </nav>

    </div>
                            <div id="local-masthead" class="masthead grid_24 nomenu">

        <!-- local-title -->
        <!-- NB: for additional title style patterns, see http://frontier.ebi.ac.uk/web/style/patterns -->

    <div class="" id="local-title">
                                                                <h1><a href="/" title="Back to Server error homepage">Server error</a></h1>
                                        </div>

    <!-- /local-title -->

Something has gone wrong with our web server

Our web server says this is a 500 internal server error: the request cannot be carried out by the server.
This problem means that the service you are trying to access is currently unavailable. We're very sorry.

Please try again but if it keeps happening, you can contact us and we will try to help you.

Explore the EBI:

Examples: blast, keratin, bfl1...

	</section>    </section>

    <!-- End example layout containers -->
<!-- Optional local footer (insert citation / project-specific copyright / etc here -->
    <!--
    <div id="local-footer" class="grid_24 clearfix">
  <p>How to reference this page: ...</p>
</div>
    -->
    <!-- End optional local footer -->
    
<div id="global-footer" class="grid_24">

    <nav id="global-nav-expanded">

        <div class="grid_4 alpha">
            <h3 class="embl-ebi"><a href="//www.ebi.ac.uk/" title="EMBL-EBI">EMBL-EBI</a></h3>
        </div>

        <div class="grid_4">
            <h3 class="services"><a href="//www.ebi.ac.uk/services">Services</a></h3>
        </div>

        <div class="grid_4">
            <h3 class="research"><a href="//www.ebi.ac.uk/research">Research</a></h3>
        </div>

        <div class="grid_4">
            <h3 class="training"><a href="//www.ebi.ac.uk/training">Training</a></h3>
        </div>

        <div class="grid_4">
            <h3 class="industry"><a href="//www.ebi.ac.uk/industry">Industry</a></h3>
        </div>

        <div class="grid_4 omega">
            <h3 class="about"><a href="//www.ebi.ac.uk/about">About us</a></h3>
        </div>

    </nav>

    <section id="ebi-footer-meta">
        <p class="address">EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SD, UK &nbsp; &nbsp; +44 (0)1223 49 44 44</p>
        <p class="legal">Copyright &copy; EMBL-EBI 2013 | EBI is an Outstation of the <a href="http://www.embl.org">European Molecular Biology Laboratory</a> | <a href="/about/privacy">Privacy</a> | <a href="/about/cookies">Cookies</a> | <a href="/about/terms-of-use">Terms of use</a></p>
    </section>

</div>
<script defer="defer" src="//www.ebi.ac.uk/web_guidelines/js/cookiebanner.js"></script> <script defer="defer" src="//www.ebi.ac.uk/web_guidelines/js/foot.js"></script>

sorry for the delayed reply. It seems the server is timing out because is a pretty big query (>6000 document ids). I would recommend sending smaller batches in the second query.

from chembl_webresource_client.new_client import new_client
document = new_client.document
docs = document.filter(journal="Bioorg. Med. Chem.").only('document_chembl_id')
compound_record = new_client.compound_record
doc_ids = [doc['document_chembl_id'] for doc in docs]

# send only 1000
records = compound_record.filter(document_chembl_id__in=doc_ids[0:1000]).only(['document_chembl_id', 'molecule_chembl_id'])
records