nomad-coe/nomad

AssertionError: archive query was not stopped before running out of things to query

Opened this issue · 14 comments

While using the ArchiveQuery class I'm getting the following error. Any idea why is this happening? I'm new to this. I am using nomad-lab==0.10.4

Traceback (most recent call last):
  File "c:\Users\Hasan Sayeed\Documents\hasan\SSE DISCOVER\nomad-examples-main\all_formula.py", line 44, in <module>
    for i, result in enumerate(tqdm(query)):
  File "C:\Users\Hasan Sayeed\anaconda3\envs\liml\lib\site-packages\tqdm\std.py", line 1180, in __iter__
  File "C:\Users\Hasan Sayeed\anaconda3\envs\liml\lib\_collections_abc.py", line 1043, in __iter__
    v = self[i]
  File "C:\Users\Hasan Sayeed\anaconda3\envs\liml\lib\site-packages\nomad\client.py", line 528, in __getitem__
    self.call_api()
  File "C:\Users\Hasan Sayeed\anaconda3\envs\liml\lib\site-packages\nomad\client.py", line 481, in call_api
    assert False, 'archive query was not stopped before running out of things to query'
AssertionError: archive query was not stopped before running out of things to query

The line for i, result in enumerate(tqdm(query)): is raising the error. The code I'm using is something like this:

excluded_elements = [
    "He", "Ar", "Ne", "Xe", "Rn", "U", "Th", "Rn", "Tc", "Po", "Pu", "Pa",
    ]

inluded_elements = [
    "Kr"
    ]
# %% query NOMAD database
query = ArchiveQuery(
    # url="http://nomad-lab.eu/prod/rae/api",
    query={"$and": {"domain": "dft", "atoms": inluded_elements, "$not": {"atoms": excluded_elements}}},
    required={
        "section_metadata": {"calc_id": "*", "formula": "*"},
    },
    per_page=457,
    max=None,
)

calc_ids = []
formulas = []
for i, result in enumerate(tqdm(query)):
    
    if result.section_metadata is not None:
        # Checking if nested attribute exists
        calc_ids.append(result.section_metadata.calc_id)
        formulas.append(result.section_metadata.formula)
    else:
        calc_ids.append(None)
        formulas.append(None)

@TLCFEM Can you have a look and respond?

This is the old 0.x version, let me find the code first.

The above query can only retrieve 2623 entries out of 2639 entries which is given by the search, probably there is some mismatch in es database. The mismatch between two numbers results in the process unable to exit.

@hasan-sayeed The archive query module was rewritten last year, try the new version of the package if you can.


endpoint: https://nomad-lab.eu/prod/rae/api/archive/query

curl: curl -X POST "https://nomad-lab.eu/prod/rae/api/archive/query" -H "accept: application/json" -H "Content-Type: application/json" -d "{ \"query\":{ \"$and\":[ { \"$and\":{ \"domain\":\"dft\", \"atoms\":[ \"Kr\" ], \"$not\":{ \"atoms\":[ \"He\", \"Ar\", \"Ne\", \"Xe\", \"Rn\", \"U\", \"Th\", \"Rn\", \"Tc\", \"Po\", \"Pu\", \"Pa\" ] } } }, { \"dft.quantities\":[ \"calc_id\", \"section_metadata\", \"formula\" ] }, { \"domain\":\"dft\" } ], \"upload_id\":[ \"-5WEQAwUSa6qNSrtS0YFrg\", \"0-sUPGbCT4WTrNYF4-xkZQ\", \"0-tMqNL2SeSD52QbIqS6HQ\", \"1CFTsyFoRHuDrpYmuEwYCw\", \"1dcza3VyR4CZrYlDDxvp9A\", \"1sJnuLHzTgqKJTy2ED4Ccg\", \"2BDahabrRruTFIuNtEWuZg\", \"3KZBVXpdQ2au-USEG3FwKw\", \"4AKh_pFmSbyUoWphewbpqQ\", \"5A1BV5LBTbGfuvTFh6F0GQ\", \"5szJqC2FT5OB9p7fLiK_tw\", \"63q7h8s8Q-i6V5nSLka5rw\", \"6d0atLpgTkS5T3Mo0B6XUg\", \"7dhd5feNRs-bHkoiZZT7_g\", \"7jo7hwe4TZWwSCP_V6n1lA\", \"85r-kk1jRDqHciRlTFYWaw\", \"87KqS7FoSjqwavM97n1gKQ\", \"8bKbFFuQSLKzOs1fqI1LKQ\", \"9RQY9goSQ5mxDkxK55IM-g\", \"9ZfzmWU1TSqhaB7ODxYxPg\", \"AB5-pX7TQguSsbDsbe7zSQ\", \"AGeD22S7TuCM0TGpN8uAoQ\", \"ATy4zekAQgW-Nlu-0-87Vg\", \"C7QPEANxRZyUNN6dQad6JA\", \"C8ONoVsgRNqNB_tlktyM2w\", \"CGfKMPo4RnKz7cBUeg6hHA\", \"CK5d9zL0R_aSDMsfcE1XBQ\", \"CbcV5E5sT56tBx1zm02_sQ\", \"CzHuvadpTzq0Q0OIue8zFA\", \"EdOAcYNYTRS27KxbLpeQ_A\", \"Eu_2B_sqSWCF8j90_J_noQ\", \"F6FpfGPiRSaYxoQqgfNL3A\", \"GHo0Jhq_QeiZWBwwaaN2xA\", \"HNiiLJx-RKKmt1a7SHf_Lw\", \"HdxBRNzHRUmjslEiA5X2QA\", \"Hnn2jc8eRRGG-Z8nMtn2Bg\", \"IDD9DWq5RZOiPuwGBlL-FA\", \"IG_G7iC3T2O9EmwstA84rg\", \"JvdvikbhQp673R4ucwQgiA\", \"KGjBTj5RS6aRbun4xjI9vw\", \"LKCl_Lx4Sau5OlTgy2cFhg\", \"L_LSn0amQlKS6Q1DCni4nQ\", \"MF3kN7zlT-q942wC1G33Jw\", \"N6rQDIFJT36PituBeUrqEA\", \"NMZHoyxFT-24w9HTXWmjwg\", \"QX3_VJ6GQxeXsViuxMFMmA\", \"QmF_fe-LRbKHB7Q4E_IkFQ\", \"SSNM3hfoS36fUpAJgeja0A\", \"Sv0sxBCMTH-thjqog0duBQ\", \"V0VeoQtWTs-wuoSuf7OfGw\", \"V_yLS8GuSVi4WBEcImF5JA\", \"VeoMtyTSRrSs4CIddGqHnQ\", \"VsrtDJIRRAqmtZ2PYH_-GQ\", \"WQgXb3oxSYuOrApLWoiw6w\", \"X9YPeM80Q8CrIyRoshkThg\", \"XUEzfFbkS7GCyfvK9hMaPw\", \"XaeVWNeQT4-bREGGOOPxCw\", \"YDXZgPooRb-31Niq48ODPA\", \"YSDtEt1iS6e59M2GSBIbHg\", \"YU_MRCWmQGeQv1ppfi0NMw\", \"YaAffqKMTWajmMCmhx_Euw\", \"_ijarO1hQCqsTpOyb62WXw\", \"b0EK1FueRny3TUzfhCp4hw\", \"b5BQDtBcQFKChE0PsrjIAw\", \"bA32e7cGRcakHNURF4gyfw\", \"bsyaWhwoSlipvKoMZaCRDg\", \"dffybQ5jTsi_3g_iTIvfEw\", \"e-UOMrI-QDGModmBnxIaBA\", \"eJ2MMA9VTfGOkBr5DNNXxA\", \"eJexi2o0RGmZ-HvagQ0DMg\", \"gr1xwRnQQ8yBfXUxkg3clw\", \"hMs410WsTYqW4TD7bl9oww\", \"hXjxFrSARi6vr8ZsckYj7Q\", \"hiSoHoH4QOel0HmSbYs58Q\", \"hyn062OhTxuVIXHALACrgg\", \"ijAHprsWQLmy0MBIKdg_8w\", \"izLzrw0-TZiD8nPFwsBSCw\", \"kMRH5OqXQbyMfWGop7iNNQ\", \"lIFjElE2QIugwAnJ70O70w\", \"lgTUnpdGTNuFamjKt9oDSA\", \"lvkJgbIwReWa1KpnpnKslg\", \"m824S2hdSyKx5JJTavPPOw\", \"mYVFFThpQVKnW7sulH9QlA\", \"nHo8kktjSviBVsqMt0BLuw\", \"o02al1UpTqWi7vTk6DhOTA\", \"qD9p5359QUSEL0o4L9NvEA\", \"qxaE3d-xSYai6Wc0dF31pA\", \"tZwGlGgwTyiD1NyWNp-Fxg\", \"uCSlNMEWQfOS5TMvsQXz1A\", \"uVFWiBJDTTC0CwVOmv6ktg\", \"w7Wcfk1NRF6xH2aQ6PWH8Q\", \"wKiS4s-NS9Ky5o1OtYKI2w\", \"xlIKWbFDTv2YyoNGCLp8sw\", \"yNgGjXLoQ3CnZxcLTxO7JQ\", \"zPDpvd6rSaWovlbK5KWnWg\", \"zZrgusawTCWOuuT0C18R_g\" ] }, \"required\":{ \"section_metadata\":{ \"calc_id\":\"*\", \"formula\":\"*\" } }, \"raise_errors\":false, \"aggregation\":{ \"per_page\":4000 }}"

payload:

{
  "query":{
    "$and":[
      {
        "$and":{
          "domain":"dft",
          "atoms":[
            "Kr"
          ],
          "$not":{
            "atoms":[
              "He",
              "Ar",
              "Ne",
              "Xe",
              "Rn",
              "U",
              "Th",
              "Rn",
              "Tc",
              "Po",
              "Pu",
              "Pa"
            ]
          }
        }
      },
      {
        "dft.quantities":[
          "calc_id",
          "section_metadata",
          "formula"
        ]
      },
      {
        "domain":"dft"
      }
    ],
    "upload_id":[
      "-5WEQAwUSa6qNSrtS0YFrg",
      "0-sUPGbCT4WTrNYF4-xkZQ",
      "0-tMqNL2SeSD52QbIqS6HQ",
      "1CFTsyFoRHuDrpYmuEwYCw",
      "1dcza3VyR4CZrYlDDxvp9A",
      "1sJnuLHzTgqKJTy2ED4Ccg",
      "2BDahabrRruTFIuNtEWuZg",
      "3KZBVXpdQ2au-USEG3FwKw",
      "4AKh_pFmSbyUoWphewbpqQ",
      "5A1BV5LBTbGfuvTFh6F0GQ",
      "5szJqC2FT5OB9p7fLiK_tw",
      "63q7h8s8Q-i6V5nSLka5rw",
      "6d0atLpgTkS5T3Mo0B6XUg",
      "7dhd5feNRs-bHkoiZZT7_g",
      "7jo7hwe4TZWwSCP_V6n1lA",
      "85r-kk1jRDqHciRlTFYWaw",
      "87KqS7FoSjqwavM97n1gKQ",
      "8bKbFFuQSLKzOs1fqI1LKQ",
      "9RQY9goSQ5mxDkxK55IM-g",
      "9ZfzmWU1TSqhaB7ODxYxPg",
      "AB5-pX7TQguSsbDsbe7zSQ",
      "AGeD22S7TuCM0TGpN8uAoQ",
      "ATy4zekAQgW-Nlu-0-87Vg",
      "C7QPEANxRZyUNN6dQad6JA",
      "C8ONoVsgRNqNB_tlktyM2w",
      "CGfKMPo4RnKz7cBUeg6hHA",
      "CK5d9zL0R_aSDMsfcE1XBQ",
      "CbcV5E5sT56tBx1zm02_sQ",
      "CzHuvadpTzq0Q0OIue8zFA",
      "EdOAcYNYTRS27KxbLpeQ_A",
      "Eu_2B_sqSWCF8j90_J_noQ",
      "F6FpfGPiRSaYxoQqgfNL3A",
      "GHo0Jhq_QeiZWBwwaaN2xA",
      "HNiiLJx-RKKmt1a7SHf_Lw",
      "HdxBRNzHRUmjslEiA5X2QA",
      "Hnn2jc8eRRGG-Z8nMtn2Bg",
      "IDD9DWq5RZOiPuwGBlL-FA",
      "IG_G7iC3T2O9EmwstA84rg",
      "JvdvikbhQp673R4ucwQgiA",
      "KGjBTj5RS6aRbun4xjI9vw",
      "LKCl_Lx4Sau5OlTgy2cFhg",
      "L_LSn0amQlKS6Q1DCni4nQ",
      "MF3kN7zlT-q942wC1G33Jw",
      "N6rQDIFJT36PituBeUrqEA",
      "NMZHoyxFT-24w9HTXWmjwg",
      "QX3_VJ6GQxeXsViuxMFMmA",
      "QmF_fe-LRbKHB7Q4E_IkFQ",
      "SSNM3hfoS36fUpAJgeja0A",
      "Sv0sxBCMTH-thjqog0duBQ",
      "V0VeoQtWTs-wuoSuf7OfGw",
      "V_yLS8GuSVi4WBEcImF5JA",
      "VeoMtyTSRrSs4CIddGqHnQ",
      "VsrtDJIRRAqmtZ2PYH_-GQ",
      "WQgXb3oxSYuOrApLWoiw6w",
      "X9YPeM80Q8CrIyRoshkThg",
      "XUEzfFbkS7GCyfvK9hMaPw",
      "XaeVWNeQT4-bREGGOOPxCw",
      "YDXZgPooRb-31Niq48ODPA",
      "YSDtEt1iS6e59M2GSBIbHg",
      "YU_MRCWmQGeQv1ppfi0NMw",
      "YaAffqKMTWajmMCmhx_Euw",
      "_ijarO1hQCqsTpOyb62WXw",
      "b0EK1FueRny3TUzfhCp4hw",
      "b5BQDtBcQFKChE0PsrjIAw",
      "bA32e7cGRcakHNURF4gyfw",
      "bsyaWhwoSlipvKoMZaCRDg",
      "dffybQ5jTsi_3g_iTIvfEw",
      "e-UOMrI-QDGModmBnxIaBA",
      "eJ2MMA9VTfGOkBr5DNNXxA",
      "eJexi2o0RGmZ-HvagQ0DMg",
      "gr1xwRnQQ8yBfXUxkg3clw",
      "hMs410WsTYqW4TD7bl9oww",
      "hXjxFrSARi6vr8ZsckYj7Q",
      "hiSoHoH4QOel0HmSbYs58Q",
      "hyn062OhTxuVIXHALACrgg",
      "ijAHprsWQLmy0MBIKdg_8w",
      "izLzrw0-TZiD8nPFwsBSCw",
      "kMRH5OqXQbyMfWGop7iNNQ",
      "lIFjElE2QIugwAnJ70O70w",
      "lgTUnpdGTNuFamjKt9oDSA",
      "lvkJgbIwReWa1KpnpnKslg",
      "m824S2hdSyKx5JJTavPPOw",
      "mYVFFThpQVKnW7sulH9QlA",
      "nHo8kktjSviBVsqMt0BLuw",
      "o02al1UpTqWi7vTk6DhOTA",
      "qD9p5359QUSEL0o4L9NvEA",
      "qxaE3d-xSYai6Wc0dF31pA",
      "tZwGlGgwTyiD1NyWNp-Fxg",
      "uCSlNMEWQfOS5TMvsQXz1A",
      "uVFWiBJDTTC0CwVOmv6ktg",
      "w7Wcfk1NRF6xH2aQ6PWH8Q",
      "wKiS4s-NS9Ky5o1OtYKI2w",
      "xlIKWbFDTv2YyoNGCLp8sw",
      "yNgGjXLoQ3CnZxcLTxO7JQ",
      "zPDpvd6rSaWovlbK5KWnWg",
      "zZrgusawTCWOuuT0C18R_g"
    ]
  },
  "required":{
    "section_metadata":{
      "calc_id":"*",
      "formula":"*"
    }
  },
  "raise_errors":false,
  "aggregation":{
    "per_page":4000
  }
}

Hi Theodore! Thank you for responding. Yes I tried the new version (1.0.8). But it was giving me the error below. And it seemed that 'resource' is an Unix specific package. I'm trying this on my windows machine. Is this the reason for this error @TLCFEM ?

ModuleNotFoundError                       Traceback (most recent call last)
d:\RW\SSE DISCOVER\nomad-examples-main\all_formula.ipynb Cell 1 in <cell line: 4>()
      [2](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W0sZmlsZQ%3D%3D?line=1) from tqdm import tqdm
      [3](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W0sZmlsZQ%3D%3D?line=2) # import nomad.client
----> [4](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W0sZmlsZQ%3D%3D?line=3) from nomad.client import ArchiveQuery
      [6](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W0sZmlsZQ%3D%3D?line=5) # exclude noble gases and certain radioactive elements, source: https://github.com/sparks-baird/mat_discover/blob/4e65b710b948c7ce269cc1741c12e219507aa2dd/mat_discover/utils/generate_elasticity_data.py#L74-L76
      [7](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W0sZmlsZQ%3D%3D?line=6) # fmt: off
      [8](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W0sZmlsZQ%3D%3D?line=7) excluded_elements = [
      [9](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W0sZmlsZQ%3D%3D?line=8)     "He", "Ar", "Ne", "Xe", "Rn", "U", "Th", "Rn", "Tc", "Po", "Pu", "Pa",
     [10](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W0sZmlsZQ%3D%3D?line=9)     ]

File c:\Users\hasan\miniconda3\lib\site-packages\nomad\client\__init__.py:19, in <module>
      1 #
      2 # Copyright The NOMAD Authors.
      3 #
   (...)
     16 # limitations under the License.
     17 #
---> 19 from .archive import ArchiveQuery
     20 from .api import Auth
     21 from .upload import upload_file

File c:\Users\hasan\miniconda3\lib\site-packages\nomad\client\archive.py:29, in <module>
     26 from keycloak import KeycloakOpenID
...
---> 56 import resource
     57 import os
     59 from nomad import config

ModuleNotFoundError: No module named 'resource'

This has been fixed last year, please apply the following patch.

diff --git a/nomad/utils/__init__.py b/nomad/utils/__init__.py
index b33effa77..6b9379260 100644
--- a/nomad/utils/__init__.py
+++ b/nomad/utils/__init__.py
@@ -53,7 +53,6 @@ import collections
 import logging
 import inspect
 import orjson
-import resource
 import os
 
 from nomad import config
@@ -248,7 +247,10 @@ def timer(logger, event, method='info', lnr_event: str = None, log_memory: bool
         The method yields a dictionary that can be used to add further log data.
     '''
     def get_rss():
-        return resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
+        if os.name != 'nt':
+            import resource
+            return resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
+        return 0
 
     kwargs = dict(kwargs)
     start = time.time()

It worked! Thank you.

I have one more question. What's the difference between the two urls?

http://nomad-lab.eu/prod/v1/api/v1/entries/archive/query
and
https://nomad-lab.eu/prod/rae/api/archive/query

For the same query I'm getting different number of materials from these two urls.

https://nomad-lab.eu/prod/rae/api/archive/query queries the old version of the project.

http://nomad-lab.eu/prod/v1/api/v1/entries/archive/query is the current version (v1) we are currently supporting.

Not sure if they are running two different indexing databse but I presume the former one returns fewer results, which shall be a subset of the response returned by the newer one.

The former one is returning more materials actually! I'm querying for all the materials that has "Li" in it but not contain "He", "Ne", "Ar", "Kr", "Xe", "Rn", "U", "Th", "Rn", "Tc", "Po", "Pu", "Pa". The former url is giving me 506776 while the latter url is giving me 469571 entries.

Then I have no idea, but this may explain why the older version can only return 2623 entries out of 2639 entries. I think additional time shall be spent on this to pin down which exactly those 16 entries are, and why they exist in indexing but cannot be retrieved.

Gotcha! Thanks

I was just trying to run this example of ArchiveQuery I found here https://github.com/nomad-coe/nomad/blob/develop/examples/archive/archive_query.py. But it's giving me an error like below. Any idea how to solve it?

RuntimeError                              Traceback (most recent call last)
d:\RW\SSE DISCOVER\nomad-examples-main\all_formula.ipynb Cell 2 in <cell line: 27>()
      [6](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W1sZmlsZQ%3D%3D?line=5) from nomad.metainfo import units
      [8](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W1sZmlsZQ%3D%3D?line=7) query = ArchiveQuery(
      [9](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W1sZmlsZQ%3D%3D?line=8)     query={
     [10](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W1sZmlsZQ%3D%3D?line=9)         'results.method.simulation.program_name': 'VASP',
   (...)
     [24](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W1sZmlsZQ%3D%3D?line=23)         }
     [25](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W1sZmlsZQ%3D%3D?line=24)     })
---> [27](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W1sZmlsZQ%3D%3D?line=26) for result in query.download(10):
     [28](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W1sZmlsZQ%3D%3D?line=27)     calc = result.workflow[0].calculation_result_ref
     [29](vscode-notebook-cell:/d%3A/RW/SSE%20DISCOVER/nomad-examples-main/all_formula.ipynb#W1sZmlsZQ%3D%3D?line=28)     formula = calc.system_ref.chemical_composition_reduced

File c:\Users\hasan\miniconda3\lib\site-packages\nomad\client\archive.py:431, in ArchiveQuery.download(self, number)
    428     number = self.fetch()
    429 elif pending_size < number:
    430     # if not sufficient fetched entries, fetch first
--> 431     self.fetch(number - pending_size)
    433 print('Downloading required data...')
    435 return asyncio.run(self._download_async(number))

File c:\Users\hasan\miniconda3\lib\site-packages\nomad\client\archive.py:406, in ArchiveQuery.fetch(self, number)
    394 '''
    395 Fetch uploads from remote.
...
     34         "asyncio.run() cannot be called from a running event loop")
     36 if not coroutines.iscoroutine(main):
     37     raise ValueError("a coroutine was expected, got {!r}".format(main))

RuntimeError: asyncio.run() cannot be called from a running event loop

Do I need any specific version? Because I tried nest-asyncio==1.5.4 to 1.5.6, giving me same error.