Error in setup
azvaska opened this issue · 2 comments
hello I followed the readme but once I issued the command
docker exec -it dataact-broker-backend python dataactcore/scripts/initialize.py -i
this error pops out:
2022-04-07 21:34:18,828 INFO:dataactvalidator.scripts.load_cfda_data:Fetching CFDA file from new-url.com/cfda.csv
Traceback (most recent call last):
File "dataactcore/scripts/initialize.py", line 249, in <module>
main()
File "dataactcore/scripts/initialize.py", line 180, in main
load_domain_value_files(validator_config_path, args.force)
File "dataactcore/scripts/initialize.py", line 93, in load_domain_value_files
load_cfda_program(base_path)
File "/data-act/backend/dataactvalidator/scripts/load_cfda_data.py", line 82, in load_cfda_program
r = requests.get(S3_CFDA_FILE, allow_redirects=True)
File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 76, in get
return request('get', url, params=params, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 528, in request
prep = self.prepare_request(req)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 466, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 316, in prepare
self.prepare_url(url, params)
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 390, in prepare_url
raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL 'new-url.com/cfda.csv': No schema supplied. Perhaps you meant http://new-url.com/cfda.csv?
reading the code I need to set this :
usas_public_reference_url: new-url.com
usas_public_submissions_url: new-url.com
but I don't know what to put, the usaspending doesn't seem to have these files.
since I don't have direct access to all the data, how can I update my DB with the new data ? without having to rebuild it every time from historical data.
Hi @azvaska, we keep these fields defaulted new-url.com
with the intention of having our operation scripts populate them upon deploys as they can change over time. We can update our config examples to use the proper URL but in the meantime, you can use https://files.usaspending.gov/reference_data
for usas_public_reference_url
and that should get you going. Apologies for the inconvenience.
i modifyed the settings and now it's giving me this error:
2022-04-08 14:09:59,682 WARNING:dataactcore.interfaces.db:No current_app, falling back to non-threadsafe database connection
2022-04-08 14:09:59,682 INFO:dataactvalidator.scripts.load_tas:Working with local cars_tas.csv
2022-04-08 14:10:00,399 WARNING:dataactcore.interfaces.db:No current_app, falling back to non-threadsafe database connection
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 3540, in _ensure_valid_index
value = Series(value)
File "/usr/local/lib/python3.7/site-packages/pandas/core/series.py", line 316, in __init__
data = SingleBlockManager(data, index, fastpath=True)
File "/usr/local/lib/python3.7/site-packages/pandas/core/internals/managers.py", line 1516, in __init__
block = make_block(block, placement=slice(0, len(axis)), ndim=1)
File "/usr/local/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 3284, in make_block
return klass(values, ndim=ndim, placement=placement)
File "/usr/local/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 2792, in __init__
super().__init__(values, ndim=ndim, placement=placement)
File "/usr/local/lib/python3.7/site-packages/pandas/core/internals/blocks.py", line 128, in __init__
"{mgr}".format(val=len(self.values), mgr=len(self.mgr_locs))
ValueError: Wrong number of items passed 25, placement implies 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "dataactcore/scripts/initialize.py", line 249, in <module>
main()
File "dataactcore/scripts/initialize.py", line 182, in main
load_tas_lookup()
File "dataactcore/scripts/initialize.py", line 66, in load_tas_lookup
load_tas()
File "/data-act/backend/dataactvalidator/scripts/load_tas.py", line 235, in load_tas
update_tas_lookups(sess, tas_file, update_missing=update_missing, metrics=metrics_json)
File "/data-act/backend/dataactvalidator/scripts/load_tas.py", line 133, in update_tas_lookups
old_data['display_tas'] = old_data.apply(lambda x: concat_display_tas_dict(x), axis=1)
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 3487, in __setitem__
self._set_item(key, value)
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 3563, in _set_item
self._ensure_valid_index(value)
File "/usr/local/lib/python3.7/site-packages/pandas/core/frame.py", line 3543, in _ensure_valid_index
"Cannot set a frame with no defined index "
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
i basically want to run the usaspening.gov API locally but i don't get how to update with the newest data, i can see from where you are pullint it https://www.usaspending.gov/about
but i don't see in the website any links to csv or somethinglike that