CoronaWhy/task-geo

Test fails because ['sub_region', 'city'] not in dataset. What to do?

ilovechai opened this issue · 7 comments

Description

I wrote a test for the hdx_formatter. The test failed because the check_dataset_format checks the order of columns in the dataset. The dataset that's fetch does not have 'sub_region,city' columns and so it fails.

@ManuelAlvarezC What should be done in this case? Do I create an empty column or should I remove the hard check for the two columns from the check_dataset_format function?

Hi @cryptox31, and thanks for reporting.

Could you attach an snippet of code replicating what you where doing and the traceback you got?
If you could also get a print of data.columnsfor your data it will be a plus.

Thanks.

PS. About your question, what we should do is make sure you are reporting a legit bug and then fix it.

@ManuelAlvarezC
Code snippet: (If you could also answer the question in #Setup section)

from unittest import TestCase

import pandas as pd

from task_geo.data_sources.hdx import hdx_formatter
from task_geo.testing import check_dataset_format


class TestHdxApi(TestCase):

    def test_validate_formatter(self):
        """ Validate formatter result according to Data Model"""
        # Setup
        raw = pd.read_csv('../../fixtures/hdx_fixture.csv') # Any idea why content root path does not work? 

        # Run
        data = hdx_formatter.acap_formatter(raw)
        print(data.columns)

        # Check.
        check_dataset_format(data)

Columns in dataset:

# Order.  V.           V.  
Index(['country', 'region', 'iso', 'category', 'measure', 'targeted_pop_group',
       'comments', 'non_compliance', 'date_implemented', 'source',
       'source_type', 'entry_date', 'link'],
      dtype='object')

This the error


Failure
Traceback (most recent call last):
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/unittest/case.py", line 628, in run
    testMethod()
  File "/Users/krishna.sheth@ibm.com/Desktop/MyCodeWorkspace/CoronaWhy/task-geo/tests/data_sources/hdx/test___init__.py", line 21, in test_validate_formatter
    check_dataset_format(data)
  File "/Users/krishna.sheth@ibm.com/Desktop/MyCodeWorkspace/CoronaWhy/task-geo/task_geo/testing/__init__.py", line 34, in check_dataset_format
    assert (locations == sorted(locations)), message
AssertionError: The correct ordening of the columns is "country, region, sub_region, city"


Assertion failed


Ran 1 test in 0.051s

FAILED (failures=1)

Process finished with exit code 1

Assertion failed

Assertion failed

Ok @cryptox31.

Just found the bug and fixed it. I will link this issue on the PR i will open.

About your installing issues, can you detail me how you run the tests?

On the tests for the PR linked up here, I can run them with:
make test without issue.

@ManuelAlvarezC The #47 PR has all the columns 'country', 'region', 'sub_region', 'city' and that's why you cannot see the error there.

The check_dataset_format(data) function breaks if there are less that 4 of the specified columns in the dataset

@cryptox31, not exactly.

The issue is that there was an error that will make the exception be raised in almost any case, and now that has been fixed. When I was fixing it, I check that your particular case worked, and it did, although I didn't create tests for that, which has misguided you.

I have already made the test and will upload my PR ASAP.

@ManuelAlvarezC Did you uploaded your PR for the testing fix mentioned in #41 (comment)?