ActivitySim/populationsim

ValueError: Buffer dtype mismatch, expected 'const int64_t' but got 'int' in setup_data_structures.py

werdnabae opened this issue · 1 comments

I'm trying to synthesize a population for the entire state of California using population sim. However, I'm getting this error:

Traceback (most recent call last):
  File "C:/Users/andre/OneDrive/Documents/populationsim/california/run_populationsim.py", line 39, in <module>
    sys.exit(run(args))
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\activitysim\cli\run.py", line 284, in run
    pipeline.run(models=config.setting("models"), resume_after=resume_after)
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\activitysim\core\pipeline.py", line 650, in run
    run_model(model)
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\activitysim\core\pipeline.py", line 502, in run_model
    orca.run([step_name])
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\orca\orca.py", line 2177, in run
    step()
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\orca\orca.py", line 973, in __call__
    return self._func(**kwargs)
  File "C:\Users\andre\OneDrive\Documents\populationsim\populationsim\steps\setup_data_structures.py", line 353, in setup_data_structures
    = build_grouped_incidence_table(incidence_table, control_spec, seed_geography)
  File "C:\Users\andre\OneDrive\Documents\populationsim\populationsim\steps\setup_data_structures.py", line 229, in build_grouped_incidence_table
    hh_incidence_table['group_id'] = hh_incidence_table[hh_groupby_cols].merge(
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\frame.py", line 9843, in merge
    return merge(
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\reshape\merge.py", line 162, in merge
    return op.get_result(copy=copy)
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\reshape\merge.py", line 809, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\reshape\merge.py", line 1065, in _get_join_info
    (left_indexer, right_indexer) = self._get_join_indexers()
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\reshape\merge.py", line 1038, in _get_join_indexers
    return get_join_indexers(
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\reshape\merge.py", line 1665, in get_join_indexers
    zipped = zip(*mapped)
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\reshape\merge.py", line 1662, in <genexpr>
    _factorize_keys(left_keys[n], right_keys[n], sort=sort, how=how)
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\reshape\merge.py", line 2428, in _factorize_keys
    llab = rizer.factorize(lk)  # type: ignore[arg-type]
  File "pandas\_libs\hashtable_class_helper.pxi", line 3045, in pandas._libs.hashtable.Int64Factorizer.factorize
ValueError: Buffer dtype mismatch, expected 'const int64_t' but got 'int'
Closing remaining open files:output\pipeline.h5...done
ERROR - activitysim run encountered an unrecoverable error
Traceback (most recent call last):
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\activitysim\cli\run.py", line 284, in run
    pipeline.run(models=config.setting("models"), resume_after=resume_after)
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\activitysim\core\pipeline.py", line 650, in run
    run_model(model)
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\activitysim\core\pipeline.py", line 502, in run_model
    orca.run([step_name])
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\orca\orca.py", line 2177, in run
    step()
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\orca\orca.py", line 973, in __call__
    return self._func(**kwargs)
  File "C:\Users\andre\OneDrive\Documents\populationsim\populationsim\steps\setup_data_structures.py", line 353, in setup_data_structures
    = build_grouped_incidence_table(incidence_table, control_spec, seed_geography)
  File "C:\Users\andre\OneDrive\Documents\populationsim\populationsim\steps\setup_data_structures.py", line 229, in build_grouped_incidence_table
    hh_incidence_table['group_id'] = hh_incidence_table[hh_groupby_cols].merge(
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\frame.py", line 9843, in merge
    return merge(
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\reshape\merge.py", line 162, in merge
    return op.get_result(copy=copy)
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\reshape\merge.py", line 809, in get_result
    join_index, left_indexer, right_indexer = self._get_join_info()
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\reshape\merge.py", line 1065, in _get_join_info
    (left_indexer, right_indexer) = self._get_join_indexers()
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\reshape\merge.py", line 1038, in _get_join_indexers
    return get_join_indexers(
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\reshape\merge.py", line 1665, in get_join_indexers
    zipped = zip(*mapped)
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\reshape\merge.py", line 1662, in <genexpr>
    _factorize_keys(left_keys[n], right_keys[n], sort=sort, how=how)
  File "C:\Users\andre\anaconda3\envs\popsim\lib\site-packages\pandas\core\reshape\merge.py", line 2428, in _factorize_keys
    llab = rizer.factorize(lk)  # type: ignore[arg-type]
  File "pandas\_libs\hashtable_class_helper.pxi", line 3045, in pandas._libs.hashtable.Int64Factorizer.factorize
ValueError: Buffer dtype mismatch, expected 'const int64_t' but got 'int'

Process finished with exit code 1

I previously tried synthesizing a population for just Alameda County in California with the same control variables, and that worked successfully. I checked the input data files, and they are both in the same format. Could the reason I'm getting this error when I'm trying to synthesize for all of California be that my input data is too large/I don't have enough memory? For reference, I'm running this on my personal computer with 16GB memory, but I plan on running it on an AWS server pretty soon.

I have started running the code on a more powerful server, and I am no longer getting this issue anymore.