HiPool

A work-in-progress modified implementation of HiPool to support experiments on CuRIAM.

HiPool, from Hierarchical Pooling, is described in the paper "HiPool: Modeling Long Documents Using Graph Neural Networks" from ACL 2023.

This is not the original repo for HiPool and I am not an author on the HiPool paper. Please see that repo here.

Setup

Create conda/mamba environment.

mamba env create -f environment.yml
mamba activate hipool

Install hipool locally.

pip install --upgrade build
pip install -e .

Download datasets.
- CuRIAM: Included with repo.
- IMDB: I think this is the dataset.
  - I renamed the main csv file to imdb_sample.csv and removed most rows for faster debugging, since this dataset is not important for what I'm experimenting with.

Misc

[work-in-progress]

This repo uses jaxtyping and typeguard to enforce correct tensor dimensions at runtime. If you see an unfamiliar type annotation or decorators like in the example the below, it's for type checking.

@jaxtyped(typechecker=typechecker)
def some_function(x: Float[torch.Tensor, "10, 768"]):
    pass

I recommend taking a look at the jaxtyping docs.

TODOs

Some long documents are too big for GPU vram right now
Batching right now should allow for single documents, but worth testing
Eval needs final pieces put together and then needs to be tested
Decide on consistent variables for type annotations

Cite HiPool

@inproceedings{li2023hipool,
  title={HiPool: Modeling Long Documents Using Graph Neural Networks},
  author={Li, Irene and Feng, Aosong and Radev, Dragomir and Ying, Rex},
  booktitle={Proceedings of the Association for Computational Linguistics (ACL)},
  year={2023},
  url={https://aclanthology.org/2023.acl-short.16/}
}

mkranzlein/HiPool

HiPool

Links

Setup

Misc

TODOs

Cite HiPool