/acl-anthology-helper

To help search, filter, and download papers from 'acl anthology' (https://aclanthology.org/).

Primary LanguagePythonMIT LicenseMIT

acl-anthology-helper

License: MIT

To help search, filter, and download papers from 'acl anthology' (https://aclanthology.org/).

Main Features

  • Retrieve papers from acl anthology.
    retrieve directly from website acl anthology.
    e.g. Retriever.acl(2021, ConfConsts.LONG)
    download all papers's info to local (MySQL database).
    e.g.
    db = AnthologyMySQL(cache_enable=True)
    db.create_tables()
    db.load_data() # load data and put into database
  • Import ABuilder to support chain operations for MySQL.
    e.g.
    data = ABuilder().table('paper').where({"year": ["in", years_limit]}).where({"venue": ["in", venue_limit]}).query()
  • Filter papers with by keyword.
    e.g. filtered = papers.filter('title', 'xxx') | papers.filter('abstract', 'xxx')
    e.g. filtered = papers.and_containing_filter(attr, [keyword1, keyword2])
  • Download papers.
    e.g. downloader.multi_download(filtered, download_path)
  • Local cache available.
  • Log available.
  • Statistics available (although I only count the total number of papers).

Get Started

  • Firstly. MySQL is required. Mine is MySQL 8.
    Configurate your MySQL database and add a src/configuration/mysql_cfg.py file.
    The example of src/configuration/mysql_cfg.py is as follows:
class MySQLCFG(object):
    HOST = 'localhost'
    PORT = 3306
    USER = "root"
    PASSWORD = "xxx"
    DB = "xxx"

Meanwhile, create the corresponding database in your MySQL database.

- Secondly. If you want to use ABuilder.
You need to make a tasks/database.py with configurations of you MySQL.
You can refer to the homepage of ABuilder.

In the latest version, I made the tasks/database.py get info from the configuration. No need to make this file any more:

  • Download and decompress the code, open a terminal and checkout to the root directory.
    run
pip install requirements.txt
cd tasks
python basic_task.py

By running this code, this basic_task will firstly download all papers within a certain time span from Acl Anthology to the local disk, and then search papers by input key words.

Note

1. Comments

I develop this project by Python 3.6, and it doesn't support python 2.

2023.6.14 The code is updated to support the lastest acl anthology pages. Current python version is 3.10 . 2023.7.2 Update the README.

2. A survey paper is written with this tool

@article{tang2022recent,
  title={Recent advances in neural text generation: A task-agnostic survey},
  author={Tang, Chen and Guerin, Frank and Li, Yucheng and Lin, Chenghua},
  journal={arXiv preprint arXiv:2203.03047},
  year={2022}
}

3. Others

homepage

There are many conferences and contents belonging to them.

Choose one, and we can see papers' list.