agli sventurati che hanno un conto postale
A Python tool to parse PDF documents from Poste Italiane and convert them into structured JSON or CSV data. It automatically identifies the document type and validates financial data to ensure integrity.
- Automatic Document Detection: Identifies the document type (e.g., BancoPosta statement, Postepay report) from the PDF content.
- Data Validation: Performs validation checks on account statements to ensure balances and totals match the transactional data.
- Multi-Page Transaction Parsing: Accurately handles transaction descriptions that span across multiple pages.
- Multiple Output Formats: Export extracted data to JSON (default) or CSV formats.
- Batch Processing: Analyze a single PDF or an entire directory of documents at once.
- Easily Extendable: The design makes it simple to adapt the parser for future changes in PDF layouts or to support new document types.
- Estratto Conto BancoPosta
- Rendiconto Postepay Evolution
- Lista Movimenti Postepay Evolution
- Clone the repository:
git clone https://github.com/genbs/poste-italiane-parser.git
cd poste-italiane-parser
- Install the required dependencies:
pip install -r requirements.txt
Download the documents you wish to analyze from your Poste Italiane online account, then run the script from your terminal. You can download the document from here
-p
,--path
(Required): Path to the PDF file or a directory containing PDF files.-f
,--format
(Optional): Output format (json
orcsv
). Defaults tojson
.-o
,--output
(Optional): Path for the output file or directory. By default, output is saved to the same directory as the input.-v
,--verbose
(Optional): Enable verbose logging for debugging purposes.
# Extract data from a single PDF to a JSON file
python main.py --path "path/to/documents/statement.pdf"
# Extract data from a single PDF to CSV, specifying an output file
python main.py --path "path/to/documents/postepay_report.pdf" --format csv --output "output/report_data.csv"
# Extract data from all PDFs in a directory and save to an output folder
python main.py "path/to/documents/" -o "out/"
You can also import and use the parser directly in your Python projects.
Install the package:
pip install poste_italiane_parser
Use it in your script:
from poste_italiane_parser import PosteItalianeParser
file_path = "path/to/your/statement.pdf"
try:
data = PosteItalianeParser(file_path)
# Print some of the extracted data
print(f"Document Type: {data['document_type']}")
print(f"Holder: {data['holder']}")
print(f"Final Balance: {data['final_balance']}")
except ValueError as e:
print(f"Error: {e}")
except FileNotFoundError:
print(f"Error: The file was not found at {file_path}")
The result of parsing
{
"generated_at": "string | null",
"document_type": "ESTRATTO_CONTO | LISTA_MOVIMENTI | RENDICONTO",
"currency": "string",
"initial_balance": "float | null",
"final_balance": "float | null",
"iban": "string | null",
"holder": "string",
"card_number": "string | null",
"account_number": "string | null",
"period": {
"start_date": "string",
"end_date": "string"
},
"customer": {
"name": "string",
"street": "string | null",
"city": "string | null",
},
"transactions": [
{
"accounting_date": "string",
"value_date": "string",
"description": "string",
"debits": "float",
"credits": "float",
"value": "float"
}[]
]
}
Note: Dates are formatted as YYYY-MM-DD HH:MM:SS, and all monetary values are floats.
This repository does not include test PDFs to avoid committing sensitive personal data. Instead, tests are designed to run against result files.
To run the test suite, you must first create a [my-test-name].test.json
file for each test case. This file is json formatted and should contain the expected output structure. Here is an example of how to structure your test result file:
{
"path": "tests/xxx.pdf",
"currency": "EURO",
"generated_at": "xxx",
"account_number": "xxxx",
"period_start_date": "xxx",
"period_end_date": "xxx",
"holder": "xxx xxx",
"customer_name": "xxx xxx",
"customer_street": "xxx",
"customer_city": "xxx",
"initial_balance": 0,
"final_balance": 0,
"card_number": "",
"iban": "xxxx",
"transactions": [
{
"accounting_date": "xxx",
"value_date": "xxx",
"description": "xxx",
"credits": 0,
"debits": 0
}
]
}
For the transactions, you can include all expected ones or just a subset.
Once your test result files are set up, run the tests with the verbose flag:
python -m unittest tests/test_PosteItalianeParser.py -v
Contributions are welcome. Please feel free to submit a pull request or open an issue for bugs, feature requests, or improvements.
This project is licensed under the MIT License. See the LICENSE file for details.