How to get the summary if the model output consists of int and str?
simon5u opened this issue · 1 comments
simon5u commented
Describe the bug
torchinfo.py", line 448, in traverse_input_data
result = aggregate(
TypeError: unsupported operand type(s) for +: 'int' and 'str'
It seems like the torchinfo.py cannot mix the different model outputs.
elif isinstance(data, Iterable) and not isinstance(data, str):
aggregate = aggregate_fn(data)
result = aggregate(
[traverse_input_data(d, action_fn, aggregate_fn) for d in data]
)
To Reproduce
Steps to reproduce the behavior:
- Install the lavis model from https://github.com/salesforce/LAVIS
salesforce-lavis 1.0.0
transformers 4.25.0 - Run the following code to get the summary:-
import torch
from PIL import Image
# load sample image
raw_image = Image.open("docs/_static/merlion.png").convert("RGB")
import torch
from lavis.models import load_model_and_preprocess
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# loads BLIP caption base model, with finetuned checkpoints on MSCOCO captioning dataset.
# this also loads the associated image processors
model, vis_processors, _ = load_model_and_preprocess(name="blip_caption", model_type="base_coco", is_eval=True, device=device)
# preprocess the image
# vis_processors stores image transforms for "train" and "eval" (validation / testing / inference)
image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
# generate caption
output = model.generate({"image": image})
# ['a large fountain spewing water into the air']
from torchinfo import summary
text_input = ["a large statue of a person spraying water from a fountain"]
samples = {"image": image, "text_input": text_input}
summary(model, input_data=[{"image": image, "text_input": text_input}])
Expected behavior
To produce the model summary
JDRanpariya commented
I'm having the same issue, is there any update on this?