-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
The MedSigLIP does explicit input patch resizing outside of the processor, which is missing from the current implementation. The processor function does not resize the inputs; it just adds padding and constructs a vision-language sequence. It does not throw an error if the input image size is a multiple of 16.
Check their official demonstration:
https://huggingface.co/google/medsiglip-448
To be specific, they have the following lines:
def resize(image):
return Image.fromarray(
tf_resize(
images=image, size=[448, 448], method='bilinear', antialias=False
).numpy().astype(np.uint8)
)
resized_imgs = [resize(img) for img in imgs]
texts = [
"a photo of an arm with no rash",
"a photo of an arm with a rash",
"a photo of a leg with no rash",
"a photo of a leg with a rash"
]
inputs = processor(text=texts, images=resized_imgs, padding="max_length", return_tensors="pt").to(device)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working