[Bug]: Missing resizing for MedSigLIP patch encoder

## Describe the bug
The MedSigLIP does explicit input patch resizing outside of the processor, which is missing from the current implementation. The processor function does not resize the inputs; it just adds padding and constructs a vision-language sequence. It does not throw an error if the input image size is a multiple of 16. 

Check their official demonstration:
https://huggingface.co/google/medsiglip-448

To be specific, they have the following lines:

```
def resize(image):
    return Image.fromarray(
        tf_resize(
            images=image, size=[448, 448], method='bilinear', antialias=False
        ).numpy().astype(np.uint8)
    )


resized_imgs = [resize(img) for img in imgs]

texts = [
    "a photo of an arm with no rash",
    "a photo of an arm with a rash",
    "a photo of a leg with no rash",
    "a photo of a leg with a rash"
]

inputs = processor(text=texts, images=resized_imgs, padding="max_length", return_tensors="pt").to(device)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Missing resizing for MedSigLIP patch encoder #26

Describe the bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Missing resizing for MedSigLIP patch encoder #26

Description

Describe the bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions