Commit graph

2 commits

Author SHA1 Message Date
Derek Higgins
87b8530e3f fix: handle encoding errors when adding files to vector store
- Add try-catch block around data.decode() to handle UnicodeDecodeError
- Implement UTF-8 fallback when detected encoding fails
- re-raise origional exception if fallback fails
- add unit tests

Fixes #2572: UnicodeDecodeError when uploading files with problematic encodings

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-07-03 17:36:33 +01:00
Kevin Postlethwait
a57985eeac
fix: add check for interleavedContent (#1973)
# What does this PR do?
Checks for RAGDocument of type InterleavedContent

I noticed when stepping through the code that the supported types for
`RAGDocument` included `InterleavedContent` as a content type. This type
is not checked against before putting the `doc.content` is regex matched
against. This would cause a runtime error. This change adds an explicit
check for type.

The only other part that I'm unclear on is how to handle the
`ImageContent` type since this would always just return `<image>` which
seems like an undesired behavior. Should the `InterleavedContent` type
be removed from `RAGDocument` and replaced with `URI | str`?

## Test Plan


[//]: # (## Documentation)

---------

Signed-off-by: Kevin <kpostlet@redhat.com>
2025-05-06 09:55:07 -07:00