-Update mllama to take the cross attention state as embeddings in a batch, more similar to how Llava handles it. This improves integration with the input cache. -Pass locations in a prompt for embeddings using tags similar to Llava. -Abstract interface to vision models so the main runner accesses Clip and Mllama similarly Co-authored-by: Michael Yang <mxyng@pm.me> |
||
|---|---|---|
| .. | ||
| 0001-cuda.patch | ||
| 0002-pretokenizer.patch | ||
| 0003-metal.patch | ||
| 0004-ggml-metal.patch | ||
| 0005-embeddings.patch | ||
| 0006-clip-unicode.patch | ||
| 0007-solar-pro.patch | ||
| 0008-conditional-fattn.patch | ||
| 0009-blas.patch | ||
| 0010-add-mllama-support.patch | ||
| 0011-add-unpad-operator.patch | ||
| 0012-fix-deepseek-deseret-regex.patch | ||