ollama

History

Jongwook Choi 12e8c12d2b Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261 ) When CUDA peer access is enabled, multi-gpu inference will produce garbage output. This is a known bug of llama.cpp (or nvidia). Until the upstream bug is fixed, we can disable CUDA peer access temporarily to ensure correct output. See #961.		2023-11-24 14:05:57 -05:00
..
ggml@9e232f0234	subprocess llama.cpp server (#401 )	2023-08-30 16:35:03 -04:00
gguf@0b871f1a04	update llama.cpp	2023-11-21 09:50:02 -08:00
patches	update llama.cpp	2023-11-21 09:50:02 -08:00
generate_darwin_amd64.go	consistent cpu instructions on macos and linux	2023-11-22 16:26:46 -05:00
generate_darwin_arm64.go	update llama.cpp	2023-11-21 09:50:02 -08:00
generate_linux.go	Disable CUDA peer access as a workaround for multi-gpu inference bug (#1261 )	2023-11-24 14:05:57 -05:00
generate_windows.go	restore building runner with `AVX` on by default (#900 )	2023-10-27 12:13:44 -07:00