vllm/source at 2ff767b51301e07d1e0ad5887eb26e104e2b3a8a - vllm

History

Adrian Abeyta 2ff767b513 Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 ) Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com> Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu> Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com> Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com> Co-authored-by: guofangze <guofangze@kuaishou.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>		2024-04-03 14:15:55 -07:00
..
assets	fix document error for value and v_vec illustration (#3421 )	2024-03-15 16:06:09 -07:00
dev	[Doc] Add docs about OpenAI compatible server (#3288 )	2024-03-18 22:05:34 -07:00
getting_started	[Hardware][Intel] Add CPU inference backend (#3634 )	2024-04-01 22:07:30 -07:00
models	[Model] Add support for Qwen2MoeModel (#3346 )	2024-03-28 15:19:59 +00:00
quantization	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )	2024-04-03 14:15:55 -07:00
serving	Usage Stats Collection (#2852 )	2024-03-28 22:16:12 -07:00
conf.py	[Doc] Fix vLLMEngine Doc Page (#3791 )	2024-04-02 09:49:37 -07:00
index.rst	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 )	2024-04-03 14:15:55 -07:00