vllm/layers at 5467ac319636245ded483b31967ac43e543c5fa3 - vllm

History

bnellnm 5467ac3196 [Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )		2024-06-09 16:23:30 -04:00
..
fused_moe	[Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047 )	2024-06-09 16:23:30 -04:00
ops	[Mypy] Part 3 fix typing for nested directories for most of directory (#4161 )	2024-04-22 21:32:44 -07:00
quantization	[Misc][Breaking] Change FP8 checkpoint format from act_scale -> input_scale (#5353 )	2024-06-08 13:54:05 -04:00
__init__.py	Change the name to vLLM (#150 )	2023-06-17 03:07:40 -07:00
activation.py	[Misc] Add CustomOp interface for device portability (#5255 )	2024-06-05 09:18:19 -07:00
layernorm.py	[Misc] Add CustomOp interface for device portability (#5255 )	2024-06-05 09:18:19 -07:00
linear.py	[Feature][Kernel] Support bitsandbytes quantization and QLoRA (#4776 )	2024-06-01 14:51:10 -06:00
logits_processor.py	[Misc] Skip for logits_scale == 1.0 (#5291 )	2024-06-05 15:19:02 -07:00
pooler.py	[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734 )	2024-05-11 11:30:37 -07:00
rejection_sampler.py	[Core][2/N] Model runner refactoring part 2. Combine prepare prefill / decode to a single API (#4681 )	2024-05-15 14:00:10 +09:00
rotary_embedding.py	[Misc] Add CustomOp interface for device portability (#5255 )	2024-06-05 09:18:19 -07:00
sampler.py	[CORE] Improvement in ranks code (#4718 )	2024-05-12 17:47:47 -07:00
vocab_parallel_embedding.py	[Core] Change LoRA embedding sharding to support loading methods (#5038 )	2024-06-06 19:07:57 -07:00