| .. |
|
core
|
[FIX] Fix styles in automatic prefix caching & add a automatic prefix caching benchmark (#3158)
|
2024-03-03 14:37:18 -08:00 |
|
engine
|
Add health check, make async Engine more robust (#3015)
|
2024-03-04 22:01:40 +00:00 |
|
entrypoints
|
Push logprob generation to LLMEngine (#3065)
|
2024-03-04 19:54:06 +00:00 |
|
lora
|
[Neuron] Support inference with transformers-neuronx (#2569)
|
2024-02-28 09:34:34 -08:00 |
|
model_executor
|
Push logprob generation to LLMEngine (#3065)
|
2024-03-04 19:54:06 +00:00 |
|
transformers_utils
|
Support starcoder2 architecture (#3089)
|
2024-02-29 00:51:48 -08:00 |
|
worker
|
Push logprob generation to LLMEngine (#3065)
|
2024-03-04 19:54:06 +00:00 |
|
__init__.py
|
Bump up to v0.3.3 (#3129)
|
2024-03-01 12:58:06 -08:00 |
|
block.py
|
Add Automatic Prefix Caching (#2762)
|
2024-03-02 00:50:01 -08:00 |
|
config.py
|
Push logprob generation to LLMEngine (#3065)
|
2024-03-04 19:54:06 +00:00 |
|
logger.py
|
Make vLLM logging formatting optional (#2877)
|
2024-02-20 14:38:55 -08:00 |
|
outputs.py
|
Add metrics to RequestOutput (#2876)
|
2024-02-20 21:55:57 -08:00 |
|
py.typed
|
Add py.typed so consumers of vLLM can get type checking (#1509)
|
2023-10-30 14:50:47 -07:00 |
|
sampling_params.py
|
[Fix] Don't deep-copy LogitsProcessors when copying SamplingParams (#3099)
|
2024-02-29 19:20:42 +00:00 |
|
sequence.py
|
Push logprob generation to LLMEngine (#3065)
|
2024-03-04 19:54:06 +00:00 |
|
test_utils.py
|
Use CuPy for CUDA graphs (#2811)
|
2024-02-13 11:32:06 -08:00 |
|
utils.py
|
[Minor fix] The domain dns.google may cause a socket.gaierror exception (#3176)
|
2024-03-04 19:17:12 +00:00 |