| .. |
|
__init__.py
|
Change the name to vLLM (#150)
|
2023-06-17 03:07:40 -07:00 |
|
cache_engine.py
|
[Model] Jamba support (#4115)
|
2024-07-02 23:11:29 +00:00 |
|
cpu_model_runner.py
|
[Bugfix] Fix broadcasting logic for multi_modal_kwargs (#6836)
|
2024-07-31 10:38:45 +08:00 |
|
cpu_worker.py
|
[Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125)
|
2024-07-26 13:50:10 -07:00 |
|
embedding_model_runner.py
|
[Core] Add span metrics for model_forward, scheduler and sampler time (#7089)
|
2024-08-09 13:55:13 -07:00 |
|
enc_dec_model_runner.py
|
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126)
|
2024-08-14 17:55:42 +00:00 |
|
model_runner_base.py
|
[BugFix] Fix use of per-request seed with pipeline parallel (#6698)
|
2024-07-30 10:40:08 -07:00 |
|
model_runner.py
|
[Core] Fix tracking of model forward time in case of PP>1 (#7440)
|
2024-08-16 13:46:01 -07:00 |
|
neuron_model_runner.py
|
[Bugfix] update neuron for version > 0.5.0 (#7175)
|
2024-08-15 09:44:14 -07:00 |
|
neuron_worker.py
|
[Bugfix] update neuron for version > 0.5.0 (#7175)
|
2024-08-15 09:44:14 -07:00 |
|
openvino_model_runner.py
|
[Bugfix] Fix broadcasting logic for multi_modal_kwargs (#6836)
|
2024-07-31 10:38:45 +08:00 |
|
openvino_worker.py
|
[core][distributed] support n layers % pp size != 0 (#6115)
|
2024-07-03 16:40:31 -07:00 |
|
tpu_model_runner.py
|
[TPU] Use mark_dynamic to reduce compilation time (#7340)
|
2024-08-10 18:12:22 -07:00 |
|
tpu_worker.py
|
[TPU] Set per-rank XLA cache (#7533)
|
2024-08-14 14:47:51 -07:00 |
|
utils.py
|
[Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942)
|
2024-08-06 16:51:47 -04:00 |
|
worker_base.py
|
[misc][plugin] add plugin system implementation (#7426)
|
2024-08-13 16:24:17 -07:00 |
|
worker.py
|
[misc] use nvml to get consistent device name (#7582)
|
2024-08-16 21:15:13 -07:00 |
|
xpu_model_runner.py
|
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126)
|
2024-08-14 17:55:42 +00:00 |
|
xpu_worker.py
|
[ci] set timeout for test_oot_registration.py (#7082)
|
2024-08-02 10:03:24 -07:00 |