| .. |
|
fp8
|
[Core] Refactor model loading code (#4097)
|
2024-04-16 11:34:39 -07:00 |
|
production_monitoring
|
[Doc]Replace deprecated flag in readme (#4526)
|
2024-05-29 22:26:33 +00:00 |
|
api_client.py
|
[Quality] Add code formatter and linter (#326)
|
2023-07-03 11:31:55 -07:00 |
|
aqlm_example.py
|
AQLM CUDA support (#3287)
|
2024-04-23 13:59:33 -04:00 |
|
gradio_openai_chatbot_webserver.py
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
gradio_webserver.py
|
Remove deprecated parameter: concurrency_count (#2315)
|
2024-01-03 09:56:21 -08:00 |
|
llava_example.py
|
[Core] Consolidate prompt arguments to LLM engines (#4328)
|
2024-05-28 13:29:31 -07:00 |
|
llm_engine_example.py
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
logging_configuration.md
|
[MISC] Rework logger to enable pythonic custom logging configuration to be provided (#4273)
|
2024-05-01 17:34:40 -07:00 |
|
multilora_inference.py
|
[CI] Try introducing isort. (#3495)
|
2024-03-25 07:59:47 -07:00 |
|
offline_inference_arctic.py
|
[Model] Snowflake arctic model implementation (#4652)
|
2024-05-09 22:37:14 +00:00 |
|
offline_inference_distributed.py
|
[Doc] Update Ray Data distributed offline inference example (#4871)
|
2024-05-17 10:52:11 -07:00 |
|
offline_inference_embedding.py
|
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734)
|
2024-05-11 11:30:37 -07:00 |
|
offline_inference_neuron.py
|
[Hardware][Neuron] Refactor neuron support (#3471)
|
2024-03-22 01:22:17 +00:00 |
|
offline_inference_openai.md
|
[Frontend] Support OpenAI batch file format (#4794)
|
2024-05-15 19:13:36 -04:00 |
|
offline_inference_with_prefix.py
|
[Bugfix] Set enable_prefix_caching=True in prefix caching example (#3703)
|
2024-03-28 16:26:30 -07:00 |
|
offline_inference.py
|
[Quality] Add code formatter and linter (#326)
|
2023-07-03 11:31:55 -07:00 |
|
openai_chat_completion_client.py
|
Add example scripts to documentation (#4225)
|
2024-04-22 16:36:54 +00:00 |
|
openai_completion_client.py
|
lint: format all python file instead of just source code (#2567)
|
2024-01-23 15:53:06 -08:00 |
|
openai_embedding_client.py
|
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734)
|
2024-05-11 11:30:37 -07:00 |
|
openai_example_batch.jsonl
|
[docs] Fix typo in examples filename openi -> openai (#4864)
|
2024-05-17 00:42:17 +09:00 |
|
save_sharded_state.py
|
[Core] Implement sharded state loader (#4690)
|
2024-05-15 22:11:54 -07:00 |
|
template_alpaca.jinja
|
Support chat template and echo for chat API (#1756)
|
2023-11-30 16:43:13 -08:00 |
|
template_baichuan.jinja
|
Fix Baichuan chat template (#3340)
|
2024-03-15 21:02:12 -07:00 |
|
template_chatglm2.jinja
|
Add chat templates for ChatGLM (#3418)
|
2024-03-14 23:19:22 -07:00 |
|
template_chatglm.jinja
|
Add chat templates for ChatGLM (#3418)
|
2024-03-14 23:19:22 -07:00 |
|
template_chatml.jinja
|
Support chat template and echo for chat API (#1756)
|
2023-11-30 16:43:13 -08:00 |
|
template_falcon_180b.jinja
|
Add chat templates for Falcon (#3420)
|
2024-03-14 23:19:02 -07:00 |
|
template_falcon.jinja
|
Add chat templates for Falcon (#3420)
|
2024-03-14 23:19:02 -07:00 |
|
template_inkbot.jinja
|
Support chat template and echo for chat API (#1756)
|
2023-11-30 16:43:13 -08:00 |
|
tensorize_vllm_model.py
|
[Frontend] [Core] perf: Automatically detect vLLM-tensorized model, update tensorizer to version 2.9.0 (#4208)
|
2024-05-13 14:57:07 -07:00 |