| .. |
|
production_monitoring
|
Refactor Prometheus and Add Request Level Metrics (#2316)
|
2024-01-31 14:58:07 -08:00 |
|
api_client.py
|
[Quality] Add code formatter and linter (#326)
|
2023-07-03 11:31:55 -07:00 |
|
gradio_openai_chatbot_webserver.py
|
Add gradio chatbot for openai webserver (#2307)
|
2024-01-11 19:45:56 -08:00 |
|
gradio_webserver.py
|
Remove deprecated parameter: concurrency_count (#2315)
|
2024-01-03 09:56:21 -08:00 |
|
llm_engine_example.py
|
Refactor LLMEngine demo script for clarity and modularity (#1413)
|
2023-10-30 09:14:37 -07:00 |
|
multilora_inference.py
|
multi-LoRA as extra models in OpenAI server (#2775)
|
2024-02-17 12:00:48 -08:00 |
|
offline_inference_distributed.py
|
Add one example to run batch inference distributed on Ray (#2696)
|
2024-02-02 15:41:42 -08:00 |
|
offline_inference_neuron.py
|
[Neuron] Support inference with transformers-neuronx (#2569)
|
2024-02-28 09:34:34 -08:00 |
|
offline_inference_with_prefix.py
|
Add Automatic Prefix Caching (#2762)
|
2024-03-02 00:50:01 -08:00 |
|
offline_inference.py
|
[Quality] Add code formatter and linter (#326)
|
2023-07-03 11:31:55 -07:00 |
|
openai_chatcompletion_client.py
|
lint: format all python file instead of just source code (#2567)
|
2024-01-23 15:53:06 -08:00 |
|
openai_completion_client.py
|
lint: format all python file instead of just source code (#2567)
|
2024-01-23 15:53:06 -08:00 |
|
template_alpaca.jinja
|
Support chat template and echo for chat API (#1756)
|
2023-11-30 16:43:13 -08:00 |
|
template_baichuan.jinja
|
Add baichuan chat template jinjia file (#2390)
|
2024-01-09 09:13:02 -08:00 |
|
template_chatml.jinja
|
Support chat template and echo for chat API (#1756)
|
2023-11-30 16:43:13 -08:00 |
|
template_inkbot.jinja
|
Support chat template and echo for chat API (#1756)
|
2023-11-30 16:43:13 -08:00 |