|
master
|
Add CUDA graph-based all reduce launcher (#26)
|
2023-04-05 11:16:57 -07:00 |
|
models
|
Optimize data movement (#20)
|
2023-04-02 00:30:17 -07:00 |
|
worker
|
Add CUDA graph-based all reduce launcher (#26)
|
2023-04-05 11:16:57 -07:00 |
|
block.py
|
Support beam search & parallel generation (#7)
|
2023-03-10 09:58:21 -08:00 |
|
sampling_params.py
|
FastAPI-based working frontend (#10)
|
2023-03-29 14:48:56 +08:00 |
|
utils.py
|
FastAPI-based working frontend (#10)
|
2023-03-29 14:48:56 +08:00 |