Zhuohan Li
|
e3f00d191e
|
Modify README to include info on loading LLaMA (#18)
|
2023-04-01 01:07:57 +08:00 |
|
Woosuk Kwon
|
80a2f812f1
|
Implement LLaMA (#9)
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-03-30 12:25:32 +08:00 |
|
Zhuohan Li
|
721fa3df15
|
FastAPI-based working frontend (#10)
|
2023-03-29 14:48:56 +08:00 |
|
Zhuohan Li
|
2f49f15585
|
Support tensor parallel (#2)
|
2023-03-21 13:45:42 -07:00 |
|
Woosuk Kwon
|
e9d3f2ff77
|
Add memory analyzer & utomatically configure KV cache size (#6)
|
2023-03-11 23:23:14 -08:00 |
|
Woosuk Kwon
|
3e9f991d6a
|
Use FlashAttention for multi_query_kv_attention (#4)
|
2023-03-01 21:13:08 -08:00 |
|
Woosuk Kwon
|
c84c708a1d
|
Add README
|
2023-02-24 12:04:49 +00:00 |
|
Woosuk Kwon
|
e7d9d9c08c
|
Initial commit
|
2023-02-09 11:24:15 +00:00 |
|