Commit Graph

  • e8ddc08ec8
    [BUG FIX] upgrade fschat version to 0.2.23 (#650) YHPeter 2023-08-02 17:05:59 -0400
  • 1b0bd0fe8a
    Add Falcon support (new) (#592) Zhuohan Li 2023-08-02 14:04:39 -0700
  • 20044cab7a
    Fix log message in scheduler (#652) Lily Liu 2023-08-02 13:35:10 -0700
  • 64f23c2900
    fix baichuan for different position embedding for 7b and 13b models (#643) Song 2023-08-02 13:22:51 +0800
  • d4c7755ca8
    fix biachuan-7b tp (#598) Qing 2023-08-02 06:41:36 +0800
  • aa39e42c5a
    fix doc (#622) Chaofan Lin 2023-08-01 04:11:57 +0800
  • 953f28cf9a
    fix ModuleNotFoundError (#599) Fang li 2023-07-30 11:52:41 +0800
  • c0d00f5be6
    [Fix] fix import error of RayWorker (#604) (#605) Xudong Zhang 2023-07-28 14:37:40 +0800
  • 58a072be15
    [Fix] Add model sequence length into model config (#575) Zhuohan Li 2023-07-25 23:46:30 -0700
  • 82ad323dee
    [Fix] Add chat completion Example and simplify dependencies (#576) Zhuohan Li 2023-07-25 23:45:48 -0700
  • df5dd3c68e
    Add Baichuan-7B to README (#494) Zhuohan Li 2023-07-25 15:25:12 -0700
  • 2d867b55fa
    fixed tensor parallel is not defined (#564) MoeedDar 2023-07-25 22:16:51 +0100
  • d7a1c6d614
    Fix paged attention testing. (#495) Tao Peng 2023-07-25 12:01:56 +0800
  • 7d5a155e4a
    [Fix] Fix GPTBigcoder for distributed execution (#503) Zhuohan Li 2023-07-24 18:36:33 -0700
  • 1dde34e0f8
    GPTJConfig has no attribute rotary. (#532) leegohi04517 2023-07-25 02:29:30 +0800
  • 6fc2a38b11
    Add support for LLaMA-2 (#505) Zhuohan Li 2023-07-20 11:38:27 -0700
  • c487a221ee
    Fix bad assert in initialize_cluster if PG already exists (#526) Antoni Baum 2023-07-19 23:17:12 -0700
  • 9925c17940
    Ray placement group support (#397) Antoni Baum 2023-07-19 22:49:31 -0700
  • 8c4b2592fb
    fix: enable trust-remote-code in api server & benchmark. (#509) Ricardo Lu 2023-07-20 08:06:15 +0800
  • cf21a9bd5c
    support trust_remote_code in benchmark (#518) WRH 2023-07-20 08:02:40 +0800
  • 16c3e295a8
    fix(ray_utils): ignore re-init error (#465) Massimiliano Pronesti 2023-07-20 02:01:19 +0200
  • bda41c70dd
    hotfix attn alibi wo head mapping (#496) Song 2023-07-19 02:31:48 +0800
  • 453bafb96f
    Merge pull request #498 from MoeedDar/main Lily Liu 2023-07-18 09:22:56 -0700
  • 328d231c17 Fixed old name reference for max_seq_len MoeedDar 2023-07-18 16:47:59 +0100
  • b4b195b360
    fix max seq len (#489) Lily Liu 2023-07-17 23:20:20 -0700
  • 20b0d88d16
    Add support for baichuan (#365) codethazine 2023-07-17 21:50:55 +0100
  • 2bdea7ac11
    [Fix] Fix the condition of max_seq_len (#477) Zhuohan Li 2023-07-17 00:33:48 -0400
  • 58df2883cb
    [Doc] Add doc for running vLLM on the cloud (#426) Zhanghao Wu 2023-07-16 13:37:14 -0700
  • 6d7d95a70a
    Offload port selection to OS (#467) Zhangir Azerbayev 2023-07-16 02:11:02 -0400
  • 96853af5a8
    Optimize MQA Kernel (#452) Zhuohan Li 2023-07-14 20:06:40 -0400
  • dbed69058c
    Fix the KeyError when loading bloom-based models (#441) Wen Sun 2023-07-14 12:58:09 +0800
  • 7b6ae94059
    add vocab padding for LLama(Support WizardLM) (#411) panda 2023-07-14 11:56:22 +0800
  • c6dfc3cdbe
    Fix handling of special tokens in decoding. (#418) xcnick 2023-07-12 23:14:56 +0800
  • 51be365143
    fix: freeze pydantic to v1 (#429) Keming 2023-07-12 23:10:55 +0800
  • c894836108
    [Model] Add support for GPT-J (#226) Andre Slavescu 2023-07-08 20:55:16 -0400
  • 75beba29b5
    Don't try to load training_args.bin (#373) Fazlul Shahriar 2023-07-08 18:26:28 -0400
  • ddfdf470ae
    Add trust_remote_code arg to get_config (#405) Woosuk Kwon 2023-07-08 15:24:17 -0700
  • b6fbb9a565
    Sort the outputs before return (#402) Woosuk Kwon 2023-07-08 14:48:18 -0700
  • 2179e4f4c5
    avoid python list copy in sequence initialization (#401) Lily Liu 2023-07-08 12:42:08 -0700
  • a945fcc2ae
    Add trust-remote-code flag to handle remote tokenizers (#364) codethazine 2023-07-07 20:04:58 +0200
  • be54f8e5c4
    [Fix] Change /generate response-type to json for non-streaming (#374) Nicolas Frenay 2023-07-06 20:15:17 -0500
  • b396cb4998
    fix: only response [DONE] once when streaming response. (#378) Ricardo Lu 2023-07-07 09:08:40 +0800
  • 1c395b4eaa
    Bump up the version (#300) Woosuk Kwon 2023-07-04 21:41:53 -0700
  • 3d64cf019e
    [Server] use fastchat.model.model_adapter.get_conversation_template method to get model template (#357) akxxsb 2023-07-05 12:39:59 +0800
  • 98fe8cb542
    [Server] Add option to specify chat template for chat endpoint (#345) Zhuohan Li 2023-07-03 23:01:56 -0700
  • ffa6d2f9f9
    [Docs] Fix typo (#346) Woosuk Kwon 2023-07-03 16:51:47 -0700
  • 404422f42e
    [Model] Add support for MPT (#334) Woosuk Kwon 2023-07-03 16:47:53 -0700
  • 7717d0838b
    Fix an endless loop issue when engine_step throws a RuntimeError (#339) coolcloudcol 2023-07-04 06:22:28 +0800
  • 42e0c1df78
    [Quality] Add CI for formatting (#343) Zhuohan Li 2023-07-03 14:50:56 -0700
  • e41f06702c
    Add support for BLOOM (#331) Woosuk Kwon 2023-07-03 13:12:35 -0700
  • d6fa1be3a8
    [Quality] Add code formatter and linter (#326) Zhuohan Li 2023-07-03 11:31:55 -0700
  • 0ffded812a
    [Fix] Better error message for batched prompts (#342) Zhuohan Li 2023-07-03 09:27:31 -0700
  • 0bd2a573a5
    Allow send list of str for the Prompt on openai demo endpoint /v1/completions (#323) Michele Catalano 2023-07-03 18:17:50 +0200
  • 49b26e2cec
    feat: add ChatCompletion endpoint in OpenAI demo server. (#330) Ricardo Lu 2023-07-03 13:54:33 +0800
  • dafd924c1f
    Raise error for long prompt (#273) Lily Liu 2023-06-30 18:48:49 -0700
  • 598dc4b79a
    [Fix] Weight loading for GPTBigCode (#313) Zhuohan Li 2023-06-29 22:14:17 -0700
  • 85de093472
    [Fix] Do not pin memory when in WSL (#312) Zhuohan Li 2023-06-29 15:00:21 -0700
  • f72297562f
    Add news for the vllm+skypilot example (#314) Zhanghao Wu 2023-06-29 12:32:37 -0700
  • 9d27b09d12
    Update README.md (#306) Bayang 2023-06-29 14:52:15 +0100
  • 998d9d1509
    [Tokenizer] Add tokenizer mode (#298) Woosuk Kwon 2023-06-28 14:19:22 -0700
  • 425040d4c1
    remove floats == 0 comparison (#285) Lily Liu 2023-06-28 14:11:51 -0700
  • 4338cc4750
    [Tokenizer] Add an option to specify tokenizer (#284) Woosuk Kwon 2023-06-28 09:46:58 -0700
  • bdd6b4c8bc
    Add LLM.set_tokenizer (#283) Jishnu Ray Chowdhury 2023-06-28 02:28:29 -0500
  • 2b7d3aca2e
    Update setup.py (#282) Cody Yu 2023-06-27 14:34:23 -0700
  • 4026a049d3
    expand coverage of gpt2 model loading (#271) twaka 2023-06-27 22:27:41 +0900
  • 43710e8d09
    [Fix] Fix default port number in benchmark scripts (#265) Zhuohan Li 2023-06-26 13:15:35 -0700
  • 526df28fb2
    [BugFix] Fix a bug in counting running sequences (#266) Woosuk Kwon 2023-06-26 13:09:02 -0700
  • 2cf1a333b6
    [Doc] Documentation for distributed inference (#261) Zhuohan Li 2023-06-26 11:34:23 -0700
  • 0b7db411b5
    [Bug] Fix the OOM condition for CPU cache (#260) Zhuohan Li 2023-06-26 11:16:13 -0700
  • 471a7a4566
    Compatible with Decapoda Research llama hf version (#251) BasicCoder 2023-06-27 00:23:57 +0800
  • 6214dd6ce9
    Update README.md (#236) Lianmin Zheng 2023-06-25 16:58:06 -0700
  • 0603379863
    fix wrong using getattr to get dict value (#232) metacryptom 2023-06-25 13:00:24 +0800
  • 665c48963b
    [Docs] Add GPTBigCode to supported models (#213) Woosuk Kwon 2023-06-22 15:05:11 -0700
  • 298695b766
    GPTBigCode (StarCoder, SantaCoder Support) (#209) Michael Feil 2023-06-22 19:49:27 +0200
  • 83658c8ace
    Bump up version to 0.1.1 (#204) Zhuohan Li 2023-06-22 15:33:32 +0800
  • 1d24ccb96c
    [Fix] Better error message when there is OOM during cache initialization (#203) Zhuohan Li 2023-06-22 15:30:06 +0800
  • 14f0b39cda
    [Bugfix] Fix a bug in RequestOutput.finished (#202) Woosuk Kwon 2023-06-22 00:17:24 -0700
  • 2e0d314384
    fix-ray (#193) Zhuohan Li 2023-06-22 00:21:41 +0800
  • 67d96c29fb
    Use slow tokenizer for open llama models (#168) Woosuk Kwon 2023-06-19 23:19:47 -0700
  • 033f5c78f5
    Remove e.g. in README (#167) Zhuohan Li 2023-06-20 14:00:28 +0800
  • 794e578de0
    [Minor] Fix URLs (#166) Woosuk Kwon 2023-06-19 22:57:14 -0700
  • caddfc14c1
    [Minor] Fix icons in doc (#165) Woosuk Kwon 2023-06-19 20:35:38 -0700
  • fc72e39de3
    Change image urls (#164) Zhuohan Li 2023-06-20 11:15:15 +0800
  • b7e62d3454
    Fix repo & documentation URLs (#163) Woosuk Kwon 2023-06-19 20:03:40 -0700
  • 364536acd1
    [Docs] Minor fix (#162) Woosuk Kwon 2023-06-19 19:58:23 -0700
  • 0b32a987dd
    Add and list supported models in README (#161) Zhuohan Li 2023-06-20 10:57:46 +0800
  • 570fb2e9cc
    [PyPI] Fix package info in setup.py (#158) Woosuk Kwon 2023-06-19 18:05:01 -0700
  • a255885f83
    Add logo and polish readme (#156) Zhuohan Li 2023-06-19 16:31:13 +0800
  • 5822ede66e
    Add performance figures for dark mode (#160) Woosuk Kwon 2023-06-18 23:46:24 -0700
  • 0370afa2e5
    Remove benchmark_async_llm_server.py (#155) Zhuohan Li 2023-06-19 11:12:37 +0800
  • 7e2a913c64
    [Minor] Fix CompletionOutput.__repr__ (#157) Woosuk Kwon 2023-06-18 19:58:25 -0700
  • 3f92038b99
    Add comments on swap space (#154) Woosuk Kwon 2023-06-18 11:39:35 -0700
  • dcda03b4cb
    Write README and front page of doc (#147) Woosuk Kwon 2023-06-18 03:19:38 -0700
  • bf5f121c02
    Reduce GPU memory utilization to make sure OOM doesn't happen (#153) Zhuohan Li 2023-06-18 17:33:50 +0800
  • bec7b2dc26
    Add quickstart guide (#148) Zhuohan Li 2023-06-18 01:26:12 +0800
  • 0b98ba15c7
    Change the name to vLLM (#150) Woosuk Kwon 2023-06-17 03:07:40 -0700
  • e5464ee484
    Rename servers to engines (#152) Zhuohan Li 2023-06-17 17:25:21 +0800
  • bab8f3dd0d
    [Minor] Fix benchmark_throughput.py (#151) Woosuk Kwon 2023-06-16 21:00:52 -0700
  • eedb46bf03
    Rename servers and change port numbers to reduce confusion (#149) Zhuohan Li 2023-06-17 00:13:02 +0800
  • 311490a720
    Add script for benchmarking serving throughput (#145) Woosuk Kwon 2023-06-14 19:55:38 -0700