Commit Graph

69 Commits

Author SHA1 Message Date
youkaichao
18b296fdb2
[core] remove beam search from the core (#9105) 2024-10-07 05:47:04 +00:00
Lily Liu
1570203864
[Spec Decode] (1/2) Remove batch expansion (#8839) 2024-10-01 16:04:42 -07:00
Nick Hill
4b377d6feb
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829) 2024-09-26 16:46:43 -07:00
youkaichao
0250dd68c5
re-implement beam search on top of vllm core (#8726)
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
2024-09-23 22:08:12 -07:00
jiqing-feng
5f7bb58427
Fix typical acceptance sampler with correct recovered token ids (#8562) 2024-09-23 12:32:27 -07:00
Lily Liu
c6bd70d772
[SpecDec][Misc] Cleanup, remove bonus token logic. (#8701) 2024-09-22 12:34:14 -07:00
youkaichao
0faab90eb0
[beam search] add output for manually checking the correctness (#8684) 2024-09-20 19:55:33 -07:00
Cyrus Leung
0455c46ed4
[Core] Factor out common code in SequenceData and Sequence (#8675) 2024-09-21 02:30:39 +00:00
Lily Liu
e6a26ed037
[SpecDecode][Kernel] Flashinfer Rejection Sampling (#7244) 2024-09-01 21:23:29 -07:00
Cyrus Leung
51f86bf487
[mypy][CI/Build] Fix mypy errors (#7929) 2024-08-27 23:47:44 -07:00
Peng Guanwen
f710fb5265
[Core] Use flashinfer sampling kernel when available (#7137)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-08-19 03:24:03 +00:00
SangBin Cho
ff7ec82c4d
[Core] Optimize SPMD architecture with delta + serialization optimization (#7109) 2024-08-18 17:57:20 -07:00
jon-chuang
50b8d08dbd
[Misc/Testing] Use torch.testing.assert_close (#7324) 2024-08-16 04:24:04 +00:00
William Lin
57b7be0e1c
[Speculative decoding] [Multi-Step] decouple should_modify_greedy_probs_inplace (#6971) 2024-08-09 05:42:45 +00:00
Lily Liu
5c60c8c423
[SpecDecode] [Minor] Fix spec decode sampler tests (#7183) 2024-08-06 10:40:32 -07:00
Nick Hill
5cf9254a9c
[BugFix] Fix use of per-request seed with pipeline parallel (#6698) 2024-07-30 10:40:08 -07:00
Peng Guanwen
db9e5708a9
[Core] Reduce unnecessary compute when logprobs=None (#6532) 2024-07-29 16:47:31 +00:00
Thomas Parnell
d4201e06d5
[Bugfix] Make spec. decode respect per-request seed. (#6034)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-07-18 19:22:08 -07:00
Thomas Parnell
4ef95b0f06
[Bugfix] use float32 precision in samplers/test_logprobs.py for comparing with HF (#6409)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-07-15 13:14:49 -04:00
sroy745
80ca1e6a3a
[Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker (#5348) 2024-07-01 00:33:05 -07:00
Roger Wang
736ed38849
[CI/Build] Fix Args for _get_logits_warper in Sampler Test (#5922) 2024-06-27 11:43:04 -07:00
Thomas Parnell
e5150f2c28
[Bugfix] Added test for sampling repetition penalty bug. (#5659)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2024-06-19 06:03:55 +00:00
sroy745
fa9e385229
[Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier (#5131) 2024-06-17 21:29:09 -05:00
Cyrus Leung
0e9164b40a
[mypy] Enable type checking for test directory (#5017) 2024-06-15 04:45:31 +00:00
youkaichao
8ea5e44a43
[CI/Test] improve robustness of test (vllm_runner) (#5357)
[CI/Test] improve robustness of test by replacing del with context manager (vllm_runner) (#5357)
2024-06-08 08:59:20 +00:00
youkaichao
9fb900f90c
[CI/Test] improve robustness of test (hf_runner) (#5347)
[CI/Test] improve robustness of test by replacing del with context manager (hf_runner) (#5347)
2024-06-07 22:31:32 -07:00
zifeitong
974fc9b845
[Bugfix] Fix prompt_logprobs when SamplingParams.detokenize is set to True (#5226) 2024-06-04 19:37:28 -07:00
Simon Mo
ed59a7ed23
Update test_ignore_eos (#4898) 2024-06-02 02:21:53 +00:00
Cyrus Leung
5ae5ed1e60
[Core] Consolidate prompt arguments to LLM engines (#4328)
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-05-28 13:29:31 -07:00
Cyrus Leung
350f9e107f
[CI/Build] Move test_utils.py to tests/utils.py (#4425)
Since #4335 was merged, I've noticed that the definition of ServerRunner in the tests is the same as in the test for OpenAI API. I have moved the class to the test utilities to avoid code duplication. (Although it only has been repeated twice so far, I will add another similar test suite in #4200 which would duplicate the code a third time)

Also, I have moved the test utilities file (test_utils.py) to under the test directory (tests/utils.py), since none of its code is actually used in the main package. Note that I have added __init__.py to each test subpackage and updated the ray.init() call in the test utilities file in order to relative import tests/utils.py.
2024-05-13 23:50:09 +09:00
Chang Su
e254497b66
[Model][Misc] Add e5-mistral-7b-instruct and Embedding API (#3734) 2024-05-11 11:30:37 -07:00
Woosuk Kwon
190bc838e1
[Misc] Remove unnecessary ModelRunner imports (#4703) 2024-05-09 00:17:17 -07:00
Cody Yu
f942efb5a3
[Dynamic Spec Decoding] Auto-disable by the running queue size (#4592)
Co-authored-by: Cade Daniel <edacih@gmail.com>
2024-05-08 21:44:00 +00:00
SangBin Cho
3521ba4f25
[Core][Model runner refactoring 1/N] Refactor attn metadata term (#4518) 2024-05-03 10:20:12 -07:00
SangBin Cho
6f1df80436
[Test] Add ignore_eos test (#4519) 2024-05-01 08:45:42 -04:00
Nick Hill
81661da7b2
[BugFix] Fix min_tokens when eos_token_id is None (#4389)
Co-authored-by: DefTruth <31974251+deftruth@users.noreply.github.com>
2024-04-27 09:52:46 -07:00
SangBin Cho
603ad84815
[Core] Refactoring sampler and support prompt logprob for chunked prefill (#4309) 2024-04-26 13:02:02 +00:00
Cade Daniel
62b8aebc6f
[Speculative decoding 7/9] Speculative decoding end-to-end correctness tests. (#3951) 2024-04-23 08:02:36 +00:00
Antoni Baum
69e1d2fb69
[Core] Refactor model loading code (#4097) 2024-04-16 11:34:39 -07:00
Nick Hill
e46a60aa4c
[BugFix] Fix handling of stop strings and stop token ids (#3672) 2024-04-11 15:34:12 -07:00
Travis Johnson
0258b7a94b
[Bugfix] handle prompt_logprobs in _apply_min_tokens_penalty (#3876)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
2024-04-10 01:39:56 -07:00
胡译文
b3104b2a10
[Bugfix] Fix logits processor when prompt_logprobs is not None (#3899) 2024-04-10 00:09:36 -07:00
SangBin Cho
26422e477b
[Test] Make model tests run again and remove --forked from pytest (#3631)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2024-03-28 21:06:40 -07:00
Nick Hill
dfeb2ecc3a
[Misc] Include matched stop string/token in responses (#2976)
Co-authored-by: Sahil Suneja <sahilsuneja@gmail.com>
2024-03-25 17:31:32 -07:00
Simon Mo
f408d05c52
hotfix isort on logprobs ranks pr (#3622) 2024-03-25 11:55:46 -07:00
Travis Johnson
c13ad1b7bd
feat: implement the min_tokens sampling parameter (#3124)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-03-25 10:14:26 -07:00
Swapnil Parekh
819924e749
[Core] Adding token ranks along with logprobs (#3516)
Co-authored-by: Swapnil Parekh <swapnilp@ibm.com>
2024-03-25 10:13:10 -07:00
SangBin Cho
01bfb22b41
[CI] Try introducing isort. (#3495) 2024-03-25 07:59:47 -07:00
Woosuk Kwon
925f3332ca
[Core] Refactor Attention Take 2 (#3462) 2024-03-25 04:39:33 +00:00
Roy
f1c0fc3919
Migrate logits computation and gather to model_runner (#3233) 2024-03-20 23:25:01 +00:00