I've noticed that the pred.py script provided is only compatible with chat models. If I wanted to use a non-chat model (for example: Llama2-7b instead of Llama2-7b-chat), how would I perform this? I added llama2 to the json files in the config folder, and attempted to modify the generation code from:
However, the models predictions are always 'null', but vllm raises no error. For example:
INFO 08-06 04:46:11 [async_llm.py:269] Added request cmpl-ba91b440856f49ec9aa181e79e9dbb21-0.
INFO: 127.0.0.1:47438 - "POST /v1/completions HTTP/1.1" 200 OK
INFO 08-06 04:46:11 [logger.py:41] Received request cmpl-cab8ab6cf4bb4fe180efb27e5dfb59bf-0: prompt: 'those corridors and he could have killed Frank without realising he’d got the wrong man. As it happens, we only have Derek’s word for it that Stefan ever went into the room.\n</text>\n\nWhat is the correct answer to this question: Please try to deduce the true story based on the evidence currently known. Who murdered Frank Parris in your deduction?\nChoices:\n(A) Aiden MacNeil\n(B) Martin Williams\n(C) Stefan Codrescu\n(D) Lisa Treherne\n\nFormat your response as follows: "The correct answer is (insert answer here)".', params: SamplingParams(n=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=0.1, top_p=0.9, top_k=0, min_p=0.0, seed=None, stop=[], stop_token_ids=[], bad_words=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=128, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None, guided_decoding=None, extra_args=None), prompt_token_ids: [1, 386, 852, 1034, 2429, 943, 322, 540, 1033, 505, 9445, 4976, 1728, 1855, 5921, 540, 30010, 29881, 2355, 278, 2743, 767, 29889, 1094, 372, 5930, 29892, 591, 871, 505, 360, 20400, 30010, 29879, 1734, 363, 372, 393, 21512, 3926, 3512, 964, 278, 5716, 29889, 13, 829, 726, 29958, 13, 13, 5618, 338, 278, 1959, 1234, 304, 445, 1139, 29901, 3529, 1018, 304, 21049, 346, 278, 1565, 5828, 2729, 373, 278, 10757, 5279, 2998, 29889, 11644, 13406, 287, 4976, 1459, 3780, 297, 596, 21049, 428, 29973, 13, 15954, 1575, 29901, 13, 29898, 29909, 29897, 319, 3615, 4326, 8139, 309, 13, 29898, 29933, 29897, 6502, 11648, 13, 29898, 29907, 29897, 21512, 315, 397, 690, 4979, 13, 29898, 29928, 29897, 29420, 6479, 2276, 484, 13, 13, 5809, 596, 2933, 408, 4477, 29901, 376, 1576, 1959, 1234, 338, 313, 7851, 1234, 1244, 29897, 1642], prompt_embeds shape: None, lora_request: None.
Hi,
I've noticed that the pred.py script provided is only compatible with chat models. If I wanted to use a non-chat model (for example: Llama2-7b instead of Llama2-7b-chat), how would I perform this? I added llama2 to the json files in the config folder, and attempted to modify the generation code from:
to:
However, the models predictions are always 'null', but vllm raises no error. For example:
My running process is the following: