Skip to content

Releases: mscheong01/llama.cpp

b3659

03 Sep 06:39
48baa61
Compare
Choose a tag to compare
server : test script : add timeout for all requests (#9282)

b3592

16 Aug 05:52
2a24c8c
Compare
Choose a tag to compare
Add Nemotron/Minitron GGUF Conversion & Inference Support (#8922)

* Add nemotron GGUF conversion & inference support

* Fix formatting issues

* Remove unnecessary write_tensors()

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <[email protected]>

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* Address comments by @compilade

* Replace ggml_mul_mat()->llm_build_lora_mm()

* Remove mutable variable

* Use  for bias tensors

* Cover corner case for role_scaling not in config.json

---------

Co-authored-by: compilade <[email protected]>

b3557

09 Aug 13:37
3071c0a
Compare
Choose a tag to compare
llava : support MiniCPM-V-2.5 (#7599)

* init

* rename

* add run android for termux in readme

* add android readme

* add instructions in readme

* change name in readme

* Update README.md

* fixed line

* add result in readme

* random pos_embed

* add positions index

* change for ollama

* change for ollama

* better pos_embed in clip

* support ollama

* updata cmakelist

* updata cmakelist

* rename wrapper

* clear code

* replace and organize code

* add link

* sync master

* fix warnings

* fix warnings

* fix bug in bicubic resize when need resize iamge smaller

* receive review comments and modify

* receive review comments and modify

* put all code into llava dir

* fix quality problem in pr code

* change n_layer

* add space in "-1"

* imitate reshape bug of python code

* fix bug in clip

* fix issues for merging

* fix llama-minicpmv-cli in cmake file

* change pr readme

* fix code review

* remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir

* fix cmakefile

* add warn

* fix KEY_HAS_MINICPMV_PROJ

* remove load_image_size into clip_ctx

* remove the extern "C", MINICPMV_API

* fix uhd code for review comment

* delete minicpmv-wrapper in pr

* remove uhd_image_embed

* Modify 2 notes

* clip : style changes

* del common.h in clip

* fix Type-Check error

* fix Type-Check error

* fix Type-Check error

* fix Type-Check error

* fix makefile error

* fix ubuntu-make error

* try fix clip

* try fix 1

---------

Co-authored-by: Hongji Zhu <[email protected]>
Co-authored-by: harvestingmoon <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

b2607

04 Apr 16:26
c666ba2
Compare
Choose a tag to compare
build CI: Name artifacts (#6482)

Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP.

It might be possible to further simplify the packing step (in future PRs).

b2527

25 Mar 11:38
ad3a050
Compare
Choose a tag to compare
Server: clean up OAI params parsing function (#6284)

* server: clean up oai parsing function

* fix response_format

* fix empty response_format

* minor fixes

* add TODO for logprobs

* update docs

b2520

24 Mar 12:42
ea279d5
Compare
Choose a tag to compare
ci : close inactive issue, increase operations per run (#6270)

b2480

21 Mar 07:32
c5b8595
Compare
Choose a tag to compare
Add nvidia and amd backends (#6157)

b2393

11 Mar 03:08
3814a07
Compare
Choose a tag to compare
[SYCL] Add support for SYCL Nvidia target (#5738)

* Add support for nvidia target in CMake

* Update sycl read-me for Nvidia target

* Fix errors

b2357

07 Mar 10:42
2002bc9
Compare
Choose a tag to compare
server : refactor (#5882)

* server : refactoring (wip)

* server : remove llava/clip objects from build

* server : fix empty prompt handling + all slots idle logic

* server : normalize id vars

* server : code style

* server : simplify model chat template validation

* server : code style

* server : minor

* llama : llama_chat_apply_template support null buf

* server : do not process embedding requests when disabled

* server : reorganize structs and enums + naming fixes

* server : merge oai.hpp in utils.hpp

* server : refactor system prompt update at start

* server : disable cached prompts with self-extend

* server : do not process more than n_batch tokens per iter

* server: tests: embeddings use a real embeddings model (#5908)

* server, tests : bump batch to fit 1 embedding prompt

* server: tests: embeddings fix build type Debug is randomly failing (#5911)

* server: tests: embeddings, use different KV Cache size

* server: tests: embeddings, fixed prompt do not exceed n_batch, increase embedding timeout, reduce number of concurrent embeddings

* server: tests: embeddings, no need to wait for server idle as it can timout

* server: refactor: clean up http code (#5912)

* server : avoid n_available var

ggml-ci

* server: refactor: better http codes

* server : simplify json parsing + add comment about t_last

* server : rename server structs

* server : allow to override FQDN in tests

ggml-ci

* server : add comments

---------

Co-authored-by: Pierrick Hymbert <[email protected]>

b2355

07 Mar 01:49
e04e04f
Compare
Choose a tag to compare
ggml : use SYS_get_cpu if SYS_getcpu is not defined (#5906)

Fixes #5694
Fixes ggerganov/whisper.cpp#1894