Releases · mscheong01/llama.cpp

03 Sep 06:39

48baa61

b3659 Latest

Latest

server : test script : add timeout for all requests (#9282)

Assets 19

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-09-03T06:39:49Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-09-03T06:39:55Z
llama-b3659-bin-macos-arm64.zip

49.8 MB 2024-09-03T06:40:04Z
llama-b3659-bin-macos-x64.zip

51.4 MB 2024-09-03T06:40:05Z
llama-b3659-bin-ubuntu-x64.zip

55.3 MB 2024-09-03T06:40:07Z
llama-b3659-bin-win-avx-x64.zip

7.76 MB 2024-09-03T06:40:08Z
llama-b3659-bin-win-avx2-x64.zip

7.75 MB 2024-09-03T06:40:09Z
llama-b3659-bin-win-avx512-x64.zip

7.76 MB 2024-09-03T06:40:09Z
llama-b3659-bin-win-cuda-cu11.7.1-x64.zip

145 MB 2024-09-03T06:40:10Z
llama-b3659-bin-win-cuda-cu12.2.0-x64.zip

144 MB 2024-09-03T06:40:14Z
Source code (zip)

2024-09-02T20:08:38Z
Source code (tar.gz)

2024-09-02T20:08:38Z

16 Aug 05:52

github-actions

b3592

2a24c8c

b3592

Add Nemotron/Minitron GGUF Conversion & Inference Support (#8922)

* Add nemotron GGUF conversion & inference support

* Fix formatting issues

* Remove unnecessary write_tensors()

* Update convert_hf_to_gguf.py

Co-authored-by: compilade <[email protected]>

* Update src/llama.cpp

Co-authored-by: compilade <[email protected]>

* Address comments by @compilade

* Replace ggml_mul_mat()->llm_build_lora_mm()

* Remove mutable variable

* Use  for bias tensors

* Cover corner case for role_scaling not in config.json

---------

Co-authored-by: compilade <[email protected]>

Assets 19

09 Aug 13:37

github-actions

b3557

3071c0a

b3557

llava : support MiniCPM-V-2.5 (#7599)

* init

* rename

* add run android for termux in readme

* add android readme

* add instructions in readme

* change name in readme

* Update README.md

* fixed line

* add result in readme

* random pos_embed

* add positions index

* change for ollama

* change for ollama

* better pos_embed in clip

* support ollama

* updata cmakelist

* updata cmakelist

* rename wrapper

* clear code

* replace and organize code

* add link

* sync master

* fix warnings

* fix warnings

* fix bug in bicubic resize when need resize iamge smaller

* receive review comments and modify

* receive review comments and modify

* put all code into llava dir

* fix quality problem in pr code

* change n_layer

* add space in "-1"

* imitate reshape bug of python code

* fix bug in clip

* fix issues for merging

* fix llama-minicpmv-cli in cmake file

* change pr readme

* fix code review

* remove in line 33 directory in the /cmakelists.txt (not in example, in the main dir

* fix cmakefile

* add warn

* fix KEY_HAS_MINICPMV_PROJ

* remove load_image_size into clip_ctx

* remove the extern "C", MINICPMV_API

* fix uhd code for review comment

* delete minicpmv-wrapper in pr

* remove uhd_image_embed

* Modify 2 notes

* clip : style changes

* del common.h in clip

* fix Type-Check error

* fix Type-Check error

* fix Type-Check error

* fix Type-Check error

* fix makefile error

* fix ubuntu-make error

* try fix clip

* try fix 1

---------

Co-authored-by: Hongji Zhu <[email protected]>
Co-authored-by: harvestingmoon <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>

Assets 20

04 Apr 16:26

github-actions

b2607

c666ba2

b2607

build CI: Name artifacts (#6482)

Name the artifacts in the build CI, so that they get uploaded with separate names, instead of all put into the same `artifact` ZIP.

It might be possible to further simplify the packing step (in future PRs).

Assets 2

25 Mar 11:38

github-actions

b2527

ad3a050

b2527

Server: clean up OAI params parsing function (#6284)

* server: clean up oai parsing function

* fix response_format

* fix empty response_format

* minor fixes

* add TODO for logprobs

* update docs

Assets 18

24 Mar 12:42

github-actions

b2520

ea279d5

b2520

ci : close inactive issue, increase operations per run (#6270)

Assets 18

21 Mar 07:32

github-actions

b2480

c5b8595

b2480

Add nvidia and amd backends (#6157)

Assets 15

11 Mar 03:08

github-actions

b2393

3814a07

b2393

[SYCL] Add support for SYCL Nvidia target (#5738)

* Add support for nvidia target in CMake

* Update sycl read-me for Nvidia target

* Fix errors

Assets 14

07 Mar 10:42

github-actions

b2357

2002bc9

b2357

server : refactor (#5882)

* server : refactoring (wip)

* server : remove llava/clip objects from build

* server : fix empty prompt handling + all slots idle logic

* server : normalize id vars

* server : code style

* server : simplify model chat template validation

* server : code style

* server : minor

* llama : llama_chat_apply_template support null buf

* server : do not process embedding requests when disabled

* server : reorganize structs and enums + naming fixes

* server : merge oai.hpp in utils.hpp

* server : refactor system prompt update at start

* server : disable cached prompts with self-extend

* server : do not process more than n_batch tokens per iter

* server: tests: embeddings use a real embeddings model (#5908)

* server, tests : bump batch to fit 1 embedding prompt

* server: tests: embeddings fix build type Debug is randomly failing (#5911)

* server: tests: embeddings, use different KV Cache size

* server: tests: embeddings, fixed prompt do not exceed n_batch, increase embedding timeout, reduce number of concurrent embeddings

* server: tests: embeddings, no need to wait for server idle as it can timout

* server: refactor: clean up http code (#5912)

* server : avoid n_available var

ggml-ci

* server: refactor: better http codes

* server : simplify json parsing + add comment about t_last

* server : rename server structs

* server : allow to override FQDN in tests

ggml-ci

* server : add comments

---------

Co-authored-by: Pierrick Hymbert <[email protected]>

Assets 14

07 Mar 01:49

github-actions

b2355

e04e04f

b2355

ggml : use SYS_get_cpu if SYS_getcpu is not defined (#5906)

Fixes #5694
Fixes ggerganov/whisper.cpp#1894

Assets 14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: mscheong01/llama.cpp

b3659

Uh oh!

b3592

Uh oh!

b3557

Uh oh!

b2607

Uh oh!

b2527

Uh oh!

b2520

Uh oh!

b2480

Uh oh!

b2393

Uh oh!

b2357

Uh oh!

b2355

Uh oh!