Freeze after offloading layers to GPU

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggerganov/llama.cpp/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

llama.cpp does not freeze and continues to run normally, not interfering with basic windows operations.

# Current Behavior

```llm_load_print_meta: format         = GGUF V2 (latest)
llm_load_print_meta: arch           = llama
llm_load_print_meta: vocab type     = SPM
llm_load_print_meta: n_vocab        = 32000
llm_load_print_meta: n_merges       = 0
llm_load_print_meta: n_ctx_train    = 4096
llm_load_print_meta: n_ctx          = 4096
llm_load_print_meta: n_embd         = 8192
llm_load_print_meta: n_head         = 64
llm_load_print_meta: n_head_kv      = 8
llm_load_print_meta: n_layer        = 80
llm_load_print_meta: n_rot          = 128
llm_load_print_meta: n_gqa          = 8
llm_load_print_meta: f_norm_eps     = 1.0e-05
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: n_ff           = 28672
llm_load_print_meta: freq_base      = 10000.0
llm_load_print_meta: freq_scale     = 1
llm_load_print_meta: model type     = 70B
llm_load_print_meta: model ftype    = mostly Q5_K - Medium
llm_load_print_meta: model size     = 68.98 B
llm_load_print_meta: general.name   = LLaMA v2
llm_load_print_meta: BOS token = 1 '<s>'
llm_load_print_meta: EOS token = 2 '</s>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: LF token  = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.23 MB
llm_load_tensors: using CUDA for GPU acceleration
llm_load_tensors: mem required  = 35995.03 MB (+ 1280.00 MB per state)
llm_load_tensors: offloading 18 repeating layers to GPU
llm_load_tensors: offloaded 18/83 layers to GPU
llm_load_tensors: VRAM used: 10500 MB
```

llama.cpp then freezes and will not respond. Task Manager shows 0% CPU or GPU load. It is also somehow unable to be stopped via task manager, requiring me to hard reset my computer to end the program.  It also causes general system instability, as I am writing this with my desktop blacked out and file explorer frozen.
# Environment and Context

Windows 10
128 GB RAM
Threadripper 3970X
RTX 2080TI
CMake 3.27.4
CUDA 12.2

# Failure Information (for bugs)

Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.

# Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

Run a model with CUBlas.
(My exact command: `main -ngl 18 -m E:\largefiles\LLAMA-2\70B\uni-tianyan-70b.Q5_K_M.gguf --color -c 4096 --temp 0.6 --repeat_penalty 1.1 -n -1 --interactive-first`)

# Failure Logs

I'd love to attach them, but file manager stopped working. I'll try and run it again tomorrow and upload the log before everything freezes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Freeze after offloading layers to GPU #3135

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Freeze after offloading layers to GPU #3135

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions