Skip to content

Commit 82e8c4a

Browse files
committed
Update example
1 parent bdbc286 commit 82e8c4a

File tree

4 files changed

+153
-4853
lines changed

4 files changed

+153
-4853
lines changed

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# Ensemble Inference for LLMs
2-
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jniimi/ensemble_inference/blob/main/example.ipynb)
2+
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jniimi/ensemble_inference/blob/main/sample.ipynb)
3+
34
Niimi, J. (2025) "A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification" In Proceedings of the 30th International Conference on Natural Language & Information Systems (NLDB 2025)
45

56
## Overview
@@ -22,7 +23,7 @@ import ensemble_inference as ens
2223
This approach can be implemented in any LLMs; however, the models with wide pretraining and instruction-tuning are highly recommended. This example adopts `Llama-3-8B-Instruct`.
2324

2425
### You can refer sample on Google Colab
25-
[https://colab.research.google.com/github/jniimi/ensemble_inference/blob/main/example.ipynb](https://colab.research.google.com/github/jniimi/ensemble_inference/blob/main/example.ipynb)
26+
[https://colab.research.google.com/github/jniimi/ensemble_inference/blob/main/sample.ipynb](https://colab.research.google.com/github/jniimi/ensemble_inference/blob/main/sample.ipynb)
2627

2728
## Reference
2829
```

ensemble_inference.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ def load_model(model_id='meta-llama/Meta-Llama-3-8B-Instruct', load_in_4bit=True
1212
if not torch.cuda.is_available():
1313
raise ValueError('Quantization with BitsAndBytes requires CUDA.')
1414
from transformers import BitsAndBytesConfig
15-
bnb_config = BitsAndBytesConfig(load_in_4bit=True)
15+
bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)
1616
tokenizer = AutoTokenizer.from_pretrained(model_id)
1717
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map='auto')
1818
return model, tokenizer

0 commit comments

Comments
 (0)