jniimi
diff --git a/‎README.md
Lines changed: 3 additions & 2 deletions b/‎README.md
Lines changed: 3 additions & 2 deletions
diff --git a/‎ensemble_inference.py
Lines changed: 1 addition & 1 deletion b/‎ensemble_inference.py
Lines changed: 1 addition & 1 deletion
@@ -1,5 +1,6 @@
 # Ensemble Inference for LLMs
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jniimi/ensemble_inference/blob/main/example.ipynb)
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jniimi/ensemble_inference/blob/main/sample.ipynb)
+
 Niimi, J. (2025) "A Simple Ensemble Strategy for LLM Inference: Towards More Stable Text Classification" In Proceedings of the 30th International Conference on Natural Language & Information Systems (NLDB 2025)
 
 ## Overview
@@ -22,7 +23,7 @@ import ensemble_inference as ens
 This approach can be implemented in any LLMs; however, the models with wide pretraining and instruction-tuning are highly recommended. This example adopts `Llama-3-8B-Instruct`.
 
 ### You can refer sample on Google Colab
-[https://colab.research.google.com/github/jniimi/ensemble_inference/blob/main/example.ipynb](https://colab.research.google.com/github/jniimi/ensemble_inference/blob/main/example.ipynb)
+[https://colab.research.google.com/github/jniimi/ensemble_inference/blob/main/sample.ipynb](https://colab.research.google.com/github/jniimi/ensemble_inference/blob/main/sample.ipynb)
 
 ## Reference
 ```
 
@@ -12,7 +12,7 @@ def load_model(model_id='meta-llama/Meta-Llama-3-8B-Instruct', load_in_4bit=True
         if not torch.cuda.is_available():
             raise ValueError('Quantization with BitsAndBytes requires CUDA.')
         from transformers import BitsAndBytesConfig
-        bnb_config = BitsAndBytesConfig(load_in_4bit=True)
+        bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16)
     tokenizer = AutoTokenizer.from_pretrained(model_id)
     model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map='auto')
     return model, tokenizer