When running –quantize llm.int8 in adapter for LitLLama, I got this error
ImportError: cannot import name 'Linear8bitLt' from 'lit_llama.quantization'
First step, we need to make sure if bitsandbytes
is running well by
python -m bitsandbytes
And I received
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
...
packages/bitsandbytes/functional.py", line 12, in <module>
from scipy.stats import norm
ModuleNotFoundError: No module named 'scipy'
Now, I know the problem is scipy is not installed. To solve this is installing scipy
pip install scipy
And I re-run again for bitsandbytes
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run
python -m bitsandbytes
and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
...
+++++++++++++ /usr/local/cuda/lib64 CUDA PATHS +++++++++++++
++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
COMPILED_WITH_CUDA = True
COMPUTE_CAPABILITIES_PER_GPU = ['8.9']
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Running a quick check that:
+ library is importable
+ CUDA function is callable
WARNING: Please be sure to sanitize sensible info from any such env vars!
SUCCESS!
Installation was successful!
And now the problem is solved!
python generate/adapter.py --prompt "Recommend a movie to watch on the weekend." --quantize llm.int8
Loading model ...
bin /home/dev/anaconda3/envs/lit/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so
Time to load model: 13.57 seconds.
I would definitely recommend watching the movie Titanic, which is a classic and will leave you feeling both sad and happy. It's a love story that's both heartbreaking and inspiring and has some of the best acting in a film. The soundtrack is also wonderful and really adds to the story.
Time for inference: 5.11 sec total, 12.72 tokens/sec
Memory used: 7.82 GB