Categories
Deep Learning

Solve ImportError: cannot import name ‘Linear8bitLt’ from quantization

When running –quantize llm.int8 in adapter for LitLLama, I got this error

ImportError: cannot import name 'Linear8bitLt' from 'lit_llama.quantization'

First step, we need to make sure if bitsandbytes is running well by

python -m bitsandbytes

And I received

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
...
packages/bitsandbytes/functional.py", line 12, in <module>
    from scipy.stats import norm
ModuleNotFoundError: No module named 'scipy'

Now, I know the problem is scipy is not installed. To solve this is installing scipy

pip install scipy

And I re-run again for bitsandbytes

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
...

+++++++++++++ /usr/local/cuda/lib64 CUDA PATHS +++++++++++++


++++++++++++++++++++++++++ OTHER +++++++++++++++++++++++++++
COMPILED_WITH_CUDA = True
COMPUTE_CAPABILITIES_PER_GPU = ['8.9']
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
++++++++++++++++++++++ DEBUG INFO END ++++++++++++++++++++++
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Running a quick check that:
    + library is importable
    + CUDA function is callable


WARNING: Please be sure to sanitize sensible info from any such env vars!

SUCCESS!
Installation was successful!

And now the problem is solved!

python generate/adapter.py --prompt "Recommend a movie to watch on the weekend." --quantize llm.int8
Loading model ...

bin /home/dev/anaconda3/envs/lit/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda118.so

Time to load model: 13.57 seconds.

I would definitely recommend watching the movie Titanic, which is a classic and will leave you feeling both sad and happy. It's a love story that's both heartbreaking and inspiring and has some of the best acting in a film. The soundtrack is also wonderful and really adds to the story.

Time for inference: 5.11 sec total, 12.72 tokens/sec
Memory used: 7.82 GB

Leave a Reply

Your email address will not be published. Required fields are marked *