Categories
Machine Learning

Zeppelin and Spark installation on Ubuntu 24.04

To install the latest Zeppelin (I recommend 0.11.0, since 0.11.1 have bugs) and Spark 3.5.1, we need to do several steps

  1. Required packages
sudo apt-get install openjdk-11-jdk build-essential 

Or you can install using previous java version if encountered with error

sudo apt-get install openjdk-8-jdk-headless

2. Spark installation

wget -c https://dlcdn.apache.org/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz

sudo tar -xvvf spark-3.5.1-bin-hadoop3.tgz -C /opt/

sudo mv /opt/spark-3.5.1-bin-hadoop3 /opt/spark

3. Zeppelin installation

wget -c https://downloads.apache.org/zeppelin/zeppelin-0.11.0/zeppelin-0.11.0-bin-all.tgz

sudo tar -xvvf zeppelin-0.11.0-bin-all.tgz -C /opt/

sudo mv /opt/zeppelin-0.11.0-bin-all /opt/zeppelin

4. Install MambaForge

wget -c https://github.com/conda-forge/miniforge/releases/download/24.1.2-0/Mambaforge-24.1.2-0-Linux-x86_64.sh

bash Mambaforge-24.1.2-0-Linux-x86_64.sh -b -p ~/anaconda3

~/anaconda3/bin/mamba init

source ~/.bashrc

mamba create -n test python=3.10

5. Run and Configure Zeppelin

sudo ./opt/zeppelin/bin/zeppelin-daemon.sh start

Then go to Interpreter settings (on top right menu) and search for Spark. Make adjustment

  1. SPARK_HOME = /opt/spark
  2. PYTHON = /home/dev/anaconda3/envs/test/bin/python
  3. PYTHON_DRIVER = /home/dev/anaconda3/envs/test/bin/python

Now you done and enjoy using Zeppelin in your ubuntu!

Categories
Machine Learning

Fix Zeppelin Spark-interpreter-0.11.1.jar and Scala

When installing Zeppelin 0.11.1 and Spark 3.3.3 over Docker in my github ( https://github.com/yodiaditya/docker-rapids-spark-zeppelin ) I receive error

Caused by: org.apache.zeppelin.interpreter.InterpreterException: Fail to open SparkInterpreter
	at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:140)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
	... 12 more
Caused by: scala.reflect.internal.FatalError: Error accessing /opt/zeppelin/interpreter/spark/._spark-interpreter-0.11.1.jar
	at scala.tools.nsc.classpath.AggregateClassPath.$anonfun$list$3(AggregateClassPath.scala:113)

Apparently, the solution for this is very simple.

All you need just delete the file that cause the problem. In my case

rm /opt/zeppelin/interpreter/spark/._spark-interpreter-0.11.1.jar

rm /opt/zeppelin/interpreter/spark/scala-2.12/._spark-scala-2.12-0.11.1.jar
Categories
Machine Learning

Solve Pandas Error: PerformanceWarning: DataFrame is highly fragmented.

When running toPandas() or another operation, I received this error

usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series

The quick solution

spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
# spark.conf.set("spark.sql.execution.arrow.enabled", "true")
Categories
Ubuntu

Solve /sbin/ldconfig.real: /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8 is not a symbolic link

I received this error when installing packages in Ubuntu 23.10. To solve this issue, you can fix the CUDNN installation steps

  1. Check the ~/.bashrc
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.8/lib64/

2. Copy CUDNN the right way

If you can’t found folder ‘lib64’ inside CUDNN, just rename ‘lib’ into ‘lib64’

sudo cp -av include/cudnn*.h /usr/local/cuda/include
sudo cp -av lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

This should fixed the not a symbolic link problems!

Categories
Anaconda

Install Miniforge /Mamba to replace Anaconda in Ubuntu

Moving to Miniforge / Mamba will help to doing packages installation faster. The first step is to uninstall Anaconda from your machine

Reverse any Anaconda scripts

conda activate
conda init --reverse --all

Remove Anaconda folders

rm -rf ~/anaconda3
rm -rf ~/.conda
rm -rf ~/.condarc

The Miniforge / Mamba Installation

wget -c https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh 

Load the environment

source ~/.bashrc
conda install conda-libmamba-solver

Now you are good!

Categories
Networking

Solve alpine APKINDEX.tar.gz no such file temporary error

I’m running DockerFile installation and when come to the part of installation APKIndex.tar.gz,


# Add additional repo's for apk to use
RUN echo http://dl-cdn.alpinelinux.org/alpine/v3.3/main > /etc/apk/repositories; \
    echo http://dl-cdn.alpinelinux.org/alpine/v3.3/community >> /etc/apk/repositories

I got error

RUN apk --update add wget tar bash coreutils procps openssl:                                                          
0.503 fetch http://dl-cdn.alpinelinux.org/alpine/v3.3/main/x86_64/APKINDEX.tar.gz                                                      
5.507 ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.3/main: temporary error (try again later)                                         
5.507 WARNING: Ignoring APKINDEX.5a59b88b.tar.gz: No such file or directory                                                            
5.507 fetch http://dl-cdn.alpinelinux.org/alpine/v3.3/community/x86_64/APKINDEX.tar.gz                                                 
10.51 ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.3/community: temporary error (try again later)
10.51 WARNING: Ignoring APKINDEX.7c1f02d6.tar.gz: No such file or directory
Categories
Ubuntu

Fix DuckDB out of memory error Export Database Parquet

I’m using the latest DuckDB 0.10.0 and receive memory error when exporting database with Parquet Format. I did with CSV and its work fine.

Memory configuration also set like :

SET memory_limit = '50GB';
SET max_memory = '50GB';
PRAGMA memory_limit=50GB;

This still trigger OOM. The only solution that works is

SET preserve_insertion_order = false;

Hope this help you in solving memory error using DuckDB.

Categories
Anaconda

Fix Tensorboard in VSCode repeating could not install tensorboard package.

Tensorboard and VSCode is already well-integrated. However, there is a slightly problem when running it using the latest version.

Tensorboard is installed, prompted to install repeatedly. It keep re-appearing to ask installing Tensorboard session package with the same result :

Could not install tensorboard. If pip is not available, please use the package manager of your choice to manually install this library into your Python environment

Apparently, the major culprit is VSCode using different Python interpreter than the selected kernel in notebook. In this case, I’m using Anaconda with specific environment that already have tensorboard installed. To solve this, the solution is very straightfoward.

Categories
Networking

Solve ASUS WRX80 SAGE ensure to connect the 8-pin power please enter setup to recover bios setting fatal error

This is the most frustated problem I have encountered when using Asus Pro WS WRX80E-SAGE SE WIFI Motherboard Pro WS WRX80E-SAGE SE WIFI. The issue appeared when I changed the BIOS settings to enable the “SR-IOV” feature with the hope of solving USB devices not detected and avoiding adding “pci=nommconf” in GRUB.

Once, I rebooted, it suddenly its showing AMI Megatrend where everything was being initialized properly, and the last message was “ensure to connect the 8-pin power please enter setup to recover bios setting fatal error”. There is BIOS page at beginning to press F2 or Del, however it was not responsive and back to AMI page.

Categories
Tensorflow

Solve TFX pip installation too long and slow

When installing TFX, I received error pip install tfx raises ResolutionTooDeep. During installation, its going over multiple different version of packages.

To solve this problem, I created requirements.txt with option to install necessary packages or all-packages that produced using pip freeze.

There are three options: TFX 1.10, 1.13 and the latest TFX 1.14.0

All the packages installation can be found here :

https://github.com/yodiaditya/datascience/tree/main/tfx

I hope this help you to solve TFX pip installation issues!