Month: January 2024

Fix PySpark contains a task of very large size. The maximum recommended task size is 1000 KiB

Post author By yodi
Post date January 30, 2024
No Comments on Fix PySpark contains a task of very large size. The maximum recommended task size is 1000 KiB

If you got this error when running your notebook with warning

contains a task of very large size. The maximum recommended task size is 1000 KiB

This mean PySpark warning you to increase the partition or parallelism (and might memory as well).

Example code to configure it, where you can adjust based on your workstation memory. In my case, is 192GB is my max memory

import os
os.environ['PYSPARK_SUBMIT_ARGS'] = '--driver-memory 192g --executor-memory 16g pyspark-shell'

# add this one in your spark configuration
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")

Full implementation

import os
from pyspark import SparkContext

os.environ['PYSPARK_SUBMIT_ARGS'] = '--driver-memory 192g --executor-memory 16g --executor-cores 10 pyspark-shell'
os.environ['PYARROW_IGNORE_TIMEZONE'] = '1'

builder = SparkSession.builder
builder = builder.config("spark.driver.maxResultSize", "5G")

spark = builder.master("local[*]").appName("FMClassifier_MovieLens").getOrCreate()
spark.conf.set("spark.sql.analyzer.failAmbiguousSelfJoin", "false")
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")

Additional bonus description if you would like to increase the instances as well


spark = SparkSession.builder.config('spark.executor.instances', 4).getOrCreate()
spark.conf.set("spark.sql.analyzer.failAmbiguousSelfJoin", "false")

Tags PySpark contains a task of very large size

Ubuntu

Install CUDA 11 on Ubuntu 23.10

Post author By yodi
Post date January 18, 2024
No Comments on Install CUDA 11 on Ubuntu 23.10

To solve Driver or CUDA 11 installation error in Ubuntu 23.10, the answer is to ensure its using the compatible GCC version. By default installation, it will using GCC 13 which is not working when compiling CUDA or NVIDIA Drivers (required GCC 10). Installing CUDA 11 is important to run Tensorflow that haven’t fully adapted with CUDA 12.

Failed to verify gcc version. See log at /var/log/cuda-installer.log for details.

First step to fixing this problem is to uninstall any nvidia and cuda installation made previously

sudo apt autoremove cuda* nvidia* --purge

Next, install GCC 10

MAX_GCC_VERSION=10
sudo apt install gcc-$MAX_GCC_VERSION g++-$MAX_GCC_VERSION
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-$MAX_GCC_VERSION $MAX_GCC_VERSION

Next, choose the GCC 10 as default by running this command

sudo update-alternatives --config gcc

Now, you all set! You can start to do installation of CUDA 11 in Ubuntu 23.10. Make sure to un-check the driver installation part (where we will install it later)

sudo ./cuda_11.8.0_520.61.05_linux.run

Next, I’m using Ubuntu NVIDIA default installation. So, I revert back the GCC to version 13, using the same command

sudo update-alternatives --config gcc

sudo apt install nvidia-driver-525

You can repeat the process like CUDNN, TensorRT and others installation following my previous article here

Install Transformers Pytorch Tensorflow Ubuntu 2023

Finally, make sure if anything broken with NVCC, is to switch the GCC version to 10, not 13.

Tags install cuda 11 in ubuntu 23.10

Ubuntu

Fix VSCode open Large file by increase Memory

Post author By yodi
Post date January 17, 2024
No Comments on Fix VSCode open Large file by increase Memory

When opening Netflix data around 1GB, the VSCode is crashed. My memory are pretty much 30% usage and have plenty room to open this 1GB file.

To fix this, either run from terminal

code --max-memory=12288mb

Or right click the menu in Ubuntu, and replace the launcher with this.

code  --max-memory=12288mb --unity-launch %F

Tags Fix VSCode open Large file crash

Fix Failed to load implementation from:dev.ludovic.netlib.blas.VectorBLAS

Post author By yodi
Post date January 11, 2024
No Comments on Fix Failed to load implementation from:dev.ludovic.netlib.blas.VectorBLAS

When you got error “Failed to load implementation from:dev.ludovic.netlib.blas.VectorBLAS” when running ALS training, the quickfix for Intel MKL are

sudo ln -s /opt/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_rt.so /usr/local/lib/libblas.so.3
sudo ln -s /opt/intel/oneapi/mkl/2023.2.0/lib/intel64/libmkl_rt.so /usr/local/lib/liblapack.so.3

Or follow this: https://spark.apache.org/docs/latest/ml-linalg-guide.html

Tags from:dev.ludovic.netlib.blas.VectorBLAS, VectorBLAS

Tensorflow

Solve No builder could be found in the director tensorflow dataset

Post author By yodi
Post date January 10, 2024
No Comments on Solve No builder could be found in the director tensorflow dataset

When downloading movielens dataset from Tensorflow dataset, I got this error

No builder could be found in the director tensorflow dataset

The quick solution is to upgrade

pip install --upgrade tfds-nightly

Tags No builder could be found in the director tensorflow dataset