Categories
Ubuntu

Sync Obsidian with Google Drive on Ubuntu

Mounting Obsidian Vault in google drive using GNOME online account option will not working. The reason because the files in Nautilus cryptic with hash names (literal how Google Drive store our files) without context translation (this part is missing). Also, using this approach the files usually being stream rather persist.

Alternatively, mount GDrive using google-drive-ocamlfuse` solved this problem. Here are 4 quicksteps to do it.

  1. Install google-drive-ocamlfuse
sudo add-apt-repository ppa:alessandro-strada/ppa
sudo apt-get update
sudo apt-get install google-drive-ocamlfuse

2. Enable your Google Drive API (important!)

Go to https://console.cloud.google.com/apis/api/drive.googleapis.com and enable API.

Categories
Google Cloud

The DatastoreGrpcStub requires a local gRPC installation, which is not found

When running app engine in Google using `dev_appserver.py app.yaml` and I got error

/usr/lib/google-cloud-sdk/platform/google_appengine/google/protobuf/internal/api_implementation.py:100: UserWarning: Selected implementation upb is not available. Falling back to the python implementation.
  warnings.warn('Selected implementation upb is not available. '
INFO     2024-07-05 15:09:52,906 <string>:234] Using Cloud Datastore Emulator.
We are gradually rolling out the emulator as the default datastore implementation of dev_appserver.
If broken, you can temporarily disable it by --support_datastore_emulator=False
Read the documentation: https://cloud.google.com/appengine/docs/standard/python/tools/migrate-cloud-datastore-emulator
Help us validate that the feature is ready by taking this survey: https://goo.gl/forms/UArIcs8K9CUSCm733
Report issues at: https://issuetracker.google.com/issues/new?component=187272

INFO     2024-07-05 15:09:52,909 <string>:316] Skipping SDK update check.
WARNING  2024-07-05 15:09:52,910 <string>:325] The default encoding of your local Python interpreter is set to 'utf-8' while App Engine's production environment uses 'ascii'; as a result your code may behave differently when deployed.
INFO     2024-07-05 15:09:52,961 datastore_emulator.py:152] Starting Cloud Datastore emulator at: http://localhost:36569
Traceback (most recent call last):
  File "/usr/lib/google-cloud-sdk/platform/google_appengine/dev_appserver.py", line 103, in <module>
    _run_file(__file__, globals())
  File "/usr/lib/google-cloud-sdk/platform/google_appengine/dev_appserver.py", line 99, in _run_file
    _execfile(_PATHS.script_file(script_name), globals_)
  File "/usr/lib/google-cloud-sdk/platform/google_appengine/dev_appserver.py", line 81, in _execfile
    exec(open(fn).read(), scope)
  File "<string>", line 638, in <module>
  File "<string>", line 626, in main
  File "<string>", line 393, in start
  File "<string>", line 746, in create_api_server
  File "/usr/lib/google-cloud-sdk/platform/google_appengine/google/appengine/tools/devappserver2/stub_util.py", line 166, in setup_stubs
    datastore_grpc_stub_class(os.environ['DATASTORE_EMULATOR_HOST']))
  File "/usr/lib/google-cloud-sdk/platform/google_appengine/google/appengine/tools/devappserver2/datastore_grpc_stub.py", line 83, in __init__
    raise RuntimeError('The DatastoreGrpcStub requires a local gRPC '
RuntimeError: The DatastoreGrpcStub requires a local gRPC installation, which is not found.
INFO     2024-07-05 15:09:55,335 datastore_emulator.py:158] Cloud Datastore emulator responded after 2.374175 seconds
Exception ignored in: <function DatastoreEmulator.__del__ at 0x79fb5032ecb0>
Traceback (most recent call last):
  File "/usr/lib/google-cloud-sdk/platform/google_appengine/google/appengine/tools/devappserver2/cloud_emulators/datastore/datastore_emulator.py", line 207, in __del__
AttributeError: 'NoneType' object has no attribute 'warning'

To solve this problem, simply passing argument to ignore datastore by

dev_appserver.py app.yaml --support_datastore_emulator=False
Categories
Devops

Solve environment is externally managed Pip install

When install python packages using pip in Google Cloud VM, I got error

error: externally-managed-environment

× This environment is externally managed
╰─> To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.
    
    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.
    
    If you wish to install a non-Debian packaged Python application,
    it may be easiest to use pipx install xyz, which will manage a
    virtual environment for you. Make sure you have pipx installed.
    
    See /usr/share/doc/python3.11/README.venv for more information.

note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.

To solve this problem, simply as

sudo rm /usr/lib/python3.11/EXTERNALLY-MANAGED

Categories
Spark

Fix problem Cannot run program “null/bin/spark-submit”: error=2, No such file or directory at org.apache.zeppelin.interpreter

When running a new Zeppelin, I got error like :

org.apache.zeppelin.interpreter.InterpreterException: java.io.IOException: Fail to detect scala version, the reason is:Cannot run program "null/bin/spark-submit": error=2, No such file or directory at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:128) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:270) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:428) at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:68) at org.apache.zeppelin.scheduler.Job.run(Job.java:186) at org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:135) at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:186) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Fail to detect scala version, the reason is:Cannot run program "null/bin/spark-submit": error=2, No such file or directory at org.apache.zeppelin.interpreter.launcher.SparkInterpreterLauncher.buildEnvFromProperties(SparkInterpreterLauncher.java:139) at org.apache.zeppelin.interpreter.launcher.StandardInterpreterLauncher.launchDirectly(StandardInterpreterLauncher.java:76) at org.apache.zeppelin.interpreter.launcher.InterpreterLauncher.launch(InterpreterLauncher.java:106) at org.apache.zeppelin.interpreter.InterpreterSetting.createInterpreterProcess(InterpreterSetting.java:856) at org.apache.zeppelin.interpreter.ManagedInterpreterGroup.getOrCreateInterpreterProcess(ManagedInterpreterGroup.java:66) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getOrCreateInterpreterProcess(RemoteInterpreter.java:103) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.internal_create(RemoteInterpreter.java:153) at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.open(RemoteInterpreter.java:125)

To solve this. First, delete the zeppelin/conf/interpreter.json . Dont worry, it will generated after starting zeppelin-daemon.sh

Next, running the zeppelin, go to interpreter and configure SPARK_HOME path on both spark and spark-submit settings. Don’t forget to setup the python path as well. Hope this solve your problem!

Categories
Machine Learning

Zeppelin and Spark installation on Ubuntu 24.04

To install the latest Zeppelin (I recommend 0.11.0, since 0.11.1 have bugs) and Spark 3.5.1, we need to do several steps

  1. Required packages
sudo apt-get install openjdk-11-jdk build-essential 

Or you can install using previous java version if encountered with error

sudo apt-get install openjdk-8-jdk-headless

2. Spark installation

wget -c https://dlcdn.apache.org/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3.tgz

sudo tar -xvvf spark-3.5.1-bin-hadoop3.tgz -C /opt/

sudo mv /opt/spark-3.5.1-bin-hadoop3 /opt/spark

3. Zeppelin installation

wget -c https://downloads.apache.org/zeppelin/zeppelin-0.11.0/zeppelin-0.11.0-bin-all.tgz

sudo tar -xvvf zeppelin-0.11.0-bin-all.tgz -C /opt/

sudo mv /opt/zeppelin-0.11.0-bin-all /opt/zeppelin

4. Install MambaForge

wget -c https://github.com/conda-forge/miniforge/releases/download/24.1.2-0/Mambaforge-24.1.2-0-Linux-x86_64.sh

bash Mambaforge-24.1.2-0-Linux-x86_64.sh -b -p ~/anaconda3

~/anaconda3/bin/mamba init

source ~/.bashrc

mamba create -n test python=3.10

5. Run and Configure Zeppelin

sudo ./opt/zeppelin/bin/zeppelin-daemon.sh start

Then go to Interpreter settings (on top right menu) and search for Spark. Make adjustment

  1. SPARK_HOME = /opt/spark
  2. PYTHON = /home/dev/anaconda3/envs/test/bin/python
  3. PYTHON_DRIVER = /home/dev/anaconda3/envs/test/bin/python

Now you done and enjoy using Zeppelin in your ubuntu!

Categories
Machine Learning

Fix Zeppelin Spark-interpreter-0.11.1.jar and Scala

When installing Zeppelin 0.11.1 and Spark 3.3.3 over Docker in my github ( https://github.com/yodiaditya/docker-rapids-spark-zeppelin ) I receive error

Caused by: org.apache.zeppelin.interpreter.InterpreterException: Fail to open SparkInterpreter
	at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:140)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:70)
	... 12 more
Caused by: scala.reflect.internal.FatalError: Error accessing /opt/zeppelin/interpreter/spark/._spark-interpreter-0.11.1.jar
	at scala.tools.nsc.classpath.AggregateClassPath.$anonfun$list$3(AggregateClassPath.scala:113)

Apparently, the solution for this is very simple.

All you need just delete the file that cause the problem. In my case

rm /opt/zeppelin/interpreter/spark/._spark-interpreter-0.11.1.jar

rm /opt/zeppelin/interpreter/spark/scala-2.12/._spark-scala-2.12-0.11.1.jar
Categories
Machine Learning

Solve Pandas Error: PerformanceWarning: DataFrame is highly fragmented.

When running toPandas() or another operation, I received this error

usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series
/usr/lib/spark/python/pyspark/sql/pandas/conversion.py:186: PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df[column_name] = series

The quick solution

spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
# spark.conf.set("spark.sql.execution.arrow.enabled", "true")
Categories
Ubuntu

Solve /sbin/ldconfig.real: /usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8 is not a symbolic link

I received this error when installing packages in Ubuntu 23.10. To solve this issue, you can fix the CUDNN installation steps

  1. Check the ~/.bashrc
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.8/lib64/

2. Copy CUDNN the right way

sudo cp -av include/cudnn*.h /usr/local/cuda/include
sudo cp -av lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

This should fixed the not a symbolic link problems!

Categories
Anaconda

Install Miniforge /Mamba to replace Anaconda in Ubuntu

Moving to Miniforge / Mamba will help to doing packages installation faster. The first step is to uninstall Anaconda from your machine

Reverse any Anaconda scripts

conda activate
conda init --reverse --all

Remove Anaconda folders

rm -rf ~/anaconda3
rm -rf ~/.conda
rm -rf ~/.condarc

The Miniforge / Mamba Installation

wget -c https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh
bash Miniforge3-Linux-x86_64.sh 

Load the environment

source ~/.bashrc
conda install conda-libmamba-solver

Now you are good!

Categories
Networking

Solve alpine APKINDEX.tar.gz no such file temporary error

I’m running DockerFile installation and when come to the part of installation APKIndex.tar.gz,


# Add additional repo's for apk to use
RUN echo http://dl-cdn.alpinelinux.org/alpine/v3.3/main > /etc/apk/repositories; \
    echo http://dl-cdn.alpinelinux.org/alpine/v3.3/community >> /etc/apk/repositories

I got error

RUN apk --update add wget tar bash coreutils procps openssl:                                                          
0.503 fetch http://dl-cdn.alpinelinux.org/alpine/v3.3/main/x86_64/APKINDEX.tar.gz                                                      
5.507 ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.3/main: temporary error (try again later)                                         
5.507 WARNING: Ignoring APKINDEX.5a59b88b.tar.gz: No such file or directory                                                            
5.507 fetch http://dl-cdn.alpinelinux.org/alpine/v3.3/community/x86_64/APKINDEX.tar.gz                                                 
10.51 ERROR: http://dl-cdn.alpinelinux.org/alpine/v3.3/community: temporary error (try again later)
10.51 WARNING: Ignoring APKINDEX.7c1f02d6.tar.gz: No such file or directory