Databricks instructions not working #246

jacobtomlinson · 2023-06-14T16:03:49Z

Following the instructions on launching Databricks have started failing.

On step 5 of section 2 where it says "Create and launch your cluster" it thinks about it for a while and then fails with the following error

Spark error:
Spark encountered an error on startup. This issue can be caused by invalid Spark configurations or malfunctioning [init scripts](https://docs.databricks.com/clusters/init-scripts.html#global-and-cluster-named-init-script-logs). Please refer to the Spark driver logs to troubleshoot this issue, and contact Databricks if the problem persists.

Internal error message: Spark error: Driver down cause: driver state change (exit code: 10)

The container image I created by following the instructions can be found at jacobtomlinson/rapids_databricks:23-06-nightly.

The text was updated successfully, but these errors were encountered:

jacobtomlinson · 2023-06-14T16:22:23Z

Interestingly I'm seeing this same problem purely using the base image that Databricks provides (databricksruntime/gpu-conda:cuda11) without even adding RAPIDS.

jacobtomlinson · 2023-08-10T09:04:14Z

I'm exploring alternative approaches to getting things running on Databricks. Here's a quick summary of the state of each approach.

1. Add RAPIDS conda env to Databricks container

Our current instructions involve adding the RAPIDS environment to the databricks container, however as this issue already shows this fails to launch.

2. Make RAPIDS/Merlin container compatible with Databricks

By following these instructions from Databricks we should be able to ensure our container images can run on Databricks. However even with following all of the instructions the container fails to start with the same error as above.

3. Launch Databricks ML runtime and install RAPIDS in a notebook

We could avoid using a custom image altogether and launch a GPU single-node cluster and install RAPIDS via pip once it is running. However in order to get a GPU node you have to select the ML Runtime, this runtime includes tensorflow which currently cannot coexist with cudf in the same environment due to package conflicts.

!pip install cudf-cu11 dask-cudf-cu11 --extra-index-url=https://pypi.nvidia.com
!pip install cuml-cu11 --extra-index-url=https://pypi.nvidia.com
!pip install cugraph-cu11 --extra-index-url=https://pypi.nvidia.com

import cudf
ImportError: cannot import name 'builder' from 'google.protobuf.internal' (/databricks/python/lib/python3.10/site-packages/google/protobuf/internal/__init__.py)

Full traceback

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
File <command-3853275431558241>, line 1
----> 1 import cudf

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-34172b24-5027-409d-964c-6b0b5464f983/lib/python3.10/site-packages/cudf/__init__.py:76
     74 from cudf.core.tools.datetimes import DateOffset, date_range, to_datetime
     75 from cudf.core.tools.numeric import to_numeric
---> 76 from cudf.io import (
     77     from_dlpack,
     78     read_avro,
     79     read_csv,
     80     read_feather,
     81     read_hdf,
     82     read_json,
     83     read_orc,
     84     read_parquet,
     85     read_text,
     86 )
     87 from cudf.options import describe_option, get_option, set_option
     88 from cudf.utils.dtypes import _NA_REP

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-34172b24-5027-409d-964c-6b0b5464f983/lib/python3.10/site-packages/cudf/io/__init__.py:8
      6 from cudf.io.hdf import read_hdf
      7 from cudf.io.json import read_json
----> 8 from cudf.io.orc import read_orc, read_orc_metadata, to_orc
      9 from cudf.io.parquet import (
     10     ParquetDatasetWriter,
     11     merge_parquet_filemetadata,
   (...)
     14     write_to_dataset,
     15 )
     16 from cudf.io.text import read_text

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-34172b24-5027-409d-964c-6b0b5464f983/lib/python3.10/site-packages/cudf/io/orc.py:14
     12 from cudf.api.types import is_list_like
     13 from cudf.utils import ioutils
---> 14 from cudf.utils.metadata import (  # type: ignore
     15     orc_column_statistics_pb2 as cs_pb2,
     16 )
     19 def _make_empty_df(filepath_or_buffer, columns):
     20     orc_file = orc.ORCFile(filepath_or_buffer)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-34172b24-5027-409d-964c-6b0b5464f983/lib/python3.10/site-packages/cudf/utils/metadata/orc_column_statistics_pb2.py:7
      1 # flake8: noqa
      2 # fmt: off
      3 # -*- coding: utf-8 -*-
      4 # Generated by the protocol buffer compiler.  DO NOT EDIT!
      5 # source: cudf/utils/metadata/orc_column_statistics.proto
      6 """Generated protocol buffer code."""
----> 7 from google.protobuf.internal import builder as _builder
      8 from google.protobuf import descriptor as _descriptor
      9 from google.protobuf import descriptor_pool as _descriptor_pool

ImportError: cannot import name 'builder' from 'google.protobuf.internal' (/databricks/python/lib/python3.10/site-packages/google/protobuf/internal/__init__.py)

4. Follow the Spark RAPIDS instructions

The Spark RAPIDS docs suggest using a .jar file to install cudf. We could document using those.

https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-databricks.html

jacobtomlinson added bug Something isn't working doc Improvements or additions to documentation platform/databricks labels Jun 14, 2023

jacobtomlinson mentioned this issue Jun 14, 2023

Explicitly state Databricks runtime version #247

Draft

jacobtomlinson mentioned this issue Aug 11, 2023

Update Databricks instructions to use pip at runtime #275

Merged

jacobtomlinson closed this as completed Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Databricks instructions not working #246

Databricks instructions not working #246

jacobtomlinson commented Jun 14, 2023 •

edited

Loading

jacobtomlinson commented Jun 14, 2023

jacobtomlinson commented Aug 10, 2023 •

edited

Loading

Databricks instructions not working #246

Databricks instructions not working #246

Comments

jacobtomlinson commented Jun 14, 2023 • edited Loading

jacobtomlinson commented Jun 14, 2023

jacobtomlinson commented Aug 10, 2023 • edited Loading

1. Add RAPIDS conda env to Databricks container

2. Make RAPIDS/Merlin container compatible with Databricks

3. Launch Databricks ML runtime and install RAPIDS in a notebook

4. Follow the Spark RAPIDS instructions

jacobtomlinson commented Jun 14, 2023 •

edited

Loading

jacobtomlinson commented Aug 10, 2023 •

edited

Loading