Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Databricks instructions not working #246

Closed
jacobtomlinson opened this issue Jun 14, 2023 · 2 comments · May be fixed by #247
Closed

Databricks instructions not working #246

jacobtomlinson opened this issue Jun 14, 2023 · 2 comments · May be fixed by #247
Labels
bug Something isn't working doc Improvements or additions to documentation platform/databricks

Comments

@jacobtomlinson
Copy link
Member

jacobtomlinson commented Jun 14, 2023

Following the instructions on launching Databricks have started failing.

On step 5 of section 2 where it says "Create and launch your cluster" it thinks about it for a while and then fails with the following error

Spark error:
Spark encountered an error on startup. This issue can be caused by invalid Spark configurations or malfunctioning [init scripts](https://docs.databricks.com/clusters/init-scripts.html#global-and-cluster-named-init-script-logs). Please refer to the Spark driver logs to troubleshoot this issue, and contact Databricks if the problem persists.

Internal error message: Spark error: Driver down cause: driver state change (exit code: 10)

The container image I created by following the instructions can be found at jacobtomlinson/rapids_databricks:23-06-nightly.

@jacobtomlinson jacobtomlinson added bug Something isn't working doc Improvements or additions to documentation platform/databricks labels Jun 14, 2023
@jacobtomlinson
Copy link
Member Author

Interestingly I'm seeing this same problem purely using the base image that Databricks provides (databricksruntime/gpu-conda:cuda11) without even adding RAPIDS.

image

@jacobtomlinson
Copy link
Member Author

jacobtomlinson commented Aug 10, 2023

I'm exploring alternative approaches to getting things running on Databricks. Here's a quick summary of the state of each approach.

1. Add RAPIDS conda env to Databricks container

Our current instructions involve adding the RAPIDS environment to the databricks container, however as this issue already shows this fails to launch.

2. Make RAPIDS/Merlin container compatible with Databricks

By following these instructions from Databricks we should be able to ensure our container images can run on Databricks. However even with following all of the instructions the container fails to start with the same error as above.

3. Launch Databricks ML runtime and install RAPIDS in a notebook

We could avoid using a custom image altogether and launch a GPU single-node cluster and install RAPIDS via pip once it is running. However in order to get a GPU node you have to select the ML Runtime, this runtime includes tensorflow which currently cannot coexist with cudf in the same environment due to package conflicts.

!pip install cudf-cu11 dask-cudf-cu11 --extra-index-url=https://pypi.nvidia.com
!pip install cuml-cu11 --extra-index-url=https://pypi.nvidia.com
!pip install cugraph-cu11 --extra-index-url=https://pypi.nvidia.com
import cudf
ImportError: cannot import name 'builder' from 'google.protobuf.internal' (/databricks/python/lib/python3.10/site-packages/google/protobuf/internal/__init__.py)
Full traceback
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
File <command-3853275431558241>, line 1
----> 1 import cudf

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-34172b24-5027-409d-964c-6b0b5464f983/lib/python3.10/site-packages/cudf/__init__.py:76
     74 from cudf.core.tools.datetimes import DateOffset, date_range, to_datetime
     75 from cudf.core.tools.numeric import to_numeric
---> 76 from cudf.io import (
     77     from_dlpack,
     78     read_avro,
     79     read_csv,
     80     read_feather,
     81     read_hdf,
     82     read_json,
     83     read_orc,
     84     read_parquet,
     85     read_text,
     86 )
     87 from cudf.options import describe_option, get_option, set_option
     88 from cudf.utils.dtypes import _NA_REP

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-34172b24-5027-409d-964c-6b0b5464f983/lib/python3.10/site-packages/cudf/io/__init__.py:8
      6 from cudf.io.hdf import read_hdf
      7 from cudf.io.json import read_json
----> 8 from cudf.io.orc import read_orc, read_orc_metadata, to_orc
      9 from cudf.io.parquet import (
     10     ParquetDatasetWriter,
     11     merge_parquet_filemetadata,
   (...)
     14     write_to_dataset,
     15 )
     16 from cudf.io.text import read_text

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-34172b24-5027-409d-964c-6b0b5464f983/lib/python3.10/site-packages/cudf/io/orc.py:14
     12 from cudf.api.types import is_list_like
     13 from cudf.utils import ioutils
---> 14 from cudf.utils.metadata import (  # type: ignore
     15     orc_column_statistics_pb2 as cs_pb2,
     16 )
     19 def _make_empty_df(filepath_or_buffer, columns):
     20     orc_file = orc.ORCFile(filepath_or_buffer)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-34172b24-5027-409d-964c-6b0b5464f983/lib/python3.10/site-packages/cudf/utils/metadata/orc_column_statistics_pb2.py:7
      1 # flake8: noqa
      2 # fmt: off
      3 # -*- coding: utf-8 -*-
      4 # Generated by the protocol buffer compiler.  DO NOT EDIT!
      5 # source: cudf/utils/metadata/orc_column_statistics.proto
      6 """Generated protocol buffer code."""
----> 7 from google.protobuf.internal import builder as _builder
      8 from google.protobuf import descriptor as _descriptor
      9 from google.protobuf import descriptor_pool as _descriptor_pool

ImportError: cannot import name 'builder' from 'google.protobuf.internal' (/databricks/python/lib/python3.10/site-packages/google/protobuf/internal/__init__.py)

4. Follow the Spark RAPIDS instructions

The Spark RAPIDS docs suggest using a .jar file to install cudf. We could document using those.

https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-databricks.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working doc Improvements or additions to documentation platform/databricks
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant