Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some queries return JSONResultBatch result type, causing errors #21

Open
noyoshi opened this issue Oct 22, 2021 · 4 comments
Open

Some queries return JSONResultBatch result type, causing errors #21

noyoshi opened this issue Oct 22, 2021 · 4 comments

Comments

@noyoshi
Copy link

noyoshi commented Oct 22, 2021

It appears some queries return results as a JSONResultBatch, which causes issues as the connector assumes the format will be arrow.

Specific error:

RuntimeError: Received unknown result batch type <class 'snowflake.connector.result_batch.JSONResultBatch'>

The query in this case was SHOW TABLES;. Not sure if snowflake makes it clear what types of results we can get, if so we can add some validation to the connector to check the different types perhaps? (More than just arrow)

Validated that a normal query would return the proper result SELECT * FROM TABLE LIMIT 100, which worked.

@jrbourbeau
Copy link
Member

Thanks @noyoshi for trying dask-snowflake out and raising an issue. Indeed it appears Snowflake will return JSONResultBatch objects in some scenerios. Perhaps @sfc-gh-mkeller has thoughts on how we should handle both JSONResultBatch and ArrowResultBatch objects. For example, today we're using the ArrowResultBatch.to_pandas method to retrieve a pandas DataFrame for the underlying batch -- what is the recommended way of doing this for JSONResultBatch?

@sfc-gh-mkeller
Copy link

So this is caused by the fact that Snowflake cannot always provide results in Arrow format. Currently only SELECT statements can generate Arrow result formats. Which is the only thing that our connector supports turning into data frames.
One workaround would be to execute your query and then use a result scan on its results, like:

with snowflake.connector.connect(
    ...
) as cnx:
    with cnx.cursor() as cur:
        cur.execute(f"SHOW TABLES")
        cur.execute(f"select * from table(result_scan('{cur.sfqid}'));")
        print(cur.fetchall())

@jrbourbeau
Copy link
Member

Thanks for the additional context @sfc-gh-mkeller -- I hadn't realized that only SELECT statements were supported for turning results into DataFrames.

I could be missing something, but it looks like the snippet you posted might cause the entire query result to be loaded into memory -- is that the case?

Is there a standard way to convert a JSONResultBatch into a JSON blob? I'm wondering if we could convert these result batches to JSON and then use pandas' read_json function to convert into a DataFrame

@sfc-gh-mkeller
Copy link

That's right, @jrbourbeau
The snippet was only to show you how to work around this limitation.
I didn't mean for you to use it that way exactly, my apologies!

Not currently, unless you could convert the Python results yourself into DataFrames. I have never used read_json, but if it's that simple then you could try patching the following function: https://github.com/snowflakedb/snowflake-connector-python/blob/ffdd6b3339aa71885878d047141fe9a77c4a4ae3/src/snowflake/connector/result_batch.py#L525

I'd imagine that this might return different data types than what our Arrow converter does, but if this works at all I'd be happy to get us to take over and add a way to do this natively in our connector.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants