-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DEPR: raise deprecation warning in numpy ufuncs on DataFrames if not aligned + fallback to <1.2.0 behaviour #39239
DEPR: raise deprecation warning in numpy ufuncs on DataFrames if not aligned + fallback to <1.2.0 behaviour #39239
Conversation
…aligned + fallback to <1.2.0 behaviour
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not timely
pls wait for 1.2.2 if you must
@jreback could you at least read what it is about and give your opinion about that? Because it is reverting behavior that was changed in 1.2.0, it exactly is timely to do it for 1.2.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i have to say i am not against the deprecation itself
i like the change and these should align - and i suppose can be deprecated
but the timing is terrible and this is a large amount of code
pls just wait till 1.2.2
it is not timely to do things at the last minute we have had several last minute changes in the past which have been a disaster -1000 on merging this now |
Thanks for the feedback @simonjayhawkins, pushed an update |
we are pretty much on top of the regressions from 1.2, so if a short delay enables to clear the list, it may be worth considering. but agreed should not be rushed |
FWIW I'm leaning to prefer keeping the breaking change from the consistency with Series argument and avoiding the code changes by users to avoid the warnings. |
pandas/core/arraylike.py
Outdated
# if at least one is not aligned -> warn and fallback to array behaviour | ||
if non_aligned: | ||
warnings.warn( | ||
"Calling a ufunc on non-aligned DataFrames/Series. Currently, the " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because the Series behavior is different, this warning could be misleading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yes. The most explicit is "non-aligned DataFrames or DataFrame/Series combination" or something like that, but wanted to keep it shorter ..
I agree the current can be misleading though (although you will of course never see the warning with only series)
this would close? |
Yes, this closes that issue, will update the top post. |
If delaying the release with 1 or 2 days helps getting this merged, I think that is worth it. |
Maybe add something in 1.2 release notes after |
i'll add the blocker tag here until there is consensus to re-open up the 1.2.1 milestone for new issues/PRs e.g. #39253 that need not block, but could potentially be completed before this. |
|
||
.. code-block:: python | ||
|
||
>>> df1 = pd.DataFrame({"a": [1, 2], "b": [3, 4]}, index=[0, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is an incorrect format
|
||
.. code-block:: python | ||
|
||
>>> df1 + df2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make an actual ipython block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to use some plain code-blocks since part of the example is showing old behaviour (or behaviour that will change in the future), and so prefer to use then code-blocks for all examples, for consistency within this section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we use ipython blocks everywhere, pls do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would like to change these to be consistent
|
||
.. code-block:: python | ||
|
||
>>> np.add(df1, np.asarray(df2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use an actual ipython format
pandas/core/arraylike.py
Outdated
from pandas.core.generic import NDFrame | ||
from pandas.core.internals import BlockManager | ||
|
||
cls = type(self) | ||
|
||
is_ndframe = [isinstance(x, NDFrame) for x in inputs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would you do this? simply check is_series. this is amazingly confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is is_series
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have dataframes and series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and NDFrame
is the parent class for both? Do you want me to put isinstance(x, (Series, DataFrame))
instead of isinstance(x, NDFrame)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i think its more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that below in this array_ufunc
function, we are also using NDFrame
for this purpose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so rename this to is_series_or_frame i think is more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I renamed it now to n_alignable
, because alignable
is the variable name that is already used below, for consistency. And it also matches the explanation in the comment (which says this is Series or DataFrame).
(but can also rename to n_series_or_frame
if you prefer)
pandas/core/arraylike.py
Outdated
"Calling a ufunc on non-aligned DataFrames (or DataFrame/Series " | ||
"combination). Currently, the indices are ignored and the result " | ||
"takes the index/columns of the first DataFrame. In the future " | ||
"(pandas 2.0), the DataFrames/Series will be aligned before " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont' need to mention the version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would not mention here
@jorisvandenbossche did you see #39239 (comment) (also since this PR is the blocker could maybe also update the release date in the notes to maybe save an extra ci/backport cycle) |
@simonjayhawkins yeah, I saw that, thanks for the reminder, as I still need to do that.
Good idea. What's your current idea about the timeline? (eg try to merge this PR this evening, and start release process tomorrow morning? in which case I pick the date of tomorrow) |
Actaully, we don't have subsections yet in the deprecations section in v1.2.0.rst, so just did the clarification of the original whatsnew note as you suggested @simonjayhawkins |
I normally like to start the release nearer to the start of the day, but could get going on the final pre-release checks #38721 (comment) as soon as this is backported. I think the date should match the tag (and we discussed templating this #21050 (comment)) which may not match the github release if the release spans a couple of days. If we're not sure, maybe best to leave out of this PR and I could expedite the change by not waiting for ci to complete (i normally wait for ci to complete which can add a couple of hours to the release process, for the change to master and again for the backport PR) |
pandas/core/arraylike.py
Outdated
""" | ||
Helper to check if a DataFrame is aligned with another DataFrame or Series. | ||
""" | ||
from pandas.core.frame import DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might as well just import from pandas here, this is only the import if you can import at the top of the file (not sure if you can), also maybe can use ABCDataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas.core.frame.py
import from this file, so I don't think I can move the import to the top of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i get that you cannot put the import at the top. However when inside the function the style is to
from pandas import DataFrame
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, changed the imports
pandas/core/arraylike.py
Outdated
is_ndframe = [isinstance(x, NDFrame) for x in inputs] | ||
is_frame = [isinstance(x, DataFrame) for x in inputs] | ||
|
||
if (sum(is_ndframe) >= 2) and (sum(is_frame) >= 1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this condition is impossible to reason about. pls make it simpler. you just want to know if you have 2 or more dataframes right? (or series)? if so, just say that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I want to know if at least two alignable objects (DataFrame or Series) and at least one DataFrame, which is what the above line does, and which is what is explained on the line just below. I can try to clarify that comment if something is not clear about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try to simplify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, Jeff, if you don't give me a clue about what exactly is unclear for you or about how you would do it differently, I have no idea how to improve this. The code reflects exactly what I just explained it needs checking, and it is explained in the line below as well.
Would eg change sum(is_frame)
into a variable n_frames
help? (and moving the sum to the list comprehension where now is_frame
is defined)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, the problem that this is getting so complicated that you need to comment. I honestly don't think this is worth doing this much change at this late hour.
if you want to do for 1.2.2 or better yet 1.3.ok
waiting for the nth change is extremely painful and disruptive.
these are supposed to be lightweight backports. this is turning in to a nightmare.
this is likely going to be extremely fragile and break again. and will then have to be patched again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Waiting for 1.2.2 or 1.3 is not going to make this change any simpler, if you don't help me find out what you don't like about it
waiting for the nth change is extremely painful and disruptive.
What is this about?
these are supposed to be lightweight backports. this is turning in to a nightmare.
The changes in this PR is a rather clean additional check in the array_ufunc
function, to use a different code path in certain cases. It almost doesn't touch any existing code, so I would say it is a clean patch to backport.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i suggested a couple of things to make it more clear. if you can fix the docs as suggested ok to merge.
pandas/core/arraylike.py
Outdated
""" | ||
Helper to check if a DataFrame is aligned with another DataFrame or Series. | ||
""" | ||
from pandas.core.frame import DataFrame |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i get that you cannot put the import at the top. However when inside the function the style is to
from pandas import DataFrame
pandas/core/arraylike.py
Outdated
from pandas.core.generic import NDFrame | ||
from pandas.core.internals import BlockManager | ||
|
||
cls = type(self) | ||
|
||
is_ndframe = [isinstance(x, NDFrame) for x in inputs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so rename this to is_series_or_frame i think is more clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok lgtm on code / tests. 2 doc comments.
pandas/core/arraylike.py
Outdated
"Calling a ufunc on non-aligned DataFrames (or DataFrame/Series " | ||
"combination). Currently, the indices are ignored and the result " | ||
"takes the index/columns of the first DataFrame. In the future " | ||
"(pandas 2.0), the DataFrames/Series will be aligned before " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would not mention here
|
||
.. code-block:: python | ||
|
||
>>> df1 + df2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would like to change these to be consistent
AFAIK the remaining comment is a doc comment on the whatsnew notes, and since this is a somewhat subjective style discussion / not critical IMO, I am going to take the liberty to merge this, so @simonjayhawkins can start the release process early in the day once this is backported and builds have passed. @simonjayhawkins I also updated the date in the release notes here. |
@meeseeksdev backport to 1.2.x |
…y ufuncs on DataFrames if not aligned + fallback to <1.2.0 behaviour
…n DataFrames if not aligned + fallback to <1.2.0 behaviour (#39288) Co-authored-by: Joris Van den Bossche <[email protected]>
@jorisvandenbossche pls following up and fix the docs it's not a style issue rather this is completely inconsistent with the current docs we NEVER use the style - always ipython docs style |
…aligned + fallback to <1.2.0 behaviour (pandas-dev#39239)
Closes #39184
This is obviously a last-minute change, but if people agree on the deprecation, I think we should try to include it in v1.2.1. I think my patch is relatively safe, since converting the input to numpy arrays (what I am doing now manually as fallback) is what happened before adding
DataFrame.__array_ufunc__
as well.It adds quite some lines of code, but it's mostly some simple checking of the exact case which is a bit verbose.
The specific tests I added were verified to pass on pandas 1.1.5, so they codify the previous behaviour (minus the warnings).
cc @TomAugspurger