-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserving column order for categoricals #27
Comments
@josef-pkt: I think this is the issue that you were trying to think of today that involved setting up column order. For reference, the issue is that patsy does impose some constraints on column order: specifically, it groups together terms so that those which contain the same combination of continuous factors go together, and then within each group it puts lower-order interactions before higher-order interactions. The user request was that they wanted to do a type-I anova, and statsmodels only supported (don't know if this is still true?) type-I anovas where each column was entered from left to right. So they wanted to in particular have some categorical terms, then some continuous terms, then some categorical terms, which violates that "grouping" constraint above. My current feleing (also expressed more thoughtfully in that thread) is that the best solution is just for statsmodels type-I anova code to support explicit specification of what order you want to enter the terms in. Doing this in patsy is hard because (a) I actually think the current behaviour is nicer for most use cases, so am reluctant to de-optimize user experience in general just to improve type-I anovas (which are almost never the right thing anyway, and rarely used outside of introductory classes), and (b) it's not clear that patsy can fix this entirely, since in general type-I anovas might want almost any ordering of columns, and patsy can't really support that without extreme contortions. Allowing |
Yes, I think that's what I remembered. I don't really know the details, but there are also other use cases where column order is relevant. One is in handling multicollinearity, where R does pivoting, and statsmodels will also do sequential check for perfect correlation. For type 1 ANOVA: |
I guess column order effects results in multicollinearity cases, but do Patsy does provide the ability to look up terms by name, so I guess you On Tue, Apr 14, 2015 at 6:53 PM, Josef Perktold [email protected]
Nathaniel J. Smith -- http://vorpus.org |
https://groups.google.com/forum/#!topic/pystatsmodels/ZvsyZag3xaw
The text was updated successfully, but these errors were encountered: