-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding Standard Error of Prediction #566
Comments
Hy Hayden, I think this would be an awesome addition to pyfixest! I think the best place to add it would be via the 'predict' method? We could add an argument, eg named "ci", that is by default False but if set to True, returns the prediction interval with the predicted values. What do you think? I agree that the key step will be to compute the full vcov matrix, including all estimated fixed effects. Do you already have an idea how to best built this based on the existing code base? I've thought about it for the last few minutes and am not sure if there is a "quick win". But I'll have to think about this more in the next days. Maybe one option would be to build a first MVP implementation that only works without fixed effects? Anyways, I'd be super happy to have this feature! Cc'ing @b-knight as this might be of interest for him as well =) |
Hey Alex, Glad you think this would be a worthwhile addition! I agree with you that the calculation of the full covariance matrix is the challenging aspect of this task. I don't have an idea how to do it right now - but I view it as an opportunity to learn more about how covariance matrices work. The nice thing is that we can compare results to the existing packages that already offer this functionality to validate/double check. I guess there are two ways to approach the task:
In general, my software engineer instincts tell me to get the hard part done first, but I am ok with either approach here and will cede to your opinion as to which order you prefer to do things. Also, as a quick aside, the functionality I am proposing would not actually calculate CI's on the prediction, it would calculate the standard error of the prediction from which intervals could be easily computed. If you like, we can also add functionality to add CI's (once we have the prediction error it is a simple arithmetic) but I haven't seen that other packages actually offer this last step built in. I usually just do it manually once I have the prediction error. |
There is also a related issue in the Fixest github which may have some pertinent discussion. |
Sounds good to start with SEs, but I think support for CIs would be cool as well! In general, I'd be happy with either approach 1. and 2. I always try to go for the quick wins first, as I am motivated by getting stuff done / building new features😄 Have you thought about the design? How do you think about the strategy of adding a inference boolean to I think the method could look smth along these lines: def predict(..., add_inference):
...
# after computing y_hat, at the very end
if add_inference:
# store model objects that will be overwritten
old_X = self._X.copy()
old_Y = self._Y.copy()
# etc
self._u_hat = y_hat - self._Y
# update all other methods required for updating via the vcov method
# note: need un-demeaned Y and X, but stored self._Y and self._X are demeaned
# maybe add option to feols "add_undemeaned_Y", "add_undemeaned_X" that
# if True, store self._X_undemeaned and self._Y_undemeaned?
self._X = [X, D] # make this sparse
self._Y = Y
self._tXX = self._X.T @ self._X
#self._scores = ...
vcov = self.vcov()
# compute se, ci of prediction interval, return pd.DataFrame with columns: y_hat, se, ci_lower, ci_upper
# restore old model objects
self._X = old_X
self._Y = old_Y
# etc What do you think @greenguy33 ? |
Oh and I just went through the very detailed fixest thread :D Are you using fixest & prediction intervals / errors in a similar way for your own research? Fwiw, reading the article, I learned that library("fixest")
data(mtcars)
head(mtcars)
fit = feols(mpg ~ wt + as.factor(disp), data = mtcars)
predict(fit, se.fit = TRUE) |> head()
# fit se.fit
# Mazda RX4 21.36122 1.397038
# Mazda RX4 Wag 20.63878 1.397038
# Datsun 710 22.80000 1.732790
# Hornet 4 Drive 21.40000 1.732790
# Hornet Sportabout 16.68415 1.272143
# Valiant 18.10000 1.732790
fit = feols(mpg ~ wt | disp, data = mtcars)
predict(fit, se.fit = TRUE) |> head()
# Error in predict.fixest(fit, se.fit = TRUE) :
# The standard-errors (SEs) of the prediction cannot be
# computed in the presence of fixed-effects. To obtain the
# SEs, you would need to include the FEs as standard factors
# in the model. |
Hey Alex, |
Just for some context, standard errors on fitted values requires you to know the entire covariance matrix of coefficients (including fixed-effects). You don't need them for inference on individual coefficients (from FWL theorem), but you do for the fitted values since they are function of the full vector of coefficients. Derivation: Fitted value at So that the VCOV of the fitted value is where the inner part is the full VCOV of the regression coefficients |
Thanks @kylebutts! We discussed this a bit and though it would likely be a good strategy to implement prediction intervals for models without fixed effects first (which we try to do in this PR), and then to tackle a more fundamental refactor of pyfixest's internals to support fixed effects as well. I think to get the full vcov matrix, we might have to update all vcov methods for Feols, which might turn out quite involved 😅 |
Pyfixest currently has
predict()
function but there is not currently functionality to provide estimates of the variance of the predictions. (Some) other tools do include this, such as:Statsmodels. The
get_prediction()
function returns aPredictionResults
object that includes an estimate of the variance of the prediction as the attributevar_pred_mean
.Stata. Includes stdp option for generating standard error of the prediction (see bottom of Pg. 1899 of this link).
This functionality would be useful for users who want to compute prediction intervals to estimate uncertainty in the prediction. If folks are interested in this feature, I am happy to work on adding this to Pyfixest. It would require calculating the full covariance matrix including all fixed effects. Then the variance of each prediction is simply
X * M * X'
where X is a row of covariate data to predict on and M is the full covariance matrix.The text was updated successfully, but these errors were encountered: