Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LPDID: Setting Windows with Monthly Data #750

Open
escherpf opened this issue Dec 16, 2024 · 13 comments
Open

LPDID: Setting Windows with Monthly Data #750

escherpf opened this issue Dec 16, 2024 · 13 comments
Labels
bug Something isn't working

Comments

@escherpf
Copy link

LPDID currently requires that date variable inputs (tname and gname) be in the format YYYYMM if one is using monthly data. This seems to cause some weirdness with setting the pre- and post-windows when one has monthly data spanning more than a year. In particular, with dates in the YYYYMM format (e.g., 202312 and 202401) the jump from December of one calendar year to January of the next is obviously not 1. If I set my pre- and post-window to values less than (minus and plus) 12, there's no problem: the command runs. But setting them to something like plus/minus 20 (again, in terms of months) and LPDID will not run and I get an error message like the one displayed below. On the other hand, if I set it to something large enough to cover the jump from one calendar year to the next (e.g., 96) the command runs and produces coefficients for all 96 leads and lags (I have about 28 years of data).

Is there something I'm missing here, or is this an known issue? Or just something that will persist until datetime formats are supported?

image

@s3alfisc
Copy link
Member

Hi, that's a bug and simply stems from the fact that the DID API isn't too well designed regarding the time variables. I'll have to take a closer look at this. Thanks for reporting!

@escherpf
Copy link
Author

escherpf commented Dec 16, 2024

Thank you for the quick response! One other little thing to note on LPDID, of which you are likely aware, is that running iplot after estimation alters the LPDID object--specifically the index--so that I cannot re-run iplot without re-estimating the model (since I'm not sure how to manipulate the LPDID object--unlike a data frame I'm not able to set a new index).

Basically, running iplot inserts a new index in the LPDID object:

image
image

Thank you again!

@s3alfisc s3alfisc added the bug Something isn't working label Dec 16, 2024
@s3alfisc
Copy link
Member

Thanks for reporting - looks like another bug! What is it that you are re-running? Is it running one of etable / summary / tidy / vcov post estimation that alters the model output? Could you provide a reproducible example with the df_het example data set?

@escherpf
Copy link
Author

It occurs when I run iplot (In my original message I had put "iplot" in angle brackets, and I just now noticed that doing so suppressed that bit of text...a key piece of information! Sorry about that!).

I will come up with a reproducible example using the df_het data set. Thank you!

@escherpf
Copy link
Author

url = "https://raw.githubusercontent.com/py-econometrics/pyfixest/master/pyfixest/did/data/df_het.csv"
df_het = pd.read_csv(url)
fit = pf.lpdid(
    df_het,
    yname="dep_var",
    idname="unit",
    tname="year",
    gname="g",
    vcov={"CRV1": "state"},
    pre_window=-20,
    post_window=20,
    att=False
)
fit.tidy().head()
fit.iplot(figsize= [1200, 400], coord_flip=False).show()
fit.tidy().head()
fit.iplot(figsize= [1200, 400], coord_flip=False).show()

The first tidy() call, produces the following output

image

The second tidy() call, after running iplot(), produces

image

The second iplot() call, generates the following error:

AttributeError: 'DataFrame' object has no attribute 'dtype'"

@s3alfisc
Copy link
Member

s3alfisc commented Dec 17, 2024

Thanks @escherpf , I can reproduce this. The second evaluation of

fit.tidy().head()
fit.iplot(figsize= [1200, 400], coord_flip=False).show()

produces

AttributeError: 'DataFrame' object has no attribute 'dtype'

As a matter of fact, it seems that calling iplot() twice seems to trigger the error:

fit = pf.lpdid(
    df_het,
    yname="dep_var",
    idname="unit",
    tname="year",
    gname="g",
    vcov={"CRV1": "state"},
    pre_window=-20,
    post_window=20,
    att=False
)
fit.iplot(figsize= [1200, 400], coord_flip=False).show()
fit.iplot(figsize= [1200, 400], coord_flip=False).show()
# AttributeError: 'DataFrame' object has no attribute 'dtype'

I'll take a look at this in the next days.

@escherpf
Copy link
Author

As a matter of fact, it seems that calling iplot() twice seems to trigger the error:

That's correct. Calling iplot changes the index of LPDID object, which is what appears to be causing the error on the second iplot call.

Thank you @s3alfisc !

@s3alfisc
Copy link
Member

The iplot error does not occur with did2s:

fit = pf.did2s(
    df_het,
    yname="dep_var",
    first_stage="~ 0 | unit + year",
    second_stage="~i(rel_year, ref=-1.0)",
    treatment="treat",
    cluster="state",
)
fit.iplot(figsize= [1200, 400], coord_flip=False).show()
fit.iplot(figsize= [1200, 400], coord_flip=False).show()

@s3alfisc
Copy link
Member

The issue with iplot() and reindexing should be solved after merging #754, was an easy fix =)

@s3alfisc
Copy link
Member

s3alfisc commented Dec 19, 2024

Currently doing some research on how the leading did libraries handle time / date data.

  • did allows only numeric input: link

@s3alfisc
Copy link
Member

Btw I need to add the name of the required input data type for tname, gname, etc to the docs:
image

@s3alfisc
Copy link
Member

Generally I think that the error message "needs to be of format YYYYMMDD" doesn't make sense, the only think that is really required is that users pass a numeric and that gname and tname are in the same scale to do this comparison

self._data["ATT"] = (self._data[self._tname] >= self._data[self._gname]) * (

And we could also verify that dates are on the same scale by checking that values of the gname variable are included in tname or 0 or infinity?

@s3alfisc
Copy link
Member

Made some changes in this PR: #756 . Could you check if these make sense for you @escherpf?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants