You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Of this data, earn-height only uses a subset: N, earn, height. This is fine for Stan, which will automatically discard data that doesn't match variables defined in the data block.
Unfortunately, this is frustrating when trying to port PosteriorDB models to other PPLs. Many PPLs — notably Turing, but I think also PyMC, NumPyro, Gen, and so on — use some sort of overloaded function definition to define a probabilistic program, e.g.:
In this setup, the data arguments need to exactly match the columns of the dataframe, and so the dataframe must be filtered beforehand to extract the relevant columns. To make this easier, it would be helpful to have a column in the dataframe specifying data-used.
Proposal:
modify the posterior
.json
files to specify what data from the dataframe is actually used as an input to the model.Rationale:
Some models only use a subset of their data. For example,
earn-height
uses theearnings
data:Of this data,
earn-height
only uses a subset:N, earn, height
. This is fine for Stan, which will automatically discard data that doesn't match variables defined in thedata
block.Unfortunately, this is frustrating when trying to port PosteriorDB models to other PPLs. Many PPLs — notably Turing, but I think also PyMC, NumPyro, Gen, and so on — use some sort of overloaded function definition to define a probabilistic program, e.g.:
In this setup, the data arguments need to exactly match the columns of the dataframe, and so the dataframe must be filtered beforehand to extract the relevant columns. To make this easier, it would be helpful to have a column in the dataframe specifying
data-used
.Example addition:
would become:
This change would only need to occur for models where the provided dataframe is a superset of the actual dataframe.
The text was updated successfully, but these errors were encountered: