Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New helper function repair_variable_names #318

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jgabry
Copy link
Member

@jgabry jgabry commented Nov 17, 2023

closes #317

Summary

repair_variable_names converts names using periods (theta.1.1) to square brackets (`theta[1,1])

Copyright and Licensing

By submitting this pull request, the copyright holder is agreeing to
license the submitted work under the following licenses:

@jgabry jgabry requested a review from paul-buerkner November 17, 2023 18:13
@@ -173,6 +175,31 @@ set_variables <- function(x, variables, ...) {
return(x)
}

#' @rdname set_variables
#' @param old_variables (character) Variable names to repair. Should be variable
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I went with old_variables as the argument name, but happy to change this based on feedback.

Comment on lines +193 to +195
stop_no_call(
"All names in 'old_variables' must contain at least one '.' in the name."
)
Copy link
Member Author

@jgabry jgabry Nov 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity I added an error if all the names don't have a . in it. It would be strange to pass in a vector like c("theta.1", "theta[2]") so I don't think we need to write out the logic to handle mixed cases like that. But let me know if you disagree and I can add that logic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic seems very unhelpful if I have some variables which are scalars. I can no longer just pass the entire list to this function

I believe the set of transformations here is idempotent, so if someone passes theta[1], nothing will be changed, so why not allow it

@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (c6ab463) 95.60% compared to head (0ebe8e9) 95.61%.

❗ Current head 0ebe8e9 differs from pull request most recent head 00d32c6. Consider uploading reports for the commit 00d32c6 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #318   +/-   ##
=======================================
  Coverage   95.60%   95.61%           
=======================================
  Files          47       47           
  Lines        3686     3694    +8     
=======================================
+ Hits         3524     3532    +8     
  Misses        162      162           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jgabry
Copy link
Member Author

jgabry commented Nov 17, 2023

Maybe repair isn't the best name actually? @avehtari suggested convert in #317 (comment). I'm open to other suggestions too.

@paul-buerkner
Copy link
Collaborator

paul-buerkner commented Nov 17, 2023

I like "repair". Having the dots instead of [] is kind of a thing we don't really want. So we are repairing them to look reasonable again. But I also don't have strong opinions. So completely fine by me if we call it differently.

@mjskay
Copy link
Collaborator

mjskay commented Nov 18, 2023

Re: function name: is the intention to create a function that specifically converts from Stan names, or is the intention that this function might (in the future) support other non-posterior-friendly naming conventions? I think that should influence the naming; e.g. is this "convert_from_stan_names" (though perhaps less verbose) or the more generic "repair_variable_names" or "convert_variable_names"? Or is it "repair_dotted_indices" or "repair_dotted_index_names" or something like that?

@jgabry
Copy link
Member Author

jgabry commented Nov 22, 2023

Good point @mjskay. If this will only ever be used to convert Stan names we could include that in the function name. What do others think?

@avehtari
Copy link
Collaborator

I think it would be best to make the function specific for Stan naming conventions, and then add other functions if we add support for other packages. The name "convert_from_stan_names" would be then very descriptive, and I don't know how to make it shorter

Furthermore, in #317 @WardBrian wrote

Complex variables use .real and .imag as their CSV names in Stan. So something like a complex_vector[2] foo has names foo.1.real,foo.1.imag,foo.2.real,foo.2.imag

Tuples use : to separate "slots" in a tuple. So a tuple(real, int) a is printed as a:1,a:2. This stacks naturally with the . for arrays of tuples/tuples of arrays.

If we want to support these, too, I assume that we would like to have something like
"foo.1.real" -> "foo_real[1]"
and
"a:1" -> "a_1"

And the natural complex type support is a separate issue #319 and PR #321

@paul-buerkner
Copy link
Collaborator

I agree with Jonah and Aki.

@WardBrian
Copy link
Member

If you did want to support tuple names, this essentially just needs to split on : and loop over the resulting parts. See the code that does something similar in Stan: https://github.com/stan-dev/stan/blob/667d5b2d7c5ea1072011919a1acd76a9d3061d01/src/stan/io/stan_csv_reader.hpp#L16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a repair_variable_names function
6 participants