Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Go Case Conversion #98

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

sergiosalvatore
Copy link
Contributor

  • provide the ability to augment or replace the set of initialisms that are used
  • provide an EncodeCasingFunc for the Go style
  • introduce the concept of "atoms" so that Go will not attempt to break up names that we want to be kept together.

@sergiosalvatore sergiosalvatore force-pushed the impl-encode-go-camel-case branch from 102af63 to fdd3c0f Compare October 4, 2024 21:07
Copy link

@silvenac silvenac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just nits

tagformat/caseconversion/case_conversion.go Outdated Show resolved Hide resolved

// SetInitialisms replaces the set of initialisms used by the GoCaseConverter
// with the argument.
func (g *GoCaseConverter) SetInitialisms(initialisms []string) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to set or recommend any limits to max number of initialism or atoms? (stealing your "we should have limits on everything" philosophy 😉)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good thought -- I just put in some protections against short initialisms or atoms that would certainly cause problems. I think, given that Dials is a library, for it to impose limits on the number of initialisms or atoms it's probably a step too far.

// AddInitialisms adds the passed initialisms to the set of initialisms.
func (g *GoCaseConverter) AddInitialism(initialism ...string) {
g.initialisms = append(g.initialisms, initialism...)
sort.Strings(g.initialisms)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we don't set any limits on how long the lists of initialisms are, maybe we should insert at the right sort index, but probably this is a micro-optimization. also do we care about duplicates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression is that users should not be calling AddInitialism in a loop and instead would just call it once in main at startup to configure the extra things they want to add.

@sergiosalvatore sergiosalvatore force-pushed the impl-encode-go-camel-case branch from fdd3c0f to 925f319 Compare October 7, 2024 14:32
Copy link
Contributor

@dfinkel dfinkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple comments

continue
}
chunkLen := 2
for chunkLen <= len(word) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to iterate over sizes like this.

The doc for SearchStrings says:

The return value is the index to insert x if x is not present

... so the longest prefix present will be a nearby predecessor to the index that's actually returned.
I think it would be better to iterate backwards from the returned index until either the offset is either no longer a prefix or you hit the beginning of the list. (assuming that it isn't an exact match)

(If we're going to do exact matches after case-conversion, we might as well use a set, and get O(1) checks for the inner loop.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point -- I think the longest prefix (or an exact match) will necessarily be the entry in the position at one less than the return value of SearchStrings or 0 if there are no prefixes. I've updated the code to use this logic and it appears to work as expected with the fewest loop iterations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point I convinced myself that the longest prefix would be in the previous position, but while writing that comment, I had trouble convincing myself of it again (which is why I was thinking about iterating backwards)

e.g. if we're looking for abcd and there are entries, a and aa, then aa will be in the preceding position, but we need to grab a, which would be another position earlier.

If we enforce that entries aren't prefixes of one another, then that simplification works, though. (I think I originally thought about this scheme in the context of Fresnel's topic prefix ACLs, where it makes no sense for them to be prefixes of eachother (but, in hindsight we probably aren't validating that 🫤 ))

Comment on lines 238 to 256
} else if initialismOffset < len(g.initialisms) && g.initialisms[initialismOffset] == maybeInitialism {
if i == 0 {
b.WriteString(w)
} else {
b.WriteString(maybeInitialism)
}
} else {
if i == 0 {
b.WriteString(w)
} else {
b.WriteString(cases.Title(language.English, cases.NoLower).String(w))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like you can pull the i == 0 check here up into the outer if/else cascade and simplify both branches.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While there is some duplication in statements I feel like the original way I wrote it is easier to understand -- but I've updated it nonetheless to simplify the branches.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it might warrant a comment.
It is a little odd to see that i == 0 check at the beginning now.

@sergiosalvatore sergiosalvatore force-pushed the impl-encode-go-camel-case branch 2 times, most recently from b16213c to cca74ee Compare October 18, 2024 19:24
- provide the ability to augment or replace the set of initialisms that
  are used
- provide an EncodeCasingFunc for the Go style
- introduce the concept of "atoms" so that Go will not attempt to break
  up names that we want to be kept together.
@sergiosalvatore sergiosalvatore force-pushed the impl-encode-go-camel-case branch from cca74ee to c275b0a Compare December 2, 2024 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants