Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support diagnostics in lockfile parsers #186

Closed
wants to merge 1 commit into from
Closed

feat: support diagnostics in lockfile parsers #186

wants to merge 1 commit into from

Conversation

G-Rath
Copy link
Collaborator

@G-Rath G-Rath commented Feb 4, 2023

This is something I probably should have done from the start, and is based on Diagnostics from Terraform providers.

My main motivations are:

  1. there's a lot of output when running the tests (when a test fails)
  2. writing to stderr like this is a bit of a smell, which came about because I wanted to avoid requiring an arbitrary object (Reporter) for those using the lockfile as a library
  3. function signatures cannot be changed without a new major, but parsers might need to surface other auxiliary info in future, and with Add io.Reader variants to lockfile package #176 every parser is going to have a pair function, so the work involved in this change can grow substantially

I'm opening this as a draft because frankly there's quite a lot of refactoring to do, turning all the existing tests into table-based tests (which should also help with landing #176 too), and some discussion points that we can have in the meantime (including "do other people think this is a good idea?").

For now I've mostly just focused on updating the individual parsers and converting their tests, but I'm aware some thinking will need to be done about what happens to Parse and co - there's a few different approaches which include "leave it for now", but I'm leaving that to last.

(also, I've not yet decided between diag, diags, or diagnostics - it should probably be the middle one tbh)

@G-Rath
Copy link
Collaborator Author

G-Rath commented Feb 4, 2023

@picatz interested in if you have any thoughts here, since you're the main person I know that is using the lockfile package

@G-Rath
Copy link
Collaborator Author

G-Rath commented Feb 5, 2023

I ended up writing a script to easily convert the parser tests to be table-based - it should be useful even if this doesn't end up getting landed.

@picatz
Copy link

picatz commented Feb 6, 2023

👋 Hello @G-Rath, I think reporting diagnostics is a good idea conceptually, but I haven't dug through the implementation details yet! I'll find some time this week to do so and report back any other thoughts I have.

@picatz
Copy link

picatz commented Feb 15, 2023

Hey @G-Rath, just to follow up: I wasn't able to find time last week. But, this is something I'm definitely still interested in!

Copy link

@picatz picatz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spent some time this afternoon looking over the implementation. Left some thoughts I had based on my initial impressions.

Overall, I think this definitely looks like the right direction!

return parseFileAndPrintDiag(pathToLockfile, ParseApkInstalledWithDiagnostics)
}

func ParseApkInstalledWithDiagnostics(pathToLockfile string) ([]PackageDetails, Diagnostics, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, thinking about a separate Diagnostics return value: in HCL/Terraform-land, the diagnostics are returned like an error, not a separate thing. Diagnostics carry a severity that determines if they're either a warning or an error.

The convention becomes to use the diags.HasErrors() method to check if the returned diagnostics have any errors that need to be dealt with; otherwise you know the rest are just warnings that might be useful to log, or show the user, but not halt further execution.

While this is a minor detail, I think the pattern is useful in these situations.

pkgDetails, diags := ParseApkInstalledWithDiagnostics(pathToLockfile)
if diags.HasErrors() {
  // add info and return an `error` if appropriate
  return fmt.Errorf("failed to parse APK file %q in %q: %w", pathToLockfile, containerImage, diags.Err())
}

Either way, I want diagnostics! But, I figured I'd at least bring this up if you want to discuss it further 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm that's interesting - I think it makes sense to consider because there's no real downside so long as diags.Err() exposes the underlying error? (actually, the main one I can think of is that IDEs like Goland would not tell you that you need to check if there was an error, which I'm pretty sure it does based on if the return is an error).

Though I'm not sure if there's an advantage to returning an object vs a slice?

Comment on lines +12 to +14
func (diag *Diagnostics) Warn(warn string) {
diag.Warnings = append(diag.Warnings, warn)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming there are ever cases where the warnings are duplicated, the consumer will need to consider de-duplication to present them to users. Not sure if it's worth it, but maybe a map[string]struct{} or something like that could provide that out the box? What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel at this point it's best to leave that to the consumer to handle as there's no defined spec around these warnings and hopefully we shouldn't need to expand this further, but also if we do it should be easy to add a WarningsMap or something.

os.Stderr,
"warning: malformed DPKG status file. Found no package name in record. File: %s\n",
diag.Warn(fmt.Sprintf(
"warning: malformed DPKG status file. Found no package name in record. File: %s",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I don't think we need to the warning: prefix.

The type information alludes to that being lockfile.Diagnostics.Warnings at the moment. This seems like a display thing that an application would add to an unstructured output like STDERR.

As a library consumer, I would know it is a diagnostic warning, and I can choose the display/styling in a CLI or whatever. I would much rather want to see a warning string like:

found malformed DPKG status file %q

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree, I just didn't change it because this way the output with this change is 1:1 with what it is currently - I know this sort of CLI output shouldn't be considered breaking to change, but I think it's nice to start with being as close to 1:1 as possible as a way of proving its stability.

@G-Rath
Copy link
Collaborator Author

G-Rath commented Feb 25, 2023

btw I hadn't realised this but go 1.20 added support for multiple errors, which I'm wondering might be a suitable alternative to this - while it's a little more cumbersome because it'd mean having to more deeply check the error value, it would avoid having to change the signature making it arguably less breaking given that there's a couple of ways diagnostics could be implemented (as showcased in this comment) 🤔

@picatz
Copy link

picatz commented Feb 27, 2023

While it's a little more cumbersome because it'd mean having to more deeply check the error value, it would avoid having to change the signature.

🤔 FWIW, I feel like diagnostics and errors are related, but subtlety different. Diagnostics can contain zero or more errors. The idea of a separate type to describe them feels more appropriate to me, at the cost of being slightly different from before.

@spencerschrock
Copy link
Contributor

Just chiming in that being able to suppress the os.Stderr writes would be beneficial for Scorecard.

We currently call DoScan here when checking for vulns. When we scan public repos, we sometimes encounter malformed yarn.lock files that write hundred of lines to os.Stderr due to #102.

I don't think this PR is enough to change the DoScan behavior (and that seems outside the scope of this PR), but I'm excited for the work being put in here.

@G-Rath
Copy link
Collaborator Author

G-Rath commented Feb 27, 2023

@spencerschrock thanks for that info - as I said on #102 and #142 that specific instance of malformed yarn.locks is very fixable and while I really would like to know how those locks are being generated I'm happy to land the fix if you're hitting them often.

It would probably be good if you could open a new issue with some further details and we can go from there.

@G-Rath
Copy link
Collaborator Author

G-Rath commented Mar 2, 2023

tbh I wish I thought of this sooner, but probably the easiest short-term solution could actually be to have a global var like this:

var HandleParserWarning = func (warning string) {
	_, _ = fmt.Fprintf(os.Stderr, warning)
}

// parse-maven-lock.go
HandleParserWarning(fmt.Sprintf(
	"Failed to resolve version of %s: property \"%s\" could not be found",
	mld.GroupID+":"+mld.ArtifactID,
	results[1],
))

(it could also have a "maybeFilePath" argument too 🤔 )

Then tools like scorecard could assign a function to do whatever they want, and it should be pretty easy to move away from later (it should be easy to continue to call the function along with whatever replacement solution comes along, and arguably it could just not be called anymore so long the function itself still exists).

@oliverchang @another-rex given it sounds like scorecard in particular are having a bit of a hard time with how uncontrollable these outputs are, does that sound like an acceptable middleground for now to make their lives a bit easier?

@oliverchang
Copy link
Collaborator

tbh I wish I thought of this sooner, but probably the easiest short-term solution could actually be to have a global var like this:

var HandleParserWarning = func (warning string) {
	_, _ = fmt.Fprintf(os.Stderr, warning)
}

// parse-maven-lock.go
HandleParserWarning(fmt.Sprintf(
	"Failed to resolve version of %s: property \"%s\" could not be found",
	mld.GroupID+":"+mld.ArtifactID,
	results[1],
))

(it could also have a "maybeFilePath" argument too 🤔 )

Then tools like scorecard could assign a function to do whatever they want, and it should be pretty easy to move away from later (it should be easy to continue to call the function along with whatever replacement solution comes along, and arguably it could just not be called anymore so long the function itself still exists).

@oliverchang @another-rex given it sounds like scorecard in particular are having a bit of a hard time with how uncontrollable these outputs are, does that sound like an acceptable middleground for now to make their lives a bit easier?

@spencerschrock how much of a problem is this for Scorecard right now? Is this something that needs a more immediate fix, or something that can wait a bit for a larger, more substantial refactoring?

@G-Rath Something like HandleParserWarning does seem like a nice easy way to handle a lot of these problems, including the ones you brought up as the original motivation for this PR. Another nice way to generalize this might be to just expose a WarningLogWriter io.Writer instead.

It's also a nice way of reducing the variations of each parser that we want to support. That said, I'm a little wary of adding new interfaces if we're going to refactor all of this again later. If we go with this I think we would have to commit to this approach for achieving the goals in the PR for the forseeable future.

@G-Rath
Copy link
Collaborator Author

G-Rath commented Mar 3, 2023

@oliverchang fwiw this is also impacting my ability to generally work on the scanner, because these warnings also get printed out by the test suite so I have to sift through a fair amount of output every time I make a change (which'll only get worse as more stuff is added) 🙃

While I agree with generally trying to avoid a lot of big refactors, from what we've been talking about offline and what you've said over on issues like #176 it seems like it there's a fair bit to unpack before you'll feel confident doing such a refactor.

@oliverchang
Copy link
Collaborator

oliverchang commented Mar 3, 2023

@oliverchang fwiw this is also impacting my ability to generally work on the scanner, because these warnings also get printed out by the test suite so I have to sift through a fair amount of output every time I make a change (which'll only get worse as more stuff is added) 🙃

While I agree with generally trying to avoid a lot of big refactors, from what we've been talking about offline and what you've said over on issues like #176 it seems like it there's a fair bit to unpack before you'll feel confident doing such a refactor.

Thinking more, either way the approach you suggested sounds good to me, with the possible exception of exposing an io.Writer to write logs to instead of a function. I don't think we need to wait for a more substantial refactor to address this.

@another-rex do you have any thoughts?

@another-rex
Copy link
Collaborator

Doing this quick compromise sounds good to me, and both having io.Writer or setting a full function seem fine, though having full function makes it slightly easier to do additional error handling if required.

@spencerschrock
Copy link
Contributor

@spencerschrock how much of a problem is this for Scorecard right now? Is this something that needs a more immediate fix, or something that can wait a bit for a larger, more substantial refactoring?

Minimal. The vast majority of the output was taken care of by #250 (thanks @G-Rath).

@oliverchang
Copy link
Collaborator

Doing this quick compromise sounds good to me, and both having io.Writer or setting a full function seem fine, though having full function makes it slightly easier to do additional error handling if required.

I think an io.Writer expresses the intention the best here -- as an abstraction dealing with a log output sink. We can still have custom io.Writer implementations if more involved error handling needs to be done.

@G-Rath
Copy link
Collaborator Author

G-Rath commented Mar 21, 2023

hmm I agree with Rex at this point, because I'd expect using a writer means lockfile then has to decide how to format the messages whereas with a function you could pass something like an error or an interface with extra detail, and then the consuming library can decide what to do with things.

But I think I'll probably have a better sense once I see an implementation of either solution.

@G-Rath
Copy link
Collaborator Author

G-Rath commented Jul 11, 2023

I'm going to close this for now since it requires rebasing + further discussion anyway, and the changes will still be retained by GitHub in case we want to dig up the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants