feat: support diagnostics in `lockfile` parsers #186

G-Rath · 2023-02-04T20:30:29Z

This is something I probably should have done from the start, and is based on Diagnostics from Terraform providers.

My main motivations are:

there's a lot of output when running the tests (when a test fails)
writing to stderr like this is a bit of a smell, which came about because I wanted to avoid requiring an arbitrary object (Reporter) for those using the lockfile as a library
function signatures cannot be changed without a new major, but parsers might need to surface other auxiliary info in future, and with Add io.Reader variants to lockfile package #176 every parser is going to have a pair function, so the work involved in this change can grow substantially

I'm opening this as a draft because frankly there's quite a lot of refactoring to do, turning all the existing tests into table-based tests (which should also help with landing #176 too), and some discussion points that we can have in the meantime (including "do other people think this is a good idea?").

For now I've mostly just focused on updating the individual parsers and converting their tests, but I'm aware some thinking will need to be done about what happens to Parse and co - there's a few different approaches which include "leave it for now", but I'm leaving that to last.

(also, I've not yet decided between diag, diags, or diagnostics - it should probably be the middle one tbh)

G-Rath · 2023-02-04T20:31:42Z

@picatz interested in if you have any thoughts here, since you're the main person I know that is using the lockfile package

G-Rath · 2023-02-05T00:48:53Z

I ended up writing a script to easily convert the parser tests to be table-based - it should be useful even if this doesn't end up getting landed.

picatz · 2023-02-06T17:33:33Z

👋 Hello @G-Rath, I think reporting diagnostics is a good idea conceptually, but I haven't dug through the implementation details yet! I'll find some time this week to do so and report back any other thoughts I have.

picatz · 2023-02-15T17:49:23Z

Hey @G-Rath, just to follow up: I wasn't able to find time last week. But, this is something I'm definitely still interested in!

picatz

Spent some time this afternoon looking over the implementation. Left some thoughts I had based on my initial impressions.

Overall, I think this definitely looks like the right direction!

picatz · 2023-02-22T20:23:26Z

pkg/lockfile/apk-installed.go

+	return parseFileAndPrintDiag(pathToLockfile, ParseApkInstalledWithDiagnostics)
+}
+
+func ParseApkInstalledWithDiagnostics(pathToLockfile string) ([]PackageDetails, Diagnostics, error) {


FWIW, thinking about a separate Diagnostics return value: in HCL/Terraform-land, the diagnostics are returned like an error, not a separate thing. Diagnostics carry a severity that determines if they're either a warning or an error.

The convention becomes to use the diags.HasErrors() method to check if the returned diagnostics have any errors that need to be dealt with; otherwise you know the rest are just warnings that might be useful to log, or show the user, but not halt further execution.

While this is a minor detail, I think the pattern is useful in these situations.

pkgDetails, diags := ParseApkInstalledWithDiagnostics(pathToLockfile) if diags.HasErrors() { // add info and return an `error` if appropriate return fmt.Errorf("failed to parse APK file %q in %q: %w", pathToLockfile, containerImage, diags.Err()) }

Either way, I want diagnostics! But, I figured I'd at least bring this up if you want to discuss it further 😄

hmm that's interesting - I think it makes sense to consider because there's no real downside so long as diags.Err() exposes the underlying error? (actually, the main one I can think of is that IDEs like Goland would not tell you that you need to check if there was an error, which I'm pretty sure it does based on if the return is an error).

Though I'm not sure if there's an advantage to returning an object vs a slice?

picatz · 2023-02-22T20:29:09Z

pkg/lockfile/diagnostics.go

+func (diag *Diagnostics) Warn(warn string) {
+	diag.Warnings = append(diag.Warnings, warn)
+}


Assuming there are ever cases where the warnings are duplicated, the consumer will need to consider de-duplication to present them to users. Not sure if it's worth it, but maybe a map[string]struct{} or something like that could provide that out the box? What do you think?

I feel at this point it's best to leave that to the consumer to handle as there's no defined spec around these warnings and hopefully we shouldn't need to expand this further, but also if we do it should be easy to add a WarningsMap or something.

picatz · 2023-02-22T20:35:22Z

pkg/lockfile/dpkg-status.go

-				os.Stderr,
-				"warning: malformed DPKG status file. Found no package name in record. File: %s\n",
+			diag.Warn(fmt.Sprintf(
+				"warning: malformed DPKG status file. Found no package name in record. File: %s",


🤔 I don't think we need to the warning: prefix.

The type information alludes to that being lockfile.Diagnostics.Warnings at the moment. This seems like a display thing that an application would add to an unstructured output like STDERR.

As a library consumer, I would know it is a diagnostic warning, and I can choose the display/styling in a CLI or whatever. I would much rather want to see a warning string like:

found malformed DPKG status file %q

Totally agree, I just didn't change it because this way the output with this change is 1:1 with what it is currently - I know this sort of CLI output shouldn't be considered breaking to change, but I think it's nice to start with being as close to 1:1 as possible as a way of proving its stability.

G-Rath · 2023-02-25T20:26:30Z

btw I hadn't realised this but go 1.20 added support for multiple errors, which I'm wondering might be a suitable alternative to this - while it's a little more cumbersome because it'd mean having to more deeply check the error value, it would avoid having to change the signature making it arguably less breaking given that there's a couple of ways diagnostics could be implemented (as showcased in this comment) 🤔

picatz · 2023-02-27T19:06:30Z

While it's a little more cumbersome because it'd mean having to more deeply check the error value, it would avoid having to change the signature.

🤔 FWIW, I feel like diagnostics and errors are related, but subtlety different. Diagnostics can contain zero or more errors. The idea of a separate type to describe them feels more appropriate to me, at the cost of being slightly different from before.

spencerschrock · 2023-02-27T23:06:45Z

Just chiming in that being able to suppress the os.Stderr writes would be beneficial for Scorecard.

We currently call DoScan here when checking for vulns. When we scan public repos, we sometimes encounter malformed yarn.lock files that write hundred of lines to os.Stderr due to #102.

I don't think this PR is enough to change the DoScan behavior (and that seems outside the scope of this PR), but I'm excited for the work being put in here.

G-Rath · 2023-02-27T23:18:45Z

@spencerschrock thanks for that info - as I said on #102 and #142 that specific instance of malformed yarn.locks is very fixable and while I really would like to know how those locks are being generated I'm happy to land the fix if you're hitting them often.

It would probably be good if you could open a new issue with some further details and we can go from there.

G-Rath · 2023-03-02T22:47:54Z

tbh I wish I thought of this sooner, but probably the easiest short-term solution could actually be to have a global var like this:

var HandleParserWarning = func (warning string) {
	_, _ = fmt.Fprintf(os.Stderr, warning)
}

// parse-maven-lock.go
HandleParserWarning(fmt.Sprintf(
	"Failed to resolve version of %s: property \"%s\" could not be found",
	mld.GroupID+":"+mld.ArtifactID,
	results[1],
))

(it could also have a "maybeFilePath" argument too 🤔 )

Then tools like scorecard could assign a function to do whatever they want, and it should be pretty easy to move away from later (it should be easy to continue to call the function along with whatever replacement solution comes along, and arguably it could just not be called anymore so long the function itself still exists).

@oliverchang @another-rex given it sounds like scorecard in particular are having a bit of a hard time with how uncontrollable these outputs are, does that sound like an acceptable middleground for now to make their lives a bit easier?

oliverchang · 2023-03-03T04:02:16Z

tbh I wish I thought of this sooner, but probably the easiest short-term solution could actually be to have a global var like this:
var HandleParserWarning = func (warning string) {
	_, _ = fmt.Fprintf(os.Stderr, warning)
}

// parse-maven-lock.go
HandleParserWarning(fmt.Sprintf(
	"Failed to resolve version of %s: property \"%s\" could not be found",
	mld.GroupID+":"+mld.ArtifactID,
	results[1],
))
(it could also have a "maybeFilePath" argument too 🤔 )

Then tools like scorecard could assign a function to do whatever they want, and it should be pretty easy to move away from later (it should be easy to continue to call the function along with whatever replacement solution comes along, and arguably it could just not be called anymore so long the function itself still exists).

@oliverchang @another-rex given it sounds like scorecard in particular are having a bit of a hard time with how uncontrollable these outputs are, does that sound like an acceptable middleground for now to make their lives a bit easier?

@spencerschrock how much of a problem is this for Scorecard right now? Is this something that needs a more immediate fix, or something that can wait a bit for a larger, more substantial refactoring?

@G-Rath Something like HandleParserWarning does seem like a nice easy way to handle a lot of these problems, including the ones you brought up as the original motivation for this PR. Another nice way to generalize this might be to just expose a WarningLogWriter io.Writer instead.

It's also a nice way of reducing the variations of each parser that we want to support. That said, I'm a little wary of adding new interfaces if we're going to refactor all of this again later. If we go with this I think we would have to commit to this approach for achieving the goals in the PR for the forseeable future.

G-Rath · 2023-03-03T04:20:01Z

@oliverchang fwiw this is also impacting my ability to generally work on the scanner, because these warnings also get printed out by the test suite so I have to sift through a fair amount of output every time I make a change (which'll only get worse as more stuff is added) 🙃

While I agree with generally trying to avoid a lot of big refactors, from what we've been talking about offline and what you've said over on issues like #176 it seems like it there's a fair bit to unpack before you'll feel confident doing such a refactor.

oliverchang · 2023-03-03T04:22:49Z

@oliverchang fwiw this is also impacting my ability to generally work on the scanner, because these warnings also get printed out by the test suite so I have to sift through a fair amount of output every time I make a change (which'll only get worse as more stuff is added) 🙃

While I agree with generally trying to avoid a lot of big refactors, from what we've been talking about offline and what you've said over on issues like #176 it seems like it there's a fair bit to unpack before you'll feel confident doing such a refactor.

Thinking more, either way the approach you suggested sounds good to me, with the possible exception of exposing an io.Writer to write logs to instead of a function. I don't think we need to wait for a more substantial refactor to address this.

@another-rex do you have any thoughts?

another-rex · 2023-03-03T04:46:25Z

Doing this quick compromise sounds good to me, and both having io.Writer or setting a full function seem fine, though having full function makes it slightly easier to do additional error handling if required.

spencerschrock · 2023-03-03T05:42:59Z

@spencerschrock how much of a problem is this for Scorecard right now? Is this something that needs a more immediate fix, or something that can wait a bit for a larger, more substantial refactoring?

Minimal. The vast majority of the output was taken care of by #250 (thanks @G-Rath).

oliverchang · 2023-03-20T05:17:44Z

Doing this quick compromise sounds good to me, and both having io.Writer or setting a full function seem fine, though having full function makes it slightly easier to do additional error handling if required.

I think an io.Writer expresses the intention the best here -- as an abstraction dealing with a log output sink. We can still have custom io.Writer implementations if more involved error handling needs to be done.

G-Rath · 2023-03-21T21:28:08Z

hmm I agree with Rex at this point, because I'd expect using a writer means lockfile then has to decide how to format the messages whereas with a function you could pass something like an error or an interface with extra detail, and then the consuming library can decide what to do with things.

But I think I'll probably have a better sense once I see an implementation of either solution.

G-Rath · 2023-07-11T03:44:00Z

I'm going to close this for now since it requires rebasing + further discussion anyway, and the changes will still be retained by GitHub in case we want to dig up the implementation.

G-Rath mentioned this pull request Feb 5, 2023

feat: support passing io.Reader to lockfile parsers #189

Closed

picatz reviewed Feb 22, 2023

View reviewed changes

G-Rath mentioned this pull request Mar 2, 2023

feat: improved error message when pom dependency version not found #253

Merged

feat: support diagnostics in lockfile parsers

faa9d00

G-Rath closed this Jul 11, 2023

spencerschrock mentioned this pull request Aug 18, 2023

BUG: Vulnerabilities check "Failed to resolve version" ossf/scorecard#3074

Open

G-Rath deleted the use-context branch January 22, 2024 19:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support diagnostics in `lockfile` parsers #186

feat: support diagnostics in `lockfile` parsers #186

G-Rath commented Feb 4, 2023 •

edited

Loading

G-Rath commented Feb 4, 2023

G-Rath commented Feb 5, 2023

picatz commented Feb 6, 2023

picatz commented Feb 15, 2023

picatz left a comment

picatz Feb 22, 2023

G-Rath Feb 22, 2023

picatz Feb 22, 2023

G-Rath Feb 22, 2023

picatz Feb 22, 2023

G-Rath Feb 22, 2023

G-Rath commented Feb 25, 2023 •

edited

Loading

picatz commented Feb 27, 2023

spencerschrock commented Feb 27, 2023

G-Rath commented Feb 27, 2023

G-Rath commented Mar 2, 2023

oliverchang commented Mar 3, 2023

G-Rath commented Mar 3, 2023

oliverchang commented Mar 3, 2023 •

edited

Loading

another-rex commented Mar 3, 2023

spencerschrock commented Mar 3, 2023

oliverchang commented Mar 20, 2023

G-Rath commented Mar 21, 2023

G-Rath commented Jul 11, 2023

feat: support diagnostics in lockfile parsers #186

feat: support diagnostics in lockfile parsers #186

Conversation

G-Rath commented Feb 4, 2023 • edited Loading

G-Rath commented Feb 4, 2023

G-Rath commented Feb 5, 2023

picatz commented Feb 6, 2023

picatz commented Feb 15, 2023

picatz left a comment

Choose a reason for hiding this comment

picatz Feb 22, 2023

Choose a reason for hiding this comment

G-Rath Feb 22, 2023

Choose a reason for hiding this comment

picatz Feb 22, 2023

Choose a reason for hiding this comment

G-Rath Feb 22, 2023

Choose a reason for hiding this comment

picatz Feb 22, 2023

Choose a reason for hiding this comment

G-Rath Feb 22, 2023

Choose a reason for hiding this comment

G-Rath commented Feb 25, 2023 • edited Loading

picatz commented Feb 27, 2023

spencerschrock commented Feb 27, 2023

G-Rath commented Feb 27, 2023

G-Rath commented Mar 2, 2023

oliverchang commented Mar 3, 2023

G-Rath commented Mar 3, 2023

oliverchang commented Mar 3, 2023 • edited Loading

another-rex commented Mar 3, 2023

spencerschrock commented Mar 3, 2023

oliverchang commented Mar 20, 2023

G-Rath commented Mar 21, 2023

G-Rath commented Jul 11, 2023

feat: support diagnostics in `lockfile` parsers #186

feat: support diagnostics in `lockfile` parsers #186

G-Rath commented Feb 4, 2023 •

edited

Loading

G-Rath commented Feb 25, 2023 •

edited

Loading

oliverchang commented Mar 3, 2023 •

edited

Loading