Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent leading and trailing whitespace in string values, and whitespace-only string values #238

Open
wants to merge 1 commit into
base: 5.0.1
Choose a base branch
from

Conversation

david-waltermire
Copy link
Collaborator

Added patterns to prevent string values from start or ending with whitespace. These strings cannot just contain whitespace either,

Resolves #232

@david-waltermire david-waltermire linked an issue Jun 22, 2023 that may be closed by this pull request
@david-waltermire david-waltermire force-pushed the issue232-prevent-whitespace branch from e2641ea to ebc9ccf Compare June 22, 2023 19:58
@david-waltermire
Copy link
Collaborator Author

On the 6/22/2023 QWG call we had a passionate conversation about allowing or preventing leading and trailing whitespace, as well as, whitespace-only values.

The current JSON schemas allow lwhitespace-only values. Generally, there was strong consensus that whitespace-only values should not be allowed. This can be enforced with the pattern \\S , which looks for at least one non-whitespace character in the value. There was strong consensus on the call to implement this pattern in v5.0.1.

The current JSON schemas allow leading whitespace. There was fairly good consensus around not allowing leading whitespace going forward. This can be enforced with the pattern ^\\S , which looks for a value staring with at least one non-whitespace character. No decision was made regarding addressing this in v5.0.1.

The current JSON schemas allow trailing whitespace. Consensus around trailing whitespace was difficult. With valid arguments for and against. No decision was made regarding addressing this in v5.0.1.

Arguments for allowing trailing whitespace revolve around the impact to content producers when restricting it.

  • When hand creating content, some content creators insert a space by habit.
  • Some tools do not trim leading/trailing whitespace or add it automatically.
  • When copying tool output, some tools output trailing whitespace (i.e. newlines), which might get copied.

Arguments for disallowing trailing whitespace revolve around the impact to consumers having to deal with it.

  • A secondary parse is required to trim leading/trailing whitespace. This is bad for performance and use of the raw data.
  • There are many more consumers than producers. The burden should be on the fewest impacted, which are the producers.

A middle ground could be to have the CVE services strip leading and trailing whitespace when the record is published. Need to discuss this with the AWG.

Open Questions:

  • To what extent does leading/trailing whitespace appear in the CVE list? How small or large is the problem?
  • What is the behavior regarding whitespace in the CVE clients, i.e. Vulnagram and CVE record generation tools?
  • How will a restrictive approach to leading/trailing affect CVE record production?
  • When would be a good time to roll out any of the options discussed above?

@jayjacobs
Copy link
Collaborator

jayjacobs commented Jun 27, 2023

Re: To what extent does leading/trailing whitespace appear in the CVE list? How small or large is the problem?

"whitespace" here is space, tabs, newlines, form feeds, and any character in the Unicode Z Category (which includes a variety of space characters and other separators.)

containers.cna.descriptions.value:

  • 205259 (99.1%) have no leading or trailing whitespace
  • 1284 (0.6%) have trailing whitespace
  • 34 (0.02%) have leading whitespace
  • 527 (0.25% have both leading and trailing whitespace.

containers.cna.title (23,287 records):

  • 22744 (97.7%) have no leading or trailing whitespace
  • 374 (1.6%) have trailing whitespace
  • 162 (0.7%) have leading whitespace
  • 7 (0.03%) have both leading and trailing whitespace.

containers.cna.affected.vendor (128,812 records):

  • 128467 (99.73%) have no leading or trailing whitespace
  • 292 (0.23%) have trailing whitespace
  • 49 (0.04%) have leading whitespace
  • 4 have both leading and trailing whitespace

containers.cna.references.url (814403 records):

  • all 814403 have no leading or trailing whitespace.

containers.cna.references.name (494,228 records):

  • only 3 have trailing whitespace, the others have no leading or trailing whitespace

containers.cna.references.refsource (80 records):

  • all 80 have no leading or trailing whitespace.

containers.cna.references.tags (1,278,641 records):

  • all 1278641 have no leading or trailing whitespace.

containers.cna.problemTypes.cweId (34861 records):

  • all 34861 have no leading or trailing whitespace.

containers.cna.problemTypes.description (78180 records):

  • 77383 (98.98%) have no leading or trailing whitespace
  • 666 (0.85%) have trailing white space
  • 50 (0.06%) have leading white space
  • 78 (0.10%) have both leading and trailing white space
  • 3 (0.00%) had values with only whitespace

containers.cna.metrics.[*].vectorString (37534 records):

  • all 37534 have no leading or trailing whitespace.

@chandanbn
Copy link
Collaborator

Given this can be a disruptive change, we will target it for 5.1.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Prevent descriptions from containing only whitespace
3 participants