Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java: Fix incorrect CSV models; add validation predicate #7034

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Marcono1234
Copy link
Contributor

@Marcono1234 Marcono1234 commented Nov 2, 2021

Adds a proof of concept query predicate for validating the data of CSV models, detecting incorrect or improvable CSV rows.
For example:

  • non-existent type or member (found the incorrect java.net;URI;false;toASCIIString row)
  • types with subtypes not marked to check subtypes
  • final or static members marked to check for subtypes

I have marked this pull request as draft for now to get some feedback; maybe there is already an effort to add such validation (or it already exists elsewhere?).

Usage (VS Code extension):

  1. Load the CodeQL database of the project whose CSV models should be checked, e.g. Guava or OpenJDK
  2. Select CsvDataValidation::checkRowData and run "CodeQL: Quick Evaluation"

In the following unchecked checkboxes represent not yet implemented checks.

Reported errors:

  • duplicate rows (with different signatures, or different subtypes value)
  • unparseable row (only basic check; was most likely already covered)
  • subtypes=true for Annotated
  • non-empty name or signature for Annotated
  • empty name for non-Annotated
  • row referring to non-existent type or member
  • Annotated for non-annotation type
  • incorrect taint and flow inputs and outputs e.g. index out of range
  • inputs and outputs mismatch with specified signature, e.g. specifying signature () but using Argument[0]
  • using Argument[-1] as input or output when signature only matches static methods
  • using ReturnValue as output for constructor (should use Argument[-1] instead?) or method without return value

Reported warnings:

  • subtypes=true for non-subclassable type
  • subtypes=true for non-overridable member
  • subtypes=false for interface
  • subtypes=false for method which has an override
  • subtypes=false for type which has publicly visible subtype
  • row for non-publicly visible type
  • row for non-publicly visible member
  • row for overriding method when row of same kind for overridden method with subtypes=true already exists
  • Missing modelled callables
    • row exists for callable with specific signature, but no row exists for other publicly visible callable with same name declared in same type or subtype (have to match input and output columns reasonably; for Argument[>=0] make sure parameter types are non-primitive or wrapper (since they are usually not modelled))
    • if taint models for type variable of generic type exist (e.g. for Collection<E>), then should consider all publicly visible 'factory methods' which have a type variable, use that for a parameter (directly or in parameterized type, array, ...) and return (subtype of) generic type mentioned above (e.g. Collections.singletonList(E) -> List<E>)
    • if taint models for return type of method of generic type exists and return type (directly or in parameterized type, array, ...) includes type variable, then should model other publicly visible method declared in same type or subtype which also have type variable as result (for subtypes this has to consider the resolved type variable, e.g. MyList<X> implements List<X> should track methods using X)

Reported notes:

  • subtypes=true for type without public subtypes
  • subtypes=false for abstract type which can be overridden
  • row for overriding method instead of modelling overridden method

@github-actions github-actions bot added the Java label Nov 2, 2021
@Marcono1234 Marcono1234 force-pushed the marcono1234/csv-model-validation branch 2 times, most recently from de04b0c to 7530ea3 Compare November 2, 2021 16:41
@Marcono1234 Marcono1234 force-pushed the marcono1234/csv-model-validation branch from 7530ea3 to 141701d Compare November 3, 2021 15:31
@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2021

⚠️ The head of this PR and the base branch were compared for differences in the framework coverage reports. The generated reports are available in the artifacts of this workflow run. The differences will be picked up by the nightly job after the PR gets merged.

Click to show differences in coverage

java

Generated file changes for java

  • Changes to framework-coverage-java.rst:
-    `Apache HttpComponents <https://hc.apache.org/>`_,"``org.apache.hc.core5.*``, ``org.apache.http``",5,136,28,,,3,,,,25
+    `Apache HttpComponents <https://hc.apache.org/>`_,"``org.apache.hc.core5.*``, ``org.apache.http``",5,135,28,,,3,,,,25
-    `Spring <https://spring.io/>`_,``org.springframework.*``,29,469,91,,,,19,14,,29
+    `Spring <https://spring.io/>`_,``org.springframework.*``,29,469,90,,,,18,14,,29
-    Totals,,175,5341,408,13,6,10,107,33,1,66
+    Totals,,175,5340,407,13,6,10,106,33,1,66
  • Changes to framework-coverage-java.csv:
- org.apache.http,27,3,70,,,,,,,,,,,25,,,,,,,2,,3,62,8
+ org.apache.http,27,3,69,,,,,,,,,,,25,,,,,,,2,,3,61,8
- org.springframework.jdbc.core,10,,,,,,,,,,,,,,,10,,,,,,,,,
+ org.springframework.jdbc.core,9,,,,,,,,,,,,,,,9,,,,,,,,,

@smowton
Copy link
Contributor

smowton commented Nov 9, 2021

Some of the no-such-member stuff might duplicate https://github.com/github/codeql/blob/main/java/ql/src/utils/flowtestcasegenerator/GenerateFlowTestCase.qll#L20 -- worth looking at whether those can be rationalised? Otherwise looks useful, thanks!

@smowton
Copy link
Contributor

smowton commented Nov 9, 2021

The change in the interface of some predicates will also affect the test-case generator -- would appreciate it if you could try running https://github.com/github/codeql/blob/main/java/ql/src/utils/flowtestcasegenerator/GenerateFlowTestCase.py to fix this if necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants