Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Javascript: How to define an own type and mark its attributes and types #12524

Open
ttttmr opened this issue Mar 15, 2023 · 12 comments
Open

Javascript: How to define an own type and mark its attributes and types #12524

ttttmr opened this issue Mar 15, 2023 · 12 comments
Labels
JS question Further information is requested

Comments

@ttttmr
Copy link

ttttmr commented Mar 15, 2023

I want to define a Foo type in codeql, which has its own attributes and types, which is convenient for subsequent type analysis

For example, I want to find the intAdd function, the wrong call to pass the parameter is not a number

// some function will return "Foo"
function getFoo() {
  return {
    id: 123,
    name: "foo",
    data: { xxx: "xxx" },
  };
}
function createFoo() {
  return {
    id: 123,
    name: "foo",
    data: { xxx: "xxx" },
  };
}
function getFoos() {
  return [
    {
      id: 123,
      name: "foo",
      data: { xxx: "xxx" },
    },
    {
      id: 123,
      name: "foo",
      data: { xxx: "xxx" },
    },
  ];
}

// use the Foo
let a = {};
let f1 = getFoo();
let fs = getFoos();
intAdd(123); // good
intAdd(f1.id); // good
intAdd(f1.name); // bad
intAdd(f1.data); // bad
fs.forEach((f) =>{
    intAdd(f.name); // bad
}); 
@ttttmr ttttmr added the question Further information is requested label Mar 15, 2023
@smowton
Copy link
Contributor

smowton commented Mar 15, 2023

How about something like:

import javascript

class Config extends DataFlow::Configuration {

  Config() {
    this = "fdksjfds"
  }

  override predicate isSource(DataFlow::Node n) {
    n.asExpr() instanceof StringLiteral or n.asExpr() instanceof ObjectExpr
  }

  override predicate isSink(DataFlow::Node n) {
    n.asExpr() = any(CallExpr ce | ce.getCalleeName() = "intAdd").getAnArgument()
  }

}

from Config c, DataFlow::PathNode source, DataFlow::PathNode sink
where c.hasFlowPath(source, sink)
select source.getNode(), sink.getNode()

This identifies the three bad cases you've shown in the example; is it useful to your more general problem?

@ttttmr
Copy link
Author

ttttmr commented Mar 16, 2023

The data of the Foo object in the example is static, but the actual situation is obtained from various interfaces or even network APIs
I tried asExpr() or analyze().getAType(), but the data type cannot be determined for these data, and there will be many false positives

@ttttmr
Copy link
Author

ttttmr commented Mar 16, 2023

Specifically, some json api will contain specific data structures, and I know their properties and types from the interface definition document

At the same time, in the js code, there is no explicit conversion of the returned json into a class object, just very primitive processing

I want to mark these specific fetch api calls so that codeql can recognize the data type inside

@smowton
Copy link
Contributor

smowton commented Mar 16, 2023

analyze().getAType() is indeed not strong enough for the example you give. Does using a dataflow configuration as suggested work for you?

You could identify these particular APIs as your sources instead of the ObjectExpr, StringLiteral or other known-bad-type entities, but you would still need to track data-flow in order to find out where objects of the type you're interested in may get to.

@ttttmr
Copy link
Author

ttttmr commented Mar 16, 2023

dataflow can only solve the data flow from the object to the target function, but cannot solve the problem, too many false positives

For example, there are user and Msg objects

User {
   id int
   name string
}
Msg {
     id int
     from User
}

Then there is some json api

/get User{}
/list []User{}
/msg Msg
const user = fetchapi("/get")
const users = fetchapi("/list")
const msg = fetchapi("/msg")
func(user.id)
func(user.name)
users.forEach(func(user) {
    func(user.id)
    func(user.name)
})
func(msg.id)
func(msg.from)
func(msg.from.id)
func(msg.from.name)

If you use data flow analysis, all func will be matched, but I only want to find that the calling parameter is User.id
In fact, User objects may be nested in more complex api results, and analysis will be very difficult
It should be said that it is data flow analysis with custom data structure

@smowton
Copy link
Contributor

smowton commented Mar 16, 2023

In your example using fetchapi, what do you want to flag? All the func calls whose arguments are not .id fields that you expect to be ints?

@ttttmr
Copy link
Author

ttttmr commented Mar 16, 2023

I just want to find the func whose parameter is User.id

const user = fetchapi("/get")
const users = fetchapi("/list")
const msg = fetchapi("/msg")
func(user.id) // match
func(user.name)
users.forEach(func(user) {
    func(user.id) // match
    func(user.name)
})
func(msg.id)
func(msg.from)
func(msg.from.id) // match
func(msg.from.name)

@smowton
Copy link
Contributor

smowton commented Mar 16, 2023

Deliberate non-match on msg.id?

@intrigus-lgtm
Copy link
Contributor

@smowton If I understood it currently, @ttttmr only wants to find access to id in "type" User and msg.id while being an id, is defined in "type" Msg.

I didn't invest much time thinking about the problem, but couldn't this potentially be solved using type-tracking?

@ttttmr
Copy link
Author

ttttmr commented Mar 17, 2023

@intrigus-lgtm yes you understand right

type-tracking can indeed track the accurate User object, but it does not perfectly solve my problem

When an interface supports parameters int/int array, the corresponding return value will also become User/User array

For example, the interface is like this

user = apicall(uid)
users = apicall([uid1,uid2])

To keep track of User objects, the ql could be something like

// `this` is the apicall (CallNode)
getUser(){
    // There are also some other methods of checking types that can be used here
    if this.getArgument(0).analyze().getAType() == TTNumber()
        result = this
    else
        // Get every user in users
        result = getAnEnumeratedArrayElement(this)
}

But this type check is useless, because in many cases codeql does not know the data type of the parameter

For example, pass in the previous User.Id again

users = apicall([uid1,uid2])
users.forEach(e => {
    newuser = apicall(e.Id) // codeql cannot check the type of e.Id
})

But i know e.Id is number, not array, how can i tell codeql this

@asgerf
Copy link
Contributor

asgerf commented Mar 17, 2023

I'd suggest something like this

class TypeMismatchConfiguration extends DataFlow::Configuration {
  TypeMismatchConfiguration() { this = "TypeMismatchConfiguration" }

  override predicate isSource(DataFlow::Node node) { node instanceof MyApiCall }

  override predicate isSink(DataFlow::Node node) { node = any(MyApiCall c).getAnArgument() }

  override predicate isAdditionalFlowStep(DataFlow::Node pred, DataFlow::Node succ) {
    // Step through arrays and promises
    TaintTracking::arrayStep(pred, succ)
    or
    TaintTracking::promiseStep(pred, succ)
    or
    // Step through properties that are not expected to contain numbers
    exists(DataFlow::PropRead read |
      pred = read.getBase() and
      succ = read and
      not read.getPropertyName() = "id"
    )
  }
}

This uses a data-flow configuration (not taint-tracking) but then opts into a few taint steps that are handy for this purpose. We simply step through arrays regardless of whether the source returned an array or not.

It is best to try and design the data-flow problem in a way that it needs as little type information as possible in order to do the right thing.

@ttttmr
Copy link
Author

ttttmr commented Mar 22, 2023

I think I need to enhance on top of type-tracking, not only the property of a specific type of object, but also the type of the property

@sidshank sidshank added the JS label Oct 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JS question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants