Why your code is a graphic



Graphical structures and how they are used in security code analysis

Graph structures are a natural representation of many types of data. They are a good way to represent relationships between objects, such as the relationship between users on social media sites and the distance between different locations.

Today, let’s explore graphs and their use in security code analysis. Next, we’ll see how to use charts to analyze code for vulnerabilities.

A brief introduction to graphics

You have probably already heard of graph structures and graph databases. In mathematics, graphs are a set of nodes and edges that connect these nodes. Like this graphic here:

The nodes represent the objects and the edges that connect the objects represent their relationship. For example, a graphical representation of relationships between users on a social media site can be represented like this:

Edges can be directed, and each edge can have a value, such as a label or a number associated with it to represent the nature of the edge. Using graph structures, we can represent complex relationships and interactions between different objects. For example, we can also represent which users are blocking another user on a website.

This is how graph databases work. By representing objects and their relationships in graphs, you can easily find answers to these questions by querying the graph and drawing the associated edges: How many users are blocking user C? And how many friends does User B have?

Use graphics to represent code

Relationships in code can also be represented in graphics. For example, Abstract Syntax Trees (ASTs) are graphics that represent the syntactic structure of program code.

Here is an abstract syntax tree for Euclid’s algorithm code taken from https://en.wikipedia.org/wiki/Abstract_syntax_tree.

This pseudocode represents the logical flow of Euclid’s algorithm.

while b ≠ 0
if a > b
a := a − b
b := b − a
return a

You can see that this graphic translates the program code into a graphic structure while preserving the individual elements of its syntax.

There are also other graphical representations of the code. Control Flow Charts (CFGs) are designed to represent the order in which code is executed and the conditions that must be met in order for that piece of code to be executed. In a CFG, each node represents the instructions of the program while the edges represent the flow of control in the program. With CFGs, you can walk the graph to determine which instructions can be called at different times during execution. Here are two examples of CFGs that represent (a) an if-then-else structure and (b) a while loop.

Retrieved from JMP EAX – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=34222288

Another useful graphical representation of code is the Program Dependency Graph (PDG). CEOs represent the data and control dependencies of a piece of code. In a PDG, data dependency edges specify the influence of a piece of data and control dependency edges specify the influence of predicates on other nodes. In this context, predicates are logical expressions that evaluate TRUE or FALSE used to direct the execution path. Take a look at this snippet of code:

void example(){
int x = 1;
int y = 2;
if (x > 1){
int z = x + y;

He may be represented by the following CEO. In this graph, D (variable) indicates a data dependency node on the variable and C (predicate) indicates a control dependency node on the predicate. You can see that the integer declaration of z depends on whether x> 1 is true and also depends on the values ​​of x and y.

Using PDGs, we can see how data flows through code and whether a particular piece of data can reach a specific instruction.

As you can see, different graphical representations of the code are optimized for different purposes. Many security analysis tools use one of these graphs or a combination of these graphs to map relationships in code. For example, for security purposes, CEOs can help us determine how data controlled by an attacker can travel to security sensitive functions and whether the data needs to go through validation or transformation in the process.

But the real power of the graphical representation of code appears when you combine the different representations. AST, CFG, and PDG all represent the same code from a different point of view, and some properties of the program are easier to express in one representation than the other. Combining these graphs representing different properties of a program allows you to build a master graph that merges the different perspectives and takes them all into account. This is what the Code Property Graph (CPG) is. Using the CPG, we can express complex behaviors of the same program in a single graph.

Illustration of a code property graph from the original article “Modeling and Discovering Vulnerabilities with Code Property Graphs”, where an abstract syntax tree, control flow graph, and program dependency graph are merged to get a representation to query the code [1].

Using graphics to automate security code analysis

Once we have a graph representing the behavior of the program, the patterns in the code can often be expressed as patterns in these graphics. This means that we can search the source code for patterns by querying the chart.

Vulnerabilities often have common patterns in the code. For example, signing an XXE vulnerability involves passing user-supplied XML to a parser without disabling DTDs or external entities. (Learn more about XXEs here.) Using a CPG, we can search for these exact patterns in code. This makes CPG an intuitive yet powerful vulnerability discovery engine, because each vulnerability is just a graph traversal.

Let’s take a look at some examples of CPG querying for models that indicate vulnerabilities. Joern (https://joern.io/) is an open source tool that uses the CPG to perform source code analysis. Let’s learn a few queries from Joern to see how CPG can simplify the logical process of finding vulnerabilities in code.

For example, detecting a command injection vulnerability in a Java application with Joern looks like this. In these Joern shell requests, we first ask the graph for any ids whose name contains “HttpServletRequest” to find the HTTP parameters controlled by the user in the program. We then find all the calls to the exec or eval methods, the functions in Java which execute the system commands. Finally, we ask if this user input can reach dangerous functions as an input argument. Together, these three queries will identify whether user input is passed to functions that perform arbitrary system commands.

> def source = cpg.identifier.typeFullName(".*HttpServletRequest.*")
> def sink = cpg.call("exec|eval").argument
> sink.reachableBy(source)

Next, let’s look for patterns that indicate a thoughtful XSS. The first query here looks for user-supplied URL parameters. The second query finds where the program prints the data in the HTTP responses. Finally, the query checks if any of the URL parameters are passed to the print instructions. Together, these three queries identify whether a URL input parameter is reflected in an HTTP response, the mark of the XSS vulnerabilities reflected.

> def source = cpg.call.name("getParameter")
> def sink = cpg.call(".*print.*").where(x => x.reachableBy(cpg.identifier.typeFullName(".*HttpServletRequest.*"))).argument
> sink.reachableBy(source)

It is the basis of the ShiftLeft code analysis platform. We have built a tool that translates your code into different languages ​​in CPG. Then, using the CPG, we scan your codebase to precisely find vulnerabilities accessible to attackers, minimizing false positives and reducing scan time. To learn more about the background, invention and technology of CPG, read Dr Fabian Yamaguchi’s post here: https://blog.shiftleft.io/semantic-code-property-graphs-and-security-profiles- b3b5933517c1.

Incorporating frequent security scans into today’s fast-paced software development processes is not impossible. Using CPGs as a means of analyzing code means better speed, accuracy, and completeness of testing. If you want to learn more about ShiftLeft’s ShiftLeft CORE security platform, visit us here.

Thanks for reading! What’s the hardest part of developing secure software for you? I would like to know. Do not hesitate to connect on Twitter @ vickieli7.

The references

[1] Modeling and discovery of vulnerabilities with the code property graph. Fabian Yamaguchi, Nico Golde, Daniel Arp and Konrad Rieck. Proc. of the 35th IEEE Security and Privacy Symposium (S&P), 2014

Why Your Code is a Graphic was originally posted on ShiftLeft Blog on Medium, where people continue the conversation by highlighting and responding to this story.

*** This is a Syndicated Security Bloggers Network blog from ShiftLeft Blog – Medium written by Vickie Li. Read the original post at: https://blog.shiftleft.io/why-your-code-is-a-graph-f7b980eab740 ? source = rss —- 86a4f941c7da — 4


Leave A Reply

Your email address will not be published.