Variant analysis of the ‘Sequoia’ bug
I imagine we’ve all heard about the recent “Sequoia” bug discovered by the Qualys Research team, identified by CVE-2021-33909. It’s a fascinating bug caused by a size_t
to int
conversion. According to the analysis, seq_dentry
attempts to convert a size_t
to an int
by sending size_t size
to the dentry_path
function, which expects a signed integer. Assuming the architecture is 32 bits, size_t
’s value can be 0
to 4294967296
since it is unsigned, but int
can only hold from -2147483648
to 2147483648
because it is signed (this means that it can have negative values also). This results in an out-of-bounds access during the pointer arithmetic in dentry_path
that’s done with p = buf + buflen;
.
This specific bug is interesting because the circumstances it appears in are quite common in the Linux Kernel. To examine this further I decided to employ CodeQL. So, precisely what is CodeQL? CodeQL is a language by Semmle/Github/Microsoft to control a semantic analysis engine for static code examination. In CodeQL, code is treated as data. Security vulnerabilities, bugs, and other issues are represented as the results of queries that may be executed on code-retrieved databases. Queries that discover potential vulnerabilities display the outcome in the source file. As a result, it is a tremendously strong tool for variant analysis.
Alright, let’s look at the CodeQL query I wrote.
/**
* @author Jordy Zomer
* @name unsigned to signed used in pointer arithmetic
* @description finds unsigned to signed conversions used in pointer arithmetic, potentially causing an out-of-bound access
* @id cpp/sign-conversion-pointer-arithmetic
* @kind problem
* @problem.severity warning
* @tags reliability
* security
* external/cwe/cwe-787
*/
import cpp
import semmle.code.cpp.dataflow.DataFlow
import semmle.code.cpp.security.Overflow
from FunctionCall call, Function f, Parameter p, DataFlow::Node sink, PointerArithmeticOperation pao
where
f = call.getTarget() and
p = f.getAParameter() and
p.getUnspecifiedType().(IntegralType).isSigned() and
call.getArgument(p.getIndex()).getUnspecifiedType().(IntegralType).isUnsigned() and
// Here we check if the argument is an operand in an expression that does pointer arithmetics
pao.getAnOperand() = sink.asExpr() and
DataFlow::localFlow(DataFlow::parameterNode(p), sink)
select call, "This call: $@ passes an unsigned int to a function that requires a signed int: $@. And then used in pointer arithmetic: $@", call, call.toString(), f, f.toString(), sink, sink.toString()
So what we do here is obtain a FunctionCall
to a Function
with any parameter that requires a signed integer. Following that, we look for any function calls that provide an unsigned number to this function despite the fact that it expects a signed integer. After that, we will use the DataFlow
library to “taint track” any use of this argument in pointer arithmetic. Running this query on the Linux kernel database successfully identifies the Sequoia vulnerability as well as hundreds of additional instances that may be vulnerable.
Because there are so many results, I decided to refine the query slightly, so I added three filters to narrow down the criteria.
- Establish whether there is a size check where the
source
is more than something - Determine whether the
sink
is smaller than something - Identify whether the
source
is a constant.
I configured it such that it only displayed results if none of these filters matched. Below you will find the updated query:
/**
* @author Jordy Zomer
* @name unsigned to signed used in pointer arithmetic
* @description finds unsigned to signed conversions used in pointer arithmetic, potentially causing an out-of-bound access
* @id cpp/sign-conversion-pointer-arithmetic
* @kind problem
* @problem.severity warning
* @tags reliability
* security
* external/cwe/cwe-787
*/
import cpp
import semmle.code.cpp.dataflow.DataFlow
import semmle.code.cpp.security.Overflow
from FunctionCall call, Function f, Parameter p, DataFlow::Node sink, PointerArithmeticOperation pao
where
f = call.getTarget() and
p = f.getAParameter() and
p.getUnspecifiedType().(IntegralType).isSigned() and
call.getArgument(p.getIndex()).getUnspecifiedType().(IntegralType).isUnsigned() and
pao.getAnOperand() = sink.asExpr() and
// determine whether there is not a check where the `Sink` < "something"
not exists(Operation a | guardedLesser(a, sink.asExpr())) and
// establish whether there is not a size check where the `Source` > "something"
not exists(Operation b | guardedGreater(b, call.getArgument(p.getIndex()))) and
// identify whether the `Source` is not constant
not call.getArgument(p.getIndex()).isConstant() and
DataFlow::localFlow(DataFlow::parameterNode(p), sink)
select call, "This call: $@ passes an unsigned int to a function that requires a signed int: $@. And then used in pointer arithmetic: $@", call, call.toString(), f, f.toString(), sink, sink.toString()
Going through the results yielded the following issues and associated patches:
- https://lkml.org/lkml/2021/7/26/434
- https://lkml.org/lkml/2021/7/26/480
- https://lkml.org/lkml/2021/7/26/481
- https://lkml.org/lkml/2021/7/27/394
- https://lkml.org/lkml/2021/7/27/360
- https://lkml.org/lkml/2021/7/31/121
- https://lkml.org/lkml/2021/7/31/126
- https://lkml.org/lkml/2021/7/31/144
Due to the large number of results, we didn’t check to see if everything was truly vulnerable, we simply wanted it to be obviously secure. Furthermore, this is a work in progress, expect additional patches soon. If you wish to help fix these findings, please feel free to reach out to me at jordy [at] pwning.systems
and I’ll provide you with the results.
Because of the nature of this query, it may be a good idea for the Github Securitylab team to use it on LGTM, as this type of bug may occur in any C application. CodeQL’s potential as a static analysis tool is obvious. I sincerely hope that it will be used in other research and projects.
I’d like to thank Greg and the other developers that contributed for their fantastic collaboration and insights. It was a huge amount of fun!
Cheers!