oss-sec mailing list archives

CVE-2023-1281, CVE-2023-1829: Linux kernel: Vulnerabilities in the tcindex classifier

From: valis <sec@valis.email>
Date: Wed, 12 Apr 2023 01:03:04 +0200
Hi,

I have recently discovered two security issues in the tcindex
classifier (part of the network QoS subsystem of the Linux kernel):

CVE-2023-1281 Race condition leading to an use-after-free when
updating imperfect hash filters
CVE-2023-1829 Use-after-free when deleting a perfect hash filter

# Impact / mitigation:

Both of these vulnerabilities can be used for local privilege escalation.

The attacker needs CAP_NET_ADMIN to create/change classifiers, so disabling
unprivileged user namespaces should mitigate both issues.

# Versions affected

CVE-2023-1281 was introduced by:

commit 9b0d4446b56904b59ae3809913b0ac760fa941a6
Author: Jiri Pirko <jiri () mellanox com>
Date:   Fri Aug 4 14:29:15 2017 +0200

    net: sched: avoid atomic swap in tcf_exts_change

and the earliest affected release is 4.14

CVE-2023-1829 has been there since the beginning of the git history
(2.6.12-rc2) and became exploitable after net/sched netlink interface
was exposed to user namespaces in version 3.8

# Fixes

CVE-2023-1281 was fixed in:

commit ee059170b1f7e94e55fa6cadee544e176a6e59c2
Author: Pedro Tammela <pctammela () mojatatu com>
Date:   Thu Feb 9 11:37:39 2023 -0300

    net/sched: tcindex: update imperfect hash filters respecting rcu


In the case of CVE-2023-1829, kernel maintainers have decided to
remove the entire tcindex classifier due to its lack of known users
combined with a high number of issues present in the code.

This change was committed in:

commit 8c710f75256bb3cf05ac7b1672c82b92c43f3d28
Author: Jamal Hadi Salim <jhs () mojatatu com>
Date:   Tue Feb 14 08:49:14 2023 -0500

    net/sched: Retire tcindex classifier

and has been backported to all -stable kernels.

# Technical details

## CVE-2023-1281 Race condition leading to an use-after-free when
updating imperfect hash filters

Tcindex classifier handles extensions (like actions) in its .change
function (tcindex_change) differently than most classifiers.

.change handler is called by tc_new_tfilter() when creating a new filter,
but also when modifying an existing filter.

In that latter case the .change handler has to create new extension objects
based on user-provided data, modify or replace the existing filter and
finally free memory used by the old filter's extensions.

Let's first look at how most classifiers handle changing an existing
filter. Here's an example from basic_change():

...

[1] err = tcf_exts_init(&fnew->exts, net, TCA_BASIC_ACT, TCA_BASIC_POLICE);

...

[2]     err = basic_set_parms(net, tp, fnew, base, tb, tca[TCA_RATE], flags,
                              extack);
...
[3]     *arg = fnew;

        if (fold) {
                idr_replace(&head->handle_idr, fnew, fnew->handle);
                list_replace_rcu(&fold->link, &fnew->link);
                tcf_unbind_filter(tp, &fold->res);
                tcf_exts_get_net(&fold->exts);
[4]             tcf_queue_work(&fold->rwork, basic_delete_filter_work);


bool tcf_queue_work(struct rcu_work *rwork, work_func_t func)
{
        INIT_RCU_WORK(rwork, func);
[5]     return queue_rcu_work(tc_filter_wq, rwork);
}

static void __basic_delete_filter(struct basic_filter *f)
{
        tcf_exts_destroy(&f->exts);
        tcf_em_tree_destroy(&f->ematches);
        tcf_exts_put_net(&f->exts);
        free_percpu(f->pf);
        kfree(f);
}

static void basic_delete_filter_work(struct work_struct *work)
{
        struct basic_filter *f = container_of(to_rcu_work(work),
                                              struct basic_filter,
                                              rwork);
        rtnl_lock();
        __basic_delete_filter(f);
        rtnl_unlock();
}


First, a new filter is allocated in fnew and extension structures are
initialized at [1].
Then new extensions are configured in basic_set_parms based on user data
provided through netlink, and the existing filter is replaced with a newly
created filter at [3].

Finally, the old filter is queued to be deleted by tcf_queue_work().

What's important here is that tcf_queue_work() uses queue_rcu_work(), which
means that basic_delete_filter_work() will only be called after the RCU
grace period.
This prevents the old filter's extensions from being freed while still used
during basic_classify() (.classify runs in a RCU read-side critical
section).

Now let's look at a vulnerable tcindex code - it doesn't use
tcf_queue_work(), but instead destroys old extensions with a
tcf_exts_change() function:

tcindex_set_parms() is called from tcindex_change() and handles most of the
work:

struct tcindex_filter_result *r = *arg;

...
[1] err = tcf_exts_init(&e, net, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);

...
[2] err = tcf_exts_validate(net, tp, tb, est, &e, flags, extack);

..
[3] r = tcindex_lookup(cp, handle) ? : &new_filter_result

[4] tcf_exts_change(&r->exts, &e);

...

void tcf_exts_change(struct tcf_exts *dst, struct tcf_exts *src)
{
#ifdef CONFIG_NET_CLS_ACT
        struct tcf_exts old = *dst;

        *dst = *src;
        tcf_exts_destroy(&old);
#endif
}

void tcf_exts_destroy(struct tcf_exts *exts)
{
#ifdef CONFIG_NET_CLS_ACT
        if (exts->actions) {
                tcf_action_destroy(exts->actions, TCA_ACT_UNBIND);
                kfree(exts->actions);
        }
        exts->nr_actions = 0;
#endif
}

First, new extensions are allocated as variable e, and initialized at [1]
and [2] (tcf_exts_validate also allocates action objects if requested by
the user).

At [3] if a filter with the same hash params already exists, it's directly
assigned to variable r.

Finally, r->exts is passed to tcf_exts_change and kfree() is called on
r->exts->actions in tcf_exts_destroy()

At the same time this filter 'r' is still linked to its parent class and is
in active use by the qdisc.

There is no locking to prevent
tcindex_change()->tcf_exts_change()->tcf_exts_destroy() being called while
tcindex_classify() is running and executing actions.

int tcindex_classify(struct sk_buff *skb, const struct tcf_proto *tp,
struct tcf_result *res)
{
...

        return tcf_exts_exec(skb, &f->exts, res);
}

static inline int
tcf_exts_exec(struct sk_buff *skb, struct tcf_exts *exts,
              struct tcf_result *res)
{
#ifdef CONFIG_NET_CLS_ACT
        return tcf_action_exec(skb, exts->actions, exts->nr_actions, res);
#endif
        return TC_ACT_OK;
}

If tcf_exts_destroy() is called while another thread is inside
tcf_action_exec (which is a non-trivial function that calls multiple action
handlers in a loop), we get use-after-free on exts->actions array or/and
individual tc_action objects.


## CVE-2023-1829 Use-after-free when deleting a perfect hash filter

There are 2 different hashing methods implemented in tcindex:
"perfect" and "imperfect" hashes.

Perfect hashes are used for a smaller range of input keys and will be
chosen if the user provides small enough mask/hash parameters when
creating the classifier. By default imperfect hashes are used.

It turns out that perfect hash implementation has several issues,
especially when used with extensions (like actions).

The vulnerability lies is in the tcindex_delete() function and is
quite easy to understand:

static int tcindex_delete(struct tcf_proto *tp, void *arg, bool *last,
                          bool rtnl_held, struct netlink_ext_ack *extack)
{
        struct tcindex_data *p = rtnl_dereference(tp->root);
        struct tcindex_filter_result *r = arg;
        struct tcindex_filter __rcu **walk;
        struct tcindex_filter *f = NULL;

        pr_debug("tcindex_delete(tp %p,arg %p),p %p\n", tp, arg, p);
[1]     if (p->perfect) {
                if (!r->res.class)
                        return -ENOENT;
        } else {
                int i;

                for (i = 0; i < p->hash; i++) {
                        walk = p->h + i;
                        for (f = rtnl_dereference(*walk); f;
                             walk = &f->next, f = rtnl_dereference(*walk)) {
                                if (&f->result == r)
                                        goto found;
                        }
                }
                return -ENOENT;

found:
[2]                rcu_assign_pointer(*walk, rtnl_dereference(f->next));
        }
        tcf_unbind_filter(tp, &r->res);
        /* all classifiers are required to call tcf_exts_destroy() after rcu
         * grace period, since converted-to-rcu actions are relying on that
         * in cleanup() callback
         */
        if (f) {
                if (tcf_exts_get_net(&f->result.exts))
                        tcf_queue_work(&f->rwork, tcindex_destroy_fexts_work);
                else
                        __tcindex_destroy_fexts(f);
        } else {
                tcindex_data_get(p);

                if (tcf_exts_get_net(&r->exts))
                        tcf_queue_work(&r->rwork, tcindex_destroy_rexts_work);
                else
                        __tcindex_destroy_rexts(r);
        }
}

We can see that for imperfect hashes the filter associated with result
r is removed from the hash table (2), but for perfect hash (1)
basically nothing is done to delete/deactivate the filter.

Because 'f' is never set in case of imperfect hashes,
tcindex_destroy_rexts_work() will be called:

static void tcindex_destroy_rexts_work(struct work_struct *work)
{
        struct tcindex_filter_result *r;

        r = container_of(to_rcu_work(work),
                         struct tcindex_filter_result,
                         rwork);
        rtnl_lock();
        __tcindex_destroy_rexts(r);
        rtnl_unlock();
}

static void __tcindex_destroy_rexts(struct tcindex_filter_result *r)
{
        tcf_exts_destroy(&r->exts);
        tcf_exts_put_net(&r->exts);
        tcindex_data_put(r->p);
}

which leads to tcf_exts_destroy():

void tcf_exts_destroy(struct tcf_exts *exts)
{
#ifdef CONFIG_NET_CLS_ACT
        if (exts->actions) {
                tcf_action_destroy(exts->actions, TCA_ACT_UNBIND);
                kfree(exts->actions);
        }
        exts->nr_actions = 0;
#endif
}

For perfect hash filters that means a use-after-free, as destroyed
extensions are still used by the active filters.


Best regards,

valis
Current thread:

CVE-2023-1281, CVE-2023-1829: Linux kernel: Vulnerabilities in the tcindex classifier valis (Apr 11)