Changelog in Linux kernel 7.1.3

9p: avoid putting oldfid in p9_client_walk() error path [+ + +]

Author: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
Date:   Thu May 28 13:39:16 2026 +0800

    9p: avoid putting oldfid in p9_client_walk() error path
    
    commit 1a3860d46e3eb47dbd60339783cdad7904486b9f upstream.
    
    When p9_client_walk() is called with clone set to false, fid aliases
    oldfid. If the walk subsequently fails after the request has been sent,
    the error path jumps to clunk_fid, which currently calls p9_fid_put(fid)
    unconditionally.
    
    This drops a reference to oldfid even though ownership of oldfid remains
    with the caller. If this is the last reference, oldfid can be clunked and
    destroyed while the caller still expects it to be valid. A later use or
    put of oldfid can then trigger a use-after-free or refcount underflow.
    
    Fix this by only putting fid in the clunk_fid error path when it does not
    alias oldfid, matching the existing guard in the error path below.
    
    This can be triggered when a multi-component walk is split into multiple
    p9_client_walk() calls and a later non-cloning walk fails. A reproducer
    and refcount warning logs are available on request.
    
    Fixes: b48dbb998d70 ("9p fid refcount: add p9_fid_get/put wrappers")
    Cc: stable@vger.kernel.org
    Reported-by: Yuxiang Yang <yangyx22@mails.tsinghua.edu.cn>
    Reported-by: Ao Wang <wangao@seu.edu.cn>
    Reported-by: Xuewei Feng <fengxw06@126.com>
    Reported-by: Qi Li <qli01@tsinghua.edu.cn>
    Reported-by: Ke Xu <xuke@tsinghua.edu.cn>
    Assisted-by: GLM 5.1
    Signed-off-by: Yizhou Zhao <zhaoyz24@mails.tsinghua.edu.cn>
    Message-ID: <20260528053918.53550-1-zhaoyz24@mails.tsinghua.edu.cn>
    Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

apparmor: advertise the tcp fast open fix is applied [+ + +]

Author: John Johansen <john.johansen@canonical.com>
Date:   Mon Jun 22 16:34:13 2026 -0700

    apparmor: advertise the tcp fast open fix is applied
    
    commit 2f6701a5ce6257ae7a64ddc6d89d0a08d2a034f8 upstream.
    
    The fix for tcp-fast-open ensures that the connect permission is being
    mediated correctly but it didn't add an artifact to the feature set to
    advertise the fix is available. Add an artifact so that the test suite
    can identify if the fix has not been properly applied or a new
    unexpected regression has occurred.
    
    Fixes: 4d587cd8a7215 ("apparmor: mediate the implicit connect of TCP fast open sendmsg")
    Signed-off-by: John Johansen <john.johansen@canonical.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

apparmor: fix use-after-free in rawdata dedup loop [+ + +]

Author: Ruslan Valiyev <linuxoid@gmail.com>
Date:   Tue May 26 00:04:46 2026 +0200

    apparmor: fix use-after-free in rawdata dedup loop
    
    commit 6f060496d03e4dc560a40f73770bd08335cb7a27 upstream.
    
    aa_replace_profiles() walks ns->rawdata_list to dedup the incoming
    policy blob against entries already attached to existing profiles.
    Per the kernel-doc on struct aa_loaddata, list membership does not
    hold a reference: profiles hold pcount, and when the last pcount
    drops, do_ploaddata_rmfs() is queued on a workqueue that takes
    ns->lock and removes the entry. Between dropping the last pcount
    and the workqueue running, an entry remains on the list with
    pcount == 0.
    
    aa_get_profile_loaddata() is an unconditional kref_get() on
    pcount, so when the dedup loop hits such an entry, refcount
    hardening reports
    
      refcount_t: addition on 0; use-after-free.
    
    inside aa_replace_profiles(), and the poisoned counter then
    trips "saturated" and "underflow" warnings on the subsequent
    uses of the same loaddata.
    
    Before commit a0b7091c4de4 ("apparmor: fix race on rawdata
    dereference") the dedup path used a get_unless_zero-style helper
    on a single counter, so the existing "if (tmp)" guard was
    meaningful. The split-refcount refactor introduced
    aa_get_profile_loaddata(), which has plain kref_get() semantics,
    and the guard quietly became a no-op.
    
    Introduce aa_get_profile_loaddata_not0(), matching the existing
    _not0 convention used by aa_get_profile_not0(), and use it for
    the rawdata_list dedup lookup so dying entries are skipped.
    
    Reproduced on x86_64 with v7.1-rc5 in QEMU+KVM running Ubuntu
    24.04 + stress-ng 0.17.06:
    
      stress-ng --apparmor 1 --klog-check --timeout 60s
    
    Without this patch the three refcount_t warnings fire within a
    few seconds. With it the same 60 s run is clean. Coverage is a
    smoke-test only; a longer soak with CONFIG_KASAN, CONFIG_KCSAN
    and CONFIG_PROVE_LOCKING would be welcome from anyone with the
    cycles.
    
    Fixes: a0b7091c4de4 ("apparmor: fix race on rawdata dereference")
    Reported-by: Colin Ian King <colin.i.king@gmail.com>
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221513
    Cc: stable@vger.kernel.org
    Signed-off-by: Ruslan Valiyev <linuxoid@gmail.com>
    Signed-off-by: John Johansen <john.johansen@canonical.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

apparmor: mediate the implicit connect of TCP fast open sendmsg [+ + +]

Author: Bryam Vargas <hexlabsecurity@proton.me>
Date:   Mon Jun 22 15:57:38 2026 -0500

    apparmor: mediate the implicit connect of TCP fast open sendmsg
    
    commit 4d587cd8a72155089a627130bbd4716ec0856e21 upstream.
    
    sendmsg()/sendto() with MSG_FASTOPEN is a combination of connect(2) and
    write(2): it opens the connection in the SYN. apparmor_socket_sendmsg()
    only checks AA_MAY_SEND, so a profile that grants send but denies connect
    lets a confined task open an outbound TCP/MPTCP connection that connect(2)
    would have refused, bypassing connect mediation.
    
    Mediate the implicit connect when MSG_FASTOPEN is set and a destination
    is supplied. Add it to apparmor_socket_sendmsg() (not the shared
    aa_sock_msg_perm() helper, which recvmsg also uses) and call aa_sk_perm()
    directly, mirroring the selinux and tomoyo fixes. sk_is_tcp() does not
    cover MPTCP fast open, so the SOCK_STREAM/IPPROTO_MPTCP arm is explicit.
    
    Fixes: cf60af03ca4e ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)")
    Cc: stable@vger.kernel.org
    Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
    Signed-off-by: John Johansen <john.johansen@canonical.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

batman-adv: bla: annotate lasttime access with READ/WRITE_ONCE [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:24 2026 +0200

    batman-adv: bla: annotate lasttime access with READ/WRITE_ONCE
    
    commit 98b0fb191c878a64cbaebfe231d96d57576acf8c upstream.
    
    The lasttime field for claim, backbone_gw, and loopdetect tracks the
    jiffies value of the most recent activity and is used to detect timeouts.
    These accesses are not consistently protected by a lock, so
    READ_ONCE/WRITE_ONCE must be used to prevent data races caused by compiler
    optimizations.
    
    Cc: stable@kernel.org
    Fixes: 23721387c409 ("batman-adv: add basic bridge loop avoidance code")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: dat: prevent false sharing between VLANs [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:39 2026 +0200

    batman-adv: dat: prevent false sharing between VLANs
    
    commit 20d7658b74169f86d4ac01b9185b3eadddf71f28 upstream.
    
    The local hash of DAT entries is supposed to be VLAN (VID) aware. But
    the adding to the hash and the search in the hash were not checking the VID
    information of the hash entries. The entries would therefore only be
    correctly separated when batadv_hash_dat() didn't select the same buckets
    for different VIDs.
    
    Cc: stable@kernel.org
    Fixes: be1db4f6615b ("batman-adv: make the Distributed ARP Table vlan aware")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: ensure bcast is writable before modifying TTL [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:28 2026 +0200

    batman-adv: ensure bcast is writable before modifying TTL
    
    commit 4cd6d3a4b96a8576f1fed8f9f9f17c2dc2978e0c upstream.
    
    Before batman-adv is allowed to write to an skb, it either has to have its
    own copy of the skb or used skb_cow() to ensure that the data part is not
    shared.
    
    The old implementation used a shared queue and created copies before
    attempting to write to it. But with the new implementation, the broadcast
    packet is already modified when it gets received. Potentially writing to
    shared buffers in this process.
    
    Adding a skb_cow() right before this operation avoids this and can at the
    same time prepare it for the modifications required to rebroadcast the
    packet.
    
    Cc: stable@kernel.org
    Fixes: 3f69339068f9 ("batman-adv: bcast: queue per interface, if needed")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: fix (m|b)cast csum after decrementing TTL [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:29 2026 +0200

    batman-adv: fix (m|b)cast csum after decrementing TTL
    
    commit e728bbdf32660c8f32b8f5e8d09427a2c131ad60 upstream.
    
    The broadcast and multicast packets can be received at the same time by the
    local system and forwarded to other nodes. Both are simply decrementing the
    TTL at the beginning of the receive path - independent of chosen paths
    (receive/forward). But such a modification of the data conflicts with the
    hw csum. This is not a problem when the packet is directly forwarded but
    can cause errors in the local receive path.
    
    Such a problem can then trigger a "hw csum failure". The receiver path must
    therefore ensure that the csum is fixed for each modification of the
    payload before batadv_interface_rx() is reached.
    
    Since all batman-adv packet types with a ttl have it as u8 at offset 2, a
    helper can be used for all of them. But it is only used at the moment for
    batadv_bcast_packet and batadv_mcast_packet because they are the only ones
    which deliver the packet locally but unconditionally modify the TTL.
    
    Cc: stable@kernel.org
    Fixes: 3f69339068f9 ("batman-adv: bcast: queue per interface, if needed")
    Fixes: 07afe1ba288c ("batman-adv: mcast: implement multicast packet reception and forwarding")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: frag: avoid underflow of TTL [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:31 2026 +0200

    batman-adv: frag: avoid underflow of TTL
    
    commit 493d9d2528e1a09b090e4b37f0f553def7bd5ce9 upstream.
    
    Packets with a TTL are using it to limit the amount of time this packet can
    be forwarded. But for batadv_frag_packet, the TTL was always only reduced
    but it was never evaluated. It could even underflow without any effect.
    
    Check the TTL in batadv_frag_skb_fwd() before attempting to prepare it for
    forwarding. This keeps it in sync with the not fragmented unicast packet.
    
    Cc: stable@kernel.org
    Fixes: 610bfc6bc99b ("batman-adv: Receive fragmented packets and merge")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: frag: ensure fragment is writable before modifying TTL [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:30 2026 +0200

    batman-adv: frag: ensure fragment is writable before modifying TTL
    
    commit b7293c6e8c15b2db77809b25cf8389e35331b27a upstream.
    
    Before batman-adv is allowed to write to an skb, it either has to have its
    own copy of the skb or use skb_cow() to ensure that the data part is not
    shared. But batadv_frag_skb_fwd() modifies the TTL even when it is shared.
    
    Adding a skb_cow() right before this operation avoids this and can at the
    same time prepare it for the modifications required to forward the
    fragment.
    
    Cc: stable@kernel.org
    Fixes: 610bfc6bc99b ("batman-adv: Receive fragmented packets and merge")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: gw: don't deselect gateway with active hardif [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:27 2026 +0200

    batman-adv: gw: don't deselect gateway with active hardif
    
    commit df97a7107b16375a10a36d7a63e9b4291a8ac680 upstream.
    
    The batadv_hardif_cnt() was previously checking if there is an
    batadv_hard_iface->mesh_iface which is has the same mesh_iface. And since
    batadv_hardif_disable_interface() was resetting the
    batadv_hard_iface->mesh_iface after this check, it had to verify whether
    *1* interface was still part of the mesh_iface before it started the
    gateway deselection.
    
    But after batadv_hardif_cnt() is now checking the lower interfaces of
    mesh_iface and batadv_hardif_disable_interface() already removed the
    interface via netdev_upper_dev_unlink() earlier in this function, the check
    must now make sure that *0* interfaces can be found by batadv_hardif_cnt()
    before selected gateway must be deselected. Otherwise the deselection would
    already happen one batadv_hard_iface too early.
    
    Because a 0 hardif count from batadv_hardif_cnt() is equal to an empty
    list, it is possible to replace the counting with a simple list_empty().
    
    Cc: stable@kernel.org
    Fixes: 7dc284702bcd ("batman-adv: store hard_iface as iflink private data")
    Reviewed-by: Nora Schiffer <neocturne@universe-factory.net>
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: prevent ELP transmission interval underflow [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:25 2026 +0200

    batman-adv: prevent ELP transmission interval underflow
    
    commit 5e50d4b8ae3ea622122d3c6a38d7f6fe68dfddca upstream.
    
    batadv_v_elp_start_timer() enqeues a delayed work. The time when it starts
    is randomly chosen between (elp_interval - BATADV_JITTER) and
    (elp_interval + BATADV_JITTER). The configured elp_interval must therefore
    be larger or equal to BATADV_JITTER to avoid that it causes an underflow of
    the unsigned integer. If this would happen, then a "fast" ELP interval
    would turn into a "day long" delay.
    
    At the same time, it must not be larger than the maximum value the variable
    can store.
    
    Cc: stable@kernel.org
    Fixes: a10800829040 ("batman-adv: Add elp_interval hardif genl configuration")
    [ Context ]
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tp_meter: add only finished tp_vars to lists [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:23 2026 +0200

    batman-adv: tp_meter: add only finished tp_vars to lists
    
    commit 15ccbf685222274f5add1387af58c2a41a95f81e upstream.
    
    When the receiver variables (aka "session") are initialized, then they are
    added to the list of sessions before the timer is set up. A RCU protected
    reader could therefore find the entry and run mod_setup before
    batadv_tp_init_recv() finished the timer initialization.
    
    The same is true for batadv_tp_start(), which must first initialize the
    finish_work and the test_length to avoid a similar problem.
    
    Cc: stable@kernel.org
    Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tp_meter: annotate last_recv_time access with READ/WRITE_ONCE [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:34 2026 +0200

    batman-adv: tp_meter: annotate last_recv_time access with READ/WRITE_ONCE
    
    commit d67c728f07fca2ee6ffdc6dd4421cf2e8691f4d1 upstream.
    
    The last_recv_time field for batadv_tp_receiver tracks the jiffies value of
    the most recent activity and is used to detect timeouts. These accesses are
    not consistently protected by a lock, so READ_ONCE/WRITE_ONCE must be used
    to prevent data races caused by compiler optimizations.
    
    Cc: stable@kernel.org
    Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tp_meter: avoid divide-by-zero for dec_cwnd [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:20 2026 +0200

    batman-adv: tp_meter: avoid divide-by-zero for dec_cwnd
    
    commit 33ccd52f3cc9ed46ce395199f89aa3234dc83314 upstream.
    
    The cwnd is always MSS <= cwnd <= 0x20000000. But the calculation in
    batadv_tp_update_cwnd() assumes unsigned 32 bit arithmetics.
    
        ((mss * 8) ** 2) / (cwnd * 8)
    
    In case cwnd is actually 0x20000000, it will be shifted by 3 bit to the
    left end up at 0x100000000 or U32_MAX + 1. It will therefore wrap around
    and be 0 - resulting in:
    
        ((mss * 8) ** 2) / 0
    
    This is of course invalid and cannot be calculated. The calculation should
    must be simplified to avoid this overflow:
    
       (mss ** 2) * 8 / cwnd
    
    It will keep the precision enhancement from the scaling (by 8) but avoid
    the overflow in the divisor.
    
    In theory, there could still be an overflow in the dividend. It is at the
    moment fixed to BATADV_TP_PLEN in batadv_tp_recv_ack() - so it is not an
    imminent problem. But allowing it to use the whole u32 bit range, would
    mean that it can still use up to 67 bits. To keep this calculation safe for
    32 bit arithmetic, mss must never use more than floor((32 - 3) / 2) bits -
    or in other words: must never be larger than 16383.
    
    Cc: stable@kernel.org
    Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tp_meter: avoid window underflow [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:19 2026 +0200

    batman-adv: tp_meter: avoid window underflow
    
    commit 765947b81fb54b6ebb0bc1cfe55c0fa399e002b8 upstream.
    
    In batadv_tp_avail(), win_left is calculated with 32-bit unsigned
    arithmetic: win_left = win_limit - tp_vars->last_sent;
    
    During Fast Recovery, cwnd is inflated and last_sent advances rapidly. When
    Fast Recovery ends, cwnd drops abruptly back to ss_threshold. If the newly
    shrunk win_limit is less than last_sent, the unsigned subtraction will
    underflow, wrapping to a massive positive value. Instead of returning that
    the window is full (unavailable), it returns that the sender can continue
    sending.
    
    To handle this situation, it must be checked whether the windows end
    sequence number (win_limit) has to be compared with the last sent sequence
    number. If it would be before the last sent sequence number, then more acks
    are needed before the transmission can be started again.
    
    Cc: stable@kernel.org
    Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tp_meter: fix fast recovery precondition [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:21 2026 +0200

    batman-adv: tp_meter: fix fast recovery precondition
    
    commit 2b0d08f08ed3b2174f05c43089ec65f3543a025b upstream.
    
    The fast recovery precondition checks if the recover (initialized to
    BATADV_TP_FIRST_SEQ) is bigger than the received ack. But since recover is
    only updated when this check is successful, it will never enter the fast
    recovery mode.
    
    According to RFC6582 Section 3.2 step 2, the check should actually be
    different:
    
    > When the third duplicate ACK is received, the TCP sender first
    > checks the value of recover to see if the Cumulative
    > Acknowledgment field covers more than recover
    
    The precondition must therefore check if recover is smaller than the
    received ack - basically swapping the operands of the current check.
    
    Cc: stable@kernel.org
    Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tp_meter: handle overlapping packets [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:36 2026 +0200

    batman-adv: tp_meter: handle overlapping packets
    
    commit cbde75c38b21f022891525078622587ad557b7c1 upstream.
    
    If the size of the packets would change during the transmission, it could
    happen that some retries of packets are overlapping. In this case, precise
    comparisons of sequence numbers by the receiver would be wrong. It is then
    necessary to check if the start sequence number to the end sequence number
    ("seqno + length") would contain a new range.
    
    If this is the case then this is enough to accept this packet. In all other
    cases, the packet still has to be dropped (and not acked).
    
    Cc: stable@kernel.org
    Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
    [ Switch to pre-splitted tp_vars structure names ]
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tp_meter: handle seqno wrap-around for fast recovery detection [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:22 2026 +0200

    batman-adv: tp_meter: handle seqno wrap-around for fast recovery detection
    
    commit f54c85ed42a1b27a516cf2a4728f5a612b799e07 upstream.
    
    The recover variable and the last_sent sequence number are initialized on
    purpose as a really high value which will wrap-around after the first 2000
    bytes. The fast recovery precondition must therefore not use simple integer
    comparisons but use helpers which are aware of the sequence number
    wrap-arounds.
    
    Cc: stable@kernel.org
    Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tp_meter: initialize dec_cwnd explicitly [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:18 2026 +0200

    batman-adv: tp_meter: initialize dec_cwnd explicitly
    
    commit febfb1b86224489535312296ecfa3d4bf467f339 upstream.
    
    When batadv_tp_update_cwnd() is called, dec_cwnd is increased. But dec_cwnd
    is only initialixed (to 0) when a duplicate Ack was received or when cwnd
    is below the ss_threshold.
    
    Just initialize the cwnd during the initialization to avoid any potential
    access of uninitialized data.
    
    Cc: stable@kernel.org
    Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tp_meter: initialize dup_acks explicitly [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:17 2026 +0200

    batman-adv: tp_meter: initialize dup_acks explicitly
    
    commit b2b68b32a715e0328662801576974aa37b942b00 upstream.
    
    When an ack with a sequence number equal to the last_acked is received, the
    dup_acks counter is increased to decide whether fast retransmit should be
    performed. Only when the sequence numbers are not equal, the dup_acks is
    set to the initial value (0).
    
    But if the initial packet would have the sequence number
    BATADV_TP_FIRST_SEQ, dup_acks would not be initialized and atomic_inc would
    operate on an undefined starting value. It is therefore required to have it
    explicitly initialized during the start of the sender session.
    
    Cc: stable@kernel.org
    Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tp_meter: initialize last_recv_time during init [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:26 2026 +0200

    batman-adv: tp_meter: initialize last_recv_time during init
    
    commit 811cb00fa8cdc3f0a7f6eefc000a6888367c8c8f upstream.
    
    The last_recv_time is the most important indicator for a receiver session
    to figure out whether a session timed out or not. But this information was
    only initialized after the session was added to the tp_receiver_list and
    after the timer was started.
    
    In the worst case, the timer (function) could have tried to access this
    information before the actual initialization was reached. Like rest of the
    variables of the tp_meter receiver session, this field has to be filled out
    before any other (parallel running) context has the chance to access it.
    
    Cc: stable@kernel.org
    Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
    [ Context ]
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tp_meter: keep unacked list in ascending ordered [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:16 2026 +0200

    batman-adv: tp_meter: keep unacked list in ascending ordered
    
    commit 5aa8651527ea0b610e7a09fb3b8204c1398b9525 upstream.
    
    When batadv_tp_handle_out_of_order inserts a new entry in the list of
    unacked (out of order) packets, it searches from the entry with the newest
    sequence number towards oldest sequence number. If an entry is found which
    is older than the newly entry, the new entry has to be added after the
    found one to keep the ascending order.
    
    But for this operation list_add_tail() was used. But this function adds an
    entry _before_ another one. As result, the list would contain a lot of
    swapped sequence numbers. The consumer of this list
    (batadv_tp_ack_unordered()) would then fail to correctly ack packets.
    
    Cc: stable@kernel.org
    Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tp_meter: prevent parallel modifications of last_recv [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:35 2026 +0200

    batman-adv: tp_meter: prevent parallel modifications of last_recv
    
    commit 6dde0cfcb36e4d5b3de35b75696937478441eed4 upstream.
    
    When last_recv is updated to store the last receive sequence number, it is
    assuming that nothing is modifying in parallel while:
    
    * check for outdated packets is done
    * out of order check is performed (and packets are stored in out-of-order
      queue)
    * the out-of-order queue was searched for closed gaps
    * sequence number for next ack is calculated
    
    Nothing of that was actually protected. It could therefore happen that the
    last_recv was updated multiple times in parallel and the final sequence
    number was calculated with deltas which had no connection to the sequence
    number they were added to.
    
    Lock this whole region with the same lock which was already used to protect
    the unacked (out-of-order) list.
    
    Cc: stable@kernel.org
    Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
    [ Switch to pre-splitted tp_vars structure names ]
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tp_meter: restrict number of unacked list entries [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:33 2026 +0200

    batman-adv: tp_meter: restrict number of unacked list entries
    
    commit e7c775110e1858e5a7471a23a9c9658c0af9df89 upstream.
    
    When the unacked_list is unbound, an attacker could send messages with
    small lengths and appropriated seqno + gaps to force the receiver to
    allocate more and more unacked_list entries. And the end either causing an
    out-of-memory situation or increase the management overhead for the (large)
    list that significant portions of CPU cycles are wasted in searching
    through the list.
    
    When limiting the list to a specific number, it is important to still
    correctly add a new entry to the list. But if the list became larger than
    the limit, the last entry of the list (with the highest seqno) must be
    dropped to still allow the earlier seqnos to finish and therefore to
    continue the process. Otherwise, the process might get stuck with too high
    seqnos which are not handled by batadv_tp_ack_unordered().
    
    Cc: stable@kernel.org
    Fixes: 33a3bb4a3345 ("batman-adv: throughput meter implementation")
    [ Switch to pre-splitted tp_vars structure names ]
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tt: don't merge change entries with different VIDs [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:37 2026 +0200

    batman-adv: tt: don't merge change entries with different VIDs
    
    commit f08e06c2d5c3e2434e7c773f2213f4a7dce6bc1e upstream.
    
    batadv_tt_local_event() merges/cancels events for the same client which
    would conflict or be duplicates. The matching of the queued events only
    compares the MAC address - the VLAN ID stored in each event is ignored.
    
    If a MAC would now appear on multiple VID, the two ADD change events (for
    VID 1 and VID 2) would be merged to a single vid event. The remote can
    therefore not calculate the correct TT table and desync. A full translation
    table exchange is required to recover from this state.
    
    A check of VID is therefore necessary to avoid such wrong merges/cancels.
    
    Cc: stable@kernel.org
    Fixes: c018ad3de61a ("batman-adv: add the VLAN ID attribute to the TT entry")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tt: track roam count per VID [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:38 2026 +0200

    batman-adv: tt: track roam count per VID
    
    commit 12407d5f61c2653a64f2ff4b22f3c267f8420ef1 upstream.
    
    batadv_tt_check_roam_count() is supposed to track roaming of a TT entry.
    But TT entries are for a MAC + VID. The VID was completely missed and thus
    leads to incorrect detection of ROAM counts when a client MAC exists in
    multiple VLANs.
    
    Cc: stable@kernel.org
    Fixes: c018ad3de61a ("batman-adv: add the VLAN ID attribute to the TT entry")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tvlv: avoid race of cifsnotfound handler state [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:41 2026 +0200

    batman-adv: tvlv: avoid race of cifsnotfound handler state
    
    commit edb557b2ba38fea2c5eb710cf366c797e187218c upstream.
    
    TVLV handlers can have the flag BATADV_TVLV_HANDLER_OGM_CIFNOTFND set to
    signal that the OGM handler should be called (with NULL for data) when the
    specific TVLV container was not found in the OGM. This is used by:
    
    * DAT
    * GW
    * Multicast (OGM + Tracker)
    
    The state whether the handler was executed was stored in the struct
    batadv_tvlv_handler. But the TVLV processing is started without any lock.
    Multiple parallel contexts processing TVLVs would therefore overwrite each
    others BATADV_TVLV_HANDLER_OGM_CALLED flag in the shared
    batadv_tvlv_handler.
    
    Drop the shared BATADV_TVLV_HANDLER_OGM_CALLED flag and instead determine,
    per TVLV buffer, whether a matching container was present by scanning the
    packet's buffer.
    
    Cc: stable@kernel.org
    Fixes: ef26157747d4 ("batman-adv: tvlv - basic infrastructure")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: tvlv: enforce 2-byte alignment [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:40 2026 +0200

    batman-adv: tvlv: enforce 2-byte alignment
    
    commit 32a6799255525d6ea4da0f7e9e0e521ad9560a46 upstream.
    
    The fields of an aggregated OGM(v2) are accessed assuming (at least) 2-byte
    alignment, so a following OGM must start at an even offset. As the header
    length is even, an odd tvlv_len would misalign it and trigger unaligned
    accesses on strict-alignment architectures.
    
    Such a misaligned TVLV/OGM/OGMv2 is not created by a normal participant in
    the mesh. Therefore, reject such malformed packets.
    
    Cc: stable@kernel.org
    Fixes: ef26157747d4 ("batman-adv: tvlv - basic infrastructure")
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

batman-adv: v: prevent OGM aggregation on disabled hardif [+ + +]

Author: Sven Eckelmann <sven@narfation.org>
Date:   Fri Jun 26 18:12:32 2026 +0200

    batman-adv: v: prevent OGM aggregation on disabled hardif
    
    commit d11c00b95b2a3b3934007fc003dccc6fdcc061ad upstream.
    
    When an interface gets disabled, the worker is correctly disabled by
    batadv_hardif_disable_interface() -> ... -> batadv_v_ogm_iface_disable().
    In this process, the skb aggr_list is also freed.
    
    But batadv_v_ogm_send_meshif() can still queue new skbs (via
    batadv_v_ogm_queue_on_if()) to the aggr_list. This will only stop after all
    cores can no longer find the RCU protected list of hard interfaces. These
    queued skbs will never be freed or consumed by batadv_v_ogm_aggr_work.
    
    The batadv_v_ogm_iface_disable() function must block
    batadv_v_ogm_queue_on_if() to avoid leak of skbs.
    
    Cc: stable@kernel.org
    Fixes: f89255a02f1d ("batman-adv: BATMAN_V: introduce per hard-iface OGMv2 queues")
    [ Context ]
    Signed-off-by: Sven Eckelmann <sven@narfation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

blk-cgroup: fix UAF in __blkcg_rstat_flush() [+ + +]

Author: Michal Koutný <mkoutny@suse.com>
Date:   Thu Feb 5 23:54:23 2026 +0800

    blk-cgroup: fix UAF in __blkcg_rstat_flush()
    
    commit 0ab5ee5a1badb58cbb2242617cb01a4972b1f2a2 upstream.
    
    When multiple blkgs in the same blkcg are released concurrently,
    a use-after-free can occur. The race happens when one blkg's
    __blkcg_rstat_flush() removes another blkg's iostat entries via
    llist_del_all(). The second blkg sees an empty list and proceeds
    to free itself while the first is still iterating over its entries.
    
    Move the flush from __blkg_release() (RCU callback) to blkg_release()
    (before call_rcu). This ensures the RCU grace period waits for any
    concurrent flush's rcu_read_lock() section to complete before freeing.
    
    Cc: stable@vger.kernel.org
    Cc: Jay Shin <jaeshin@redhat.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Waiman Long <longman@redhat.com>
    Fixes: 20cb1c2fb756 ("blk-cgroup: Flush stats before releasing blkcg_gq")
    Reported-by: coregee2000@gmail.com
    Closes: https://lore.kernel.org/linux-block/CAHPqNmwT9oRpem3J3erS_W0uSQND47LGGSBsNxP8E6uSUish1w@mail.gmail.com/
    Signed-off-by: Ming Lei <ming.lei@redhat.com>
    Tested-by: Jose Fernandez (Anthropic) <jose.fernandez@linux.dev>
    Link: https://patch.msgid.link/20260205155425.342084-1-ming.lei@redhat.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

block: Avoid mounting the bdev pseudo-filesystem in userspace [+ + +]

Author: Denis Arefev <arefev@swemel.ru>
Date:   Thu May 21 10:28:56 2026 +0300

    block: Avoid mounting the bdev pseudo-filesystem in userspace
    
    commit f73aa66dffcb8e61e78f01b56163ec16a15d06d2 upstream.
    
    The bdev pseudo-filesystem is an internal kernel filesystem with which
    userspace should not interfere. Unregister it so that userspace cannot
    even attempt to mount it.
    
    This fixes a bug [1] that occurs when attempting to access files,
    because the system call move_mount() uses pointers declared in the
    inode_operations structure, which for the bdev pseudo-filesystem
    are always equal to 0. `inode->i_op = &empty_iops;`
    
    [1]
    
     BUG: kernel NULL pointer dereference, address: 0000000000000000
     #PF: supervisor instruction fetch in kernel mode
     #PF: error_code(0x0010) - not-present page
     PGD 23380067 P4D 23380067 PUD 23381067 PMD 0
     Oops: 0010 [#1] PREEMPT SMP KASAN NOPTI
     CPU: 2 PID: 17125 Comm: syz-executor.0 Not tainted 6.1.155-syzkaller-00350-g84221fde2681 #0
     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
     RIP: 0010:0x0
    
     Call Trace:
     <TASK>
     lookup_open.isra.0+0x700/0x1180 fs/namei.c:3460
     open_last_lookups fs/namei.c:3550 [inline]
     path_openat+0x953/0x2700 fs/namei.c:3780
     do_filp_open+0x1c5/0x410 fs/namei.c:3810
     do_sys_openat2+0x171/0x4d0 fs/open.c:1318
     do_sys_open fs/open.c:1334 [inline]
     __do_sys_openat fs/open.c:1350 [inline]
     __se_sys_openat fs/open.c:1345 [inline]
     __x64_sys_openat+0x13c/0x1f0 fs/open.c:1345
     do_syscall_x64 arch/x86/entry/common.c:51 [inline]
     do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81
     entry_SYSCALL_64_after_hwframe+0x6e/0xd8
    
    Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Link: https://lore.kernel.org/all/20131010004732.GJ13318@ZenIV.linux.org.uk/T/#
    Cc: stable@vger.kernel.org
    Signed-off-by: Denis Arefev <arefev@swemel.ru>
    Reviewed-by: Christoph Hellwig <hch@lst.de>
    Link: https://patch.msgid.link/20260521072857.5078-1-arefev@swemel.ru
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

block: invalidate cached plug timestamp after task switch [+ + +]

Author: Usama Arif <usama.arif@linux.dev>
Date:   Tue Jun 16 07:15:18 2026 -0700

    block: invalidate cached plug timestamp after task switch
    
    commit fad156c2af227f42ca796cbb20ddc354a6dd9932 upstream.
    
    blk_time_get_ns() caches ktime_get_ns() in current->plug->cur_ktime
    and marks the task with PF_BLOCK_TS. That cache is only valid while the
    task keeps running; if the task is switched out, wall-clock time
    advances and the cached value must not be reused when the task runs again.
    
    The existing invalidation covers explicit plug flushes through
    __blk_flush_plug(), and the schedule() / rtmutex paths through
    sched_update_worker(). It does not cover in-kernel preemption paths such
    as preempt_schedule(), preempt_schedule_notrace(), and
    preempt_schedule_irq(), which enter __schedule(SM_PREEMPT) directly and
    return without calling sched_update_worker().
    
    As a result, a task preempted while holding a plug with PF_BLOCK_TS set
    can reuse a stale plug->cur_ktime after it is scheduled back in. blk-iocost
    then consumes that stale timestamp through ioc_now(), producing stale vnow
    values for throttle decisions, and through ioc_rqos_done(), inflating
    on-queue time and feeding false missed-QoS samples into vrate
    adjustment.
    
    Move the schedule-side invalidation to finish_task_switch(), which runs
    for the scheduled-in task after every actual context switch regardless
    of which schedule entry point was used. Keep __blk_flush_plug() as the
    explicit flush/finish-plug invalidation path, and remove only the
    PF_BLOCK_TS handling from sched_update_worker().
    
    Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption")
    Cc: stable@vger.kernel.org
    Signed-off-by: Usama Arif <usama.arif@linux.dev>
    Link: https://patch.msgid.link/20260616141604.328820-3-usama.arif@linux.dev
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bpf: use kvfree() for replaced sysctl write buffer [+ + +]

Author: Dawei Feng <dawei.feng@seu.edu.cn>
Date:   Wed Jun 3 18:53:16 2026 +0800

    bpf: use kvfree() for replaced sysctl write buffer
    
    commit 4c21b5927d4364bfe7365f2700da5fea0ed0d004 upstream.
    
    proc_sys_call_handler() allocates its temporary sysctl buffer with
    kvzalloc() and passes it to __cgroup_bpf_run_filter_sysctl(). Since
    kvzalloc() may fall back to vmalloc() for large allocations, freeing
    that buffer with kfree() is wrong and can corrupt memory.
    
    Use kvfree() to safely handle both kmalloc and kvzalloc()/vmalloc
    allocations.
    
    The bug was first flagged by an experimental analysis tool we are
    developing for kernel memory-management bugs while analyzing
    v6.13-rc1. The tool is still under development and is not yet publicly
    available. Manual inspection confirms that the bug is still
    present in v7.1-rc5.
    
    Reproduced the bug based on v7.1-rc4 in a QEMU x86_64 guest booted with
    KASAN and CONFIG_FAILSLAB enabled. To exercise the replacement path, the
    test tree also included the accompanying fix for the stale ret == 1
    check in __cgroup_bpf_run_filter_sysctl(). The reproducer confines
    failslab injections to the proc_sys_call_handler() range, uses
    stacktrace-depth=32, and injects fail-nth=1 while writing 8191 bytes to
    /proc/sys/kernel/domainname from a task in the target cgroup. Under
    that setup, fail-nth=1 triggered the fault:
    
      BUG: unable to handle page fault for address: ffffeb0200024d48
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      PGD 0 P4D 0
      Oops: Oops: 0000  SMP KASAN NOPTI
      CPU: 2 UID: 0 PID: 209 Comm: repro_proc_sys_ Not tainted 7.1.0-rc4-00686-g97625979a5d4  PREEMPT(lazy)
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
      RIP: 0010:kfree+0x6e/0x510
      ...
      Call Trace:
       <TASK>
       ? __cgroup_bpf_run_filter_sysctl+0x626/0xc30
       __cgroup_bpf_run_filter_sysctl+0x74d/0xc30
       ? __pfx___cgroup_bpf_run_filter_sysctl+0x10/0x10
       ? srso_return_thunk+0x5/0x5f
       ? __kvmalloc_node_noprof+0x345/0x870
       ? proc_sys_call_handler+0x250/0x480
       ? srso_return_thunk+0x5/0x5f
       proc_sys_call_handler+0x3a2/0x480
       ? __pfx_proc_sys_call_handler+0x10/0x10
       ? srso_return_thunk+0x5/0x5f
       ? selinux_file_permission+0x39f/0x500
       ? srso_return_thunk+0x5/0x5f
       ? lock_is_held_type+0x9e/0x120
       vfs_write+0x98e/0x1000
       ...
       </TASK>
    
    With this fix applied on top of the same test setup, rerunning the
    reproducer with fail-nth=1 yields no corresponding Oops reports.
    
    Fixes: 4508943794ef ("proc: use kvzalloc for our kernel buffer")
    Cc: stable@vger.kernel.org
    
    Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
    Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
    Acked-by: Yonghong Song <yonghong.song@linux.dev>
    Signed-off-by: Zilin Guan <zilin@seu.edu.cn>
    Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn>
    Link: https://lore.kernel.org/r/20260603105317.944304-3-dawei.feng@seu.edu.cn
    Signed-off-by: Alexei Starovoitov <ast@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: nx - fix nx_crypto_ctx_exit argument [+ + +]

Author: Sam James <sam@gentoo.org>
Date:   Mon May 25 08:56:19 2026 +0100

    crypto: nx - fix nx_crypto_ctx_exit argument
    
    commit 4e67f504ee9ded15e256b64f4fde150e917381d7 upstream.
    
    nx_crypto_ctx_shash_exit calls nx_crypto_ctx_exit with crypto_shash_ctx(...)
    but crypto_shash_ctx gives a nx_crypto_ctx *, not a crypto_tfm *.
    
    Fix the type in nx_crypto_ctx_exit and drop the bogus crypto_tfm_ctx
    call.
    
    This fixes the following oops:
    
      BUG: Unable to handle kernel data access at 0xc0403effffffffc8
      Faulting instruction address: 0xc000000000396cb4
      Oops: Kernel access of bad area, sig: 11 [#15]
      Call Trace:
       nx_crypto_ctx_shash_exit+0x24/0x60
       crypto_shash_exit_tfm+0x28/0x40
       crypto_destroy_tfm+0x98/0x140
       crypto_exit_ahash_using_shash+0x20/0x40
       crypto_destroy_tfm+0x98/0x140
       hash_release+0x1c/0x30
       alg_sock_destruct+0x38/0x60
       __sk_destruct+0x48/0x2b0
       af_alg_release+0x58/0xb0
       __sock_release+0x68/0x150
       sock_close+0x20/0x40
       __fput+0x110/0x3a0
       sys_close+0x48/0xa0
       system_call_exception+0x140/0x2d0
       system_call_common+0xf4/0x258
    
    .. which came from hardlink(1) opportunistically using AF_ALG.
    
    The same problem exists with nx_crypto_ctx_skcipher_exit getting a context
    it wasn't expecting, but apparently nobody hit that for years.
    
    Cc: Eric Biggers <ebiggers@kernel.org>
    Cc: stable@vger.kernel.org
    Fixes: bfd9efddf990 ("crypto: nx - convert AES-ECB to skcipher API")
    Fixes: 9420e628e7d8 ("crypto: nx - Use API partial block handling")
    Acked-by: Breno Leitao <leitao@debian.org>
    Reviewed-by: Eric Biggers <ebiggers@kernel.org>
    Reported-by: Calvin Buckley <calvin@cmpct.info>
    Tested-by: Calvin Buckley <calvin@cmpct.info>
    Suggested-by: Brad Spengler <brad.spengler@opensrcsec.com>
    Signed-off-by: Sam James <sam@gentoo.org>
    Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

err.h: use __always_inline on all error pointer helpers [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue May 26 12:18:41 2026 +0200

    err.h: use __always_inline on all error pointer helpers
    
    commit 94bfc7f3b0c7c33331ba4ff6cc64ff309dfcbce8 upstream.
    
    While testing randconfig builds on s390, I came across a link failure with
    CONFIG_DMA_SHARED_BUFFER disabled:
    
    ERROR: modpost: "dma_buf_put" [drivers/iommu/iommufd/iommufd.ko] undefined!
    
    The problem here is that IS_ERR() is not inlined and dead code elimination
    fails as a consequence.
    
    The err.h helpers all turn into a trivial assignment of a bit mask and
    should never result in a function call, so force them to always be inline.
    This should generally result in better object code aside from avoiding
    the link failure above.
    
    Link: https://lore.kernel.org/20260526101851.2495110-1-arnd@kernel.org
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Reviewed-by: Nathan Chancellor <nathan@kernel.org>
    Tested-by: Tamir Duberstein <tamird@kernel.org>
    Cc: Alexander Gordeev <agordeev@linux.ibm.com>
    Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com>
    Cc: Ansuel Smith <ansuelsmth@gmail.com>
    Cc: Bjorn Andersson <andersson@kernel.org>
    Cc: Heiko Carstens <hca@linux.ibm.com>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

exfat: fix potential use-after-free in exfat_find_dir_entry() [+ + +]

Author: Michael Bommarito <michael.bommarito@gmail.com>
Date:   Wed Apr 22 11:58:44 2026 -0400

    exfat: fix potential use-after-free in exfat_find_dir_entry()
    
    commit 3f5f8ee9917cc2b9076ac533492d8a200edcabb8 upstream.
    
    In exfat_find_dir_entry(), the buffer_head obtained from
    exfat_get_dentry() is released with brelse(bh) before the fall-through
    TYPE_EXTEND branch reads the directory entry through ep (which points
    into bh->b_data):
    
            brelse(bh);
            if (entry_type == TYPE_EXTEND) {
                    ...
                    len = exfat_extract_uni_name(ep, entry_uniname);
                    ...
            }
    
    After brelse() drops our reference, nothing guarantees that the
    underlying page backing bh->b_data remains valid for the subsequent
    exfat_extract_uni_name() read. This is the same pattern fixed in
    commit fc961522ddbd ("exfat: Fix potential use after free in
    exfat_load_upcase_table()").
    
    Move brelse(bh) so it runs after ep is no longer dereferenced on
    each branch.
    
    Confirmed on QEMU x86_64 with CONFIG_KASAN=y + CONFIG_DEBUG_PAGEALLOC=y
    + CONFIG_PAGE_POISONING=y on linux-next, using a crafted exFAT image
    (long filename with same-hash collisions forcing the TYPE_EXTEND path).
    With a debug-only invalidate_bdev() inserted between brelse(bh) and
    the ep read to make the stale-deref window deterministic, the
    unpatched kernel faults:
    
      BUG: KASAN: use-after-free in exfat_find_dir_entry+0x133b/0x15a0
      BUG: unable to handle page fault for address: ffff88801a5fa0c2
      Oops: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN NOPTI
      RIP: 0010:exfat_find_dir_entry+0x1188/0x15a0
    
    With this patch applied, the same instrumented harness completes
    cleanly under the same sanitizer stack. I have not reproduced a
    crash on an uninstrumented kernel under ordinary reclaim; the
    instrumented A/B establishes the lifetime violation and that the
    patch closes it, not an unaided triggerability claim.
    
    Fixes: ca06197382bd ("exfat: add directory operations")
    Cc: stable@vger.kernel.org
    Assisted-by: Claude:claude-opus-4-7
    Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
    Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: atomic: fix UAF issue on f2fs_inode_info.atomic_inode [+ + +]

Author: Chao Yu <chao@kernel.org>
Date:   Thu May 21 10:15:05 2026 +0800

    f2fs: atomic: fix UAF issue on f2fs_inode_info.atomic_inode
    
    commit e0288584baa5dc41df4a829a023c4c1b33fe53d7 upstream.
    
    - ioctl(F2FS_IOC_GARBAGE_COLLECT_RANGE)         - shrink
     - f2fs_gc
      - gc_data_segment
       - ra_data_block(cow_inode)
        - mapping = F2FS_I(inode)->atomic_inode->i_mapping
        : f2fs_is_cow_file(cow_inode) is true
                                                     - f2fs_evict_inode(atomic_inode)
                                                      - clear_inode_flag(fi->cow_inode, FI_COW_FILE)
                                                      - F2FS_I(fi->cow_inode)->atomic_inode = NULL
                                                      ...
                                                      - truncate_inode_pages_final(atomic_inode)
        - f2fs_grab_cache_folio(mapping)
        : create folio in atomic_inode->mapping
                                                      - clear_inode(atomic_inode)
                                                       - BUG_ON(atomic_inode->i_data.nrpages)
    
    We need to add a reference on fi->atomic_inode before using its mapping
    field during garbage collection, otherwise, it will cause UAF issue.
    
    Cc: stable@kernel.org
    Cc: Daeho Jeong <daehojeong@google.com>
    Cc: Sunmin Jeong <s_min.jeong@samsung.com>
    Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
    Fixes: f18d00769336 ("f2fs: use meta inode for GC of COW file")
    Signed-off-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: bound i_inline_xattr_size for non-inline-xattr inodes [+ + +]

Author: Bryam Vargas <hexlabsecurity@proton.me>
Date:   Thu Jun 11 23:00:36 2026 -0500

    f2fs: bound i_inline_xattr_size for non-inline-xattr inodes
    
    commit 378acf3cf19b6af6cba55e8dd1154c4e1504bae8 upstream.
    
    When the flexible_inline_xattr feature is enabled, do_read_inode() loads
    the on-disk i_inline_xattr_size unconditionally:
    
            if (f2fs_sb_has_flexible_inline_xattr(sbi))
                    fi->i_inline_xattr_size = le16_to_cpu(ri->i_inline_xattr_size);
    
    but sanity_check_inode() only range-checks it when the inode also has the
    FI_INLINE_XATTR flag set.  An inode that carries an inline dentry or inline
    data but not FI_INLINE_XATTR -- the normal layout for an inline
    directory -- therefore keeps a fully attacker-controlled
    i_inline_xattr_size from a crafted image.
    
    get_inline_xattr_addrs() returns that value with no flag gating, so it
    feeds the inode geometry:
    
            MAX_INLINE_DATA()  = 4 * (CUR_ADDRS_PER_INODE - i_inline_xattr_size - 1)
            NR_INLINE_DENTRY() = MAX_INLINE_DATA() * BITS_PER_BYTE / (...)
            addrs_per_page()   = CUR_ADDRS_PER_INODE - i_inline_xattr_size
    
    A large i_inline_xattr_size drives MAX_INLINE_DATA() and NR_INLINE_DENTRY()
    negative, so make_dentry_ptr_inline() sets d->max (int) to a negative
    value.  The inline directory walk then compares an unsigned long bit_pos
    against that negative d->max, which is promoted to a huge unsigned bound,
    and reads far past the inline area:
    
            while (bit_pos < d->max)                /* fs/f2fs/dir.c */
                    ... test_bit_le(bit_pos, d->bitmap) / d->dentry[bit_pos] ...
    
    Mounting a crafted image and reading such a directory triggers an
    out-of-bounds read in f2fs_fill_dentries(); the same underflow also
    corrupts ADDRS_PER_INODE for regular files.
    
    Validate i_inline_xattr_size against MAX_INLINE_XATTR_SIZE whenever the
    flexible_inline_xattr feature is enabled -- i.e. whenever the value is
    loaded from disk and consumed -- and keep the lower MIN_INLINE_XATTR_SIZE
    bound gated on inodes that actually carry an inline xattr, so legitimate
    inodes with i_inline_xattr_size == 0 are still accepted.
    
    Cc: stable@vger.kernel.org
    Fixes: 6afc662e68b5 ("f2fs: support flexible inline xattr size")
    Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me>
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: fix incorrect FI_NO_EXTENT handling in __destroy_extent_node() [+ + +]

Author: Yongpeng Yang <yangyongpeng@xiaomi.com>
Date:   Mon Apr 27 21:10:51 2026 +0800

    f2fs: fix incorrect FI_NO_EXTENT handling in __destroy_extent_node()
    
    commit 1f70ddb28a3c71df124da5fa4040c808116d6bb9 upstream.
    
    When __destroy_extent_node() sets the inode flag FI_NO_EXTENT, it does
    not reset the length of the largest extent to 0 and update the inode
    folio. Since modifications to the extent tree are disallowed afterward,
    the cached largest extent may become stale. This can trigger the
    following error in xfstests generic/388:
    
    F2FS-fs (dm-0): sanity_check_extent_cache: inode (ino=1761) extent info [220057, 57, 6] is incorrect, run fsck to fix
    
    In the f2fs_drop_inode path, __destroy_extent_node() does not need to
    guarantee that et->node_cnt is 0, because concurrency with writeback
    is expected in this path, and writeback may update the extent cache.
    
    This patch reverts commit ed78aeebef05 ("f2fs: fix node_cnt race between
    extent node destroy and writeback"), and remove the unnecessary zero
    check of et->node_cnt.
    
    Fixes: ed78aeebef05 ("f2fs: fix node_cnt race between extent node destroy and writeback")
    Cc: stable@vger.kernel.org
    Reported-by: Chao Yu <chao@kernel.org>
    Suggested-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com>
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: fix missing read bio submission on large folio error [+ + +]

Author: Wenjie Qi <qiwenjie@xiaomi.com>
Date:   Wed May 20 17:52:04 2026 +0800

    f2fs: fix missing read bio submission on large folio error
    
    commit 74c8d2ec95c59a5651ecd975c466998af1961fd4 upstream.
    
    f2fs_read_data_large_folio() can keep a read bio across multiple
    readahead folios.  If a later folio hits an error before any of its
    blocks are added to the bio, folio_in_bio is false and the current error
    path returns immediately after ending that folio.
    
    This can leave the bio accumulated for earlier folios unsubmitted.  Those
    folios then never receive read completion, and readers can wait
    indefinitely on the locked folios.
    
    Route errors through the common out path so any pending bio is submitted
    before returning.  Stop consuming more readahead folios once an error is
    seen, and only wait on and clear the current folio when it was actually
    added to the bio.
    
    Cc: stable@kernel.org
    Fixes: a5d8b9d94e18 ("f2fs: fix to unlock folio in f2fs_read_data_large_folio()")
    Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com>
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: fix to do sanity check on f2fs_get_node_folio_ra() [+ + +]

Author: Chao Yu <chao@kernel.org>
Date:   Fri May 22 15:53:29 2026 +0800

    f2fs: fix to do sanity check on f2fs_get_node_folio_ra()
    
    commit 8712353ed80f87271d732297567dcdbe4b84e8c7 upstream.
    
    kernel BUG at fs/f2fs/file.c:845!
    Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
    CPU: 0 UID: 0 PID: 5336 Comm: syz.0.0 Not tainted syzkaller #0 PREEMPT(full)
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
    RIP: 0010:f2fs_do_truncate_blocks+0x1115/0x1140 fs/f2fs/file.c:845
    Code: fc fc 90 0f 0b e8 8b 9d 9a fd 90 0f 0b e8 83 9d 9a fd 48 89 df 48 c7 c6 60 d1 1a 8c e8 54 f1 fc fc 90 0f 0b e8 6c 9d 9a fd 90 <0f> 0b e8 64 9d 9a fd 90 0f 0b 90 e9 93 fd ff ff e8 56 9d 9a fd 90
    RSP: 0018:ffffc9000e4474c0 EFLAGS: 00010283
    RAX: ffffffff842b1d34 RBX: 0000000000000003 RCX: 0000000000100000
    RDX: ffffc9000f03a000 RSI: 0000000000035503 RDI: 0000000000035504
    RBP: ffffc9000e447608 R08: ffff8880123b0000 R09: 0000000000000002
    R10: 00000000fffffffe R11: 0000000000000002 R12: 0000000000000001
    R13: 0000000000000000 R14: 1ffff92001c88ea0 R15: 00000000ffff039c
    FS:  00007f7e02ee36c0(0000) GS:ffff88808c887000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007ff0305c4000 CR3: 0000000012d4c000 CR4: 0000000000352ef0
    Call Trace:
     <TASK>
     f2fs_truncate_blocks+0x10a/0x300 fs/f2fs/file.c:882
     f2fs_truncate+0x471/0x7c0 fs/f2fs/file.c:940
     f2fs_evict_inode+0xa3f/0x1ac0 fs/f2fs/inode.c:907
     evict+0x61e/0xb10 fs/inode.c:841
     f2fs_fill_super+0x5f43/0x78f0 fs/f2fs/super.c:5224
     get_tree_bdev_flags+0x431/0x4f0 fs/super.c:1694
     vfs_get_tree+0x92/0x2a0 fs/super.c:1754
     fc_mount fs/namespace.c:1193 [inline]
     do_new_mount_fc fs/namespace.c:3758 [inline]
     do_new_mount+0x341/0xd30 fs/namespace.c:3834
     do_mount fs/namespace.c:4167 [inline]
     __do_sys_mount fs/namespace.c:4383 [inline]
     __se_sys_mount+0x31d/0x420 fs/namespace.c:4360
     do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
     do_syscall_64+0x15f/0xf80 arch/x86/entry/syscall_64.c:94
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    
            count = ADDRS_PER_PAGE(dn.node_folio, inode);
    
            count -= dn.ofs_in_node;
            f2fs_bug_on(sbi, count < 0);
    
    The fuzz test will trigger above bug_on in f2fs.
    
    The root cause should be: in the corrupted inode, there is a direct node
    which has the same ino and nid in its footer, so in f2fs_do_truncate_blocks(),
    after f2fs_get_dnode_of_data() finds such dnode:
    1) ADDRS_PER_PAGE(dn.node_folio, inode) will return 923
    2) once dn.ofs_in_node points to addr[923, 1017]
    Then it will trigger the system panic.
    
    Let's introduce NODE_TYPE_NON_IXNODE to indicate current node should
    not be an inode or xattr node, and then use it in below path to detect
    inconsistent node chain in inode mapping table:
    
    - f2fs_do_truncate_blocks
     - f2fs_get_dnode_of_data
      - f2fs_get_node_folio_ra
       -  __get_node_folio
        - f2fs_sanity_check_node_footer
         - case NODE_TYPE_NON_IXNODE -> check whether it is inode|xnode
    
    Cc: stable@kernel.org
    Reported-by: syzbot+2488d8d751b27f7ce268@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/all/69fa3697.170a0220.59368.0018.GAE@google.com
    Signed-off-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: fix to round down start offset of fallocate for pin file [+ + +]

Author: Sunmin Jeong <s_min.jeong@samsung.com>
Date:   Mon Jun 22 14:28:17 2026 +0900

    f2fs: fix to round down start offset of fallocate for pin file
    
    commit 4275b59673eb60b02eec3997816c83f1f4b909c4 upstream.
    
    Currently, the length of fallocate for pin file is section-aligned to
    keep allocated sections from being selected as victims of GC. However,
    for the case that the start offset of fallocate is not aligned in
    section, the allocated sections can't be fully utilized. It's because a
    new section is allocated by f2fs_allocate_pinning_section() after using
    blks_per_sec blocks regardless of the start offset. As a result, several
    unexpected dirty segments may be created, including blocks assigned to
    the pinned file.
    
    To address this issue, let's round down the start offset of fallocate
    to the length of section.
    
    The reproducing scenario is as below
    
    chunk=$(((2<<20)+4096)) # 2MB + 4KB
    touch test
    f2fs_io pinfile set test
    f2fs_io fallocate 0 0 $chunk test
    f2fs_io fallocate 0 $chunk $chunk test
    f2fs_io fallocate 0 $((chunk*2)) $chunk test
    f2fs_io fiemap 0 $((chunk*3)) test
    
    Fiemap: offset = 0 len = 12288
        logical addr.    physical addr.   length           flags
    0   0000000000000000 000000068c600000 0000000000400000 00001088
    1   0000000000400000 000000003d400000 0000000000001000 00001088
    2   0000000000401000 00000003eb200000 0000000000200000 00001088
    3   0000000000601000 00000005e4200000 0000000000001000 00001088
    4   0000000000602000 0000000605400000 0000000000200000 00001089
    
    Cc: stable@vger.kernel.org
    Fixes: f5a53edcf01e ("f2fs: support aligned pinned file")
    Reviewed-by: Yunji Kang <yunji0.kang@samsung.com>
    Reviewed-by: Yeongjin Gil <youngjin.gil@samsung.com>
    Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
    Signed-off-by: Sunmin Jeong <s_min.jeong@samsung.com>
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: keep atomic write retry from zeroing original data [+ + +]

Author: Wenjie Qi <qwjhust@gmail.com>
Date:   Wed May 27 20:06:28 2026 +0800

    f2fs: keep atomic write retry from zeroing original data
    
    commit 6d874b65aadce56ac78f76129dbcfc2599b638f8 upstream.
    
    A partial atomic write reserves a block in the COW inode before reading the
    original data page for the untouched bytes in that page.
    
    If that read fails, write_begin returns an error but leaves the COW inode
    entry as NEW_ADDR. A retry of the same partial write then finds the COW
    entry, treats it as existing COW data, and f2fs_write_begin() zeroes the
    whole folio because blkaddr is NEW_ADDR.
    
    If the retry is committed, the bytes outside the retried write range are
    committed as zeroes instead of preserving the original file contents.
    
    Only use the COW inode as the read source when it already has a real data
    block. If the COW entry is still NEW_ADDR, treat it as a reservation to
    reuse: keep reading the old data from the original inode and avoid
    reserving or accounting the same atomic block again.
    
    Cc: stable@kernel.org
    Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
    Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com>
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: pass correct iostat type for single node writes [+ + +]

Author: Wenjie Qi <qwjhust@gmail.com>
Date:   Wed May 20 20:07:05 2026 +0800

    f2fs: pass correct iostat type for single node writes
    
    commit fcb05c26c2a67953b420739b85f49386efc9b6c0 upstream.
    
    f2fs_write_single_node_folio() takes an io_type argument, but still
    passes FS_GC_NODE_IO to __write_node_folio() unconditionally.
    
    This was harmless while the helper was only used by
    f2fs_move_node_folio(), whose caller passes FS_GC_NODE_IO. However,
    commit fe9b8b30b971 ("f2fs: fix inline data not being written to disk
    in writeback path") made f2fs_inline_data_fiemap() call the helper with
    FS_NODE_IO for FIEMAP_FLAG_SYNC.
    
    Honor the caller supplied io_type so inline-data FIEMAP sync writeback is
    accounted as normal node IO instead of GC node IO, while the GC path
    continues to pass FS_GC_NODE_IO explicitly.
    
    Cc: stable@kernel.org
    Fixes: fe9b8b30b971 ("f2fs: fix inline data not being written to disk in writeback path")
    Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com>
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: read COW data with the original inode during atomic write [+ + +]

Author: Mikhail Lobanov <m.lobanov@rosa.ru>
Date:   Mon Jun 15 14:36:13 2026 +0300

    f2fs: read COW data with the original inode during atomic write
    
    commit a41075acde0124d2f8a5f563068a5d63e8ffd57b upstream.
    
    When updating an atomic-write file, f2fs_write_begin() may read the
    previously written data back from the COW inode:
    prepare_atomic_write_begin() locates the block in the COW inode and sets
    use_cow, and the read bio is then built with the COW inode:
    
            f2fs_submit_page_read(use_cow ? F2FS_I(inode)->cow_inode : inode,
                                  ...);
    
    and f2fs_grab_read_bio() decides whether to schedule fs-layer decryption
    (STEP_DECRYPT) for the bio based on that inode via
    fscrypt_inode_uses_fs_layer_crypto().
    
    However, the folio being filled belongs to the original inode
    (folio->mapping->host == inode), and the data stored in the COW block was
    encrypted (or left as plaintext) using the original inode's context, not
    the COW inode's -- see f2fs_encrypt_one_page(), which keys off
    fio->page->mapping->host.  fscrypt_decrypt_pagecache_blocks() likewise
    operates on folio->mapping->host.
    
    The COW inode is created as a tmpfile in the parent directory and inherits
    its encryption policy from there.  With test_dummy_encryption the newly
    created COW inode gets the dummy policy and becomes encrypted, while a
    pre-existing regular file -- created before the policy applied, e.g.
    already present in the on-disk image -- stays unencrypted.  The read
    path then sets STEP_DECRYPT based on the encrypted COW inode and calls
    fscrypt_decrypt_pagecache_blocks() on a folio whose host (the unencrypted
    original inode) has a NULL ->i_crypt_info, dereferencing it:
    
      Oops: general protection fault, probably for non-canonical address ...
      KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
      RIP: 0010:fscrypt_decrypt_pagecache_blocks+0xa0/0x310
      Workqueue: f2fs_post_read_wq f2fs_post_read_work
      Call Trace:
       fscrypt_decrypt_bio+0x1eb/0x340
       f2fs_post_read_work+0xba/0x140
       process_one_work+0x91c/0x1a40
       worker_thread+0x677/0xe90
       kthread+0x2bc/0x3a0
    
    The COW inode is only needed to locate the on-disk block, and that block
    address is already resolved into @blkaddr by prepare_atomic_write_begin()
    via __find_data_block(cow_inode, ...); f2fs_submit_page_read() then reads
    from that physical @blkaddr directly, so the inode argument only selects
    the post-read crypto context, not which block is fetched.  Reading with
    @inode therefore returns the same (latest, not-yet-committed) COW data,
    while making both the fs-layer decryption decision and the inline crypto
    path use the correct (original inode's) key.
    
    With the COW inode no longer used at the read site, the use_cow flag has no
    remaining consumer; drop it from f2fs_write_begin() and
    prepare_atomic_write_begin().
    
    Fixes: 591fc34e1f98 ("f2fs: use cow inode data when updating atomic write")
    Cc: stable@vger.kernel.org
    Signed-off-by: Mikhail Lobanov <m.lobanov@rosa.ru>
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: reject setattr size changes on large folio files [+ + +]

Author: Wenjie Qi <qwjhust@gmail.com>
Date:   Wed Jun 10 22:37:35 2026 +0800

    f2fs: reject setattr size changes on large folio files
    
    commit 242d30bfc0a84b8b5de0a88821b53c9ad7fd31c4 upstream.
    
    F2FS large folios are only enabled for immutable non-compressed files.
    Writable open and writable mmap reject such mappings, but truncate(2)
    through f2fs_setattr() misses the same guard.
    
    If FS_IMMUTABLE_FL is cleared while the inode is still cached, the mapping
    can keep large-folio support and ATTR_SIZE can change i_size. Reject size
    changes in that state.
    
    Cc: stable@kernel.org
    Fixes: 05e65c14ea59 ("f2fs: support large folio for immutable non-compressed case")
    Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com>
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: validate ACL entry sizes in f2fs_acl_from_disk() [+ + +]

Author: Zhang Cen <rollkingzzc@gmail.com>
Date:   Mon Jun 15 15:19:54 2026 +0800

    f2fs: validate ACL entry sizes in f2fs_acl_from_disk()
    
    commit c4810ada31e80cbe4011467c4f3b1e93f94134f3 upstream.
    
    f2fs_acl_count() only validates the aggregate ACL xattr length. A
    malformed ACL can still place ACL_USER or ACL_GROUP in a slot that only
    contains struct f2fs_acl_entry_short bytes, and f2fs_acl_from_disk()
    then reads entry->e_id before verifying that a full entry fits.
    
    Require a short entry before reading e_tag and e_perm, and require a
    full entry before reading e_id for ACL_USER and ACL_GROUP. Return
    -EFSCORRUPTED from these new truncated-entry checks, while keeping the
    pre-existing -EINVAL paths unchanged.
    
    Validation reproduced this kernel report:
    KASAN slab-out-of-bounds in __f2fs_get_acl+0x6fb/0x7e0
    RIP: 0033:0x7f4b835ea7aa
    The buggy address belongs to the object at ffff888114589960 which belongs
    to the cache kmalloc-8 of size 8
    The buggy address is located 0 bytes to the right of allocated 8-byte
    region [ffff888114589960, ffff888114589968)
    Read of size 4
    Call trace:
      dump_stack_lvl+0x66/0xa0 (?:?)
      print_report+0xce/0x630 (?:?)
      __f2fs_get_acl+0x6fb/0x7e0 (fs/f2fs/acl.c:169)
      srso_alias_return_thunk+0x5/0xfbef5 (?:?)
      __virt_addr_valid+0x224/0x430 (?:?)
      kasan_report+0xe0/0x110 (?:?)
      __f2fs_get_acl+0x5/0x7e0 (fs/f2fs/acl.c:169)
      __get_acl+0x281/0x380 (?:?)
      vfs_get_acl+0x10b/0x190 (?:?)
      do_get_acl+0x2a/0x410 (?:?)
      do_get_acl+0x9/0x410 (?:?)
      do_getxattr+0xe8/0x260 (?:?)
      filename_getxattr+0xd1/0x140 (?:?)
      do_getname+0x2d/0x2d0 (?:?)
      path_getxattrat+0x16c/0x200 (?:?)
      lock_release+0xc8/0x290 (?:?)
      cgroup_update_frozen+0x9d/0x320 (?:?)
      lockdep_hardirqs_on_prepare+0xea/0x1a0 (?:?)
      trace_hardirqs_on+0x1a/0x170 (?:?)
      _raw_spin_unlock_irq+0x28/0x50 (?:?)
      do_syscall_64+0x115/0x6a0 (arch/x86/entry/syscall_64.c:87)
      entry_SYSCALL_64_after_hwframe+0x77/0x7f (?:?)
    
    Cc: stable@kernel.org
    Fixes: af48b85b8cd3 ("f2fs: add xattr and acl functionalities")
    Assisted-by: Codex:gpt-5.5
    Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: validate compress cache inode only when enabled [+ + +]

Author: Wenjie Qi <qwjhust@gmail.com>
Date:   Thu May 21 11:16:18 2026 +0800

    f2fs: validate compress cache inode only when enabled
    
    commit 5073c66a96a9c23c0c2533ed4ed06e42f9021208 upstream.
    
    F2FS_COMPRESS_INO() uses NM_I(sbi)->max_nid as the synthetic inode
    number for the compressed page cache inode. That inode only exists when
    the compress_cache mount option is enabled.
    
    When compress_cache is disabled, max_nid is outside the valid inode
    range. A corrupted directory entry that points to ino == max_nid should
    therefore be rejected by f2fs_check_nid_range(). However, is_meta_ino()
    currently treats F2FS_COMPRESS_INO() as a meta inode unconditionally,
    so f2fs_iget() bypasses do_read_inode() and its nid range check, and
    instantiates a fake internal inode instead.
    
    Gate the compressed cache inode case on COMPRESS_CACHE, matching
    f2fs_init_compress_inode(). With compress_cache disabled, ino ==
    max_nid now follows the normal inode path and is rejected as an
    out-of-range nid.
    
    Cc: stable@kernel.org
    Fixes: 6ce19aff0b8c ("f2fs: compress: add compress_inode to cache compressed blocks")
    Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com>
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

f2fs: validate orphan inode entry count [+ + +]

Author: Wenjie Qi <qwjhust@gmail.com>
Date:   Tue May 26 13:35:57 2026 +0800

    f2fs: validate orphan inode entry count
    
    commit 846c499a65816d13f1186e3090e825e8bb8bcb8b upstream.
    
    f2fs_recover_orphan_inodes() trusts the orphan block entry_count when
    replaying orphan inodes from the checkpoint pack. A corrupted entry_count
    larger than F2FS_ORPHANS_PER_BLOCK makes the recovery loop read past the
    ino[] array and interpret footer or following data as inode numbers.
    
    On a crafted image, mounting an unpatched kernel can drive orphan recovery
    into f2fs_bug_on() and panic the kernel. Validate entry_count before
    consuming entries so corrupted checkpoint data fails the mount with
    -EFSCORRUPTED and requests fsck instead.
    
    Set ERROR_INCONSISTENT_ORPHAN as well, so the corruption reason can be
    recorded in the superblock s_errors[] field. This gives fsck a persistent
    hint even though mount-time orphan recovery failure may leave no chance to
    persist SBI_NEED_FSCK through a checkpoint.
    
    Cc: stable@kernel.org
    Fixes: 127e670abfa7 ("f2fs: add checkpoint operations")
    Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com>
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fbdev: fbcon: fix out-of-bounds read in err_out of fbcon_do_set_font() [+ + +]

Author: Mingyu Wang <25181214217@stu.xidian.edu.cn>
Date:   Fri Jun 26 00:03:06 2026 +0800

    fbdev: fbcon: fix out-of-bounds read in err_out of fbcon_do_set_font()
    
    commit 8fdc8c2057eea08d40ce2c8eed41ff9e451c65c2 upstream.
    
    When fbcon_do_set_font() fails (e.g., due to a memory allocation failure
    inside vc_resize() under heavy memory pressure), it jumps to the `err_out`
    label to roll back the console state. However, the current rollback logic
    forgets to restore the `hi_font` state, leading to a severe state machine
    corruption.
    
    Earlier in the function, `set_vc_hi_font()` might be called to change
    `vc->vc_hi_font_mask` and mutate the screen buffer. If `vc_resize()`
    subsequently fails, the `err_out` path restores `vc_font.charcount`
    but entirely skips rolling back the `vc_hi_font_mask` and the screen
    buffer.
    
    This mismatch leaves the terminal in a desynchronized state. Because
    `vc_hi_font_mask` remains set, the VT subsystem will still accept
    character indices greater than 255 from userspace and write them to the
    screen buffer. Subsequent rendering calls (e.g., `fbcon_putcs()`) will
    then use these inflated indices to access the reverted, 256-character
    font array, leading to a deterministic out-of-bounds read and potential
    kernel memory disclosure.
    
    Fix this by adding the missing rollback logic for the `hi_font` mask
    and screen buffer in the error path.
    
    Fixes: a5a923038d70 ("fbdev: fbcon: Properly revert changes when vc_resize() failed")
    Cc: stable@vger.kernel.org
    Signed-off-by: Mingyu Wang <25181214217@stu.xidian.edu.cn>
    Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fbdev: Fix fb_new_modelist to prevent null-ptr-deref in fb_videomode_to_var [+ + +]

Author: Ian Bridges <icb@fastmail.org>
Date:   Wed Jun 24 23:13:12 2026 -0500

    fbdev: Fix fb_new_modelist to prevent null-ptr-deref in fb_videomode_to_var
    
    commit 7f08fc10fa3d3366dc3af723970bd03d7d6d10e3 upstream.
    
    info->var, a framebuffer's current mode, is expected to have a matching
    entry in info->modelist. var_to_display() relies on this and treats a
    failed fb_match_mode() as "This should not happen". fb_set_var() keeps it
    true by adding the mode to the list on every change, and
    do_register_framebuffer() does the same at registration.
    
    store_modes() replaces the modelist from userspace. fb_new_modelist()
    validates the new modes but does not check that info->var still has a
    match. It relies on fbcon_new_modelist() to re-point consoles, but that
    only handles consoles mapped to the framebuffer. With fbcon unbound there
    are none, so info->var is left describing a mode that is no longer in the
    list.
    
    A later console takeover runs var_to_display(), where fb_match_mode()
    returns NULL and leaves fb_display[i].mode NULL. fbcon_switch() passes it
    to display_to_var(), and fb_videomode_to_var() dereferences the NULL mode.
    
    Keep the current mode in the list in fb_new_modelist(), the same way
    fb_set_var() does.
    
    Cc: stable@vger.kernel.org
    Assisted-by: Claude:claude-opus-4-8
    Signed-off-by: Ian Bridges <icb@fastmail.org>
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fbdev: fix use-after-free in store_modes() [+ + +]

Author: Ian Bridges <icb@fastmail.org>
Date:   Thu Jun 25 23:50:48 2026 -0500

    fbdev: fix use-after-free in store_modes()
    
    commit 2c1c805c65fb7dc7524e20376d6987721e73a0b1 upstream.
    
    store_modes() replaces a framebuffer's modelist with modes from userspace.
    On success it frees the old modelist with fb_destroy_modelist(). Two
    fields still point into that freed list.
    
    One pointer is fb_display[i].mode, the mode a console is using.
    fbcon_new_modelist() moves these pointers to the new list. It only does so
    for consoles still mapped to the framebuffer. An unmapped console is
    skipped and keeps its stale pointer. Unbinding fbcon, for example, sets
    con2fb_map[i] to -1 but leaves fb_display[i].mode set. An
    FBIOPUT_VSCREENINFO ioctl with FB_ACTIVATE_INV_MODE later reaches
    fbcon_mode_deleted(). That function reads the stale fb_display[i].mode
    through fb_mode_is_equal(). The read is a use-after-free.
    
    The other pointer is fb_info->mode, the current mode. It is set through
    the mode sysfs attribute. store_modes() does not update fb_info->mode, so
    it is left pointing into the freed list. show_mode(), the attribute's read
    handler, dereferences the stale fb_info->mode through mode_string(). The
    read is a use-after-free.
    
    Clear both pointers before freeing the list. Commit a1f305893074 ("fbcon:
    Set fb_display[i]->mode to NULL when the mode is released") added the
    helper fbcon_delete_modelist(). It clears every fb_display[i].mode that
    points into a given list. So far it is called only from the unregister
    path. Call it from store_modes() too, and set fb_info->mode to NULL.
    
    Reported-by: syzbot+81c7c6b52649fd07299d@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=81c7c6b52649fd07299d
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/all/ajjoDhAi2y4ArSlz@dev/
    Assisted-by: Claude:claude-opus-4-8
    Signed-off-by: Ian Bridges <icb@fastmail.org>
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fbdev: modedb: fix a possible UAF in fb_find_mode() [+ + +]

Author: Tuo Li <islituo@gmail.com>
Date:   Wed Jun 10 10:50:14 2026 +0800

    fbdev: modedb: fix a possible UAF in fb_find_mode()
    
    commit 85b6256469cebdac395e7447147e06b2e151014f upstream.
    
    If mode_option is NULL, it is assigned from mode_option_buf:
    
      if (!mode_option) {
        fb_get_options(NULL, &mode_option_buf);
        mode_option = mode_option_buf;
      }
    
    Later, name is assigned from mode_option:
    
      const char *name = mode_option;
    
    However, mode_option_buf is freed before name is no longer used:
    
      kfree(mode_option_buf);
    
    while name is still accessed by:
    
      if ((name_matches(db[i], name, namelen) ||
    
    Since name aliases mode_option_buf, this may result in a
    use-after-free.
    
    Fix this by extending the lifetime of mode_option_buf until the end of the
    function by using scope-based resource management for cleanup.
    
    Signed-off-by: Tuo Li <islituo@gmail.com>
    Cc: stable@vger.kernel.org # v6.5+
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fbdev: modedb: Fix misaligned fields in the 1920x1080-60 mode [+ + +]

Author: Steffen Persvold <spersvold@gmail.com>
Date:   Fri Jun 12 18:40:41 2026 +0200

    fbdev: modedb: Fix misaligned fields in the 1920x1080-60 mode
    
    commit d894c48a57d78206e4df9c90d4acfaf39394806a upstream.
    
    The 1920x1080@60 modedb entry has one too many initializers before
    its sync field: a stray "0" occupies the sync slot, which shifts the
    remaining values by one field. The entry therefore decodes as
    sync = 0, vmode = FB_SYNC_HOR_HIGH_ACT | FB_SYNC_VERT_HIGH_ACT (0x3,
    i.e. FB_VMODE_INTERLACED | FB_VMODE_DOUBLE), and flag =
    FB_VMODE_NONINTERLACED, instead of the intended sync = positive H/V,
    vmode = non-interlaced.
    
    fb_find_mode() then returns a 1920x1080 mode flagged as interlaced +
    doublescan with active-low syncs. Drivers that honour var->vmode and
    var->sync when programming display timing enable doublescan and the
    wrong sync polarity, corrupting the output.
    
    Drop the stray initializer so sync and vmode hold their intended
    values (positive H/V sync, non-interlaced), matching the adjacent
    1920x1200 entry.
    
    Fixes: c8902258b2b8 ("fbdev: modedb: Add 1920x1080 at 60 Hz video mode")
    Cc: stable@vger.kernel.org
    Signed-off-by: Steffen Persvold <spersvold@gmail.com>
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fbdev: omap2: fix use-after-free in omapfb_mmap [+ + +]

Author: Hongling Zeng <zenghongling@kylinos.cn>
Date:   Tue Jun 2 16:54:21 2026 +0800

    fbdev: omap2: fix use-after-free in omapfb_mmap
    
    commit 7958e67375aa111522086286bba13cfc0816ce8d upstream.
    
    omapfb_mmap() has a race condition with OMAPFB_SETUP_PLANE ioctl that
    can lead to use-after-free:
    
    The fb_mmap() entry point holds mm_lock but not lock (fb_info->lock),
    while ioctl handlers like OMAPFB_SETUP_PLANE hold lock but not mm_lock.
    This allows concurrent execution.
    
    In omapfb_mmap():
    1. rg = omapfb_get_mem_region(ofbi->region);      // Get old region ref
    2. start = omapfb_get_region_paddr(ofbi);          // Read from NEW region
    3. len = fix->smem_len;                             // Read from NEW region
    4. vm_iomap_memory(vma, start, len);               // Map NEW region memory
    5. atomic_inc(&rg->map_count);                      // Increment OLD region!
    
    Concurrently, OMAPFB_SETUP_PLANE can:
    - Reassign ofbi->region = new_rg
    - Update fix->smem_len
    - OMAPFB_SETUP_MEM then checks NEW region's map_count (0!) and frees it
    
    This leaves userspace with a mapping to freed physical memory.
    
    The fix is to read all required values (start, len) from the same
    region reference (rg) that will have its map_count incremented,
    preventing the region from being freed while still mapped.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Hongling Zeng <zenghongling@kylinos.cn>
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fpga: region: fix use-after-free in child_regions_with_firmware() [+ + +]

Author: Wentao Liang <vulab@iscas.ac.cn>
Date:   Wed Apr 8 15:45:34 2026 +0000

    fpga: region: fix use-after-free in child_regions_with_firmware()
    
    commit 54f3c5643ec523a04b6ec0e7c19eb10f5ebebdd3 upstream.
    
    Move of_node_put(child_region) after the error print to avoid accessing
    freed memory when pr_err() references child_region.
    
    Fixes: 0fa20cdfcc1f ("fpga: fpga-region: device tree control for FPGA")
    Cc: stable@vger.kernel.org
    Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
    [ Yilun: Fix the Fixes tag ]
    Reviewed-by: Xu Yilun <yilun.xu@intel.com>
    Link: https://lore.kernel.org/r/20260408154534.404327-1-vulab@iscas.ac.cn
    Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fscrypt: Fix key setup in edge case with multiple data unit sizes [+ + +]

Author: Eric Biggers <ebiggers@kernel.org>
Date:   Thu Jun 18 11:06:51 2026 -0700

    fscrypt: Fix key setup in edge case with multiple data unit sizes
    
    commit dd015b566d505d698386103e9c80b739c7336eb8 upstream.
    
    The addition of support for customizable data unit sizes introduced an
    edge case where a file's contents can be en/decrypted with the wrong
    data unit size.  It occurs when there are multiple v2 policies that:
    
    - Have *different* data unit sizes, via the log2_data_unit_size field
    
    - Share the same master_key_identifier, contents_encryption_mode, and
      either FSCRYPT_POLICY_FLAG_DIRECT_KEY,
      FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32, or
      FSCRYPT_POLICY_FLAG_IV_INO_LBLK_64
    
    - Are being used on the same filesystem, which also must be mounted with
      the "inlinecrypt" mount option.
    
    Fortunately this edge case doesn't actually occur in practice.  I just
    found it via code review.  But it needs to be fixed regardless.
    
    The bug is caused by the data unit size not being fully considered when
    blk_crypto_keys are cached in mk_direct_keys, mk_iv_ino_lblk_32_keys,
    and mk_iv_ino_lblk_64_keys.  They're differentiated only by master key,
    encryption mode, and flag.  However, each one actually has a data unit
    size too.  Only the first data unit size that is cached is used.
    
    To fix this, start using the data unit size to differentiate the cached
    keys.  For several reasons, including avoiding increasing the size of
    struct fscrypt_master_key, just replace all three arrays with a single
    linked list instead of changing them into two-dimensional arrays.  This
    works well when considering that in practice at most 2 entries are used
    across all three arrays, so it was already mostly wasted space.
    
    For simplicity, make the list also take over the publish/subscribe of
    the prepared key itself.  That is, create separate list nodes for
    blk_crypto_keys vs crypto_skciphers, and add nodes to the list only when
    their key is actually prepared.  (Note that the legacy
    fscrypt_direct_keys table in fs/crypto/keysetup_v1.c already works this
    way.)  This eliminates the need for the additional memory barriers when
    reading and writing the fields of struct fscrypt_prepared_key.
    
    Note that I technically should have included the data unit size in the
    HKDF info string as well.  But it's too late to change that.
    
    Fixes: 5b1188847180 ("fscrypt: support crypto data unit size less than filesystem block size")
    Cc: stable@vger.kernel.org
    Link: https://patch.msgid.link/20260618180652.52742-1-ebiggers@kernel.org
    Signed-off-by: Eric Biggers <ebiggers@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gcov: use atomic counter updates to fix concurrent access crashes [+ + +]

Author: Konstantin Khorenko <khorenko@virtuozzo.com>
Date:   Mon May 11 12:50:52 2026 +0200

    gcov: use atomic counter updates to fix concurrent access crashes
    
    commit 56cb9b7d96b28a1173a510ab25354b6599ad3a33 upstream.
    
    GCC's GCOV instrumentation can merge global branch counters with loop
    induction variables as an optimization.  In inflate_fast(), the inner copy
    loops get transformed so that the GCOV counter value is loaded multiple
    times to compute the loop base address, start index, and end bound.  Since
    GCOV counters are global (not per-CPU), concurrent execution on different
    CPUs causes the counter to change between loads, producing inconsistent
    values and out-of-bounds memory writes.
    
    The crash manifests during IPComp (IP Payload Compression) processing when
    inflate_fast() runs concurrently on multiple CPUs:
    
      BUG: unable to handle page fault for address: ffffd0a3c0902ffa
      RIP: inflate_fast+1431
      Call Trace:
       zlib_inflate
       __deflate_decompress
       crypto_comp_decompress
       ipcomp_decompress [xfrm_ipcomp]
       ipcomp_input [xfrm_ipcomp]
       xfrm_input
    
    At the crash point, the compiler generated three loads from the same
    global GCOV counter (__gcov0.inflate_fast+216) to compute base, start, and
    end for an indexed loop.  Another CPU modified the counter between loads,
    making the values inconsistent - the write went 3.4 MB past a 65 KB
    buffer.
    
    Add -fprofile-update=prefer-atomic to CFLAGS_GCOV at the global level in
    the top-level Makefile, guarded by a try-run compile test.  The test
    compiles a minimal program with and without -fprofile-update=prefer-atomic
    using the full KBUILD_CFLAGS, then compares undefined symbols in the
    resulting object files.  If prefer-atomic introduces new undefined
    references (such as __atomic_fetch_add_8 on i386 or __aarch64_ldadd8_relax
    on arm64 with outline-atomics), the flag is not added -- the kernel does
    not link against libatomic.
    
    On architectures where GCC inlines 64-bit atomic counter updates (x86_64,
    s390, ...) the test passes and the flag is enabled, preventing the
    compiler from merging counters with loop induction variables and fixing
    the observed concurrent-access crash.
    
    On architectures where the flag would introduce libatomic dependencies, it
    is silently omitted and behaviour is no worse than before this patch.
    
    Move the CFLAGS_GCOV block from its original position (before the arch
    Makefile include) to after the core KBUILD_CFLAGS assignments but before
    the scripts/Makefile.gcc-plugins include.  This placement ensures the
    try-run test sees arch-specific flags (-m32, -march=,
    -mno-outline-atomics) while avoiding GCC plugin flags (-fplugin=) that
    would break the test on clean builds when plugin shared objects do not yet
    exist.
    
    Link: https://lore.kernel.org/20260511105052.417187-2-khorenko@virtuozzo.com
    Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
    Tested-by: Arnd Bergmann <arnd@arndb.de>
    Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com>
    Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com>
    Cc: Masahiro Yamada <masahiroy@kernel.org>
    Cc: Miguel Ojeda <ojeda@kernel.org>
    Cc: Mikhail Zaslonko <zaslonko@linux.ibm.com>
    Cc: Nathan Chancellor <nathan@kernel.org>
    Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
    Cc: Thomas Weißschuh <linux@weissschuh.net>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gfs2: fix use-after-free in gfs2_qd_dealloc [+ + +]

Author: Tristan Madani <tristan@talencesecurity.com>
Date:   Fri May 1 11:02:03 2026 +0000

    gfs2: fix use-after-free in gfs2_qd_dealloc
    
    commit f9c9ec2c319f843b70ecdf939d48b52d189bc081 upstream.
    
    gfs2_qd_dealloc(), called as an RCU callback from gfs2_qd_dispose(),
    accesses the superblock object sdp through qd->qd_sbd after freeing qd.
    It does so to decrement sd_quota_count and wake up sd_kill_wait.
    
    However, by the time the RCU callback runs, gfs2_put_super() may have
    already freed sdp via free_sbd().  This can happen when
    gfs2_quota_cleanup() is called during unmount: it disposes of quota
    objects via call_rcu() and then waits on sd_kill_wait with a 60-second
    timeout.  If the timeout expires, or if gfs2_gl_hash_clear() triggers
    additional qd_put() calls that schedule more RCU callbacks after the
    wait completes, gfs2_put_super() will proceed to free the superblock
    while RCU callbacks referencing it are still pending.
    
    Add an rcu_barrier() before free_sbd() in gfs2_put_super() to ensure
    all pending RCU callbacks (including gfs2_qd_dealloc) have completed
    before the superblock is freed.
    
    Fixes: a475c5dd16e5 ("gfs2: Free quota data objects synchronously")
    Reported-by: syzbot+42a37bf8045847d8f9d2@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=42a37bf8045847d8f9d2
    Tested-by: syzbot+42a37bf8045847d8f9d2@syzkaller.appspotmail.com
    Cc: stable@vger.kernel.org
    Signed-off-by: Tristan Madani <tristan@talencesecurity.com>
    Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hdlc_ppp: sync per-proto timers before freeing hdlc state [+ + +]

Author: Fan Wu <fanwu01@zju.edu.cn>
Date:   Wed Jun 17 02:05:18 2026 +0000

    hdlc_ppp: sync per-proto timers before freeing hdlc state
    
    commit c78a4e41ab5ead6193ad8a2dd92e8906bae659fa upstream.
    
    Each PPP control protocol (LCP/IPCP/IPV6CP) embedded in struct ppp
    registers a timer via timer_setup(). That struct ppp is the
    hdlc->state allocation, which detach_hdlc_protocol() frees with kfree()
    in both teardown paths: unregister_hdlc_device() and the re-attach inside
    attach_hdlc_protocol().
    
    The ppp proto never registered a .detach callback, so
    detach_hdlc_protocol() performs no timer synchronization before the
    kfree(). The only cancel, timer_delete(&proto->timer) in ppp_cp_event(),
    is partial (it does not wait for a running callback) and only runs on the
    ->CLOSED transition; ppp_stop()/ppp_close() do not sync either. A
    ppp_timer callback already executing (blocked on ppp->lock) survives the
    kfree and then dereferences proto->state / ppp->lock in freed memory,
    leading to a use-after-free.
    
    Fix this by adding a .detach helper that calls timer_shutdown_sync() on
    every per-proto timer. detach_hdlc_protocol() invokes proto->detach(dev)
    before kfree(hdlc->state), so timer_shutdown_sync()
    now runs on both free paths.
    timer_shutdown_sync() is used instead of timer_delete_sync() because the
    keepalive path re-arms the timer through add_timer()/mod_timer() and
    shutdown blocks any re-activation during teardown.
    
    Initialize the per-protocol timers in ppp_ioctl() when the protocol is
    attached, and remove the now-redundant timer_setup() from ppp_start(), so
    that the timers are initialized exactly once at attach time and
    ppp_timer_release() never operates on uninitialized timer_list
    structures. attach_hdlc_protocol() uses kmalloc() (not kzalloc), so
    struct ppp's protos[i].timer is uninitialized garbage until the first
    timer_setup(); without this init-at-attach, attaching the PPP protocol
    without ever bringing the device up would leave timer_shutdown_sync()
    operating on uninitialized memory in .detach. Moving the init out of
    ppp_start() (which only runs on NETDEV_UP) into the attach path makes the
    initialization unconditional and avoids initializing the same timer_list
    twice.
    
    This bug was found by static analysis.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Cc: stable@vger.kernel.org
    Signed-off-by: Fan Wu <fanwu01@zju.edu.cn>
    Link: https://patch.msgid.link/20260617020518.116319-1-fanwu01@zju.edu.cn
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: core: fix adapter registration race [+ + +]

Author: Johan Hovold <johan@kernel.org>
Date:   Mon May 11 16:37:12 2026 +0200

    i2c: core: fix adapter registration race
    
    commit ba14d7cf2fe7284610a29854bdff22b2537d3ce6 upstream.
    
    Adapters can be looked up based on their id using i2c_get_adapter()
    which takes a reference to the embedded struct device.
    
    Make sure that the adapter (including its struct device) has been
    initialised before adding it to the IDR to avoid accessing uninitialised
    data which could, for example, lead to NULL-pointer dereferences or
    use-after-free.
    
    Note that the i2c-dev chardev, which is registered from a bus notifier,
    currently uses i2c_get_adapter() so the adapter needs to be added to the
    IDR before registration.
    
    Fixes: 6e13e6418418 ("i2c: Add i2c_add_numbered_adapter()")
    Cc: stable@vger.kernel.org      # 2.6.22
    Signed-off-by: Johan Hovold <johan@kernel.org>
    Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipv4: account for fraggap on the paged allocation path [+ + +]

Author: Wongi Lee <qw3rtyp0@gmail.com>
Date:   Tue Jun 16 22:38:29 2026 +0900

    ipv4: account for fraggap on the paged allocation path
    
    [ Upstream commit eca856950f7cb1a221e02b99d758409f2c5cec42 ]
    
    In __ip_append_data(), when the paged-allocation branch is taken,
    alloclen and pagedlen are computed as
    
            alloclen = fragheaderlen + transhdrlen;
            pagedlen = datalen - transhdrlen;
    
    datalen already includes fraggap, but the fraggap bytes carried over
    from the previous skb are copied into the new skb's linear area at
    offset transhdrlen by the subsequent skb_copy_and_csum_bits(). The
    linear area is therefore undersized by fraggap bytes while pagedlen is
    overstated by the same amount.
    
    The non-paged branch sets alloclen to fraglen, which already accounts
    for fraggap because datalen does. Bring the paged branch in line by
    adding fraggap to alloclen and subtracting it from pagedlen.
    
    After this adjustment, copy no longer collapses to -fraggap on the
    paged path, so remove the stale comment describing that old arithmetic.
    
    Fixes: 8eb77cc73977 ("ipv4: avoid partial copy for zc")
    Signed-off-by: Jungwoo Lee <jwlee2217@gmail.com>
    Signed-off-by: Wongi Lee <qw3rtyp0@gmail.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Link: https://patch.msgid.link/ajFR1eLAIs42TN3g@DESKTOP-19IMU7U.localdomain
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv6: account for fraggap on the paged allocation path [+ + +]

Author: Wongi Lee <qw3rtyp0@gmail.com>
Date:   Tue Jun 16 22:46:17 2026 +0900

    ipv6: account for fraggap on the paged allocation path
    
    commit 736b380e28d0480c7bc3e022f1950f31fe53a7c5 upstream.
    
    In __ip6_append_data(), when the paged-allocation branch is taken
    (MSG_MORE / NETIF_F_SG / large fraglen), alloclen and pagedlen are
    computed as
    
            alloclen = fragheaderlen + transhdrlen;
            pagedlen = datalen - transhdrlen;
    
    datalen already includes fraggap (datalen = length + fraggap). When
    fraggap is non-zero, this is not the first skb and transhdrlen is zero.
    The fraggap bytes carried over from the previous skb are copied just past
    the fragment headers in the new skb's linear area. The linear area is
    therefore undersized by fraggap bytes while pagedlen is overstated by the
    same amount, and the copy writes past skb->end into the trailing
    skb_shared_info.
    
    An unprivileged user can trigger this via a UDPv6 socket using
    MSG_MORE together with MSG_SPLICE_PAGES.
    
    The bad accounting was introduced by commit 773ba4fe9104 ("ipv6:
    avoid partial copy for zc"). Before commit ce650a166335 ("udp6: Fix
    __ip6_append_data()'s handling of MSG_SPLICE_PAGES"), the negative
    copy value caused -EINVAL to be returned. That later commit allowed
    MSG_SPLICE_PAGES to proceed in this case, making the corruption
    triggerable.
    
    The non-paged branch sets alloclen to fraglen, which already accounts
    for fraggap because datalen does. Bring the paged branch in line by
    adding fraggap to alloclen and subtracting it from pagedlen.
    
    After this adjustment, copy no longer collapses to -fraggap on the
    paged path, so remove the stale comment describing that old arithmetic.
    Since a negative copy is no longer expected for a valid MSG_SPLICE_PAGES
    case, remove the MSG_SPLICE_PAGES exception from the negative copy check.
    
    Fixes: 773ba4fe9104 ("ipv6: avoid partial copy for zc")
    Signed-off-by: Jungwoo Lee <jwlee2217@gmail.com>
    Signed-off-by: Wongi Lee <qw3rtyp0@gmail.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Link: https://patch.msgid.link/ajFTqRljatR17fFy@DESKTOP-19IMU7U.localdomain
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

irqchip/imgpdc: Fix resource leak, add missing chained handler cleanup on remove [+ + +]

Author: Qingshuang Fu <fuqingshuang@kylinos.cn>
Date:   Thu Jun 18 10:13:52 2026 +0800

    irqchip/imgpdc: Fix resource leak, add missing chained handler cleanup on remove
    
    commit 37738fdf2ab1e504d1c63ce5bc0aeb6452d8f057 upstream.
    
    The driver allocates domain generic chips using
    irq_alloc_domain_generic_chips() during probe and sets up chained
    handlers using irq_set_chained_handler_and_data(). However, on driver
    removal, the generic chips are not freed and the chained handlers are
    not removed.
    
    The generic chips remain on the global gc_list and may later be accessed by
    generic interrupt chip suspend, resume, or shutdown callbacks after the
    driver has been removed, potentially resulting in a use-after-free and
    kernel crash.
    
    The chained handlers that were installed in probe for peripheral and
    syswake interrupts are also left dangling, which can lead to spurious
    interrupts accessing freed memory.
    
    Fix these issues by:
    
      - Setting IRQ_DOMAIN_FLAG_DESTROY_GC flag in domain->flags, so the
        core code automatically removes generic chips when irq_domain_remove()
        is called
    
      - Clearing all chained handlers with NULL in pdc_intc_remove()
    
    Fixes: b6ef9161e43a ("irq-imgpdc: add ImgTec PDC irqchip driver")
    Signed-off-by: Qingshuang Fu <fuqingshuang@kylinos.cn>
    Signed-off-by: Thomas Gleixner <tglx@kernel.org>
    Cc: stable@vger.kernel.org
    Link: https://patch.msgid.link/20260618021352.661773-1-fffsqian@163.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

kernel/fork: clear PF_BLOCK_TS in copy_process() [+ + +]

Author: Usama Arif <usama.arif@linux.dev>
Date:   Tue Jun 16 07:15:17 2026 -0700

    kernel/fork: clear PF_BLOCK_TS in copy_process()
    
    commit fd38b75c4b43295b10d69772a46d1c74dbd6fc81 upstream.
    
    PF_BLOCK_TS is only set in blk_time_get_ns() when current->plug is
    non-NULL, and blk_finish_plug() clears it via __blk_flush_plug()
    before NULLing the plug pointer.  copy_process() breaks the
    invariant by inheriting PF_BLOCK_TS from the parent while resetting
    the child's plug to NULL.
    
    Clear PF_BLOCK_TS alongside that assignment so callers can rely on
    "PF_BLOCK_TS set implies current->plug != NULL" and dereference
    current->plug unguarded.
    
    Fixes: 06b23f92af87 ("block: update cached timestamp post schedule/preemption")
    Cc: stable@vger.kernel.org
    Signed-off-by: Usama Arif <usama.arif@linux.dev>
    Link: https://patch.msgid.link/20260616141604.328820-2-usama.arif@linux.dev
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KEYS: fix overflow in keyctl_pkey_params_get_2() [+ + +]

Author: Jarkko Sakkinen <jarkko@kernel.org>
Date:   Mon Jun 1 23:11:54 2026 +0300

    KEYS: fix overflow in keyctl_pkey_params_get_2()
    
    commit cb481e59ea6cae3b7796ac1d7a22b6b24c3f3c0b upstream.
    
    The length for the internal output buffer is calculated incorrectly, which
    can result overflow when a too small buffer is provided.
    
    Fix the bug by allocating internal output with the size of the maximum
    length of the cryptographic primitive instead of caller provided size.
    
    Link: https://lore.kernel.org/keyrings/20260531024914.3712130-1-jarkko@kernel.org/
    Cc: stable@vger.kernel.org # v4.20+
    Fixes: 00d60fd3b932 ("KEYS: Provide keyctls to drive the new key type ops for asymmetric keys [ver #2]")
    Reported-by: Alessandro Groppo <ale.grpp@gmail.com>
    Tested-by: Alessandro Groppo <ale.grpp@gmail.com>
    Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

keys: Pin request_key_auth payload in instantiate paths [+ + +]

Author: Shaomin Chen <eeesssooo020@gmail.com>
Date:   Wed Jun 10 13:10:05 2026 +0300

    keys: Pin request_key_auth payload in instantiate paths
    
    commit fd15b457a86939c38aa12116adabd8ff686c5e51 upstream.
    
    A: request_key()       B: KEYCTL_INSTANTIATE_IOV
    ================       =========================
    
    create auth key
    store rka in auth key
    wait for helper
                           get auth key
                           load rka from auth key
                           copy user payload
                           sleep on #PF
    
    helper completed
    detach and free rka
    destroy auth key
                           wake up
                           use rka->target_key
                           **USE-AFTER-FREE**
    
    Give request_key_auth payloads a refcount.  Take a payload reference while
    authkey->sem stabilizes the payload and revocation state.  Hold that
    reference across the instantiate and reject paths.  Drop the auth key
    owning reference from revoke and destroy.
    
    [jarkko: Replaced the first two paragraphs of text with an actual
     concurrency scenario.]
    Cc: stable@vger.kernel.org # v5.10+
    Fixes: b5f545c880a2 ("[PATCH] keys: Permit running process to instantiate keys")
    Reported-by: Shaomin Chen <eeesssooo020@gmail.com>
    Closes: https://lore.kernel.org/r/20260519144403.436694-1-eeesssooo020@gmail.com
    Signed-off-by: Shaomin Chen <eeesssooo020@gmail.com>
    Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ksmbd: fix out-of-bounds read in smb_check_perm_dacl() [+ + +]

Author: Hem Parekh <hemparekh1596@gmail.com>
Date:   Tue Jun 2 16:56:46 2026 -0700

    ksmbd: fix out-of-bounds read in smb_check_perm_dacl()
    
    commit 1ef06004ed4bd6d3ed8c840d9d1a376b66d4935b upstream.
    
    The permission-check ACE walk in smb_check_perm_dacl() validates the ACE
    header size and caps sid.num_subauth at SID_MAX_SUB_AUTHORITIES, but it
    never checks that ace->size is actually large enough to contain
    num_subauth sub-authorities before compare_sids() dereferences them.
    
    CIFS_SID_BASE_SIZE covers the SID header up to but excluding the
    sub_auth[] array, and offsetof(struct smb_ace, sid) is the ACE header,
    so the existing guards only guarantee the 8-byte SID base, i.e. zero
    sub-authorities. compare_sids() then reads ace->sid.sub_auth[i] for
    i < min(local_sid->num_subauth, ace->sid.num_subauth). The local
    comparison SIDs (sid_everyone, sid_unix_NFS_mode, and the id_to_sid()
    result) always have at least one sub-authority, and an attacker controls
    the ACE revision and authority bytes (which lie within the in-bounds SID
    base), so they can match one of those SIDs and force the sub_auth read.
    
    A crafted ACE with size == 16 and num_subauth >= 1 placed at the tail of
    the security descriptor therefore causes a heap out-of-bounds read of up
    to SID_MAX_SUB_AUTHORITIES * sizeof(__le32) bytes past the pntsd
    allocation. The security descriptor is loaded by ksmbd_vfs_get_sd_xattr()
    into a buffer sized exactly to the on-disk data (kzalloc(sd_size) in
    ndr_decode_v4_ntacl()), so the read lands past the allocation. The
    malformed descriptor can be stored verbatim via SMB2_SET_INFO (the DACL
    is not normalised before being written to the security.NTACL xattr) and
    the read fires on a subsequent SMB2_CREATE access check, making this
    reachable by an authenticated client on a share that uses ACL xattrs.
    
    Add the missing num_subauth-versus-ace_size check, mirroring the
    identical guards already present in the sibling parsers parse_dacl() and
    smb_inherit_dacl().
    
    Fixes: d07b26f39246 ("ksmbd: require minimum ACE size in smb_check_perm_dacl()")
    Cc: stable@vger.kernel.org
    Signed-off-by: Hem Parekh <hemparekh1596@gmail.com>
    Acked-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: arm64: Omit tag sync on stage-2 mappings of the zero page [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Thu Jun 4 17:11:56 2026 +0200

    KVM: arm64: Omit tag sync on stage-2 mappings of the zero page
    
    commit 2986a625740599fe6e7635b0586fed2a95bcd1f7 upstream.
    
    Commit
    
       f620d66af316 ("arm64: mte: Do not flag the zero page as PG_mte_tagged")
    
    removed the PG_mte_tagged flag from the zero page, but missed a KVM code
    path that may set this flag on the zero page when it is used in a
    stage-2 CoW mapping of anonymous memory.
    
    So disregard the zero page explicitly in sanitise_mte_tags().
    
    Fixes: f620d66af316 ("arm64: mte: Do not flag the zero page as PG_mte_tagged")
    Cc: stable@vger.kernel.org # 5.10.x
    Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Will Deacon <will@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: Replace guest-triggerable BUG_ON() in ioeventfd datamatch with get_unaligned() [+ + +]

Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Jun 12 15:52:41 2026 -0700

    KVM: Replace guest-triggerable BUG_ON() in ioeventfd datamatch with get_unaligned()
    
    commit f1edbed787ba67988ed34e0132ca128b052b6ce8 upstream.
    
    Drop a BUG_ON() that has been reachable since it was first added, way back
    in 2009, and instead use get_unaligned() to perform potentially-unaligned
    accesses.
    
    For a given store, KVM x86's emulator tracks the entire value in the
    destination operand, x86_emulate_ctxt.dst.  If the destination is memory,
    and the target splits multiple pages and/or is emulated MMIO, then KVM
    handles each fragment independently.  E.g. on a page split starting at page
    offset 0xffc, KVM writes 4 bytes to the first page, then the remaining
    bytes to the second page, using ctxt->dst as the source for both (with
    appropriate offsets).
    
    If the destination splits a page *and* hits emulated MMIO on the second
    page, then KVM will complete the write to the first page, then emulate the
    MMIO access to the second page.  If there is a datamatch-enabled ioeventfd
    at offset 0 of the second page, then KVM will process the remainder of the
    store as a potential ioeventfd signal.
    
    Putting it all together, if the guest emits a store that splits a page
    starting at page offset N, and the second page has a datamatch-enabled
    ioeventfd at offset 0, then KVM will check for datamatch using
    &dst.valptr[N] as the source.  Due to dst (and thus dst.valptr) being
    32-byte aligned, if N is not aligned to @len, the BUG_ON() fires.
    
    E.g. with a 16-byte store at page offset 0xffc, to an ioeventfd of len 8,
    all initial checks in ioeventfd_in_range() will succeed, and the BUG_ON()
    fires due to @val being 4-byte aligned, but not 8-byte aligned.
    
      ------------[ cut here ]------------
      kernel BUG at arch/x86/kvm/../../../virt/kvm/eventfd.c:783!
      Oops: invalid opcode: 0000 [#1] SMP
      CPU: 0 UID: 1000 PID: 615 Comm: repro Not tainted 7.1.0-rc2-ff238429d1ea #365 PREEMPT
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      RIP: 0010:ioeventfd_write+0x6c/0x70 [kvm]
      Call Trace:
       <TASK>
       __kvm_io_bus_write+0x85/0xb0 [kvm]
       kvm_io_bus_write+0x53/0x80 [kvm]
       vcpu_mmio_write+0x66/0xf0 [kvm]
       emulator_read_write_onepage+0x12a/0x540 [kvm]
       emulator_read_write+0x109/0x2b0 [kvm]
       x86_emulate_insn+0x4f8/0xfb0 [kvm]
       x86_emulate_instruction+0x181/0x790 [kvm]
       kvm_mmu_page_fault+0x313/0x630 [kvm]
       vmx_handle_exit+0x18a/0x590 [kvm_intel]
       kvm_arch_vcpu_ioctl_run+0xc81/0x1c90 [kvm]
       kvm_vcpu_ioctl+0x2d5/0x970 [kvm]
       __x64_sys_ioctl+0x8a/0xd0
       do_syscall_64+0xb7/0x890
       entry_SYSCALL_64_after_hwframe+0x4b/0x53
      RIP: 0033:0x7f19c931a9bf
       </TASK>
      Modules linked in: kvm_intel kvm irqbypass
      ---[ end trace 0000000000000000 ]---
    
    In a perfect world, the fix would be to simply delete the BUG_ON(), as KVM
    x86 doesn't perform alignment checks on "normal" memory accesses at CPL0.
    Sadly, C99 ruins all the fun; while the x86 architecture plays nice,
    dereferencing an unaligned pointer directly is undefined behavior in C,
    e.g. triggers splats when running with CONFIG_UBSAN_ALIGNMENT=y.
    
    Fixes: d34e6b175e61 ("KVM: add ioeventfd support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-ID: <20260612225241.678509-1-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: SVM: Fix page overflow in sev_dbg_crypt() for ENCRYPT path [+ + +]

Author: Ashutosh Desai <ashutoshdesai993@gmail.com>
Date:   Fri May 1 13:35:32 2026 -0700

    KVM: SVM: Fix page overflow in sev_dbg_crypt() for ENCRYPT path
    
    commit 78ee2d50185a037b3d2452a97f3dad69c3f7f389 upstream.
    
    In sev_dbg_crypt(), the per-iteration transfer length is bounded by
    the source page offset (PAGE_SIZE - s_off) but not by the destination
    page offset (PAGE_SIZE - d_off).  When d_off > s_off, the encrypt
    path (__sev_dbg_encrypt_user) performs a read-modify-write using a
    single-page intermediate buffer (dst_tpage):
    
      1. __sev_dbg_decrypt() expands the size to round_up(len + (d_off & 15), 16)
         before issuing the PSP command.  If len + (d_off & 15) > PAGE_SIZE,
         the PSP writes beyond the end of the 4096-byte dst_tpage allocation.
    
      2. The subsequent memcpy()/copy_from_user() into
         page_address(dst_tpage) + (d_off & 15) of 'len' bytes overflows
         by up to 15 bytes under the same condition.
    
    Trigger example: s_off = 0, d_off = 1, debug.len = PAGE_SIZE -
    the PSP is instructed to write round_up(4097, 16) = 4112 bytes to
    a 4096-byte buffer.
    
    Fix by also bounding len by (PAGE_SIZE - d_off), the same check that
    sev_send_update_data() already performs for its single-page guest
    region.
    
     ==================================================================
     BUG: KASAN: slab-use-after-free in sev_dbg_crypt+0x993/0xd10 [kvm_amd]
     Write of size 4095 at addr ff110062293bb009 by task sev_dbg_test/228214
    
     CPU: 96 UID: 0 PID: 228214 Comm: sev_dbg_test Tainted: G     U  W           7.0.0-smp--5ce9b0c48211-dbg #156 PREEMPTLAZY
     Tainted: [U]=USER, [W]=WARN
     Hardware name: Google Astoria/astoria, BIOS 0.20250817.1-0 08/25/2025
     Call Trace:
      <TASK>
      dump_stack_lvl+0x54/0x70
      print_report+0xbc/0x260
      kasan_report+0xa2/0xd0
      kasan_check_range+0x25f/0x2c0
      __asan_memcpy+0x40/0x70
      sev_dbg_crypt+0x993/0xd10 [kvm_amd]
      sev_mem_enc_ioctl+0x33c/0x450 [kvm_amd]
      kvm_vm_ioctl+0x65d/0x6d0 [kvm]
      __se_sys_ioctl+0xb2/0x100
      do_syscall_64+0xe8/0x870
      entry_SYSCALL_64_after_hwframe+0x4b/0x53
      </TASK>
    
     The buggy address belongs to the physical page:
     page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7fe72b6a0 pfn:0x62293bb
     memcg:ff11000112827d82
     flags: 0x1400000000000000(node=1|zone=1)
     raw: 1400000000000000 0000000000000000 dead000000000122 0000000000000000
     raw: 00000007fe72b6a0 0000000000000000 00000001ffffffff ff11000112827d82
     page dumped because: kasan: bad access detected
    
     Memory state around the buggy address:
      ff110062293bbf00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      ff110062293bbf80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
     >ff110062293bc000: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
                        ^
      ff110062293bc080: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      ff110062293bc100: fa fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
     ==================================================================
     Disabling lock debugging due to kernel taint
    
    Fixes: 24f41fb23a39 ("KVM: SVM: Add support for SEV DEBUG_DECRYPT command")
    Fixes: 7d1594f5d94b ("KVM: SVM: Add support for SEV DEBUG_ENCRYPT command")
    Cc: stable@vger.kernel.org
    Signed-off-by: Ashutosh Desai <ashutoshdesai993@gmail.com>
    [sean: add sample KASAN splat, Fixes, and stable@]
    Link: https://patch.msgid.link/20260501203537.2120074-2-seanjc@google.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level [+ + +]

Author: Sean Christopherson <seanjc@google.com>
Date:   Wed Apr 29 09:34:01 2026 -0700

    KVM: x86/mmu: Ensure hugepage is in by slot before checking max mapping level
    
    commit ef057cbf825e03b63f6edf5980f96abf3c53089d upstream.
    
    When recovering hugepages in the shadow MMU, verify that the base gfn of
    the shadow page is actually contained within the target memslot, *before*
    querying the max mapping level given the shadow page's gfn.  Failure to
    pre-check the validity of the gfn can lead to an out-of-bounds access to
    the slot's lpage_info (which typically manifests as a host #PF because the
    lpage_info is vmalloc'd) if the guest creates a hugepage mapping (in its
    PTEs) that extends "below" the bounds of a memslot.
    
    When faulting in memory for a guest, and the size of the guest mapping is
    greater than KVM's (current) max mapping, then KVM will create a "direct"
    shadow page (direct in that there are no gPTEs to shadow, and so the target
    gfn is a direct calculation given the base gfn of the shadow page).  The
    hugepage recovery flow looks for such direct shadow pages, as forcing 4KiB
    mappings when dirty logging generates the guest > host mapping size case.
    When the 4KiB restriction is lifted, then KVM can replace the shadow page
    with a hugepage.
    
    But if KVM originally used a smaller mapping than the guest because the
    range of memory covered by the guest hugepage exceeds the bounds of a
    memslot, then KVM will link a direct shadow page with a gfn that is outside
    the bounds of the memslot being used to fault in memory.  The rmap entry
    added for the leaf mapping is correct and within bounds, but the gfn of the
    leaf SPTE's parent shadow page will be out of bounds.
    
      BUG: unable to handle page fault for address: ffffc90000806ffc
      #PF: supervisor read access in kernel mode
      #PF: error_code(0x0000) - not-present page
      PGD 100000067 P4D 100000067 PUD 1002a7067 PMD 10612f067 PTE 0
      Oops: Oops: 0000 [#1] SMP
      CPU: 13 UID: 1000 PID: 757 Comm: mmu_stress_test Not tainted 7.1.0-rc1-48ce1e26eace-x86_pir_to_irr_comments-vm #341 PREEMPT
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      RIP: 0010:kvm_mmu_max_mapping_level+0x79/0x2b0 [kvm]
      Call Trace:
       <TASK>
       kvm_mmu_recover_huge_pages+0x21b/0x320 [kvm]
       kvm_set_memslot+0x1ee/0x590 [kvm]
       kvm_set_memory_region.part.0+0x3a1/0x4d0 [kvm]
       kvm_vm_ioctl+0x9bf/0x15d0 [kvm]
       __x64_sys_ioctl+0x8a/0xd0
       do_syscall_64+0xb7/0xbb0
       entry_SYSCALL_64_after_hwframe+0x4b/0x53
      RIP: 0033:0x7f21c0f1a9bf
       </TASK>
    
    Don't bother pre-checking the bounds of the potential hugepage, i.e. don't
    check that e.g. sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level + 1) is also
    within the memslot, as the checks performed by kvm_mmu_max_mapping_level()
    are a superset of the basic bounds checks.  I.e. pre-checking the full
    range would be a dubious micro-optimization.
    
    Fixes: 9eba50f8d7fc ("KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs")
    Cc: stable@vger.kernel.org
    Cc: David Matlack <dmatlack@google.com>
    Cc: James Houghton <jthoughton@google.com>
    Cc: Alexander Bulekov <bkov@amazon.com>
    Cc: Fred Griffoul <fgriffo@amazon.co.uk>
    Cc: Alexander Graf <graf@amazon.de>
    Cc: David Woodhouse <dwmw@amazon.co.uk>
    Cc: Filippo Sironi <sironi@amazon.de>
    Cc: Ivan Orlov <iorlov@amazon.co.uk>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: x86: Fix shadow paging use-after-free due to unexpected role [+ + +]

Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Fri Jun 26 13:22:32 2026 +0200

    KVM: x86: Fix shadow paging use-after-free due to unexpected role
    
    commit 81ccda30b4e83d8f5cc4fd50503c44e3a33abfeb upstream.
    
    Commit 0cb2af2ea66ad ("KVM: x86: Fix shadow paging use-after-free due
    to unexpected GFN") fixed a shadow paging mismatch between stored and
    computed GFNs; the bug could be triggered by changing a PDE mapping from
    outside the guest, and then deleting a memslot.  The rmap_remove()
    call would miss entries created after the PDE change because the GFN
    of the leaf SPTE does not match the GFN of the struct kvm_mmu_page.
    
    A similar hole however remains if the modified PDE points to a non-leaf
    page.  In this case the gfn can be made to match, but the role does not
    match: the original large 2MB page creates a kvm_mmu_page with direct=1,
    while the new 4KB needs a kvm_mmu_page with direct=0.  However,
    kvm_mmu_get_child_sp() does not compare the role, and therefore reuses
    the page.
    
    The next step is installing a leaf (4KB) SPTE on the new path which
    records an rmap entry under the gfn resolved by the walk.  But when
    that child is zapped its parent kvm_mmu_page has direct=1 and
    kvm_mmu_page_get_gfn() computes the gfn for the 4KB page as
    sp->gfn + index instead of using sp->shadowed_translation[] (or sp->gfns[]
    in older kernels).  It therefore fails to remove the recorded entry.
    
    When the memslot is dropped the shadow page is freed but the rmap
    entry survives, as in the scenario that was already fixed.  Code that
    later walks that gfn (dirty logging, MMU notifier invalidation, and
    so on) dereferences an sptep that lies in the freed page, causing the
    use-after-free.
    
    Fixes: 2032a93d66fa ("KVM: MMU: Don't allocate gfns page for direct mmu pages")
    Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

KVM: x86: hyper-v: Bound the bank index when querying sparse banks [+ + +]

Author: Hyunwoo Kim <imv4bel@gmail.com>
Date:   Sat Jun 6 23:44:52 2026 +0900

    KVM: x86: hyper-v: Bound the bank index when querying sparse banks
    
    commit 4721f8160f17554b003e8928bb61e6c9b2fe92a3 upstream.
    
    When checking if a VP ID is included in a sparse bank set, explicitly check
    that the ID can actually be contained in a sparse bank (the TLFS allows for
    a maximum of 64 banks of 64 vCPUs each).  When handling a paravirtual TLB
    flush for L2, the VP ID is copied verbatim from the enlightened VMCS,
    without any bounds check, i.e. isn't guaranteed to be under the limit of
    4096.
    
    Failure to check the bounds of the VP ID leads to an out-of-bounds read
    when testing the sparse bank, and super strictly speaking could lead to KVM
    performing an unnecessary TLB flush for an L2 vCPU.
    
      ==================================================================
      BUG: KASAN: use-after-free in hv_is_vp_in_sparse_set+0x85/0x100 [kvm]
      Read of size 8 at addr ffff88811ba5f598 by task hyperv_evmcs/2802
    
      CPU: 12 UID: 1000 PID: 2802 Comm: hyperv_evmcs Not tainted 7.1.0-rc2 #7 PREEMPT
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
      Call Trace:
       <TASK>
       dump_stack_lvl+0x51/0x60
       print_report+0xcb/0x5d0
       kasan_report+0xb4/0xe0
       kasan_check_range+0x35/0x1b0
       hv_is_vp_in_sparse_set+0x85/0x100 [kvm]
       kvm_hv_flush_tlb+0xe9e/0x16c0 [kvm]
       kvm_hv_hypercall+0xe6b/0x1e60 [kvm]
       vmx_handle_exit+0x485/0x1b60 [kvm_intel]
       kvm_arch_vcpu_ioctl_run+0x22e3/0x5070 [kvm]
       kvm_vcpu_ioctl+0x5d0/0x10c0 [kvm]
       __x64_sys_ioctl+0x129/0x1a0
       do_syscall_64+0xb9/0xcf0
       entry_SYSCALL_64_after_hwframe+0x4b/0x53
      RIP: 0033:0x7f0e62d1a9bf
       </TASK>
    
      The buggy address belongs to the physical page:
      page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffffffffffffffff pfn:0x11ba5f
      flags: 0x4000000000000000(zone=1)
      raw: 4000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      raw: ffffffffffffffff 0000000000000000 00000000ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
    
      Memory state around the buggy address:
       ffff88811ba5f480: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff88811ba5f500: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      >ffff88811ba5f580: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                  ^
       ffff88811ba5f600: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff88811ba5f680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      ==================================================================
      Disabling lock debugging due to kernel taint
    
    Opportunistically add a compile time assertion to ensure the maximum number
    of sparse banks exactly matches the number of possible bits in the passed
    in mask.
    
    Cc: stable@vger.kernel.org
    Fixes: c58a318f6090 ("KVM: x86: hyper-v: L2 TLB flush")
    Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
    Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
    Link: https://patch.msgid.link/aiQyZIJtO-2Aj_xN@v4bel
    [sean: add KASAN splat, drop comment, add assert, massage changelog]
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux: Linux 7.1.3 [+ + +]

Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Sat Jul 4 13:45:09 2026 +0200

    Linux 7.1.3
    
    Link: https://lore.kernel.org/r/20260702155112.964534952@linuxfoundation.org
    Tested-by: Ronald Warsow <rwarsow@gmx.de>
    Tested-by: Brett A C Sheffield <bacs@librecast.net>
    Tested-by: Salvatore Bonaccorso <carnil@debian.org>
    Tested-by: Peter Schneider <pschneider1968@googlemail.com>
    Tested-by: Miguel Ojeda <ojeda@kernel.org>
    Link: https://lore.kernel.org/r/20260703072822.817328079@linuxfoundation.org
    Tested-by: Pavel Machek (CIP) <pavel@nabladev.com>
    Tested-by: Brett A C Sheffield <bacs@librecast.net>
    Tested-by: Ronald Warsow <rwarsow@gmx.de>
    Tested-by: Dileep Malepu <dileep.debian@gmail.com>
    Tested-by: Mark Brown <broonie@kernel.org>
    Tested-by: Takeshi Ogasawara <takeshi.ogasawara@futuring-girl.com>
    Tested-by: Peter Schneider <pschneider1968@googlemail.com>
    Tested-by: Ron Economos <re@w6rz.net>
    Tested-by: Miguel Ojeda <ojeda@kernel.org>
    Tested-by: Barry K. Nathan <barryn@pobox.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

LoongArch: Report dying CPU to RCU in stop_this_cpu() [+ + +]

Author: Huacai Chen <chenhuacai@kernel.org>
Date:   Thu Jun 25 13:03:49 2026 +0800

    LoongArch: Report dying CPU to RCU in stop_this_cpu()
    
    commit f2539c56c74691e7a88af6372ba2b48c06ed2fe4 upstream.
    
    This is a port of MIPS commit 9f3f3bdc6d9dac1 ("MIPS: smp: report dying
    CPU to RCU in stop_this_cpu()"). smp_send_stop() parks all secondary
    CPUs in stop_this_cpu(). And the function marks the CPU offline for the
    scheduler via set_cpu_online(false) but never informs RCU, so RCU keeps
    expecting a quiescent state from CPUs that are now spinning forever with
    interrupts disabled.
    
    As long as nothing waits for an RCU grace period after smp_send_stop()
    this is harmless, which is why it went unnoticed. However, since commit
    91840be8f710370 ("irq_work: Fix use-after-free in irq_work_single() on
    PREEMPT_RT"), irq_work_sync() calls synchronize_rcu() on architectures
    without an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt()
    returns false. Any irq_work_sync() issued in the reboot/shutdown/halt
    path after smp_send_stop() then blocks on a grace period that can never
    complete, hanging the reboot:
    
      WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
      ...
      rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
      rcu: Offline CPU 1 blocking current GP.
      rcu: Offline CPU 2 blocking current GP.
      rcu: Offline CPU 3 blocking current GP.
    
    This issue needs some hacks to reproduce, and it was not noticed on
    LoongArch because arch_irq_work_has_interrupt() usually returns true.
    
    Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring
    the generic CPU-hotplug offline path, so RCU stops waiting on the parked
    CPUs and grace periods can still complete. LoongArch shuts down all CPUs
    here without going through the CPU-hotplug mechanism, so this report is
    not otherwise issued.
    
    Cc: <stable@vger.kernel.org>
    Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
    Reviewed-by: Guo Ren <guoren@kernel.org>
    Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mac802154: llsec: add skb_cow_data() before in-place crypto [+ + +]

Author: Doruk Tan Ozturk <doruk@0sec.ai>
Date:   Tue May 26 20:37:26 2026 +0200

    mac802154: llsec: add skb_cow_data() before in-place crypto
    
    commit 84a04eb5b210643bd67aab81ff805d32f62aa865 upstream.
    
    llsec_do_encrypt_unauth(), llsec_do_encrypt_auth(),
    llsec_do_decrypt_unauth(), and llsec_do_decrypt_auth() all perform
    in-place cryptographic transformations on skb data.  They build a
    scatterlist with sg_init_one() pointing into the skb's linear data area
    and then pass the same scatterlist as both src and dst to the crypto API
    (e.g. crypto_skcipher_encrypt/decrypt, crypto_aead_encrypt/decrypt).
    
    On the RX path, __ieee802154_rx_handle_packet() clones the received skb
    before handing it to each subscriber via ieee802154_subif_frame().  The
    cloned skb shares the same underlying data buffer via reference
    counting.  When llsec_do_decrypt() subsequently modifies this shared
    buffer in place, it corrupts data that other clones -- potentially
    belonging to other sockets or subsystems -- still reference.
    
    On the TX path, similar data sharing can occur when an skb's head has
    been cloned (skb_cloned() returns true).
    
    The fix is to call skb_cow_data() before performing any in-place crypto
    operation.  skb_cow_data() ensures that the skb's data area is not
    shared: if the skb head is cloned or the data spans multiple fragments,
    it copies the data into a private buffer that can be safely modified in
    place.  This is the same pattern used by:
    
      - ESP (net/ipv4/esp4.c, net/ipv6/esp6.c)
      - MACsec (drivers/net/macsec.c)
      - WireGuard (drivers/net/wireguard/receive.c)
      - TIPC (net/tipc/crypto.c)
    
    Without this guard, in-place crypto on shared skb data leads to:
      - Silent data corruption of other skb clones
      - Use-after-free when the crypto API scatterwalk writes through a
        page that has already been freed by another clone's kfree_skb()
      - Kernel crashes under concurrent 802.15.4 traffic with security
        enabled (KASAN/KMSAN reports slab-use-after-free)
    
    Found by 0sec (https://0sec.ai) using automated source analysis.
    
    Fixes: 4c14a2fb5d14 ("mac802154: add llsec decryption method")
    Fixes: 03556e4d0dbb ("mac802154: add llsec encryption method")
    Cc: stable@vger.kernel.org
    Reported-by: Doruk Tan Ozturk <doruk@0sec.ai>
    Closes: https://lore.kernel.org/linux-wpan/20260525161806.96158-1-doruk@0sec.ai/
    Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
    Closes: <link to your mail on lore>
    Link: https://lore.kernel.org/20260526183726.56100-1-doruk@0sec.ai
    Signed-off-by: Stefan Schmidt <stefan@datenfreihafen.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: DEC: Prevent initial console buffer from landing in XKPHYS [+ + +]

Author: Maciej W. Rozycki <macro@orcam.me.uk>
Date:   Wed May 6 23:42:27 2026 +0100

    MIPS: DEC: Prevent initial console buffer from landing in XKPHYS
    
    commit 7fb13fd35110ebe95eb053faf79d018f51144d85 upstream.
    
    In 64-bit configurations calling the initial console output handler from
    a kernel thread other than the initial one will result in a situation
    where the stack has been placed in the XKPHYS 64-bit memory segment and
    consequently so has been the buffer allocated there that is used as the
    argument corresponding to the `%s' output conversion specifier for the
    firmware's printf() entry point.
    
    This 64-bit address will then be truncated by 32-bit firmware, resulting
    in an attempt to access the wrong memory location, which in turn will
    cause all kinds of unpredictable behaviour, such as a kernel crash:
    
      Console: colour dummy device 160x64
      Calibrating delay loop... 49.36 BogoMIPS (lpj=192512)
      pid_max: default: 32768 minimum: 301
      CPU 0 Unable to handle kernel paging request at virtual address 000000000203bd00, epc == ffffffffbfc08364, ra == ffffffffbfc08800
      Oops[#1]:
      CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-rc2-00254-gfb649bda6f56-dirty #121
      $ 0   : 0000000000000000 0000000000000001 0000000000000023 ffffffff80684ba0
      $ 4   : 000000000203bd00 ffffffffbfc0f3b4 ffffffffffffffff 0000000000000073
      $ 8   : 0a303d7469000000 0000000000000000 0000000000000073 ffffffffbfc0f473
      $12   : 0000000000000002 0000000000000000 ffffffff80684c1c 0000000000000000
      $16   : 0000000000000000 ffffffff80596dc9 0000000000000000 ffffffffbfc09240
      $20   : ffffffff80684c40 ffffffffbfc0f400 000000000000002d 000000000000002b
      $24   : ffffffffffffffbf 000000000203bd00
      $28   : ffffffff805f0000 ffffffff80684b58 0000000000000030 ffffffffbfc08800
      Hi    : 0000000000000000
      Lo    : 0000000000000aa8
      epc   : ffffffffbfc08364 0xffffffffbfc08364
      ra    : ffffffffbfc08800 0xffffffffbfc08800
      Status: 140120e2        KX SX UX KERNEL EXL
      Cause : 00000008 (ExcCode 02)
      BadVA : 000000000203bd00
      PrId  : 00000430 (R4000SC)
      Modules linked in:
      Process swapper (pid: 0, threadinfo=(____ptrval____), task=(____ptrval____), tls=0000000000000000)
      Stack : 0000000000000000 0000000000000000 0000000000000000 0000004d0000004d
              80684cc0806a2a40 80596dc80000004d 8061000000000000 bfc0850c80684c38
              0000000000000000 000000000203bd00 0000000000000000 0000000000000000
              0000000000000000 00000000bfc0f3b4 0000000000000000 0000000000000000
              0000000000000000 0000000000000000 0000000000000000 0000000000000000
              0000000000000000 0000000000000000 0000000000000000 0000000000000000
              0000002500000000 0000000000000000 0000000000000000 802c1a7400000000
              0203bd0080596dc8 0203bd4d69000000 6c61632000000018 5f746567646e6172
              6c616320625f6d6f 5f736e5f6d6f7266 206361323778302b 303d74696e726320
              806a0a38806b0000 806a0a38806b0000 00000000806b0000 80683c58806b0000
              ...
      Call Trace:
    
      Code: a082ffff  03e00008  00601021 <80820000> 00001821  10400005  24840001  80820000  24630001
    
      ---[ end trace 0000000000000000 ]---
      Kernel panic - not syncing: Fatal exception in interrupt
    
      KN04 V2.1k    (PC: 0xa0026768, SP: 0x806848e8)
      >>
    
    In this case the pointer in $4 was truncated from 0x980000000203bd00 to
    0x000000000203bd00.
    
    This may happen when no final console driver has been enabled in the
    configuration and consequently the initial console continues being used
    late into bootstrap or with an upcoming change that will switch the zs
    driver to use a platform device, which in turn will make the console
    handover happen only after other kernel threads have already been
    started.
    
    Fix the issue by making the buffer static and initdata, and therefore
    placed in the CKSEG0 32-bit compatibility segment, observing that the
    console output handler is called with the console lock held, implying
    no need for this code to be reentrant.  Add an assertion to verify the
    buffer actually has been placed in a compatibility segment.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
    Cc: stable@vger.kernel.org # v2.6.12+
    Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MIPS: smp: report dying CPU to RCU in stop_this_cpu() [+ + +]

Author: Jonas Jelonek <jelonek.jonas@gmail.com>
Date:   Mon Jun 8 09:37:29 2026 +0000

    MIPS: smp: report dying CPU to RCU in stop_this_cpu()
    
    commit 9f3f3bdc6d9dac1a5a8262ee7ad0f2ff1527a7e7 upstream.
    
    smp_send_stop() parks all secondary CPUs in stop_this_cpu(). The function
    marks the CPU offline for the scheduler via set_cpu_online(false) but
    never informs RCU, so RCU keeps expecting a quiescent state from CPUs
    that are now spinning forever with interrupts disabled.
    
    As long as nothing waits for an RCU grace period after smp_send_stop()
    this is harmless, which is why it went unnoticed. Since commit
    91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
    however, irq_work_sync() calls synchronize_rcu() on architectures without
    an irq_work self-IPI, i.e. where arch_irq_work_has_interrupt() returns
    false. That is the asm-generic default used by MIPS. Any irq_work_sync()
    issued in the reboot/shutdown path after smp_send_stop() then blocks on
    a grace period that can never complete, hanging the reboot:
    
      WARNING: CPU: 0 PID: 15 at kernel/irq_work.c:144 irq_work_queue_on
      ...
      rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
      rcu: Offline CPU 1 blocking current GP.
      rcu: Offline CPU 2 blocking current GP.
      rcu: Offline CPU 3 blocking current GP.
    
    This issue was noticed on several Realtek MIPS switch SoCs (MIPS
    interAptiv) and came up during kernel bump downstream in OpenWrt from
    6.18.33 to 6.18.34, after the backport of the patch to the 6.18 stable
    branch. The patch also has been backported all the way back to 6.1.
    
    Call rcutree_report_cpu_dead() once interrupts are disabled, mirroring the
    generic CPU-hotplug offline path, so RCU stops waiting on the parked CPUs
    and grace periods can still complete. MIPS shuts down all CPUs here
    without going through the CPU-hotplug mechanism, so this report is not
    otherwise issued. Reporting a dying CPU to RCU outside the regular hotplug
    offline path is not unprecedented: arm64 does the same in cpu_die_early().
    There it is an exception for a CPU that was coming online and is aborting
    bringup, rather than the default shutdown action as on MIPS.
    
    Fixes: 91840be8f710 ("irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT")
    CC: stable@vger.kernel.org
    Signed-off-by: Jonas Jelonek <jelonek.jonas@gmail.com>
    Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/tcp-ao: fix use-after-free of key in del_async path [+ + +]

Author: HanQuan <eilaimemedsnaimel@gmail.com>
Date:   Tue Jun 23 01:52:08 2026 +0000

    net/tcp-ao: fix use-after-free of key in del_async path
    
    commit 5ba9950bc9078e19b69cca1e56d1553b125c6857 upstream.
    
    In tcp_ao_delete_key(), the del_async path skips the current_key
    and rnext_key validity checks present in the synchronous path,
    assuming these pointers are always NULL on LISTEN sockets.  However,
    if a key was added with set_current=1/set_rnext=1 while the socket
    was in CLOSE state, current_key and rnext_key will be non-NULL
    after listen() transitions the socket to LISTEN.
    
    When such a key is deleted with del_async=1, hlist_del_rcu() and
    call_rcu() free the key without clearing the dangling pointers.
    After the RCU grace period, getsockopt(TCP_AO_INFO) dereferences
    current_key->sndid and rnext_key->rcvid from freed slab memory.
    
    Clear current_key and rnext_key in the del_async path when they
    reference the key being deleted.
    
    Fixes: d6732b95b6fb ("net/tcp: Allow asynchronous delete for TCP-AO keys (MKTs)")
    Signed-off-by: HanQuan <eilaimemedsnaimel@gmail.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://patch.msgid.link/20260623015208.1191687-1-eilaimemedsnaimel@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: ip_gre: require CAP_NET_ADMIN in the device netns for changelink [+ + +]

Author: Maoyi Xie <maoyixie.tju@gmail.com>
Date:   Fri Jun 12 16:59:35 2026 +0800

    net: ip_gre: require CAP_NET_ADMIN in the device netns for changelink
    
    commit 8165f7ff57d9667d2bb477ef6af83ede7fed4ad7 upstream.
    
    A tunnel changelink() operates on at most two netns, dev_net(dev) and
    the tunnel link netns t->net. They differ once the device is created in
    or moved to a netns other than the one the request runs in. The rtnl
    changelink path checks CAP_NET_ADMIN only against dev_net(dev), so a
    caller privileged there but not in t->net can rewrite a tunnel that
    lives in t->net.
    
    Add rtnl_dev_link_net_capable() next to rtnl_get_net_ns_capable() in
    net/core/rtnetlink.c. It requires CAP_NET_ADMIN in the link netns and is
    skipped when the link netns is dev_net(dev), where the rtnl path already
    checked it. The other patches in this series use the same helper.
    
    Gate ipgre_changelink() and erspan_changelink() with it, at the top of
    the op before any attribute is parsed, because the parsers update live
    tunnel fields first. ipgre_netlink_parms() sets t->collect_md before
    ip_tunnel_changelink() runs.
    
    Commit 8b484efd5cb4 ("ip6: vti: Use ip6_tnl.net in
    vti6_siocdevprivate().") added the same check on the ioctl path. This
    adds it on RTM_NEWLINK.
    
    Reported-by: Xiao Liang <shaw.leon@gmail.com>
    Closes: https://lore.kernel.org/netdev/CABAhCOSzP1vaThGV35_VnsRCb=87_CPjPVsTHbq905k8A+BuUg@mail.gmail.com/
    Fixes: b57708add314 ("gre: add x-netns support")
    Cc: stable@vger.kernel.org
    Signed-off-by: Maoyi Xie <maoyixie.tju@gmail.com>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
    Link: https://patch.msgid.link/20260612085941.3158249-2-maoyixie.tju@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: skmsg: preserve sg.copy across SG transforms [+ + +]

Author: Yiming Qian <yimingqian591@gmail.com>
Date:   Wed Jun 10 06:21:36 2026 +0000

    net: skmsg: preserve sg.copy across SG transforms
    
    commit 406e8a651a7b854c41fecd5117bb282b3a6c2c6b upstream.
    
    The sk_msg sg.copy bitmap is part of the scatterlist entry ownership
    state. A set bit tells sk_msg_compute_data_pointers() not to expose the
    entry through writable BPF ctx->data. This protects entries backed by
    pages that are not private to the sk_msg, such as splice-backed file
    page-cache pages.
    
    Several sk_msg transform paths move, copy, split, or compact
    msg->sg.data[] entries without moving the matching sg.copy bit. This can
    make an externally backed entry arrive at a new slot with a clear copy
    bit. A later SK_MSG verdict can then expose sg_virt(sge) as writable
    ctx->data and BPF stores can modify the original page cache.
    
    Keep sg.copy synchronized with sg.data[] whenever entries are
    transferred, shifted, split, or copied into a new sk_msg. Clear the bit
    when an entry is replaced by a newly allocated private page or freed.
    This covers the BPF pull/push/pop helpers, sk_msg_shift_left/right(),
    sk_msg_xfer(), and tls_split_open_record(), including the partial tail
    entry created during TLS open-record splitting.
    
    Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling")
    Cc: stable@vger.kernel.org
    Reported-by: Yiming Qian <yimingqian591@gmail.com>
    Reported-by: Keenan Dong <keenanat2000@gmail.com>
    Signed-off-by: Yiming Qian <yimingqian591@gmail.com>
    Link: https://patch.msgid.link/20260610062137.49075-1-yimingqian591@gmail.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFS: Prevent resource leak in nfs_alloc_server() [+ + +]

Author: Markus Elfring <elfring@users.sourceforge.net>
Date:   Sun Jun 14 09:56:35 2026 +0200

    NFS: Prevent resource leak in nfs_alloc_server()
    
    commit d189f224308c8ac3feeea8e442c99922bd18f1b2 upstream.
    
    It was overlooked to call ida_free() after a failed nfs_alloc_iostats() call.
    Thus add the missed function call in an if branch.
    
    Fixes: 1c7251187dc067a6d460cf33ca67da9c1dd87807 ("NFS: add superblock sysfs entries")
    Cc: stable@vger.kernel.org
    Reported-by: Christophe Jaillet <christophe.jaillet@wanadoo.fr>
    Closes: https://lore.kernel.org/linux-nfs/1c8e10c9-def7-4f0d-8aa1-23c8035a38c8@wanadoo.fr/
    Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
    Signed-off-by: Anna Schumaker <anna.schumaker@hammerspace.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: avoid leaking pre-allocated openowner on unconfirmed retry race [+ + +]

Author: Jeff Layton <jlayton@kernel.org>
Date:   Fri May 22 10:36:14 2026 -0400

    nfsd: avoid leaking pre-allocated openowner on unconfirmed retry race
    
    commit 57aee7a35bb12753057c5b65d72d1f46c0e95b07 upstream.
    
    When find_or_alloc_open_stateowner() encounters an unconfirmed owner, it
    calls release_openowner() and sets oo = NULL. Control then falls through
    past the `if (oo)` guard -- which would have freed any pre-allocated
    `new` -- and unconditionally executes `new = alloc_stateowner(...)`. If
    `new` was already allocated on a prior iteration, the pointer is
    silently overwritten and the previous allocation (slab object + owner
    name buffer) is leaked.
    
    This requires a race: two NFSv4.0 OPEN threads with the same owner
    string, where a concurrent thread inserts a new unconfirmed owner into
    the hash between retry iterations. The window is narrow but repeatable
    under adversarial conditions.
    
    Fix by adding `goto retry` after `oo = NULL` so the already-allocated
    `new` is reused on the next iteration rather than overwritten.
    
    Reported-by: Chris Mason <clm@meta.com>
    Fixes: 23df17788c62 ("nfsd: perform all find_openstateowner_str calls in the one place.")
    Cc: stable@vger.kernel.org
    Assisted-by: kres:claude-opus-4-6
    Signed-off-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: check get_user() return when reading princhashlen [+ + +]

Author: Dominik Woźniak <stalion@gmail.com>
Date:   Thu May 21 17:46:56 2026 +0200

    nfsd: check get_user() return when reading princhashlen
    
    commit e186fa1c057f5eccb22afb1e83e34c0627085868 upstream.
    
    In __cld_pipe_inprogress_downcall(), the get_user() that reads
    princhashlen from the userspace cld_msg_v2 buffer does not check its
    return value. A failing copy leaves princhashlen with uninitialised
    stack contents, which are then used to drive memdup_user() and stored
    as princhash.len on the resulting reclaim record. The other get_user()
    calls in this function all check the return; only this one is missed,
    which is most likely a copy-paste oversight from when v2 upcalls were
    introduced.
    
    Mirror the existing pattern used a few lines above for namelen.
    namecopy is declared with __free(kfree) so the early return cleans up
    the already-allocated buffer automatically.
    
    Fixes: 6ee95d1c8991 ("nfsd: add support for upcall version 2")
    Cc: stable@vger.kernel.org
    Signed-off-by: Dominik Woźniak <stalion@gmail.com>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: fix dead ACL conflict guard in nfsd4_create [+ + +]

Author: Jeff Layton <jlayton@kernel.org>
Date:   Thu May 21 07:50:21 2026 -0400

    nfsd: fix dead ACL conflict guard in nfsd4_create
    
    commit a60f25a800846ab8e5a13f8a9d05111f2aee55a7 upstream.
    
    nfsd4_create() steals create->cr_dpacl/cr_pacl into the local
    nfsd_attrs via the designated initializer, then immediately sets the
    source pointers to NULL. The subsequent conflict guard tests the
    already-nilled source fields, making it permanently dead code:
    
        if (create->cr_acl) {
            if (create->cr_dpacl || create->cr_pacl)  /* always false */
    
    When a client encodes both FATTR4_WORD0_ACL and
    FATTR4_WORD2_POSIX_{DEFAULT,ACCESS}_ACL in the same CREATE fattr
    bitmap, nfsd4_acl_to_attr() overwrites attrs.na_pacl/na_dpacl without
    releasing the originals, leaking two posix_acl slab objects per
    request. Repeated requests cause unbounded slab exhaustion.
    
    Fix by checking attrs.na_dpacl/na_pacl (the stolen values) instead of
    the nilled create->cr_dpacl/cr_pacl, matching the correct pattern
    already used in nfsd4_setattr().
    
    Reported-by: Chris Mason <clm@meta.com>
    Assisted-by: kres:claude-opus-4-6
    Fixes: d2ca50606f5f ("NFSD: Add support for POSIX draft ACLs for file creation")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: fix inverted cp_ttl check in async copy reaper [+ + +]

Author: Jeff Layton <jlayton@kernel.org>
Date:   Thu May 21 09:25:40 2026 -0400

    nfsd: fix inverted cp_ttl check in async copy reaper
    
    commit 0150459b05490b88b7e7378a31550a9e07b5517c upstream.
    
    nfsd4_async_copy_reaper() is supposed to keep completed async copy
    state around for NFSD_COPY_INITIAL_TTL (10) laundromat ticks so
    that OFFLOAD_STATUS can report the result, then reap the state once
    the countdown expires.
    
    The TTL predicate is inverted: `if (--copy->cp_ttl)` is true while
    ticks remain and false when the counter reaches zero.  This causes
    the copy to be reaped on the very first tick (cp_ttl goes from 10
    to 9, which is non-zero) instead of after all 10 ticks elapse.
    Once reaped, OFFLOAD_STATUS returns NFS4ERR_BAD_STATEID because
    the copy state has already been freed.
    
    Fix by negating the test so that cleanup runs when the TTL expires.
    
    Fixes: aa0ebd21df9c ("NFSD: Add nfsd4_copy time-to-live")
    Cc: stable@vger.kernel.org
    Reported-by: Chris Mason <clm@meta.com>
    Assisted-by: kres:claude-opus-4-6
    Signed-off-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: fix posix_acl leak and ignored error in nfsd4_create_file [+ + +]

Author: Jeff Layton <jlayton@kernel.org>
Date:   Thu May 21 12:37:33 2026 -0400

    nfsd: fix posix_acl leak and ignored error in nfsd4_create_file
    
    commit 24c975bbdd564d7d0ad90294bfa69729830345de upstream.
    
    nfsd4_create_file() has two bugs in its ACL handling:
    
    The return value of nfsd4_acl_to_attr() is silently discarded.  When
    the NFSv4-to-POSIX ACL conversion fails (e.g., -EINVAL for
    unsupported ACE types), the file is created without any ACL and the
    client receives NFS4_OK.  This violates RFC 7530/8881 which require
    the server to reject unsupported attributes on CREATE.
    
    When start_creating() fails after ACL attributes have been populated
    in attrs (either via nfsd4_acl_to_attr or via ownership transfer from
    open->op_dpacl/op_pacl), the function jumps to out_write which skips
    nfsd_attrs_free().  The posix_acl allocations are leaked.  A client
    can trigger this repeatedly with OPEN(CREATE), ACL attributes, and an
    invalid filename (e.g., longer than NAME_MAX).
    
    Fix both by capturing the nfsd4_acl_to_attr() return value and by
    changing the early error paths to jump to out instead of out_write.
    Initialize child to ERR_PTR(-EINVAL) so that end_creating() is safe
    to call even if start_creating() was never reached.
    
    Reported-by: Chris Mason <clm@meta.com>
    Fixes: 7ab96df840e6 ("VFS/nfsd/cachefiles/ovl: add start_creating() and end_creating()")
    Cc: stable@vger.kernel.org
    Assisted-by: kres:claude-opus-4-6
    Signed-off-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: fix posix_acl leak on SETACL decode failure [+ + +]

Author: Jeff Layton <jlayton@kernel.org>
Date:   Thu May 21 13:51:43 2026 -0400

    nfsd: fix posix_acl leak on SETACL decode failure
    
    commit 0853ac544c590880d797b04daa33fcb72b6be0e1 upstream.
    
    nfsaclsvc_decode_setaclargs() and nfs3svc_decode_setaclargs() each
    call nfs_stream_decode_acl() twice, first for NFS_ACL and then for
    NFS_DFACL.  Each successful call transfers ownership of a freshly
    allocated posix_acl into argp->acl_access or argp->acl_default.  If
    the first call succeeds but the second fails, the decoder returns
    false and argp->acl_access is left dangling.
    
    ACLPROC2_SETACL.pc_release was wired to nfssvc_release_attrstat and
    ACLPROC3_SETACL.pc_release was wired to nfs3svc_release_fhandle.
    Both only call fh_put() and have no knowledge of the ACL fields on
    argp.  The posix_acl_release() pairs sat at the out: labels inside
    nfsacld_proc_setacl() and nfsd3_proc_setacl(), but svc_process()
    skips pc_func when pc_decode returns false, so that cleanup is
    unreachable on decode failure:
    
        svc_process_common()
          pc_decode()                  /* decode_setaclargs: false */
          /* pc_func skipped */
          pc_release()                 /* fh_put only -- ACLs leaked */
    
    The orphaned posix_acl is leaked for the lifetime of the server.
    
    Fix by adding nfsaclsvc_release_setacl() and nfs3svc_release_setacl(),
    which release both argp->acl_access and argp->acl_default in addition
    to fh_put(), and wiring them as pc_release for their respective SETACL
    procedures.  pc_release runs on every path svc_process() takes after
    decode, including decode failure, so the posix_acl_release() pairs are
    removed from the proc functions' out: labels to keep ownership in one
    place.  This matches the existing release_getacl() pattern used by
    the sibling GETACL procedures.
    
    Fixes: a257cdd0e217 ("[PATCH] NFSD: Add server support for NFSv3 ACLs.")
    Cc: stable@vger.kernel.org
    Assisted-by: kres:claude-opus-4-7
    Signed-off-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSD: Fix SECINFO_NO_NAME decode error cleanup [+ + +]

Author: Guannan Wang <wgnbuaa@gmail.com>
Date:   Thu May 21 16:03:32 2026 +0800

    NFSD: Fix SECINFO_NO_NAME decode error cleanup
    
    commit 9e18e83b8846a5c3fe13fc8a464b4865d33996c6 upstream.
    
    nfsd4_decode_secinfo_no_name() currently initializes sin_exp after
    decoding sin_style. If the XDR stream is truncated, the decoder returns
    nfserr_bad_xdr before sin_exp is initialized.
    
    Since commit 3fdc54646234 ("NFSD: Reduce amount of struct
    nfsd4_compoundargs that needs clearing"), the inline iops array is not
    cleared between RPC calls. A failed SECINFO_NO_NAME decode can therefore
    leave sin_exp holding stale union contents from a previous operation.
    
    The error response path still invokes nfsd4_secinfo_no_name_release(),
    which calls exp_put() on a non-NULL sin_exp.
    
    Initialize sin_exp before the first failable decode step, matching
    nfsd4_decode_secinfo().
    
    Fixes: 3fdc54646234 ("NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing")
    Cc: stable@vger.kernel.org
    Signed-off-by: Guannan Wang <wgnbuaa@gmail.com>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: release layout stid on setlease failure [+ + +]

Author: Chris Mason <clm@meta.com>
Date:   Mon May 18 13:16:36 2026 -0700

    nfsd: release layout stid on setlease failure
    
    commit 30d55c8aabb261bc3f427d6b9aae7ef6206063f9 upstream.
    
    nfs4_alloc_stid() publishes the new stid into cl->cl_stateids via
    idr_alloc_cyclic() under cl_lock before returning to
    nfsd4_alloc_layout_stateid(). When nfsd4_layout_setlease() then
    fails, the error path frees the layout stateid directly with
    kmem_cache_free() without ever calling idr_remove(), leaving the
    IDR slot pointing at freed slab memory. Any subsequent IDR walker
    (states_show, client teardown) dereferences the dangling pointer.
    
    The correct teardown for an IDR-published stid is nfs4_put_stid(),
    which removes the IDR slot under cl_lock, dispatches sc_free
    (nfsd4_free_layout_stateid) to release ls->ls_file via
    nfsd4_close_layout(), and drops the nfs4_file reference in its
    tail.
    
    A second issue blocks that switch: nfsd4_free_layout_stateid()
    unconditionally inspects ls->ls_fence_work via
    delayed_work_pending() under ls_lock, but
    INIT_DELAYED_WORK(&ls->ls_fence_work, ...) currently runs only
    after the setlease call. On the setlease-failure path the
    destructor would touch an uninitialized delayed_work.
    
        nfsd4_alloc_layout_stateid()
          nfs4_alloc_stid()           /* idr_alloc_cyclic under cl_lock */
          nfsd4_layout_setlease()     /* fails */
            nfs4_put_stid()
              nfsd4_free_layout_stateid()
                delayed_work_pending(&ls->ls_fence_work)  /* needs INIT */
                nfsd4_close_layout()  /* nfsd_file_put(ls->ls_file) */
              put_nfs4_file()
    
    Fix by hoisting the ls_fenced / ls_fence_delay / INIT_DELAYED_WORK
    initialization above the nfsd4_layout_setlease() call, and replace
    the manual nfsd_file_put + put_nfs4_file + kmem_cache_free cleanup
    with a single nfs4_put_stid(stp).
    
    Fixes: c5c707f96fc9 ("nfsd: implement pNFS layout recalls")
    Cc: stable@vger.kernel.org
    Assisted-by: kres (claude-opus-4-7)
    Signed-off-by: Chris Mason <clm@meta.com>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd: reset write verifier on deferred writeback errors [+ + +]

Author: Jeff Layton <jlayton@kernel.org>
Date:   Fri May 22 12:44:19 2026 -0400

    nfsd: reset write verifier on deferred writeback errors
    
    commit 2090b05803faab8a9fa62fbff871007862cac1b7 upstream.
    
    nfsd_vfs_write() and nfsd_commit() both call filemap_check_wb_err() to
    detect deferred writeback errors, but neither rotates the server's write
    verifier (nn->writeverf) when this check fails. Every other
    durable-storage-failure path in these functions calls
    commit_reset_write_verifier() before returning an error.
    
    The missing rotation means clients holding UNSTABLE write data under the
    current verifier will COMMIT, receive the unchanged verifier back, and
    conclude their data is durable — silently dropping data that failed
    writeback. This violates the UNSTABLE+COMMIT durability contract
    (RFC 1813 §3.3.7, RFC 8881 §18.32).
    
    Add commit_reset_write_verifier() calls at both filemap_check_wb_err()
    error sites, matching the pattern used by adjacent error paths in the
    same functions. The helper already filters -EAGAIN and -ESTALE
    internally, so the calls are unconditionally safe.
    
    Reported-by: Chris Mason <clm@meta.com>
    Fixes: 555dbf1a9aac ("nfsd: Replace use of rwsem with errseq_t")
    Cc: stable@vger.kernel.org
    Assisted-by: kres:claude-opus-4-6
    Signed-off-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSv4/flexfiles: reject zero filehandle version count [+ + +]

Author: Michael Bommarito <michael.bommarito@gmail.com>
Date:   Wed May 13 12:26:56 2026 -0400

    NFSv4/flexfiles: reject zero filehandle version count
    
    commit 2c6bb3c40bc24f6aa8dfbe6fe98c3ad6389203f2 upstream.
    
    ff_layout_alloc_lseg() decodes the filehandle-version array count
    from the flexfiles layout body. The value is used as the count for
    kzalloc_objs(), and the current code only rejects NULL.
    
    A zero count yields ZERO_SIZE_PTR, which can be stored in
    dss_info->fh_versions even though later flexfiles paths assume that at
    least one filehandle version exists.
    
    Reject fh_count == 0 before the allocation, matching the existing zero
    version_count validation in the flexfiles GETDEVICEINFO parser.
    
    A QEMU/KASAN run with a malformed flexfiles layout hit:
    
      KASAN: null-ptr-deref in range [0x0000000000000010-0x0000000000000017]
      RIP: 0010:ff_layout_encode_ff_layoutupdate.isra.0+0x15f/0x750
      ff_layout_encode_layoutreturn+0x683/0x970
      nfs4_xdr_enc_layoutreturn+0x278/0x3a0
      Kernel panic - not syncing: Fatal exception
    
    The patched kernel rejects the malformed layout without KASAN/oops/panic,
    and a valid fh_count=1 regression still opens, reads, and unmounts cleanly.
    
    Cc: stable@vger.kernel.org
    Fixes: d67ae825a59d ("pnfs/flexfiles: Add the FlexFile Layout Driver")
    Assisted-by: Claude:claude-opus-4-7
    Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
    Signed-off-by: Anna Schumaker <anna.schumaker@hammerspace.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSv4/pNFS: reject zero-length r_addr in nfs4_decode_mp_ds_addr [+ + +]

Author: Michael Bommarito <michael.bommarito@gmail.com>
Date:   Wed May 27 12:30:35 2026 -0400

    NFSv4/pNFS: reject zero-length r_addr in nfs4_decode_mp_ds_addr
    
    commit 41fe0f7b84f0cb822ae10ab08592996a592b2a25 upstream.
    
    nfs4_decode_mp_ds_addr() decodes the r_netid and r_addr opaques of a
    netaddr4 from a GETDEVICEINFO multipath-DS body, then immediately
    calls strrchr(buf, '.') to locate the port separator. Both decodes
    use xdr_stream_decode_string_dup(), and the current code checks only
    "nlen < 0" / "rlen < 0" before dereferencing the returned string.
    
    When the on-wire opaque has length zero, xdr_stream_decode_opaque_inline()
    returns 0 and xdr_stream_decode_string_dup() falls through to its
    "*str = NULL; return ret" tail, leaving buf NULL with a return value
    of 0. The "< 0" check does not catch this, and the next line is
    strrchr(NULL, '.'), a kernel NULL pointer dereference reachable from
    any pNFS-flexfile client mounted against a malicious or compromised
    metadata server.
    
    Reject the zero-length cases explicitly so the decoder fails with
    -EBADMSG (treated as a malformed GETDEVICEINFO body) instead of
    panicking the client.
    
    Cc: stable@vger.kernel.org
    Fixes: 6b7f3cf96364 ("nfs41: pull decode_ds_addr from file layout to generic pnfs")
    Assisted-by: Claude:claude-opus-4-7
    Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
    Signed-off-by: Anna Schumaker <anna.schumaker@hammerspace.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSv4: clear exception state on successful mkdir retry [+ + +]

Author: Igor Raits <igor.raits@gmail.com>
Date:   Wed Apr 29 12:49:38 2026 +0200

    NFSv4: clear exception state on successful mkdir retry
    
    commit 238e9b51aa29f48b6243212a3b75c8e48d6b96fd upstream.
    
    After a server returns NFS4ERR_DELAY for an NFSv4 CREATE issued by
    mkdir(2), the client correctly waits and retries.  When the retry
    succeeds, however, mkdir(2) can still surface -EEXIST to userspace
    even though the directory was just created on the server.
    
    Reproducer (random 16-hex names so collisions are not the cause)
    against an in-kernel Linux nfsd; reproduces under both NFSv4.0 and
    NFSv4.2:
    
      N=2000000; base=/var/gdc/export
      for ((i=1; i<=N; i++)); do
          d=$base/$(openssl rand -hex 8)
          mkdir "$d" 2>/dev/null || echo "$(date +%T) failed loop=$i $d"
          rmdir "$d" 2>/dev/null
      done
    
    Failures cluster at the cadence at which the server-side auth/export
    cache refresh path causes nfsd to return NFS4ERR_DELAY for CREATE.
    
    A wire trace of one failure (the three CREATE RPCs all come from a
    single mkdir(2), generated by the do-while in nfs4_proc_mkdir()):
    
      client -> server  CREATE name=...  -> NFS4ERR_DELAY
      ~100 ms later
      client -> server  CREATE name=...  -> NFS4_OK         (dir created)
      ~80 us later
      client -> server  CREATE name=...  -> NFS4ERR_EXIST   (correct)
    
    Since commit dd862da61e91 ("nfs: fix incorrect handling of large-number
    NFS errors in nfs4_do_mkdir()"), nfs4_handle_exception() is called only
    when _nfs4_proc_mkdir() returned an error.  That gate breaks retry-state
    hygiene: nfs4_do_handle_exception() resets exception.{delay,recovering,
    retry} to 0 on entry, so calling it on success is what previously
    cleared the retry flag set by the preceding NFS4ERR_DELAY iteration.
    With the gate in place, exception.retry stays at 1 after the successful
    retry, the loop runs once more, and the resulting CREATE for an
    already-created name yields NFS4ERR_EXIST -> -EEXIST to userspace.
    
    Drop the conditional and call nfs4_handle_exception() unconditionally,
    matching every other do-while in fs/nfs/nfs4proc.c (nfs4_proc_symlink(),
    nfs4_proc_link(), etc.).  The dentry/status separation introduced by
    that commit is preserved.
    
    Fixes: dd862da61e91 ("nfs: fix incorrect handling of large-number NFS errors in nfs4_do_mkdir()")
    Reported-and-tested-by: Jan Čípa <jan.cipa@gooddata.com>
    Closes: https://lore.kernel.org/linux-nfs/CA+9S74hSp_tJu2Ffe2BPNC2T25gfkhgjjDkdgSsF5c2rnJq_wA@mail.gmail.com/
    Reviewed-by: NeilBrown <neil@brown.name>
    Cc: stable@vger.kernel.org
    Signed-off-by: Igor Raits <igor.raits@gmail.com>
    Signed-off-by: Anna Schumaker <anna.schumaker@hammerspace.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NTB: epf: Avoid pci_iounmap() with offset when PEER_SPAD and CONFIG share BAR [+ + +]

Author: Koichiro Den <den@valinux.co.jp>
Date:   Wed Mar 4 11:05:27 2026 +0900

    NTB: epf: Avoid pci_iounmap() with offset when PEER_SPAD and CONFIG share BAR
    
    commit d876153680e3d721d385e554def919bce3d18c74 upstream.
    
    When BAR_PEER_SPAD and BAR_CONFIG share one PCI BAR, the module teardown
    path ends up calling pci_iounmap() on the same iomem with some offset,
    which is unnecessary and triggers a kernel warning like the following:
    
      Trying to vunmap() nonexistent vm area (0000000069a5ffe8)
      WARNING: mm/vmalloc.c:3470 at vunmap+0x58/0x68, CPU#5: modprobe/2937
      [...]
      Call trace:
       vunmap+0x58/0x68 (P)
       iounmap+0x34/0x48
       pci_iounmap+0x2c/0x40
       ntb_epf_pci_remove+0x44/0x80 [ntb_hw_epf]
       pci_device_remove+0x48/0xf8
       device_remove+0x50/0x88
       device_release_driver_internal+0x1c8/0x228
       driver_detach+0x50/0xb0
       bus_remove_driver+0x74/0x100
       driver_unregister+0x34/0x68
       pci_unregister_driver+0x34/0xa0
       ntb_epf_pci_driver_exit+0x14/0xfe0 [ntb_hw_epf]
      [...]
    
    Fix it by unmapping only when PEER_SPAD and CONFIG use difference bars.
    
    Cc: stable@vger.kernel.org
    Fixes: e75d5ae8ab88 ("NTB: epf: Allow more flexibility in the memory BAR map method")
    Reviewed-by: Frank Li <Frank.Li@nxp.com>
    Signed-off-by: Koichiro Den <den@valinux.co.jp>
    Reviewed-by: Dave Jiang <dave.jiang@intel.com>
    Signed-off-by: Jon Mason <jdmason@kudzu.us>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ntfs3: reject direct userspace writes to reserved $LX* xattrs [+ + +]

Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Wed Jun 10 12:31:01 2026 +0200

    ntfs3: reject direct userspace writes to reserved $LX* xattrs
    
    commit 5b08dccecf825cbf905f348bc6ccb497507e28e2 upstream.
    
    NTFS3 uses $LXUID, $LXGID, $LXMOD and $LXDEV as internal WSL
    permission metadata and reloads them into i_uid, i_gid and i_mode
    from ntfs_get_wsl_perm().
    
    Because the empty-prefix xattr handler also lets file owners call
    setxattr() on these names directly, an unprivileged writer on a
    writable ntfs3 mount can plant root ownership and S_ISUID on their own
    file and gain euid 0 after inode reload.
    
    Reject direct userspace writes to the reserved $LX* names. Internal
    ntfs3 metadata updates are unchanged because ntfs_save_wsl_perm()
    writes them via ntfs_set_ea() directly.
    
    Signed-off-by: Zhen Yan <sdjasjbuaa@gmail.com>
    [almaz.alexandrovich@paragon-software.com: added an additional check for non privileged users]
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ntfs: serialize volume label accesses [+ + +]

Author: Hyunchul Lee <hyc.lee@gmail.com>
Date:   Tue Jun 2 13:53:24 2026 +0900

    ntfs: serialize volume label accesses
    
    commit e9e50ce4f13dc721014af622613409455c734942 upstream.
    
    Protect vol->volume_label with a mutex and snaphost the label before
    copy_to_user. This prevent a use-after-free when FS_IOC_SETFSLABEL
    replaces the vol->volume_label and FS_IOC_GETTSLABEL reads it
    concurrently.
    
    Cc: stable@vger.kernel.org # v7.1
    Signed-off-by: Hyunchul Lee <hyc.lee@gmail.com>
    Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ocfs2: reject oversized group bitmap descriptors [+ + +]

Author: Zhang Cen <rollkingzzc@gmail.com>
Date:   Sun May 24 19:12:48 2026 +0800

    ocfs2: reject oversized group bitmap descriptors
    
    commit 9bd541e09dffff27e5bec0f9f45b0228173a5375 upstream.
    
    ocfs2_validate_gd_parent() only bounds bg_bits against the parent
    allocator's chain geometry.  A malicious descriptor can still claim a
    bg_size/bg_bits pair that exceeds the bitmap bytes that physically fit in
    the group descriptor block, so later bitmap scans and bit updates can run
    past bg_bitmap.
    
    Add a physical-cap check based on ocfs2_group_bitmap_size() for the parent
    allocator type and reject descriptors whose bg_size or bg_bits exceed that
    capacity.  Keep the existing chain geometry check so both the on-disk
    bitmap layout and the allocator metadata must agree before the descriptor
    is used.
    
    Validation reproduced this kernel report:
    KASAN use-after-free in _find_next_bit+0x7f/0xc0
    Read of size 8
    Call trace:
      dump_stack_lvl+0x66/0xa0 (?:?)
      print_report+0xd0/0x630 (?:?)
      _find_next_bit+0x7f/0xc0 (?:?)
      srso_alias_return_thunk+0x5/0xfbef5 (?:?)
      __virt_addr_valid+0x188/0x2f0 (?:?)
      kasan_report+0xe4/0x120 (?:?)
      ocfs2_find_max_contig_free_bits+0x35/0x70 (fs/ocfs2/suballoc.c:1375)
      ocfs2_block_group_set_bits+0x472/0x4b0 (fs/ocfs2/suballoc.c:1457)
      ocfs2_cluster_group_search+0x16b/0x440 (fs/ocfs2/suballoc.c:86)
      ocfs2_bg_discontig_fix_result+0x1ef/0x230 (fs/ocfs2/suballoc.c:1786)
      ocfs2_search_chain+0x8f8/0x10a0 (fs/ocfs2/suballoc.c:1886)
      get_page_from_freelist+0x70e/0x2370 (?:?)
      lock_release+0xc6/0x290 (?:?)
      do_raw_spin_unlock+0x9a/0x100 (?:?)
      kasan_unpoison+0x27/0x60 (?:?)
      __bfs+0x147/0x240 (?:?)
      get_page_from_freelist+0x83d/0x2370 (?:?)
      ocfs2_claim_suballoc_bits+0x38c/0xe70 (fs/ocfs2/suballoc.c:96)
      sched_domains_numa_masks_clear+0x70/0xd0 (?:?)
      check_irq_usage+0xe8/0xb70 (?:?)
      __ocfs2_claim_clusters+0x18d/0x4c0 (fs/ocfs2/suballoc.c:2497)
      check_path+0x24/0x50 (?:?)
      rcu_is_watching+0x20/0x50 (?:?)
      check_prev_add+0xfd/0xd00 (?:?)
      ocfs2_add_clusters_in_btree+0x17d/0x810 (fs/ocfs2/suballoc.c:?)
      __folio_batch_add_and_move+0x1f5/0x3d0 (?:?)
      ocfs2_add_inode_data+0xd9/0x120 (fs/ocfs2/suballoc.c:?)
      filemap_add_folio+0x105/0x1f0 (?:?)
      ocfs2_write_begin_nolock+0x29f7/0x2f80 (fs/ocfs2/suballoc.c:3043)
      ocfs2_read_inode_block+0xb5/0x110 (fs/ocfs2/suballoc.c:?)
      down_write+0xf5/0x180 (?:?)
      ocfs2_write_begin+0x180/0x240 (fs/ocfs2/suballoc.c:?)
      __mark_inode_dirty+0x758/0x9a0 (?:?)
      inode_to_bdi+0x41/0x90 (?:?)
      balance_dirty_pages_ratelimited_flags+0xf8/0x1d0 (?:?)
      generic_perform_write+0x252/0x440 (?:?)
      mnt_put_write_access_file+0x16/0x70 (?:?)
      file_update_time_flags+0xe4/0x200 (?:?)
      ocfs2_file_write_iter+0x80a/0x1320 (fs/ocfs2/suballoc.c:?)
      lock_acquire+0x184/0x2f0 (?:?)
      ksys_write+0xd2/0x170 (?:?)
      apparmor_file_permission+0xf5/0x310 (?:?)
      read_zero+0x8d/0x140 (?:?)
      lock_is_held_type+0x8f/0x100 (?:?)
    
    Link: https://lore.kernel.org/20260524111248.1429884-1-rollkingzzc@gmail.com
    Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
    Assisted-by: Codex:gpt-5.5
    Signed-off-by: Zhang Cen <rollkingzzc@gmail.com>
    Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
    Cc: Mark Fasheh <mark@fasheh.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Cc: Changwei Ge <gechangwei@live.cn>
    Cc: Jun Piao <piaojun@huawei.com>
    Cc: Heming Zhao <heming.zhao@suse.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

PCI/P2PDMA: Add Intel QAT, DSA, IAA devices to whitelist [+ + +]

Author: Lukas Wunner <lukas@wunner.de>
Date:   Thu Jun 4 17:12:08 2026 +0200

    PCI/P2PDMA: Add Intel QAT, DSA, IAA devices to whitelist
    
    commit 0ba76b19fd4c7256787eab0283c759b18eb76876 upstream.
    
    The first device on a PCI root bus determines whether the host bridge is
    whitelisted for P2PDMA.  All Intel Xeon chips since Ice Lake (ICX, 2021)
    expose a device with ID 0x09a2 as first device.  It is loosely associated
    with the IOMMU.  All these Xeon chips support P2PDMA, so since the addition
    of the device with commit feaea1fe8b36 ("PCI/P2PDMA: Add Intel 3rd Gen
    Intel Xeon Scalable Processors to whitelist"), P2PDMA has been allowed on
    all new Xeons without the need to amend the whitelist:
    
    Xeons with Performance Cores:
      Sapphire Rapids (SPR, 2023)
      Emerald Rapids (EMR, 2023)
      Granite Rapids (GNR, 2024)
      Diamond Rapids (DMR, 2026)
    
    Xeons with Efficiency Cores:
      Sierra Forest (SRF, 2024)
      Clearwater Forest (CWF, 2026)
    
    However these Xeons also expose accelerators as first device on a root bus
    of its own:
    
      QuickAssist Technology (QAT, crypto & compression accelerator)
      Data Streaming Accelerator (DSA, dma engine)
      In-Memory Analytics Accelerator (IAA, compression accelerator)
    
    Whitelist them for P2PDMA as well.  Move their Device ID macros from the
    accelerator drivers to <linux/pci_ids.h> for reuse by P2PDMA code.
    
    Unfortunately the Device IDs vary across Xeon generations as additional
    features were added to the accelerators.  This currently necessitates an
    amendment for each new Xeon chip.
    
    For future chips, this need shall be avoided by an ongoing effort to extend
    ACPI HMAT with PCIe P2PDMA characteristics (latency, bandwidth, ordering
    constraints).  The PCI core will be able look up in this BIOS-provided ACPI
    table whether P2PDMA is supported, instead of relying on a whitelist that
    needs to be amended continuously.
    
    Signed-off-by: Lukas Wunner <lukas@wunner.de>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
    Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
    Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> # QAT
    Cc: stable@vger.kernel.org
    Link: https://patch.msgid.link/6aac4922b5fe7070b11874427a9285e42ddd05a4.1780585518.git.lukas@wunner.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pNFS: Fix use-after-free in pnfs_update_layout() [+ + +]

Author: Wentao Liang <vulab@iscas.ac.cn>
Date:   Mon May 18 13:10:36 2026 +0000

    pNFS: Fix use-after-free in pnfs_update_layout()
    
    commit 13e198a90ca4050f4bee8a3f23680389a6563ccc upstream.
    
    When hitting the NFS_LAYOUT_RETURN branch in pnfs_update_layout(),
    the code calls pnfs_prepare_to_retry_layoutget(lo). If it succeeds,
    pnfs_put_layout_hdr(lo) is called before trace_pnfs_update_layout(),
    which still references 'lo'. This results in a use-after-free when the
    tracepoint accesses lo's fields.
    
    Fix this by moving the tracepoint call before pnfs_put_layout_hdr(lo).
    
    Fixes: 2c8d5fc37fe2 ("pNFS: Stricter ordering of layoutget and layoutreturn")
    Cc: stable@vger.kernel.org
    Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
    Signed-off-by: Anna Schumaker <anna.schumaker@hammerspace.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

power: reset: linkstation-poweroff: fix use-after-free in the linkstation_poweroff_init() [+ + +]

Author: Wentao Liang <vulab@iscas.ac.cn>
Date:   Tue Apr 7 07:30:25 2026 +0000

    power: reset: linkstation-poweroff: fix use-after-free in the linkstation_poweroff_init()
    
    commit 8eec545cde69e46e9a1d2b7d915ce4f5df85b3bd upstream.
    
    Move of_node_put(dn) after the of_match_node() call, which still needs
    the node pointer. The node reference is correctly released after use.
    
    Fixes: e2f471efe1d6 ("power: reset: linkstation-poweroff: prepare for new devices")
    Cc: stable@vger.kernel.org
    Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
    Link: https://patch.msgid.link/20260407073025.271865-1-vulab@iscas.ac.cn
    Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

pwrseq: core: fix use-after-free in pwrseq_debugfs_seq_next() [+ + +]

Author: Wentao Liang <vulab@iscas.ac.cn>
Date:   Tue Jun 16 15:10:49 2026 +0000

    pwrseq: core: fix use-after-free in pwrseq_debugfs_seq_next()
    
    commit 257595adf9dac15ae1edd9d07753fbc576a7583d upstream.
    
    pwrseq_debugfs_seq_next() declares 'next' with __free(put_device),
    which causes put_device() to be called on the returned pointer when
    the variable goes out of scope.  This results in a use-after-free
    since the seq_file framework receives a pointer whose reference has
    already been dropped.
    
    Simply removing __free(put_device) would fix the UAF but would leak
    the reference acquired by bus_find_next_device(), as stop() only
    calls up_read(&pwrseq_sem) and never releases the device reference.
    
    Fix this by making the reference counting consistent across all
    seq_file callbacks, matching the standard pattern used by PCI and
    SCSI:
    
    - start(): use get_device() so it returns a referenced pointer.
    - next(): explicitly put_device(curr) to release the previous
      device's reference (no NULL check needed - the seq_file framework
      only calls next() while the previous return was non-NULL).
    - stop(): put_device(data) to release the last iterated device's
      reference, with a NULL guard since stop() may be called with NULL
      when start() returned NULL or next() reached end-of-sequence.
    
    Cc: stable@vger.kernel.org
    Fixes: 249ebf3f65f8 ("power: sequencing: implement the pwrseq core")
    Signed-off-by: Wentao Liang <vulab@iscas.ac.cn>
    Link: https://patch.msgid.link/20260616151049.1705503-1-vulab@iscas.ac.cn
    Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "f2fs: remove non-uptodate folio from the page cache in move_data_block" [+ + +]

Author: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
Date:   Mon Jun 8 17:09:39 2026 +0800

    Revert "f2fs: remove non-uptodate folio from the page cache in move_data_block"
    
    commit ccaba785821970f422c47770331c7e3271763f17 upstream.
    
    This reverts commit 9609dd704725a40cd63d915f2ab6c44248a44598.
    
    The kernel panics are keeping to be reported especially when the f2fs
    partition get almost full. By investigation, we find that the reason is
    one f2fs page got freed to buddy without being deleted from LRU and the
    root cause is the race happened in [2] which is enrolled by this commit.
    
    There are 3 race processes in this scenario, please find below for their
    main activities.
    
    The changed code in move_data_block() lets the GC path evict the tail-end
    folio from the page cache through folio_end_dropbehind().  Once
    folio_unmap_invalidate() removes the folio from mapping->i_pages, the
    page-cache references for all pages in the folio are dropped.  The folio
    is then kept alive only by temporary external references, which allows a
    later split to operate on a folio whose subpages are no longer protected
    by page-cache references.
    
    After the page-cache references are gone, split_folio_to_order() can
    split the big folio into individual pages and put the resulting subpages
    back on the LRU.  For tail pages beyond EOF, split removes them from the
    page cache and drops their page-cache references.  A tail page can then
    remain on the LRU with PG_lru set while holding only the split caller's
    temporary reference.  When free_folio_and_swap_cache() drops that final
    reference, the page enters the final folio_put() release path.
    
    In parallel, folio_isolate_lru() can observe the same tail page with a
    non-zero refcount and PG_lru set.  It clears PG_lru before taking its own
    reference.  If this races with the final folio_put() from the split path,
    __folio_put() sees PG_lru already cleared and skips lruvec_del_folio().
    The page is then freed back to the allocator while its lru links are
    still present in the LRU list.  A later LRU operation on a neighboring
    page detects the stale link and reports list corruption.
    
    [1]
    [   22.486082] list_del corruption. next->prev should be fffffffec10e0ac8, but was dead000000000122. (next=fffffffec10e0a88)
    [   22.486130] ------------[ cut here ]------------
    [   22.486134] kernel BUG at lib/list_debug.c:67!
    [   22.486141] Internal error: Oops - BUG: 00000000f2000800 [#1]  SMP
    [   22.488502] Tainted: [W]=WARN, [O]=OOT_MODULE
    [   22.488506] Hardware name: Spreadtrum UMS9230 1H10 SoC (DT)
    [   22.488511] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [   22.488517] pc : __list_del_entry_valid_or_report+0x14c/0x154
    [   22.488531] lr : __list_del_entry_valid_or_report+0x14c/0x154
    [   22.488539] sp : ffffffc08006b830
    [   22.488542] x29: ffffffc08006b868 x28: 0000000000003020 x27: 0000000000000000
    [   22.488553] x26: 0000000000000000 x25: 0000000000000004 x24: fffffffec10e0ac0
    [   22.488564] x23: 00000000000000e8 x22: 0000000000000024 x21: dead000000000122
    [   22.488574] x20: fffffffec10e0a88 x19: fffffffec10e0ac8 x18: ffffffc080061060
    [   22.488585] x17: 20747562202c3863 x16: 6130653031636566 x15: 0000000000000058
    [   22.488595] x14: 0000000000000004 x13: ffffff80f91e0000 x12: 0000000000000003
    [   22.488605] x11: 0000000000000003 x10: 0000000000000001 x9 : ffe85721f0e25f00
    [   22.488615] x8 : ffe85721f0e25f00 x7 : 0000000000000000 x6 : 6c65645f7473696c
    [   22.488625] x5 : ffffffed39b23026 x4 : 0000000000000000 x3 : 0000000000000010
    [   22.488636] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 000000000000006d
    [   22.488647] Call trace:
    [   22.488651]  __list_del_entry_valid_or_report+0x14c/0x154 (P)
    [   22.488661]  __folio_put+0x2bc/0x434
    [   22.488670]  folio_put+0x28/0x58
    [   22.488678]  do_garbage_collect+0x1a34/0x2584
    [   22.488689]  f2fs_gc+0x230/0x9b4
    [   22.488697]  f2fs_fallocate+0xb90/0xdf4
    [   22.488706]  vfs_fallocate+0x1b4/0x2bc
    [   22.488716]  __arm64_sys_fallocate+0x44/0x78
    [   22.488725]  invoke_syscall+0x58/0xe4
    [   22.488732]  do_el0_svc+0x48/0xdc
    [   22.488739]  el0_svc+0x3c/0x98
    [   22.488747]  el0t_64_sync_handler+0x20/0x130
    [   22.488754]  el0t_64_sync+0x1c4/0x1c8
    
    [2]
    CPU0 (f2fs GC)              CPU1 (split_folio_to_order)          CPU2 (folio_isolate_lru)
    
    F: pagecache refs = n
    F: extra refs = GC + split
    F: PG_lru set
    move_data_block()
    folio = f2fs_grab_cache_folio(F)
    ...
    __folio_set_dropbehind(F)
    folio_unlock(F)
    folio_end_dropbehind(F)
      folio_unmap_invalidate(F)
        __filemap_remove_folio(F)
        folio_put_refs(F, n)
    folio_put(F)
                                split_folio_to_order(F)
                                  folio_ref_freeze(F, 1)
                                  ...
                                  lru_add_split_folio(T)
                                    list_add_tail(&T->lru, &F->lru)
                                    folio_set_lru(T)
                                  __filemap_remove_folio(T)
                                  folio_put_refs(T, 1)
                                  /* T refcount == 1, PageLRU set */
                                                                      folio_isolate_lru(T)
                                                                        folio_test_clear_lru(T)
                                free_folio_and_swap_cache(T)
                                  folio_put(T)
                                    /* refcount: 1 -> 0 */
                                    __folio_put(T)
                                      __page_cache_release(T)
                                        folio_test_lru(T) == false
                                        /* skip lruvec_del_folio(T) */
                                      free_frozen_pages(T)
                                                                      folio_get(T)
                                                                      lruvec_del_folio(T)
    later:
      list_del(adjacent->lru)
        next == &T->lru
        next->prev == LIST_POISON / PCP freelist
        BUG
    
    Cc: stable@vger.kernel.org
    Fixes: 9609dd704725 ("f2fs: remove non-uptodate folio from the page cache in move_data_block")
    Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
    Reviewed-by: Chao Yu <chao@kernel.org>
    Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

riscv: kfence: Call mark_new_valid_map() for kfence_unprotect() [+ + +]

Author: Vivian Wang <wangruikang@iscas.ac.cn>
Date:   Tue Mar 3 13:29:46 2026 +0800

    riscv: kfence: Call mark_new_valid_map() for kfence_unprotect()
    
    commit 8d6c8c40e733b3fcaf92fed0a078bba2f6941a3b upstream.
    
    In kfence_protect_page(), which kfence_unprotect() calls, we cannot send
    IPIs to other CPUs to ask them to flush TLB. This may lead to those CPUs
    spuriously faulting on a recently allocated kfence object despite it
    being valid, leading to false positive use-after-free reports.
    
    Fix this by calling mark_new_valid_map() so that the page fault handling
    code path notices the spurious fault and flushes TLB then retries the
    access.
    
    Update the comment in handle_exception to indicate that
    new_valid_map_cpus_check also handles kfence_unprotect() spurious
    faults.
    
    Note that kfence_protect() has the same stale TLB entries problem, but
    that leads to false negatives, which is fine with kfence.
    
    Cc: stable@vger.kernel.org
    Reported-by: Yanko Kaneti <yaneti@declera.com>
    Fixes: b3431a8bb336 ("riscv: Fix IPIs usage in kfence_protect_page()")
    Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
    Link: https://patch.msgid.link/20260303-handle-kfence-protect-spurious-fault-v2-2-f80d8354d79d@iscas.ac.cn
    Signed-off-by: Paul Walmsley <pjw@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

riscv: mm: Extract helper mark_new_valid_map() [+ + +]

Author: Vivian Wang <wangruikang@iscas.ac.cn>
Date:   Tue Mar 3 13:29:45 2026 +0800

    riscv: mm: Extract helper mark_new_valid_map()
    
    commit 9ee25d0a70ff4494b4e1d266b962d0a574ef318a upstream.
    
    In preparation of a future patch using the same mechanism for
    non-vmalloc addresses, extract the mark_new_valid_map() helper from
    flush_cache_vmap().
    
    No functional change intended.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
    Link: https://patch.msgid.link/20260303-handle-kfence-protect-spurious-fault-v2-1-f80d8354d79d@iscas.ac.cn
    Signed-off-by: Paul Walmsley <pjw@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rpmsg: char: Fix use-after-free on probe error path [+ + +]

Author: Yuho Choi <dbgh9129@gmail.com>
Date:   Mon Jun 1 14:32:47 2026 -0400

    rpmsg: char: Fix use-after-free on probe error path
    
    commit 1ff3f528e67d20e2b1483dcaba899dc7832b2e6b upstream.
    
    rpmsg_chrdev_probe() stores the newly allocated eptdev in the default
    endpoint's priv pointer before calling rpmsg_chrdev_eptdev_add(). If
    rpmsg_chrdev_eptdev_add() then fails, its error path frees eptdev while
    the default endpoint may still dispatch callbacks with the stale priv
    pointer.
    
    Avoid publishing eptdev through the default endpoint until
    rpmsg_chrdev_eptdev_add() succeeds. Messages received before the priv
    pointer is published should be ignored by rpmsg_ept_cb(). Flow-control
    updates can hit rpmsg_ept_flow_cb() in the same window, so make both
    callbacks return success when priv is NULL.
    
    Fixes: bc69d1066569 ("rpmsg: char: Introduce the "rpmsg-raw" channel")
    Signed-off-by: Yuho Choi <dbgh9129@gmail.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20260601183247.1962010-1-dbgh9129@gmail.com
    Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

sched/mmcid: Fix OOB clear_bit when CID is MM_CID_UNSET in fixup path [+ + +]

Author: Rik van Riel <riel@surriel.com>
Date:   Tue Jun 16 16:38:17 2026 -0400

    sched/mmcid: Fix OOB clear_bit when CID is MM_CID_UNSET in fixup path
    
    commit de3ab9bd3133899efb92e4cd05ba4203e58fc0a3 upstream.
    
    In mm_cid_fixup_cpus_to_tasks(), when rq->curr has the target mm and
    mm_cid.active is set, the CID is checked with cid_in_transit() before
    setting the transition bit.  In per-CPU mode a newly forked or exec'd
    task can be running with mm_cid.cid == MM_CID_UNSET because CIDs are
    assigned lazily on schedule-in.  With cid_in_transit() the guard passes
    for MM_CID_UNSET (no transit bit), converts it to MM_CID_UNSET |
    MM_CID_TRANSIT and stores it back; later mm_cid_schedout() feeds this
    to clear_bit() with MM_CID_UNSET as the bit number, triggering an
    out-of-bounds write.
    
    Symptoms: this is genuine memory corruption, but a bounded out-of-bounds
    write, not an arbitrary one.  MM_CID_UNSET is the fixed sentinel BIT(31),
    so once the bad value reaches mm_cid_schedout() the cid_from_transit_cid()
    strip leaves MM_CID_UNSET, which fails the "cid < max_cids" convergence
    test and falls into mm_drop_cid() -> clear_bit(MM_CID_UNSET,
    mm_cidmask(mm)).  The cid bitmap is embedded in the mm_struct slab object
    (after cpu_bitmap and mm_cpus_allowed) and is only num_possible_cpus()
    bits wide, so clearing bit 31 is a deterministic OOB bit-clear at a
    fixed offset of 2^31 / 8 == 256 MiB past the bitmap base.  The address is
    not attacker-influenced (fixed sentinel -> fixed offset) and the op only
    clears a single bit; what sits 256 MiB further along the direct map is
    whatever kernel object happens to live there, so this corrupts one bit of
    unpredictable kernel memory -- it is not an arbitrary-address or
    arbitrary-value write.
    
    It triggers only in per-CPU CID mode, when a CPU is running an active
    task of the target mm whose cid is still MM_CID_UNSET -- the
    fork()/execve() window before that task's next schedule-in assigns it a
    real CID -- and a per-CPU -> per-task fixup walks over it (the mode
    fallback driven by a thread exit, sched_mm_cid_exit(), or by the deferred
    max_cids recompute in mm_cid_work_fn()).
    
    In practice syzkaller surfaced it as a KASAN use-after-free reported in
    __schedule -> mm_cid_switch_to, where the offending clear_bit() is inlined
    via mm_cid_schedout() -> mm_drop_cid().
    
    Guard the transition-bit assignment against MM_CID_UNSET, in addition to
    the existing cid_in_transit() check, so the bit is only set on a genuine
    task-owned CID.  A CPU-owned (MM_CID_ONCPU) CID of a running active task
    is handled by the cid_on_cpu(pcp->cid) branch above and never reaches
    this path, so excluding MM_CID_UNSET (and the already-transitioning case)
    is sufficient.
    
    Fixes: fbd0e71dc370 ("sched/mmcid: Provide CID ownership mode fixup functions")
    Signed-off-by: Rik van Riel <riel@surriel.com>
    Signed-off-by: Thomas Gleixner <tglx@kernel.org>
    Assisted-by: Claude:claude-opus-4-8 syzkaller
    Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Cc: stable@vger.kernel.org
    Link: https://patch.msgid.link/20260616203818.1516263-1-riel@surriel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tipc: fix slab-use-after-free Read in tipc_aead_decrypt_done [+ + +]

Author: Doruk Tan Ozturk <doruk@0sec.ai>
Date:   Wed Jun 17 09:58:18 2026 +0200

    tipc: fix slab-use-after-free Read in tipc_aead_decrypt_done
    
    commit bda3348872a2ef0d19f2df6aa8cb5025adce2f20 upstream.
    
    tipc_aead_decrypt() goes straight from tipc_bearer_hold(b) to
    crypto_aead_decrypt(req) without taking a reference on the netns, unlike
    the encrypt path. When crypto_aead_decrypt() is offloaded asynchronously
    (e.g. the SIMD aead wrapper queuing to cryptd), the cryptd worker runs
    tipc_aead_decrypt_done() later. If the bearer's netns is torn down in the
    meantime, cleanup_net() -> tipc_exit_net() -> tipc_crypto_stop() frees the
    per-netns tipc_crypto, and the completion then reads it:
    tipc_aead_decrypt_done() dereferences aead->crypto->stats and
    aead->crypto->net, and tipc_crypto_rcv_complete() dereferences
    aead->crypto->aead[] and the node table -- reading freed memory.
    
    Decoded KASAN splat (v7.1-rc7, CONFIG_KASAN_INLINE + TIPC + TIPC_CRYPTO):
    
      BUG: KASAN: slab-use-after-free in tipc_aead_decrypt_done (net/tipc/crypto.c:999)
      Read of size 8 at addr ffff8881056258a8 by task kworker/u16:2/51
      Workqueue: events_unbound
      Call Trace:
       tipc_aead_decrypt_done (net/tipc/crypto.c:999)
       process_one_work (kernel/workqueue.c:3314)
       worker_thread (kernel/workqueue.c:3397 kernel/workqueue.c:3478)
       kthread (kernel/kthread.c:436)
       ret_from_fork (arch/x86/kernel/process.c:158)
       ret_from_fork_asm (arch/x86/entry/entry_64.S:245)
    
      Allocated by task 169:
       __kasan_kmalloc (mm/kasan/common.c:398 mm/kasan/common.c:415)
       tipc_crypto_start (net/tipc/crypto.c:1502)
       tipc_init_net (net/tipc/core.c:72)
       ops_init (net/core/net_namespace.c:137)
       setup_net (net/core/net_namespace.c:446)
       copy_net_ns (net/core/net_namespace.c:579)
       create_new_namespaces (kernel/nsproxy.c:132)
       __x64_sys_unshare (kernel/fork.c:3316)
       do_syscall_64 (arch/x86/entry/syscall_64.c:63)
       entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
    
      Freed by task 8:
       kfree (mm/slub.c:6566)
       tipc_exit_net (net/tipc/core.c:119)
       cleanup_net (net/core/net_namespace.c:704)
       process_one_work (kernel/workqueue.c:3314)
       kthread (kernel/kthread.c:436)
    
    This is the same class of bug that commit e279024617134 ("net/tipc: fix
    slab-use-after-free Read in tipc_aead_encrypt_done") fixed for the encrypt
    side. The encrypt path takes maybe_get_net(aead->crypto->net) before
    crypto_aead_encrypt() and drops it with put_net() on the synchronous
    return paths and in tipc_aead_encrypt_done(); the -EINPROGRESS/-EBUSY
    return keeps the reference for the async callback to release. The decrypt
    path was left without the equivalent guard.
    
    Mirror the encrypt-side fix on the decrypt path: take a net reference
    before crypto_aead_decrypt() (failing with -ENODEV and the matching
    bearer put if it cannot be acquired), keep it across the
    -EINPROGRESS/-EBUSY async return, and drop it with put_net() on the
    synchronous success/error return and at the end of
    tipc_aead_decrypt_done().
    
    Reproduced under KASAN on v7.1-rc7: a UDP bearer with a cluster key is
    flooded with crafted encrypted frames from an unknown peer (driving the
    cluster-key decrypt path) while the bearer's netns is repeatedly torn
    down. The completion must run asynchronously to outlive
    tipc_crypto_stop(); on x86 the stock aesni gcm(aes) now decrypts
    synchronously, so the async path was exercised via cryptd offload. The
    unguarded aead->crypto dereference in tipc_aead_decrypt_done() is the
    unpatched upstream path; tipc_aead_decrypt() still lacks
    maybe_get_net(aead->crypto->net), so the completion can outlive the free
    on any config where crypto_aead_decrypt() goes async.
    
    Found by 0sec automated security-research tooling (https://0sec.ai).
    
    Fixes: fc1b6d6de220 ("tipc: introduce TIPC encryption & authentication")
    Cc: stable@vger.kernel.org
    Signed-off-by: Doruk Tan Ozturk <doruk@0sec.ai>
    Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Reviewed-by: Tung Nguyen <tung.quang.nguyen@est.tech>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Link: https://patch.msgid.link/20260617075818.37431-1-doruk@0sec.ai
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

userfaultfd: build __VMA_UFFD_FLAGS from config-gated masks [+ + +]

Author: Kiryl Shutsemau (Meta) <kas@kernel.org>
Date:   Fri May 29 18:23:30 2026 +0100

    userfaultfd: build __VMA_UFFD_FLAGS from config-gated masks
    
    commit cc7a9f6e57c4f71e8e1fee3274b1ae8770f2a743 upstream.
    
    The VMA flags bitmap is a single word today: NUM_VMA_FLAG_BITS is
    BITS_PER_LONG, so on 32-bit vma_flags_t holds only 32 bits.  (The bitmap
    type exists so this can grow past BITS_PER_LONG later; until it does,
    anything declared above the first word is out of range on 32-bit.) The bit
    enum nevertheless declares some bits unconditionally above BITS_PER_LONG
    -- VMA_UFFD_MINOR_BIT is 41, with VM_UFFD_MINOR == VM_NONE on 32-bit so no
    VMA actually carries the bit.
    
    __VMA_UFFD_FLAGS feeds VMA_UFFD_MINOR_BIT to mk_vma_flags()
    unconditionally.  On 32-bit that becomes __set_bit(41, &one_long), a write
    one word past the end of the single-word bitmap.  The compiler folds the
    out-of-bounds store with wraparound (1UL << (41 % 32) == bit 9) into the
    first word; bit 9 is already in __VMA_UFFD_FLAGS so the mask happens to
    come out right today, but it is an out-of-bounds write all the same, and
    any high-numbered bit whose mod-BITS_PER_LONG position is otherwise unused
    would silently OR an extra bit into the mask.
    
    Rather than feed bit numbers that may not exist on the current build to
    mk_vma_flags(), build the mask from whole per-mode masks that collapse to
    EMPTY_VMA_FLAGS when their feature is unavailable.  Add
    mk_vma_flags_from_masks() for that, and define VMA_UFFD_MISSING / _WP /
    _MINOR alongside the VM_UFFD_* flags, gating VMA_UFFD_MINOR on the same
    config as VM_UFFD_MINOR (which implies 64BIT, where bit 41 fits).  An
    out-of-range bit is then never materialised, on any arch, and the in-range
    fast path stays a compile-time constant.
    
    Link: https://lore.kernel.org/20260529172331.356655-7-kas@kernel.org
    Fixes: 9ea35a25d51b ("mm: introduce VMA flags bitmap type")
    Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
    Reported-by: Sashiko AI review <sashiko-bot@kernel.org>
    Suggested-by: Lorenzo Stoakes <ljs@kernel.org>
    Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>
    Assisted-by: Claude:claude-opus-4-8
    Cc: David Hildenbrand <david@kernel.org>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Mike Rapoport <rppt@kernel.org>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: Vlastimil Babka <vbabka@kernel.org>
    Cc: Balbir Singh <balbirs@nvidia.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

userfaultfd: ensure mremap_userfaultfd_fail() releases mmap_changing [+ + +]

Author: Mike Rapoport (Microsoft) <rppt@kernel.org>
Date:   Wed May 13 11:14:16 2026 +0300

    userfaultfd: ensure mremap_userfaultfd_fail() releases mmap_changing
    
    commit 0496a59745b0723ea74274db16fd5c8b1379b9a9 upstream.
    
    Sashiko says:
    
      mremap_userfaultfd_prep() increments ctx->mmap_changing to stall
      concurrent operations, but mremap_userfaultfd_fail() does not
      decrement it before dropping the context reference.
    
    If an mremap operation fails, ctx->mmap_changing remains elevated. This
    will causes subsequent userfaultfd operations like a UFFDIO_COPY to fail
    with -EAGAIN.
    
    Decrement ctx->mmap_changing in mremap_userfaultfd_fail().
    
    Link: https://sashiko.dev/#/patchset/20260430113512.115938-1-rppt@kernel.org
    Link: https://lore.kernel.org/20260513081416.495963-1-rppt@kernel.org
    Fixes: df2cc96e7701 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races")
    Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
    Reviewed-by: David Hildenbrand (Arm) <david@kernel.org>
    Cc: Al Viro <viro@zeniv.linux.org.uk>
    Cc: Christian Brauner <brauner@kernel.org>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: ath11k: fix warning when unbinding [+ + +]

Author: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
Date:   Mon Apr 20 13:01:29 2026 +0200

    wifi: ath11k: fix warning when unbinding
    
    commit 8b7a26b6681922a38cd5a7829ace61f8e54df9b7 upstream.
    
    If there is an error during some initialization related to firmware,
    the buffers dp->tx_ring[i].tx_status are released.
    However this is released again when the device is unbinded (ath11k_pci),
    and we get:
    WARNING: CPU: 0 PID: 6231 at mm/slub.c:4368 free_large_kmalloc+0x57/0x90
    Call Trace:
    free_large_kmalloc
    ath11k_dp_free
    ath11k_core_deinit
    ath11k_pci_remove
    ...
    
    The issue is always reproducible from a VM because the MSI addressing
    initialization is failing.
    
    In order to fix the issue, just set the buffers to NULL after releasing in
    order to avoid the double free.
    
    Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
    Cc: stable@vger.kernel.org
    Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com>
    Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com>
    Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com>
    Link: https://patch.msgid.link/20260420110130.509670-1-jtornosm@redhat.com
    Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: iwlwifi: mld: fix race condition in PTP removal [+ + +]

Author: Junjie Cao <junjie.cao@intel.com>
Date:   Thu Feb 12 20:50:35 2026 +0800

    wifi: iwlwifi: mld: fix race condition in PTP removal
    
    commit e1fc08598aa34b28359831e768076f56632720c1 upstream.
    
    iwl_mld_ptp_remove() calls cancel_delayed_work_sync() only after
    ptp_clock_unregister() and clearing ptp_data state (ptp_clock,
    last_gp2, wrap_counter).
    
    This creates a race where the delayed work iwl_mld_ptp_work() can
    execute between ptp_clock_unregister() and cancel_delayed_work_sync(),
    observing partially cleared PTP state.
    
    Move cancel_delayed_work_sync() before ptp_clock_unregister() to
    ensure the delayed work is fully stopped before any PTP cleanup
    begins.
    
    Cc: stable@vger.kernel.org
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
    Signed-off-by: Junjie Cao <junjie.cao@intel.com>
    Link: https://patch.msgid.link/20260212125035.1345718-2-junjie.cao@intel.com
    Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: iwlwifi: mld: validate sta_mask before ffs() in BA session handlers [+ + +]

Author: Junrui Luo <moonafterrain@outlook.com>
Date:   Thu Apr 2 14:48:07 2026 +0800

    wifi: iwlwifi: mld: validate sta_mask before ffs() in BA session handlers
    
    commit f056fc2b927448d37eca6b6cacc3d1b0f67b20d2 upstream.
    
    Three BA session handlers use ffs(ba_data->sta_mask) - 1 to derive a
    station ID without checking that sta_mask is non-zero. When sta_mask is
    zero, ffs() returns 0 and the subtraction wraps to 0xFFFFFFFF, causing
    an out-of-bounds access on fw_id_to_link_sta[].
    
    Add WARN_ON_ONCE(!ba_data->sta_mask) guards before each ffs() call,
    consistent with the existing check in iwl_mld_ampdu_rx_start().
    
    Reported-by: Yuhao Jiang <danisjiang@gmail.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Junrui Luo <moonafterrain@outlook.com>
    Link: https://patch.msgid.link/SYBPR01MB788115C6CE873271A9A15A25AF51A@SYBPR01MB7881.ausprd01.prod.outlook.com
    Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: iwlwifi: mvm: fix race condition in PTP removal [+ + +]

Author: Junjie Cao <junjie.cao@intel.com>
Date:   Thu Feb 12 20:50:34 2026 +0800

    wifi: iwlwifi: mvm: fix race condition in PTP removal
    
    commit 65150c9cc3e06ab54bc4e8134a47f6f5d095a4e3 upstream.
    
    iwl_mvm_ptp_remove() calls cancel_delayed_work_sync() only after
    ptp_clock_unregister() and clearing ptp_data state (ptp_clock,
    ptp_clock_info, last_gp2).
    
    This creates a race where the delayed work iwl_mvm_ptp_work() can
    execute between ptp_clock_unregister() and cancel_delayed_work_sync(),
    observing partially cleared PTP state.
    
    Move cancel_delayed_work_sync() before ptp_clock_unregister() to
    ensure the delayed work is fully stopped before any PTP cleanup
    begins.
    
    Cc: stable@vger.kernel.org
    Reviewed-by: Simon Horman <horms@kernel.org>
    Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
    Signed-off-by: Junjie Cao <junjie.cao@intel.com>
    Link: https://patch.msgid.link/20260212125035.1345718-1-junjie.cao@intel.com
    Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: mt76: add wcid publish check in mt76_sta_add [+ + +]

Author: Jiajia Liu <liujiajia@kylinos.cn>
Date:   Thu May 28 11:38:14 2026 +0800

    wifi: mt76: add wcid publish check in mt76_sta_add
    
    commit 20b126920a259df4d7dcae19fcfe2c57a74d6b2e upstream.
    
    Since mt7925_mac_sta_add publishes wcid, add publish check in mt76_sta_add
    to avoid reinitializing the wcid->poll_list.
    
    Found dev->sta_poll_list corruption when using mt7925 and 7.1-rc4.
    According to the corruption information, prev->next was changed to itself.
    
    wlan0: disconnect from AP 90:fb:5d:94:8b:e3 for new auth to 90:fb:5d:94:8b:e2
    wlan0: authenticate with 90:fb:5d:94:8b:e2 (local address=84:9e:56:9c:7e:6b)
    wlan0: send auth to 90:fb:5d:94:8b:e2 (try 1/3)
     slab kmalloc-8k start ffff8c80958a6000 pointer offset 4160 size 8192
    list_add corruption. prev->next should be next (ffff8c808a7488f8), but was ffff8c80958a7040. (prev=ffff8c80958a7040).
    
     mt76_wcid_add_poll+0x95/0xd0 [mt76]
     mt7925_mac_add_txs.part.0+0xa5/0xe0 [mt7925_common]
     mt7925_rx_check+0xa7/0xc0 [mt7925_common]
     mt76_dma_rx_poll+0x50d/0x790 [mt76]
     mt792x_poll_rx+0x52/0xe0 [mt792x_lib]
    
    Signed-off-by: Jiajia Liu <liujiajia@kylinos.cn>
    Link: https://patch.msgid.link/20260528033814.46418-1-liujiajia@kylinos.cn
    Signed-off-by: Felix Fietkau <nbd@nbd.name>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: mt76: mt76x2u: Add support for ELECOM WDC-867SU3S [+ + +]

Author: Zenm Chen <zenmchen@gmail.com>
Date:   Tue Apr 7 23:44:30 2026 +0800

    wifi: mt76: mt76x2u: Add support for ELECOM WDC-867SU3S
    
    commit f4ce0664e9f0387873b181777891741c33e19465 upstream.
    
    Add the ID 056e:400a to the table to support an additional MT7612U
    adapter: ELECOM WDC-867SU3S.
    
    Compile tested only.
    
    Cc: stable@vger.kernel.org # 5.10.x
    Signed-off-by: Zenm Chen <zenmchen@gmail.com>
    Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
    Link: https://patch.msgid.link/20260407154430.9184-1-zenmchen@gmail.com
    Signed-off-by: Felix Fietkau <nbd@nbd.name>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: mt76: mt7925: don't disable AP BSS when removing TDLS peer [+ + +]

Author: ElXreno <elxreno@gmail.com>
Date:   Wed May 6 04:39:16 2026 +0300

    wifi: mt76: mt7925: don't disable AP BSS when removing TDLS peer
    
    commit 37d65384aa6f9cbe45f4052b13b378af1aab3e95 upstream.
    
    On a STATION vif, removing a TDLS peer takes the mt7925_mac_sta_remove
    -> mt7925_mac_sta_remove_links path. The first loop in that function
    calls mt7925_mcu_add_bss_info(..., enable=false) for every link of the
    station being removed. For a non-MLO STATION vif there is exactly one
    link, link 0, whose bss_conf is the AP's. TDLS peers do not have their
    own bss_conf - they share the AP's BSS.
    
    The result is that every TDLS peer teardown sends a BSS_INFO_UPDATE
    with enable=0 for the AP's BSS to the firmware, which wipes the AP-side
    rate-control context. The connection stays associated and TX from the
    host still works at the negotiated rate, but the AP's downlink to us
    collapses to the lowest mandatory OFDM rate (HE-MCS 0 / 6 Mbit/s OFDM)
    and only slowly recovers as rate adaptation re-learns under sustained
    traffic. With brief or bursty traffic the link can stay at 6-72 Mbit/s
    indefinitely, requiring a manual reconnect.
    
    mt7925_mac_link_sta_remove() already guards its own
    mt7925_mcu_add_bss_info(..., false) call with
    "vif->type == NL80211_IFTYPE_STATION && !link_sta->sta->tdls".
    Add the equivalent guard at the top of the cleanup loop in
    mt7925_mac_sta_remove_links(), above the link_sta / link_conf /
    mlink / mconf lookups, so TDLS peer teardown skips the loop body
    entirely without doing the per-link work that would just be thrown
    away.
    
    Verified on mt7925e by triggering Samsung-S938B auto-TDLS via iperf3
    and watching iw rx bitrate after teardown:
    
      Before: rx bitrate collapses to 6.0-72.0 Mbit/s, oscillates 17/72/
              137/288/432 Mbit/s for 30+ seconds, no full recovery without
              a manual reassoc.
      After:  rx bitrate stays at 1200.9 Mbit/s HE-MCS 11 NSS 2 80 MHz
              across the entire TDLS lifecycle.
    
    bpftrace confirms a single mt7925_mcu_add_bss_info(enable=0) call per
    teardown before the fix; zero such calls after.
    
    Fixes: 3878b4333602 ("wifi: mt76: mt7925: update mt7925_mac_link_sta_[add, assoc, remove] for MLO")
    Cc: stable@vger.kernel.org
    Signed-off-by: ElXreno <elxreno@gmail.com>
    Assisted-by: Claude:claude-opus-4-7 bpftrace
    Link: https://patch.msgid.link/20260506-mt7925-tdls-fixes-v2-2-46aa826ba8bb@gmail.com
    Signed-off-by: Felix Fietkau <nbd@nbd.name>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: rtl8xxxu: Detect the maximum supported channel width [+ + +]

Author: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Date:   Wed Apr 29 15:02:48 2026 +0300

    wifi: rtl8xxxu: Detect the maximum supported channel width
    
    commit ef771eabc79d5f21b63689cca0e0fa5493fa0a8a upstream.
    
    Some devices malfunction when connected to a network with 40 MHz channel
    width, because they don't support that.
    
    RTL8188FU, RTL8192FU, and RTL8710BU (RTL8188GU) have a way to signal
    this (and some other capabilities) to the driver. Get this information
    from the hardware and advertise 40 MHz support only when the hardware
    can handle it. We assume the other chips can always handle it.
    
    RTL8710BU needs a different way to retrieve this information, which will
    be implemented some other time.
    
    Fixes: dbf9b7bb0edf ("wifi: rtl8xxxu: Enable 40 MHz width by default")
    Cc: stable@vger.kernel.org
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221394
    Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
    Reviewed-by: Ping-Ke Shih <pkshih@realtek.com>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://patch.msgid.link/c57de68e-5d57-4c26-898f-8a284bb25381@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: rtlwifi: rtl8821ae: Fix C2H bit location in RX descriptor [+ + +]

Author: Bitterblue Smith <rtl8821cerfe2@gmail.com>
Date:   Sat Apr 25 22:32:58 2026 +0300

    wifi: rtlwifi: rtl8821ae: Fix C2H bit location in RX descriptor
    
    commit 83d38df6929118c3f996b9e3351c2d5014073d87 upstream.
    
    Bit 28 of double word 2 in the RX descriptor indicates if the packet is
    a normal 802.11 frame, or a message from the wifi firmware to the
    driver (Card 2 Host).
    
    Commit f5678bfe1cdc ("rtlwifi: rtl8821ae: Replace local bit manipulation
    macros") mistakenly made the driver look for this bit in double word 1,
    causing packet loss and Bluetooth coexistence problems.
    
    Fixes: f5678bfe1cdc ("rtlwifi: rtl8821ae: Replace local bit manipulation macros")
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Bitterblue Smith <rtl8821cerfe2@gmail.com>
    Acked-by: Ping-Ke Shih <pkshih@realtek.com>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://patch.msgid.link/04da7398-cedb-425a-a810-5772ab10139d@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: rtw88: increase TX report timeout to fix race condition [+ + +]

Author: Luka Gejak <luka.gejak@linux.dev>
Date:   Mon May 18 16:23:10 2026 +0200

    wifi: rtw88: increase TX report timeout to fix race condition
    
    commit c80788f7c5aed8d420366b821f867a8a353d83a5 upstream.
    
    The driver expects the firmware to report TX status within 500ms.
    However, a timeout can be triggered when the hardware performs
    background scans while under TX load. During these scans, the firmware
    stays off-channel for periods exceeding 500ms, delaying the delivery of
    TX reports back to the driver.
    
    When this occurs, the purge timer fires prematurely and drops the
    tracking skbs from the queue. This results in the host stack
    interpreting the missing status as packet loss, leading to TCP window
    collapse. In testing with iperf3, this causes throughput to drop from
    ~90 Mbps to near-zero for approximately 2 seconds until the connection
    recovers.
    
    Increase RTW_TX_PROBE_TIMEOUT to 2500ms for RTL8723DU. This duration is
    sufficient to accommodate off-channel dwell time during full background
    scans, ensuring the purge timer only trips during genuine firmware
    lockups and preventing unnecessary TCP retransmission cycles.
    
    Fixes: a82dfd33d123 ("wifi: rtw88: Add common USB chip support")
    Cc: stable@vger.kernel.org
    Acked-by: Ping-Ke Shih <pkshih@realtek.com>
    Tested-by: Luka Gejak <luka.gejak@linux.dev>
    Signed-off-by: Luka Gejak <luka.gejak@linux.dev>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://patch.msgid.link/20260518142311.10328-1-luka.gejak@linux.dev
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

wifi: rtw88: usb: fix memory leaks on USB write failures [+ + +]

Author: Luka Gejak <luka.gejak@linux.dev>
Date:   Mon May 18 16:23:11 2026 +0200

    wifi: rtw88: usb: fix memory leaks on USB write failures
    
    commit 6b964941bbfe6e0f18b1a5e008486dbb62df440a upstream.
    
    When rtw_usb_write_port() fails to submit a USB Request Block (URB)
    (e.g., due to device disconnect or ENOMEM), the completion callback is
    never executed.
    
    Currently, the driver ignores the return value of rtw_usb_write_port()
    in rtw_usb_write_data() and rtw_usb_tx_agg_skb(). Because these
    functions rely on the completion callback to free the socket buffers
    (skbs) and the transaction control block (txcb), a submission failure
    results in:
    1. A memory leak of the allocated skb in rtw_usb_write_data().
    2. A memory leak of the txcb structure and all aggregated skbs in
       rtw_usb_tx_agg_skb().
    
    Fix this by checking the return value of rtw_usb_write_port(). If it
    fails, explicitly free the skb in rtw_usb_write_data(), and properly
    purge the tx_ack_queue and free the txcb in rtw_usb_tx_agg_skb().
    
    The issue was discovered in practice during device disconnect/reconnect
    scenarios and memory pressure conditions. Tested by verifying normal TX
    operation continues after the fix without regressions.
    
    Fixes: a82dfd33d123 ("wifi: rtw88: Add common USB chip support")
    Cc: stable@vger.kernel.org
    Acked-by: Ping-Ke Shih <pkshih@realtek.com>
    Tested-by: Luka Gejak <luka.gejak@linux.dev>
    Signed-off-by: Luka Gejak <luka.gejak@linux.dev>
    Signed-off-by: Ping-Ke Shih <pkshih@realtek.com>
    Link: https://patch.msgid.link/20260518142311.10328-2-luka.gejak@linux.dev
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>