Changelog in Linux kernel 6.1.119

ALSA: hda/realtek - Fixed Clevo platform headset Mic issue [+ + +]

Author: Kailang Yang <kailang@realtek.com>
Date:   Fri Oct 25 16:37:57 2024 +0800

    ALSA: hda/realtek - Fixed Clevo platform headset Mic issue
    
    commit 42ee87df8530150d637aa48363b72b22a9bbd78f upstream.
    
    Clevo platform with ALC255 Headset Mic was disable by default.
    Assigned verb table for Mic pin will enable it.
    
    Signed-off-by: Kailang Yang <kailang@realtek.com>
    Cc: <stable@vger.kernel.org>
    Link: https://lore.kernel.org/b2dcac3e09ef4f82b36d6712194e1ea4@realtek.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek: fix mute/micmute LEDs for a HP EliteBook 645 G10 [+ + +]

Author: Maksym Glubokiy <maxgl.kernel@gmail.com>
Date:   Tue Nov 12 17:48:15 2024 +0200

    ALSA: hda/realtek: fix mute/micmute LEDs for a HP EliteBook 645 G10
    
    commit 96409eeab8cdd394e03ec494ea9547edc27f7ab4 upstream.
    
    HP EliteBook 645 G10 uses ALC236 codec and need the
    ALC236_FIXUP_HP_MUTE_LED_MICMUTE_VREF quirk to make mute LED and
    micmute LED work.
    
    Signed-off-by: Maksym Glubokiy <maxgl.kernel@gmail.com>
    Cc: <stable@vger.kernel.org>
    Link: https://patch.msgid.link/20241112154815.10888-1-maxgl.kernel@gmail.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ARM: 9419/1: mm: Fix kernel memory mapping for xip kernels [+ + +]

Author: Harith G <harith.g@alifsemi.com>
Date:   Wed Sep 18 06:57:11 2024 +0100

    ARM: 9419/1: mm: Fix kernel memory mapping for xip kernels
    
    [ Upstream commit ed6cbe6e5563452f305e89c15846820f2874e431 ]
    
    The patchset introducing kernel_sec_start/end variables to separate the
    kernel/lowmem memory mappings, broke the mapping of the kernel memory
    for xipkernels.
    
    kernel_sec_start/end variables are in RO area before the MMU is switched
    on for xipkernels.
    So these cannot be set early in boot in head.S. Fix this by setting these
    after MMU is switched on.
    xipkernels need two different mappings for kernel text (starting at
    CONFIG_XIP_PHYS_ADDR) and data (starting at CONFIG_PHYS_OFFSET).
    Also, move the kernel code mapping from devicemaps_init() to map_kernel().
    
    Fixes: a91da5457085 ("ARM: 9089/1: Define kernel physical section start and end")
    Signed-off-by: Harith George <harith.g@alifsemi.com>
    Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
    Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: hci_core: Fix calling mgmt_device_connected [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Fri Nov 8 11:19:54 2024 -0500

    Bluetooth: hci_core: Fix calling mgmt_device_connected
    
    [ Upstream commit 7967dc8f797f454d4f4acec15c7df0cdf4801617 ]
    
    Since 61a939c68ee0 ("Bluetooth: Queue incoming ACL data until
    BT_CONNECTED state is reached") there is no long the need to call
    mgmt_device_connected as ACL data will be queued until BT_CONNECTED
    state.
    
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=219458
    Link: https://github.com/bluez/bluez/issues/1014
    Fixes: 333b4fd11e89 ("Bluetooth: L2CAP: Fix uaf in l2cap_connect")
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: hci_event: Remove code to removed CONFIG_BT_HS [+ + +]

Author: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Date:   Wed Feb 7 14:42:11 2024 +0100

    Bluetooth: hci_event: Remove code to removed CONFIG_BT_HS
    
    [ Upstream commit f4b0c2b4cd78b75acde56c2ee5aa732b6fb2a6a9 ]
    
    Commit cec9f3c5561d ("Bluetooth: Remove BT_HS") removes config BT_HS, but
    misses two "ifdef BT_HS" blocks in hci_event.c.
    
    Remove this dead code from this removed config option.
    
    Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Stable-dep-of: 7967dc8f797f ("Bluetooth: hci_core: Fix calling mgmt_device_connected")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: ISO: Fix not validating setsockopt user input [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Fri Apr 5 15:56:50 2024 -0400

    Bluetooth: ISO: Fix not validating setsockopt user input
    
    commit 9e8742cdfc4b0e65266bb4a901a19462bda9285e upstream.
    
    Check user input length before copying data.
    
    Fixes: ccf74f2390d6 ("Bluetooth: Add BTPROTO_ISO socket type")
    Fixes: 0731c5ab4d51 ("Bluetooth: ISO: Add support for BT_PKT_STATUS")
    Fixes: f764a6c2c1e4 ("Bluetooth: ISO: Add broadcast support")
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    [Xiangyu:  Bp to fix CVE: CVE-2024-35964 resolved minor conflicts]
    Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

bonding: add ns target multicast address to slave device [+ + +]

Author: Hangbin Liu <liuhangbin@gmail.com>
Date:   Mon Nov 11 10:16:49 2024 +0000

    bonding: add ns target multicast address to slave device
    
    [ Upstream commit 8eb36164d1a6769a20ed43033510067ff3dab9ee ]
    
    Commit 4598380f9c54 ("bonding: fix ns validation on backup slaves")
    tried to resolve the issue where backup slaves couldn't be brought up when
    receiving IPv6 Neighbor Solicitation (NS) messages. However, this fix only
    worked for drivers that receive all multicast messages, such as the veth
    interface.
    
    For standard drivers, the NS multicast message is silently dropped because
    the slave device is not a member of the NS target multicast group.
    
    To address this, we need to make the slave device join the NS target
    multicast group, ensuring it can receive these IPv6 NS messages to validate
    the slave’s status properly.
    
    There are three policies before joining the multicast group:
    1. All settings must be under active-backup mode (alb and tlb do not support
       arp_validate), with backup slaves and slaves supporting multicast.
    2. We can add or remove multicast groups when arp_validate changes.
    3. Other operations, such as enslaving, releasing, or setting NS targets,
       need to be guarded by arp_validate.
    
    Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets")
    Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
    Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

char: xillybus: Fix trivial bug with mutex [+ + +]

Author: Eli Billauer <eli.billauer@gmail.com>
Date:   Thu Nov 17 09:18:25 2022 +0200

    char: xillybus: Fix trivial bug with mutex
    
    commit c002f04c0bc79ec00d4beb75fb631d5bf37419bd upstream.
    
    @unit_mutex protects @unit from being freed, so obviously it should be
    released after @unit is used, and not before.
    
    This is a follow-up to commit 282a4b71816b ("char: xillybus: Prevent
    use-after-free due to race condition") which ensures, among others, the
    protection of @private_data after @unit_mutex has been released.
    
    Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
    Signed-off-by: Eli Billauer <eli.billauer@gmail.com>
    Link: https://lore.kernel.org/r/20221117071825.3942-1-eli.billauer@gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

char: xillybus: Prevent use-after-free due to race condition [+ + +]

Author: Eli Billauer <eli.billauer@gmail.com>
Date:   Sun Oct 30 11:42:09 2022 +0200

    char: xillybus: Prevent use-after-free due to race condition
    
    commit 282a4b71816b6076029017a7bab3a9dcee12a920 upstream.
    
    The driver for XillyUSB devices maintains a kref reference count on each
    xillyusb_dev structure, which represents a physical device. This reference
    count reaches zero when the device has been disconnected and there are no
    open file descriptors that are related to the device. When this occurs,
    kref_put() calls cleanup_dev(), which clears up the device's data,
    including the structure itself.
    
    However, when xillyusb_open() is called, this reference count becomes
    tricky: This function needs to obtain the xillyusb_dev structure that
    relates to the inode's major and minor (as there can be several such).
    xillybus_find_inode() (which is defined in xillybus_class.c) is called
    for this purpose. xillybus_find_inode() holds a mutex that is global in
    xillybus_class.c to protect the list of devices, and releases this
    mutex before returning. As a result, nothing protects the xillyusb_dev's
    reference counter from being decremented to zero before xillyusb_open()
    increments it on its own behalf. Hence the structure can be freed
    due to a rare race condition.
    
    To solve this, a mutex is added. It is locked by xillyusb_open() before
    the call to xillybus_find_inode() and is released only after the kref
    counter has been incremented on behalf of the newly opened inode. This
    protects the kref reference counters of all xillyusb_dev structs from
    being decremented by xillyusb_disconnect() during this time segment, as
    the call to kref_put() in this function is done with the same lock held.
    
    There is no need to hold the lock on other calls to kref_put(), because
    if xillybus_find_inode() finds a struct, xillyusb_disconnect() has not
    made the call to remove it, and hence not made its call to kref_put(),
    which takes place afterwards. Hence preventing xillyusb_disconnect's
    call to kref_put() is enough to ensure that the reference doesn't reach
    zero before it's incremented by xillyusb_open().
    
    It would have been more natural to increment the reference count in
    xillybus_find_inode() of course, however this function is also called by
    Xillybus' driver for PCIe / OF, which registers a completely different
    structure. Therefore, xillybus_find_inode() treats these structures as
    void pointers, and accordingly can't make any changes.
    
    Reported-by: Hyunwoo Kim <imv4bel@gmail.com>
    Suggested-by: Alan Stern <stern@rowland.harvard.edu>
    Signed-off-by: Eli Billauer <eli.billauer@gmail.com>
    Link: https://lore.kernel.org/r/20221030094209.65916-1-eli.billauer@gmail.com
    Signed-off-by: Bin Lan <bin.lan.cn@windriver.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cxl/pci: fix error code in __cxl_hdm_decode_init() [+ + +]

Author: Dan Carpenter <dan.carpenter@linaro.org>
Date:   Fri Nov 15 17:11:38 2024 +0300

    cxl/pci: fix error code in __cxl_hdm_decode_init()
    
    When commit 0cab68720598 ("cxl/pci: Fix disabling memory if DVSEC CXL
    Range does not match a CFMWS window") was backported, this chunk moved
    from the cxl_hdm_decode_init() function which returns negative error
    codes to the __cxl_hdm_decode_init() function which returns false on
    error.  So the error code needs to be modified from -ENXIO to false.
    
    This issue only exits in the 6.1.y kernels.  In later kernels negative
    error codes are correct and the driver didn't exist in earlier kernels.
    
    Fixes: 031217128990 ("cxl/pci: Fix disabling memory if DVSEC CXL Range does not match a CFMWS window")
    Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
    Reviewed-by: Ira Weiny <ira.weiny@intel.com>
    Reviewed-by: Dave Jiang <dave.jiang@intel.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd: check num of link levels when update pcie param [+ + +]

Author: Lin.Cao <lincao12@amd.com>
Date:   Wed Oct 25 11:32:41 2023 +0800

    drm/amd: check num of link levels when update pcie param
    
    commit 406e8845356d18bdf3d3a23b347faf67706472ec upstream.
    
    In SR-IOV environment, the value of pcie_table->num_of_link_levels will
    be 0, and num_of_levels - 1 will cause array index out of bounds
    
    Signed-off-by: Lin.Cao <lincao12@amd.com>
    Acked-by: Jingwen Chen <Jingwen.Chen2@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    [ Resolve minor conflicts to fix CVE-2023-52812 ]
    Signed-off-by: Bin Lan <bin.lan.cn@windriver.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd: Fix initialization mistake for NBIO 7.7.0 [+ + +]

Author: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
Date:   Tue Nov 12 10:11:42 2024 -0600

    drm/amd: Fix initialization mistake for NBIO 7.7.0
    
    commit 7013a8268d311fded6c7a6528fc1de82668e75f6 upstream.
    
    There is a strapping issue on NBIO 7.7.0 that can lead to spurious PME
    events while in the D0 state.
    
    Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
    Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda@amd.com>
    Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
    Acked-by: Alex Deucher <alexander.deucher@amd.com>
    Link: https://lore.kernel.org/r/20241112161142.28974-1-mario.limonciello@amd.com
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit 447a54a0f79c9a409ceaa17804bdd2e0206397b9)
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/bridge: tc358768: Fix DSI command tx [+ + +]

Author: Francesco Dolcini <francesco.dolcini@toradex.com>
Date:   Thu Sep 26 16:12:46 2024 +0200

    drm/bridge: tc358768: Fix DSI command tx
    
    commit 32c4514455b2b8fde506f8c0962f15c7e4c26f1d upstream.
    
    Wait for the command transmission to be completed in the DSI transfer
    function polling for the dc_start bit to go back to idle state after the
    transmission is started.
    
    This is documented in the datasheet and failures to do so lead to
    commands corruption.
    
    Fixes: ff1ca6397b1d ("drm/bridge: Add tc358768 driver")
    Cc: stable@vger.kernel.org
    Signed-off-by: Francesco Dolcini <francesco.dolcini@toradex.com>
    Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
    Link: https://lore.kernel.org/r/20240926141246.48282-1-francesco@dolcini.it
    Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
    Link: https://patchwork.freedesktop.org/patch/msgid/20240926141246.48282-1-francesco@dolcini.it
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/rockchip: vop: Fix a dereferenced before check warning [+ + +]

Author: Andy Yan <andy.yan@rock-chips.com>
Date:   Mon Oct 21 15:28:06 2024 +0800

    drm/rockchip: vop: Fix a dereferenced before check warning
    
    [ Upstream commit ab1c793f457f740ab7108cc0b1340a402dbf484d ]
    
    The 'state' can't be NULL, we should check crtc_state.
    
    Fix warning:
    drivers/gpu/drm/rockchip/rockchip_drm_vop.c:1096
    vop_plane_atomic_async_check() warn: variable dereferenced before check
    'state' (see line 1077)
    
    Fixes: 5ddb0bd4ddc3 ("drm/atomic: Pass the full state to planes async atomic check and update")
    Signed-off-by: Andy Yan <andy.yan@rock-chips.com>
    Signed-off-by: Heiko Stuebner <heiko@sntech.de>
    Link: https://patchwork.freedesktop.org/patch/msgid/20241021072818.61621-1-andyshrk@163.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

fs/9p: fix uninitialized values during inode evict [+ + +]

Author: Eric Van Hensbergen <ericvh@kernel.org>
Date:   Tue Nov 19 11:43:16 2024 +0800

    fs/9p: fix uninitialized values during inode evict
    
    [ Upstream commit 6630036b7c228f57c7893ee0403e92c2db2cd21d ]
    
    If an iget fails due to not being able to retrieve information
    from the server then the inode structure is only partially
    initialized.  When the inode gets evicted, references to
    uninitialized structures (like fscache cookies) were being
    made.
    
    This patch checks for a bad_inode before doing anything other
    than clearing the inode from the cache.  Since the inode is
    bad, it shouldn't have any state associated with it that needs
    to be written back (and there really isn't a way to complete
    those anyways).
    
    Reported-by: syzbot+eb83fe1cce5833cd66a0@syzkaller.appspotmail.com
    Signed-off-by: Eric Van Hensbergen <ericvh@kernel.org>
    (cherry picked from commit 1b4cb6e91f19b81217ad98142ee53a1ab25893fd)
    [Xiangyu: CVE-2024-36923 Minor conflict resolution due to missing 4eb31178 ]
    Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

fs/ntfs3: Additional check in ntfs_file_release [+ + +]

Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Mon Nov 18 10:26:50 2024 +0800

    fs/ntfs3: Additional check in ntfs_file_release
    
    [ Upstream commit 031d6f608290c847ba6378322d0986d08d1a645a ]
    
    Reported-by: syzbot+8c652f14a0fde76ff11d@syzkaller.appspotmail.com
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
    Signed-off-by: Bin Lan <bin.lan.cn@windriver.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ima: fix buffer overrun in ima_eventdigest_init_common [+ + +]

Author: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
Date:   Wed Aug 7 10:27:13 2024 -0700

    ima: fix buffer overrun in ima_eventdigest_init_common
    
    commit 923168a0631bc42fffd55087b337b1b6c54dcff5 upstream.
    
    Function ima_eventdigest_init() calls ima_eventdigest_init_common()
    with HASH_ALGO__LAST which is then used to access the array
    hash_digest_size[] leading to buffer overrun. Have a conditional
    statement to handle this.
    
    Fixes: 9fab303a2cb3 ("ima: fix violation measurement list record")
    Signed-off-by: Samasth Norway Ananda <samasth.norway.ananda@oracle.com>
    Tested-by: Enrico Bravi (PhD at polito.it) <enrico.bravi@huawei.com>
    Cc: stable@vger.kernel.org # 5.19+
    Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ipvs: properly dereference pe in ip_vs_add_service [+ + +]

Author: Chen Hanxiao <chenhx.fnst@fujitsu.com>
Date:   Tue Nov 19 18:20:10 2024 +0800

    ipvs: properly dereference pe in ip_vs_add_service
    
    [ Upstream commit cbd070a4ae62f119058973f6d2c984e325bce6e7 ]
    
    Use pe directly to resolve sparse warning:
    
      net/netfilter/ipvs/ip_vs_ctl.c:1471:27: warning: dereference of noderef expression
    
    Fixes: 39b972231536 ("ipvs: handle connections started by real-servers")
    Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
    Acked-by: Julian Anastasov <ja@ssi.bg>
    Acked-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
    [ Resolve minor conflicts to fix CVE-2024-42322 ]
    Signed-off-by: Bin Lan <bin.lan.cn@windriver.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ksmbd: fix potencial out-of-bounds when buffer offset is invalid [+ + +]

Author: Namjae Jeon <linkinjeon@kernel.org>
Date:   Tue Mar 19 08:40:48 2024 +0900

    ksmbd: fix potencial out-of-bounds when buffer offset is invalid
    
    commit c6cd2e8d2d9aa7ee35b1fa6a668e32a22a9753da upstream.
    
    I found potencial out-of-bounds when buffer offset fields of a few requests
    is invalid. This patch set the minimum value of buffer offset field to
    ->Buffer offset to validate buffer length.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>
    Signed-off-by: Vamsi Krishna Brahmajosyula <vamsi-krishna.brahmajosyula@broadcom.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16() [+ + +]

Author: Namjae Jeon <linkinjeon@kernel.org>
Date:   Sat Mar 16 23:36:36 2024 +0900

    ksmbd: fix slab-out-of-bounds in smb_strndup_from_utf16()
    
    commit a80a486d72e20bd12c335bcd38b6e6f19356b0aa upstream.
    
    If ->NameOffset of smb2_create_req is smaller than Buffer offset of
    smb2_create_req, slab-out-of-bounds read can happen from smb2_open.
    This patch set the minimum value of the name offset to the buffer offset
    to validate name length of smb2_create_req().
    
    Cc: stable@vger.kernel.org
    Reported-by: Xuanzhe Yu <yuxuanzhe@outlook.com>
    Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
    Signed-off-by: Steve French <stfrench@microsoft.com>
    Stable-dep-of: c6cd2e8d2d9a ("ksmbd: fix potencial out-of-bounds when buffer offset is invalid")
    Signed-off-by: Vamsi Krishna Brahmajosyula <vamsi-krishna.brahmajosyula@broadcom.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: nVMX: Treat vpid01 as current if L2 is active, but with VPID disabled [+ + +]

Author: Sean Christopherson <seanjc@google.com>
Date:   Thu Oct 31 13:20:11 2024 -0700

    KVM: nVMX: Treat vpid01 as current if L2 is active, but with VPID disabled
    
    commit 2657b82a78f18528bef56dc1b017158490970873 upstream.
    
    When getting the current VPID, e.g. to emulate a guest TLB flush, return
    vpid01 if L2 is running but with VPID disabled, i.e. if VPID is disabled
    in vmcs12.  Architecturally, if VPID is disabled, then the guest and host
    effectively share VPID=0.  KVM emulates this behavior by using vpid01 when
    running an L2 with VPID disabled (see prepare_vmcs02_early_rare()), and so
    KVM must also treat vpid01 as the current VPID while L2 is active.
    
    Unconditionally treating vpid02 as the current VPID when L2 is active
    causes KVM to flush TLB entries for vpid02 instead of vpid01, which
    results in TLB entries from L1 being incorrectly preserved across nested
    VM-Enter to L2 (L2=>L1 isn't problematic, because the TLB flush after
    nested VM-Exit flushes vpid01).
    
    The bug manifests as failures in the vmx_apicv_test KVM-Unit-Test, as KVM
    incorrectly retains TLB entries for the APIC-access page across a nested
    VM-Enter.
    
    Opportunisticaly add comments at various touchpoints to explain the
    architectural requirements, and also why KVM uses vpid01 instead of vpid02.
    
    All credit goes to Chao, who root caused the issue and identified the fix.
    
    Link: https://lore.kernel.org/all/ZwzczkIlYGX+QXJz@intel.com
    Fixes: 2b4a5a5d5688 ("KVM: nVMX: Flush current VPID (L1 vs. L2) for KVM_REQ_TLB_FLUSH_GUEST")
    Cc: stable@vger.kernel.org
    Cc: Like Xu <like.xu.linux@gmail.com>
    Debugged-by: Chao Gao <chao.gao@intel.com>
    Reviewed-by: Chao Gao <chao.gao@intel.com>
    Tested-by: Chao Gao <chao.gao@intel.com>
    Link: https://lore.kernel.org/r/20241031202011.1580522-1-seanjc@google.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: VMX: Bury Intel PT virtualization (guest/host mode) behind CONFIG_BROKEN [+ + +]

Author: Sean Christopherson <seanjc@google.com>
Date:   Fri Nov 1 11:50:30 2024 -0700

    KVM: VMX: Bury Intel PT virtualization (guest/host mode) behind CONFIG_BROKEN
    
    commit aa0d42cacf093a6fcca872edc954f6f812926a17 upstream.
    
    Hide KVM's pt_mode module param behind CONFIG_BROKEN, i.e. disable support
    for virtualizing Intel PT via guest/host mode unless BROKEN=y.  There are
    myriad bugs in the implementation, some of which are fatal to the guest,
    and others which put the stability and health of the host at risk.
    
    For guest fatalities, the most glaring issue is that KVM fails to ensure
    tracing is disabled, and *stays* disabled prior to VM-Enter, which is
    necessary as hardware disallows loading (the guest's) RTIT_CTL if tracing
    is enabled (enforced via a VMX consistency check).  Per the SDM:
    
      If the logical processor is operating with Intel PT enabled (if
      IA32_RTIT_CTL.TraceEn = 1) at the time of VM entry, the "load
      IA32_RTIT_CTL" VM-entry control must be 0.
    
    On the host side, KVM doesn't validate the guest CPUID configuration
    provided by userspace, and even worse, uses the guest configuration to
    decide what MSRs to save/load at VM-Enter and VM-Exit.  E.g. configuring
    guest CPUID to enumerate more address ranges than are supported in hardware
    will result in KVM trying to passthrough, save, and load non-existent MSRs,
    which generates a variety of WARNs, ToPA ERRORs in the host, a potential
    deadlock, etc.
    
    Fixes: f99e3daf94ff ("KVM: x86: Add Intel PT virtualization work mode")
    Cc: stable@vger.kernel.org
    Cc: Adrian Hunter <adrian.hunter@intel.com>
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
    Tested-by: Adrian Hunter <adrian.hunter@intel.com>
    Message-ID: <20241101185031.1799556-2-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: x86: Unconditionally set irr_pending when updating APICv state [+ + +]

Author: Sean Christopherson <seanjc@google.com>
Date:   Tue Nov 5 17:51:35 2024 -0800

    KVM: x86: Unconditionally set irr_pending when updating APICv state
    
    commit d3ddef46f22e8c3124e0df1f325bc6a18dadff39 upstream.
    
    Always set irr_pending (to true) when updating APICv status to fix a bug
    where KVM fails to set irr_pending when userspace sets APIC state and
    APICv is disabled, which ultimate results in KVM failing to inject the
    pending interrupt(s) that userspace stuffed into the vIRR, until another
    interrupt happens to be emulated by KVM.
    
    Only the APICv-disabled case is flawed, as KVM forces apic->irr_pending to
    be true if APICv is enabled, because not all vIRR updates will be visible
    to KVM.
    
    Hit the bug with a big hammer, even though strictly speaking KVM can scan
    the vIRR and set/clear irr_pending as appropriate for this specific case.
    The bug was introduced by commit 755c2bf87860 ("KVM: x86: lapic: don't
    touch irr_pending in kvm_apic_update_apicv when inhibiting it"), which as
    the shortlog suggests, deleted code that updated irr_pending.
    
    Before that commit, kvm_apic_update_apicv() did indeed scan the vIRR, with
    with the crucial difference that kvm_apic_update_apicv() did the scan even
    when APICv was being *disabled*, e.g. due to an AVIC inhibition.
    
            struct kvm_lapic *apic = vcpu->arch.apic;
    
            if (vcpu->arch.apicv_active) {
                    /* irr_pending is always true when apicv is activated. */
                    apic->irr_pending = true;
                    apic->isr_count = 1;
            } else {
                    apic->irr_pending = (apic_search_irr(apic) != -1);
                    apic->isr_count = count_vectors(apic->regs + APIC_ISR);
            }
    
    And _that_ bug (clearing irr_pending) was introduced by commit b26a695a1d78
    ("kvm: lapic: Introduce APICv update helper function"), prior to which KVM
    unconditionally set irr_pending to true in kvm_apic_set_state(), i.e.
    assumed that the new virtual APIC state could have a pending IRQ.
    
    Furthermore, in addition to introducing this issue, commit 755c2bf87860
    also papered over the underlying bug: KVM doesn't ensure CPUs and devices
    see APICv as disabled prior to searching the IRR.  Waiting until KVM
    emulates an EOI to update irr_pending "works", but only because KVM won't
    emulate EOI until after refresh_apicv_exec_ctrl(), and there are plenty of
    memory barriers in between.  I.e. leaving irr_pending set is basically
    hacking around bad ordering.
    
    So, effectively revert to the pre-b26a695a1d78 behavior for state restore,
    even though it's sub-optimal if no IRQs are pending, in order to provide a
    minimal fix, but leave behind a FIXME to document the ugliness.  With luck,
    the ordering issue will be fixed and the mess will be cleaned up in the
    not-too-distant future.
    
    Fixes: 755c2bf87860 ("KVM: x86: lapic: don't touch irr_pending in kvm_apic_update_apicv when inhibiting it")
    Cc: stable@vger.kernel.org
    Cc: Maxim Levitsky <mlevitsk@redhat.com>
    Reported-by: Yong He <zhuangel570@gmail.com>
    Closes: https://lkml.kernel.org/r/20241023124527.1092810-1-alexyonghe%40tencent.com
    Signed-off-by: Sean Christopherson <seanjc@google.com>
    Message-ID: <20241106015135.2462147-1-seanjc@google.com>
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

lib/buildid: Fix build ID parsing logic [+ + +]

Author: Jiri Olsa <jolsa@kernel.org>
Date:   Mon Nov 4 18:52:54 2024 +0100

    lib/buildid: Fix build ID parsing logic
    
    The parse_build_id_buf does not account Elf32_Nhdr header size
    when getting the build id data pointer and returns wrong build
    id data as result.
    
    This is problem only for stable trees that merged 84887f4c1c3a
    fix, the upstream build id code was refactored and returns proper
    build id.
    
    Acked-by: Andrii Nakryiko <andrii@kernel.org>
    Fixes: 84887f4c1c3a ("lib/buildid: harden build ID parsing logic")
    Signed-off-by: Jiri Olsa <jolsa@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Linux: Linux 6.1.119 [+ + +]

Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Fri Nov 22 15:37:35 2024 +0100

    Linux 6.1.119
    
    Link: https://lore.kernel.org/r/20241120125809.623237564@linuxfoundation.org
    Tested-by: Mark Brown <broonie@kernel.org>
    Tested-by: SeongJae Park <sj@kernel.org>
    Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Tested-by: Shuah Khan <skhan@linuxfoundation.org>
    Tested-by: Ron Economos <re@w6rz.net>
    Tested-by: Pavel Machek (CIP) <pavel@denx.de>
    Tested-by: Salvatore Bonaccorso <carnil@debian.org>
    Tested-by: Hardik Garg hargar@linux.microsoft.com=0A=
    Tested-by: Jon Hunter <jonathanh@nvidia.com>
    Tested-by: kernelci.org bot <bot@kernelci.org>
    Tested-by: Yann Sionneau <ysionneau@kalrayinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

media: dvbdev: fix the logic when DVB_DYNAMIC_MINORS is not set [+ + +]

Author: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Date:   Wed Nov 6 21:50:55 2024 +0100

    media: dvbdev: fix the logic when DVB_DYNAMIC_MINORS is not set
    
    commit a4aebaf6e6efff548b01a3dc49b4b9074751c15b upstream.
    
    When CONFIG_DVB_DYNAMIC_MINORS, ret is not initialized, and a
    semaphore is left at the wrong state, in case of errors.
    
    Make the code simpler and avoid mistakes by having just one error
    check logic used weather DVB_DYNAMIC_MINORS is used or not.
    
    Reported-by: kernel test robot <lkp@intel.com>
    Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
    Closes: https://lore.kernel.org/r/202410201717.ULWWdJv8-lkp@intel.com/
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    Link: https://lore.kernel.org/r/9e067488d8935b8cf00959764a1fa5de85d65725.1730926254.git.mchehab+huawei@kernel.org
    Cc: Nathan Chancellor <nathan@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: avoid unsafe VMA hook invocation when error arises on mmap hook [+ + +]

Author: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date:   Mon Nov 18 16:17:25 2024 +0000

    mm: avoid unsafe VMA hook invocation when error arises on mmap hook
    
    [ Upstream commit 3dd6ed34ce1f2356a77fb88edafb5ec96784e3cf ]
    
    Patch series "fix error handling in mmap_region() and refactor
    (hotfixes)", v4.
    
    mmap_region() is somewhat terrifying, with spaghetti-like control flow and
    numerous means by which issues can arise and incomplete state, memory
    leaks and other unpleasantness can occur.
    
    A large amount of the complexity arises from trying to handle errors late
    in the process of mapping a VMA, which forms the basis of recently
    observed issues with resource leaks and observable inconsistent state.
    
    This series goes to great lengths to simplify how mmap_region() works and
    to avoid unwinding errors late on in the process of setting up the VMA for
    the new mapping, and equally avoids such operations occurring while the
    VMA is in an inconsistent state.
    
    The patches in this series comprise the minimal changes required to
    resolve existing issues in mmap_region() error handling, in order that
    they can be hotfixed and backported.  There is additionally a follow up
    series which goes further, separated out from the v1 series and sent and
    updated separately.
    
    This patch (of 5):
    
    After an attempted mmap() fails, we are no longer in a situation where we
    can safely interact with VMA hooks.  This is currently not enforced,
    meaning that we need complicated handling to ensure we do not incorrectly
    call these hooks.
    
    We can avoid the whole issue by treating the VMA as suspect the moment
    that the file->f_ops->mmap() function reports an error by replacing
    whatever VMA operations were installed with a dummy empty set of VMA
    operations.
    
    We do so through a new helper function internal to mm - mmap_file() -
    which is both more logically named than the existing call_mmap() function
    and correctly isolates handling of the vm_op reassignment to mm.
    
    All the existing invocations of call_mmap() outside of mm are ultimately
    nested within the call_mmap() from mm, which we now replace.
    
    It is therefore safe to leave call_mmap() in place as a convenience
        function (and to avoid churn).  The invokers are:
    
         ovl_file_operations -> mmap -> ovl_mmap() -> backing_file_mmap()
        coda_file_operations -> mmap -> coda_file_mmap()
         shm_file_operations -> shm_mmap()
    shm_file_operations_huge -> shm_mmap()
                dma_buf_fops -> dma_buf_mmap_internal -> i915_dmabuf_ops
                                -> i915_gem_dmabuf_mmap()
    
    None of these callers interact with vm_ops or mappings in a problematic
    way on error, quickly exiting out.
    
    Link: https://lkml.kernel.org/r/cover.1730224667.git.lorenzo.stoakes@oracle.com
    Link: https://lkml.kernel.org/r/d41fd763496fd0048a962f3fd9407dc72dd4fd86.1730224667.git.lorenzo.stoakes@oracle.com
    Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
    Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
    Reported-by: Jann Horn <jannh@google.com>
    Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Reviewed-by: Jann Horn <jannh@google.com>
    Cc: Andreas Larsson <andreas@gaisler.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Helge Deller <deller@gmx.de>
    Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mark Brown <broonie@kernel.org>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: fix NULL pointer dereference in alloc_pages_bulk_noprof [+ + +]

Author: Jinjiang Tu <tujinjiang@huawei.com>
Date:   Wed Nov 13 16:32:35 2024 +0800

    mm: fix NULL pointer dereference in alloc_pages_bulk_noprof
    
    commit 8ce41b0f9d77cca074df25afd39b86e2ee3aa68e upstream.
    
    We triggered a NULL pointer dereference for ac.preferred_zoneref->zone in
    alloc_pages_bulk_noprof() when the task is migrated between cpusets.
    
    When cpuset is enabled, in prepare_alloc_pages(), ac->nodemask may be
    ¤t->mems_allowed.  when first_zones_zonelist() is called to find
    preferred_zoneref, the ac->nodemask may be modified concurrently if the
    task is migrated between different cpusets.  Assuming we have 2 NUMA Node,
    when traversing Node1 in ac->zonelist, the nodemask is 2, and when
    traversing Node2 in ac->zonelist, the nodemask is 1.  As a result, the
    ac->preferred_zoneref points to NULL zone.
    
    In alloc_pages_bulk_noprof(), for_each_zone_zonelist_nodemask() finds a
    allowable zone and calls zonelist_node_idx(ac.preferred_zoneref), leading
    to NULL pointer dereference.
    
    __alloc_pages_noprof() fixes this issue by checking NULL pointer in commit
    ea57485af8f4 ("mm, page_alloc: fix check for NULL preferred_zone") and
    commit df76cee6bbeb ("mm, page_alloc: remove redundant checks from alloc
    fastpath").
    
    To fix it, check NULL pointer for preferred_zoneref->zone.
    
    Link: https://lkml.kernel.org/r/20241113083235.166798-1-tujinjiang@huawei.com
    Fixes: 387ba26fb1cb ("mm/page_alloc: add a bulk page allocator")
    Signed-off-by: Jinjiang Tu <tujinjiang@huawei.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Alexander Lobakin <alobakin@pm.me>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Nanyong Sun <sunnanyong@huawei.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling [+ + +]

Author: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date:   Mon Nov 18 16:17:27 2024 +0000

    mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
    
    [ Upstream commit 5baf8b037debf4ec60108ccfeccb8636d1dbad81 ]
    
    Currently MTE is permitted in two circumstances (desiring to use MTE
    having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is
    specified, as checked by arch_calc_vm_flag_bits() and actualised by
    setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is
    shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap
    hook is activated in mmap_region().
    
    The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also
    set is the arm64 implementation of arch_validate_flags().
    
    Unfortunately, we intend to refactor mmap_region() to perform this check
    earlier, meaning that in the case of a shmem backing we will not have
    invoked shmem_mmap() yet, causing the mapping to fail spuriously.
    
    It is inappropriate to set this architecture-specific flag in general mm
    code anyway, so a sensible resolution of this issue is to instead move the
    check somewhere else.
    
    We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via
    the arch_calc_vm_flag_bits() call.
    
    This is an appropriate place to do this as we already check for the
    MAP_ANONYMOUS case here, and the shmem file case is simply a variant of
    the same idea - we permit RAM-backed memory.
    
    This requires a modification to the arch_calc_vm_flag_bits() signature to
    pass in a pointer to the struct file associated with the mapping, however
    this is not too egregious as this is only used by two architectures anyway
    - arm64 and parisc.
    
    So this patch performs this adjustment and removes the unnecessary
    assignment of VM_MTE_ALLOWED in shmem_mmap().
    
    [akpm@linux-foundation.org: fix whitespace, per Catalin]
    Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.1730224667.git.lorenzo.stoakes@oracle.com
    Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
    Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
    Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
    Reported-by: Jann Horn <jannh@google.com>
    Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Andreas Larsson <andreas@gaisler.com>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Helge Deller <deller@gmx.de>
    Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
    Cc: Liam R. Howlett <Liam.Howlett@oracle.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mark Brown <broonie@kernel.org>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: resolve faulty mmap_region() error path behaviour [+ + +]

Author: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date:   Mon Nov 18 16:17:28 2024 +0000

    mm: resolve faulty mmap_region() error path behaviour
    
    [ Upstream commit 5de195060b2e251a835f622759550e6202167641 ]
    
    The mmap_region() function is somewhat terrifying, with spaghetti-like
    control flow and numerous means by which issues can arise and incomplete
    state, memory leaks and other unpleasantness can occur.
    
    A large amount of the complexity arises from trying to handle errors late
    in the process of mapping a VMA, which forms the basis of recently
    observed issues with resource leaks and observable inconsistent state.
    
    Taking advantage of previous patches in this series we move a number of
    checks earlier in the code, simplifying things by moving the core of the
    logic into a static internal function __mmap_region().
    
    Doing this allows us to perform a number of checks up front before we do
    any real work, and allows us to unwind the writable unmap check
    unconditionally as required and to perform a CONFIG_DEBUG_VM_MAPLE_TREE
    validation unconditionally also.
    
    We move a number of things here:
    
    1. We preallocate memory for the iterator before we call the file-backed
       memory hook, allowing us to exit early and avoid having to perform
       complicated and error-prone close/free logic. We carefully free
       iterator state on both success and error paths.
    
    2. The enclosing mmap_region() function handles the mapping_map_writable()
       logic early. Previously the logic had the mapping_map_writable() at the
       point of mapping a newly allocated file-backed VMA, and a matching
       mapping_unmap_writable() on success and error paths.
    
       We now do this unconditionally if this is a file-backed, shared writable
       mapping. If a driver changes the flags to eliminate VM_MAYWRITE, however
       doing so does not invalidate the seal check we just performed, and we in
       any case always decrement the counter in the wrapper.
    
       We perform a debug assert to ensure a driver does not attempt to do the
       opposite.
    
    3. We also move arch_validate_flags() up into the mmap_region()
       function. This is only relevant on arm64 and sparc64, and the check is
       only meaningful for SPARC with ADI enabled. We explicitly add a warning
       for this arch if a driver invalidates this check, though the code ought
       eventually to be fixed to eliminate the need for this.
    
    With all of these measures in place, we no longer need to explicitly close
    the VMA on error paths, as we place all checks which might fail prior to a
    call to any driver mmap hook.
    
    This eliminates an entire class of errors, makes the code easier to reason
    about and more robust.
    
    Link: https://lkml.kernel.org/r/6e0becb36d2f5472053ac5d544c0edfe9b899e25.1730224667.git.lorenzo.stoakes@oracle.com
    Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
    Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
    Reported-by: Jann Horn <jannh@google.com>
    Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Tested-by: Mark Brown <broonie@kernel.org>
    Cc: Andreas Larsson <andreas@gaisler.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Helge Deller <deller@gmx.de>
    Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: revert "mm: shmem: fix data-race in shmem_getattr()" [+ + +]

Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Fri Nov 15 16:57:24 2024 -0800

    mm: revert "mm: shmem: fix data-race in shmem_getattr()"
    
    commit d1aa0c04294e29883d65eac6c2f72fe95cc7c049 upstream.
    
    Revert d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()") as
    suggested by Chuck [1].  It is causing deadlocks when accessing tmpfs over
    NFS.
    
    As Hugh commented, "added just to silence a syzbot sanitizer splat: added
    where there has never been any practical problem".
    
    Link: https://lkml.kernel.org/r/ZzdxKF39VEmXSSyN@tissot.1015granger.net [1]
    Fixes: d949d1d14fa2 ("mm: shmem: fix data-race in shmem_getattr()")
    Acked-by: Hugh Dickins <hughd@google.com>
    Cc: Chuck Lever <chuck.lever@oracle.com>
    Cc: Jeongjun Park <aha310510@gmail.com>
    Cc: Yu Zhao <yuzhao@google.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mm: unconditionally close VMAs on error [+ + +]

Author: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Date:   Mon Nov 18 16:17:26 2024 +0000

    mm: unconditionally close VMAs on error
    
    [ Upstream commit 4080ef1579b2413435413988d14ac8c68e4d42c8 ]
    
    Incorrect invocation of VMA callbacks when the VMA is no longer in a
    consistent state is bug prone and risky to perform.
    
    With regards to the important vm_ops->close() callback We have gone to
    great lengths to try to track whether or not we ought to close VMAs.
    
    Rather than doing so and risking making a mistake somewhere, instead
    unconditionally close and reset vma->vm_ops to an empty dummy operations
    set with a NULL .close operator.
    
    We introduce a new function to do so - vma_close() - and simplify existing
    vms logic which tracked whether we needed to close or not.
    
    This simplifies the logic, avoids incorrect double-calling of the .close()
    callback and allows us to update error paths to simply call vma_close()
    unconditionally - making VMA closure idempotent.
    
    Link: https://lkml.kernel.org/r/28e89dda96f68c505cb6f8e9fc9b57c3e9f74b42.1730224667.git.lorenzo.stoakes@oracle.com
    Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails")
    Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
    Reported-by: Jann Horn <jannh@google.com>
    Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
    Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
    Reviewed-by: Jann Horn <jannh@google.com>
    Cc: Andreas Larsson <andreas@gaisler.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Helge Deller <deller@gmx.de>
    Cc: James E.J. Bottomley <James.Bottomley@HansenPartnership.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mark Brown <broonie@kernel.org>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mmc: sunxi-mmc: Fix A100 compatible description [+ + +]

Author: Andre Przywara <andre.przywara@arm.com>
Date:   Thu Nov 7 01:42:40 2024 +0000

    mmc: sunxi-mmc: Fix A100 compatible description
    
    commit 85b580afc2c215394e08974bf033de9face94955 upstream.
    
    It turns out that the Allwinner A100/A133 SoC only supports 8K DMA
    blocks (13 bits wide), for both the SD/SDIO and eMMC instances.
    And while this alone would make a trivial fix, the H616 falls back to
    the A100 compatible string, so we have to now match the H616 compatible
    string explicitly against the description advertising 64K DMA blocks.
    
    As the A100 is now compatible with the D1 description, let the A100
    compatible string point to that block instead, and introduce an explicit
    match against the H616 string, pointing to the old description.
    Also remove the redundant setting of clk_delays to NULL on the way.
    
    Fixes: 3536b82e5853 ("mmc: sunxi: add support for A100 mmc controller")
    Cc: stable@vger.kernel.org
    Signed-off-by: Andre Przywara <andre.przywara@arm.com>
    Tested-by: Parthiban Nallathambi <parthiban@linumiz.com>
    Reviewed-by: Chen-Yu Tsai <wens@csie.org>
    Message-ID: <20241107014240.24669-1-andre.przywara@arm.com>
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: add userspace_pm_lookup_addr_by_id helper [+ + +]

Author: Geliang Tang <geliang@kernel.org>
Date:   Tue Nov 19 09:35:51 2024 +0100

    mptcp: add userspace_pm_lookup_addr_by_id helper
    
    commit 06afe09091ee69dc7ab058b4be9917ae59cc81e5 upstream.
    
    Corresponding __lookup_addr_by_id() helper in the in-kernel netlink PM,
    this patch adds a new helper mptcp_userspace_pm_lookup_addr_by_id() to
    lookup the address entry with the given id on the userspace pm local
    address list.
    
    Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
    Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Reviewed-by: Mat Martineau <martineau@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: f642c5c4d528 ("mptcp: hold pm lock when deleting entry")
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: cope racing subflow creation in mptcp_rcv_space_adjust [+ + +]

Author: Paolo Abeni <pabeni@redhat.com>
Date:   Tue Nov 19 09:35:49 2024 +0100

    mptcp: cope racing subflow creation in mptcp_rcv_space_adjust
    
    commit ce7356ae35943cc6494cc692e62d51a734062b7d upstream.
    
    Additional active subflows - i.e. created by the in kernel path
    manager - are included into the subflow list before starting the
    3whs.
    
    A racing recvmsg() spooling data received on an already established
    subflow would unconditionally call tcp_cleanup_rbuf() on all the
    current subflows, potentially hitting a divide by zero error on
    the newly created ones.
    
    Explicitly check that the subflow is in a suitable state before
    invoking tcp_cleanup_rbuf().
    
    Fixes: c76c6956566f ("mptcp: call tcp_cleanup_rbuf on subflows")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/02374660836e1b52afc91966b7535c8c5f7bafb0.1731060874.git.pabeni@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    [ Conflicts in protocol.c, because commit f410cbea9f3d ("tcp: annotate
      data-races around tp->window_clamp") has not been backported to this
      version. The conflict is easy to resolve, because only the context is
      different, but not the line to modify. ]
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: define more local variables sk [+ + +]

Author: Geliang Tang <geliang@kernel.org>
Date:   Tue Nov 19 09:35:50 2024 +0100

    mptcp: define more local variables sk
    
    commit 14cb0e0bf39bd10429ba14e9e2f905f1144226fc upstream.
    
    '(struct sock *)msk' is used several times in mptcp_nl_cmd_announce(),
    mptcp_nl_cmd_remove() or mptcp_userspace_pm_set_flags() in pm_userspace.c,
    it's worth adding a local variable sk to point it.
    
    Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
    Signed-off-by: Geliang Tang <geliang.tang@suse.com>
    Signed-off-by: Mat Martineau <martineau@kernel.org>
    Link: https://lore.kernel.org/r/20231025-send-net-next-20231025-v1-8-db8f25f798eb@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: 06afe09091ee ("mptcp: add userspace_pm_lookup_addr_by_id helper")
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: drop lookup_by_id in lookup_addr [+ + +]

Author: Geliang Tang <geliang@kernel.org>
Date:   Tue Nov 19 09:35:54 2024 +0100

    mptcp: drop lookup_by_id in lookup_addr
    
    commit af250c27ea1c404e210fc3a308b20f772df584d6 upstream.
    
    When the lookup_by_id parameter of __lookup_addr() is true, it's the same
    as __lookup_addr_by_id(), it can be replaced by __lookup_addr_by_id()
    directly. So drop this parameter, let __lookup_addr() only looks up address
    on the local address list by comparing addresses in it, not address ids.
    
    Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
    Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://lore.kernel.org/r/20240305-upstream-net-next-20240304-mptcp-misc-cleanup-v1-4-c436ba5e569b@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: db3eab8110bc ("mptcp: pm: use _rcu variant under rcu_read_lock")
    [ Conflicts in pm_netlink.c, because commit 6a42477fe449 ("mptcp: update
      set_flags interfaces") is not in this version, and causes too many
      conflicts when backporting it. The conflict is easy to resolve: addr
      is a pointer here here in mptcp_pm_nl_set_flags(), the rest of the
      code is the same. ]
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: error out earlier on disconnect [+ + +]

Author: Paolo Abeni <pabeni@redhat.com>
Date:   Fri Nov 8 11:58:16 2024 +0100

    mptcp: error out earlier on disconnect
    
    [ Upstream commit 581302298524e9d77c4c44ff5156a6cd112227ae ]
    
    Eric reported a division by zero splat in the MPTCP protocol:
    
    Oops: divide error: 0000 [#1] PREEMPT SMP KASAN PTI
    CPU: 1 UID: 0 PID: 6094 Comm: syz-executor317 Not tainted
    6.12.0-rc5-syzkaller-00291-g05b92660cdfe #0
    Hardware name: Google Google Compute Engine/Google Compute Engine,
    BIOS Google 09/13/2024
    RIP: 0010:__tcp_select_window+0x5b4/0x1310 net/ipv4/tcp_output.c:3163
    Code: f6 44 01 e3 89 df e8 9b 75 09 f8 44 39 f3 0f 8d 11 ff ff ff e8
    0d 74 09 f8 45 89 f4 e9 04 ff ff ff e8 00 74 09 f8 44 89 f0 99 <f7> 7c
    24 14 41 29 d6 45 89 f4 e9 ec fe ff ff e8 e8 73 09 f8 48 89
    RSP: 0018:ffffc900041f7930 EFLAGS: 00010293
    RAX: 0000000000017e67 RBX: 0000000000017e67 RCX: ffffffff8983314b
    RDX: 0000000000000000 RSI: ffffffff898331b0 RDI: 0000000000000004
    RBP: 00000000005d6000 R08: 0000000000000004 R09: 0000000000017e67
    R10: 0000000000003e80 R11: 0000000000000000 R12: 0000000000003e80
    R13: ffff888031d9b440 R14: 0000000000017e67 R15: 00000000002eb000
    FS: 00007feb5d7f16c0(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007feb5d8adbb8 CR3: 0000000074e4c000 CR4: 00000000003526f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
    <TASK>
    __tcp_cleanup_rbuf+0x3e7/0x4b0 net/ipv4/tcp.c:1493
    mptcp_rcv_space_adjust net/mptcp/protocol.c:2085 [inline]
    mptcp_recvmsg+0x2156/0x2600 net/mptcp/protocol.c:2289
    inet_recvmsg+0x469/0x6a0 net/ipv4/af_inet.c:885
    sock_recvmsg_nosec net/socket.c:1051 [inline]
    sock_recvmsg+0x1b2/0x250 net/socket.c:1073
    __sys_recvfrom+0x1a5/0x2e0 net/socket.c:2265
    __do_sys_recvfrom net/socket.c:2283 [inline]
    __se_sys_recvfrom net/socket.c:2279 [inline]
    __x64_sys_recvfrom+0xe0/0x1c0 net/socket.c:2279
    do_syscall_x64 arch/x86/entry/common.c:52 [inline]
    do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
    entry_SYSCALL_64_after_hwframe+0x77/0x7f
    RIP: 0033:0x7feb5d857559
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 51 18 00 00 90 48 89 f8 48
    89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
    01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007feb5d7f1208 EFLAGS: 00000246 ORIG_RAX: 000000000000002d
    RAX: ffffffffffffffda RBX: 00007feb5d8e1318 RCX: 00007feb5d857559
    RDX: 000000800000000e RSI: 0000000000000000 RDI: 0000000000000003
    RBP: 00007feb5d8e1310 R08: 0000000000000000 R09: ffffffff81000000
    R10: 0000000000000100 R11: 0000000000000246 R12: 00007feb5d8e131c
    R13: 00007feb5d8ae074 R14: 000000800000000e R15: 00000000fffffdef
    
    and provided a nice reproducer.
    
    The root cause is the current bad handling of racing disconnect.
    After the blamed commit below, sk_wait_data() can return (with
    error) with the underlying socket disconnected and a zero rcv_mss.
    
    Catch the error and return without performing any additional
    operations on the current socket.
    
    Reported-by: Eric Dumazet <edumazet@google.com>
    Fixes: 419ce133ab92 ("tcp: allow again tcp_disconnect() when threads are waiting")
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/8c82ecf71662ecbc47bf390f9905de70884c9f2d.1731060874.git.pabeni@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

mptcp: hold pm lock when deleting entry [+ + +]

Author: Geliang Tang <geliang@kernel.org>
Date:   Tue Nov 19 09:35:53 2024 +0100

    mptcp: hold pm lock when deleting entry
    
    commit f642c5c4d528d11bd78b6c6f84f541cd3c0bea86 upstream.
    
    When traversing userspace_pm_local_addr_list and deleting an entry from
    it in mptcp_pm_nl_remove_doit(), msk->pm.lock should be held.
    
    This patch holds this lock before mptcp_userspace_pm_lookup_addr_by_id()
    and releases it after list_move() in mptcp_pm_nl_remove_doit().
    
    Fixes: d9a4594edabf ("mptcp: netlink: Add MPTCP_PM_CMD_REMOVE")
    Cc: stable@vger.kernel.org
    Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
    Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20241112-net-mptcp-misc-6-12-pm-v1-2-b835580cefa8@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: pm: use _rcu variant under rcu_read_lock [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Tue Nov 19 09:35:55 2024 +0100

    mptcp: pm: use _rcu variant under rcu_read_lock
    
    commit db3eab8110bc0520416101b6a5b52f44a43fb4cf upstream.
    
    In mptcp_pm_create_subflow_or_signal_addr(), rcu_read_(un)lock() are
    used as expected to iterate over the list of local addresses, but
    list_for_each_entry() was used instead of list_for_each_entry_rcu() in
    __lookup_addr(). It is important to use this variant which adds the
    required READ_ONCE() (and diagnostic checks if enabled).
    
    Because __lookup_addr() is also used in mptcp_pm_nl_set_flags() where it
    is called under the pernet->lock and not rcu_read_lock(), an extra
    condition is then passed to help the diagnostic checks making sure
    either the associated spin lock or the RCU lock is held.
    
    Fixes: 86e39e04482b ("mptcp: keep track of local endpoint still available for each msk")
    Cc: stable@vger.kernel.org
    Reviewed-by: Geliang Tang <geliang@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20241112-net-mptcp-misc-6-12-pm-v1-3-b835580cefa8@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: update local address flags when setting it [+ + +]

Author: Geliang Tang <geliang@kernel.org>
Date:   Tue Nov 19 09:35:52 2024 +0100

    mptcp: update local address flags when setting it
    
    commit e0266319413d5d687ba7b6df7ca99e4b9724a4f2 upstream.
    
    Just like in-kernel pm, when userspace pm does set_flags, it needs to send
    out MP_PRIO signal, and also modify the flags of the corresponding address
    entry in the local address list. This patch implements the missing logic.
    
    Traverse all address entries on userspace_pm_local_addr_list to find the
    local address entry, if bkup is true, set the flags of this entry with
    FLAG_BACKUP, otherwise, clear FLAG_BACKUP.
    
    Fixes: 892f396c8e68 ("mptcp: netlink: issue MP_PRIO signals from userspace PMs")
    Cc: stable@vger.kernel.org
    Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
    Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20241112-net-mptcp-misc-6-12-pm-v1-1-b835580cefa8@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    [ Conflicts in pm_userspace.c, because commit 6a42477fe449 ("mptcp:
      update set_flags interfaces"), is not in this version, and causes too
      many conflicts when backporting it. The same code can still be added
      at the same place, before sending the ACK. ]
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/mlx5: fs, lock FTE when checking if active [+ + +]

Author: Mark Bloch <mbloch@nvidia.com>
Date:   Thu Nov 7 20:35:23 2024 +0200

    net/mlx5: fs, lock FTE when checking if active
    
    [ Upstream commit 9ca314419930f9135727e39d77e66262d5f7bef6 ]
    
    The referenced commits introduced a two-step process for deleting FTEs:
    
    - Lock the FTE, delete it from hardware, set the hardware deletion function
      to NULL and unlock the FTE.
    - Lock the parent flow group, delete the software copy of the FTE, and
      remove it from the xarray.
    
    However, this approach encounters a race condition if a rule with the same
    match value is added simultaneously. In this scenario, fs_core may set the
    hardware deletion function to NULL prematurely, causing a panic during
    subsequent rule deletions.
    
    To prevent this, ensure the active flag of the FTE is checked under a lock,
    which will prevent the fs_core layer from attaching a new steering rule to
    an FTE that is in the process of deletion.
    
    [  438.967589] MOSHE: 2496 mlx5_del_flow_rules del_hw_func
    [  438.968205] ------------[ cut here ]------------
    [  438.968654] refcount_t: decrement hit 0; leaking memory.
    [  438.969249] WARNING: CPU: 0 PID: 8957 at lib/refcount.c:31 refcount_warn_saturate+0xfb/0x110
    [  438.970054] Modules linked in: act_mirred cls_flower act_gact sch_ingress openvswitch nsh mlx5_vdpa vringh vhost_iotlb vdpa mlx5_ib mlx5_core xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm ib_uverbs ib_core zram zsmalloc fuse [last unloaded: cls_flower]
    [  438.973288] CPU: 0 UID: 0 PID: 8957 Comm: tc Not tainted 6.12.0-rc1+ #8
    [  438.973888] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
    [  438.974874] RIP: 0010:refcount_warn_saturate+0xfb/0x110
    [  438.975363] Code: 40 66 3b 82 c6 05 16 e9 4d 01 01 e8 1f 7c a0 ff 0f 0b c3 cc cc cc cc 48 c7 c7 10 66 3b 82 c6 05 fd e8 4d 01 01 e8 05 7c a0 ff <0f> 0b c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 90
    [  438.976947] RSP: 0018:ffff888124a53610 EFLAGS: 00010286
    [  438.977446] RAX: 0000000000000000 RBX: ffff888119d56de0 RCX: 0000000000000000
    [  438.978090] RDX: ffff88852c828700 RSI: ffff88852c81b3c0 RDI: ffff88852c81b3c0
    [  438.978721] RBP: ffff888120fa0e88 R08: 0000000000000000 R09: ffff888124a534b0
    [  438.979353] R10: 0000000000000001 R11: 0000000000000001 R12: ffff888119d56de0
    [  438.979979] R13: ffff888120fa0ec0 R14: ffff888120fa0ee8 R15: ffff888119d56de0
    [  438.980607] FS:  00007fe6dcc0f800(0000) GS:ffff88852c800000(0000) knlGS:0000000000000000
    [  438.983984] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  438.984544] CR2: 00000000004275e0 CR3: 0000000186982001 CR4: 0000000000372eb0
    [  438.985205] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [  438.985842] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [  438.986507] Call Trace:
    [  438.986799]  <TASK>
    [  438.987070]  ? __warn+0x7d/0x110
    [  438.987426]  ? refcount_warn_saturate+0xfb/0x110
    [  438.987877]  ? report_bug+0x17d/0x190
    [  438.988261]  ? prb_read_valid+0x17/0x20
    [  438.988659]  ? handle_bug+0x53/0x90
    [  438.989054]  ? exc_invalid_op+0x14/0x70
    [  438.989458]  ? asm_exc_invalid_op+0x16/0x20
    [  438.989883]  ? refcount_warn_saturate+0xfb/0x110
    [  438.990348]  mlx5_del_flow_rules+0x2f7/0x340 [mlx5_core]
    [  438.990932]  __mlx5_eswitch_del_rule+0x49/0x170 [mlx5_core]
    [  438.991519]  ? mlx5_lag_is_sriov+0x3c/0x50 [mlx5_core]
    [  438.992054]  ? xas_load+0x9/0xb0
    [  438.992407]  mlx5e_tc_rule_unoffload+0x45/0xe0 [mlx5_core]
    [  438.993037]  mlx5e_tc_del_fdb_flow+0x2a6/0x2e0 [mlx5_core]
    [  438.993623]  mlx5e_flow_put+0x29/0x60 [mlx5_core]
    [  438.994161]  mlx5e_delete_flower+0x261/0x390 [mlx5_core]
    [  438.994728]  tc_setup_cb_destroy+0xb9/0x190
    [  438.995150]  fl_hw_destroy_filter+0x94/0xc0 [cls_flower]
    [  438.995650]  fl_change+0x11a4/0x13c0 [cls_flower]
    [  438.996105]  tc_new_tfilter+0x347/0xbc0
    [  438.996503]  ? ___slab_alloc+0x70/0x8c0
    [  438.996929]  rtnetlink_rcv_msg+0xf9/0x3e0
    [  438.997339]  ? __netlink_sendskb+0x4c/0x70
    [  438.997751]  ? netlink_unicast+0x286/0x2d0
    [  438.998171]  ? __pfx_rtnetlink_rcv_msg+0x10/0x10
    [  438.998625]  netlink_rcv_skb+0x54/0x100
    [  438.999020]  netlink_unicast+0x203/0x2d0
    [  438.999421]  netlink_sendmsg+0x1e4/0x420
    [  438.999820]  __sock_sendmsg+0xa1/0xb0
    [  439.000203]  ____sys_sendmsg+0x207/0x2a0
    [  439.000600]  ? copy_msghdr_from_user+0x6d/0xa0
    [  439.001072]  ___sys_sendmsg+0x80/0xc0
    [  439.001459]  ? ___sys_recvmsg+0x8b/0xc0
    [  439.001848]  ? generic_update_time+0x4d/0x60
    [  439.002282]  __sys_sendmsg+0x51/0x90
    [  439.002658]  do_syscall_64+0x50/0x110
    [  439.003040]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
    
    Fixes: 718ce4d601db ("net/mlx5: Consolidate update FTE for all removal changes")
    Fixes: cefc23554fc2 ("net/mlx5: Fix FTE cleanup")
    Signed-off-by: Mark Bloch <mbloch@nvidia.com>
    Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://patch.msgid.link/20241107183527.676877-4-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: CT: Fix null-ptr-deref in add rule err flow [+ + +]

Author: Moshe Shemesh <moshe@nvidia.com>
Date:   Thu Nov 7 20:35:26 2024 +0200

    net/mlx5e: CT: Fix null-ptr-deref in add rule err flow
    
    [ Upstream commit e99c6873229fe0482e7ceb7d5600e32d623ed9d9 ]
    
    In error flow of mlx5_tc_ct_entry_add_rule(), in case ct_rule_add()
    callback returns error, zone_rule->attr is used uninitiated. Fix it to
    use attr which has the needed pointer value.
    
    Kernel log:
     BUG: kernel NULL pointer dereference, address: 0000000000000110
     RIP: 0010:mlx5_tc_ct_entry_add_rule+0x2b1/0x2f0 [mlx5_core]
    …
     Call Trace:
      <TASK>
      ? __die+0x20/0x70
      ? page_fault_oops+0x150/0x3e0
      ? exc_page_fault+0x74/0x140
      ? asm_exc_page_fault+0x22/0x30
      ? mlx5_tc_ct_entry_add_rule+0x2b1/0x2f0 [mlx5_core]
      ? mlx5_tc_ct_entry_add_rule+0x1d5/0x2f0 [mlx5_core]
      mlx5_tc_ct_block_flow_offload+0xc6a/0xf90 [mlx5_core]
      ? nf_flow_offload_tuple+0xd8/0x190 [nf_flow_table]
      nf_flow_offload_tuple+0xd8/0x190 [nf_flow_table]
      flow_offload_work_handler+0x142/0x320 [nf_flow_table]
      ? finish_task_switch.isra.0+0x15b/0x2b0
      process_one_work+0x16c/0x320
      worker_thread+0x28c/0x3a0
      ? __pfx_worker_thread+0x10/0x10
      kthread+0xb8/0xf0
      ? __pfx_kthread+0x10/0x10
      ret_from_fork+0x2d/0x50
      ? __pfx_kthread+0x10/0x10
      ret_from_fork_asm+0x1a/0x30
      </TASK>
    
    Fixes: 7fac5c2eced3 ("net/mlx5: CT: Avoid reusing modify header context for natted entries")
    Signed-off-by: Moshe Shemesh <moshe@nvidia.com>
    Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
    Reviewed-by: Yevgeny Kliteynik <kliteyn@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://patch.msgid.link/20241107183527.676877-7-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5e: kTLS, Fix incorrect page refcounting [+ + +]

Author: Dragos Tatulea <dtatulea@nvidia.com>
Date:   Thu Nov 7 20:35:24 2024 +0200

    net/mlx5e: kTLS, Fix incorrect page refcounting
    
    [ Upstream commit dd6e972cc5890d91d6749bb48e3912721c4e4b25 ]
    
    The kTLS tx handling code is using a mix of get_page() and
    page_ref_inc() APIs to increment the page reference. But on the release
    path (mlx5e_ktls_tx_handle_resync_dump_comp()), only put_page() is used.
    
    This is an issue when using pages from large folios: the get_page()
    references are stored on the folio page while the page_ref_inc()
    references are stored directly in the given page. On release the folio
    page will be dereferenced too many times.
    
    This was found while doing kTLS testing with sendfile() + ZC when the
    served file was read from NFS on a kernel with NFS large folios support
    (commit 49b29a573da8 ("nfs: add support for large folios")).
    
    Fixes: 84d1bb2b139e ("net/mlx5e: kTLS, Limit DUMP wqe size")
    Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Link: https://patch.msgid.link/20241107183527.676877-5-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/sched: cls_u32: replace int refcounts with proper refcounts [+ + +]

Author: Pedro Tammela <pctammela@mojatatu.com>
Date:   Tue Nov 14 11:18:55 2023 -0300

    net/sched: cls_u32: replace int refcounts with proper refcounts
    
    [ Upstream commit 6b78debe1c07e6aa3c91ca0b1384bf3cb8217c50 ]
    
    Proper refcounts will always warn splat when something goes wrong,
    be it underflow, saturation or object resurrection. As these are always
    a source of bugs, use it in cls_u32 as a safeguard to prevent/catch issues.
    Another benefit is that the refcount API self documents the code, making
    clear when transitions to dead are expected.
    
    For such an update we had to make minor adaptations on u32 to fit the refcount
    API. First we set explicitly to '1' when objects are created, then the
    objects are alive until a 1 -> 0 happens, which is then released appropriately.
    
    The above made clear some redundant operations in the u32 code
    around the root_ht handling that were removed. The root_ht is created
    with a refcnt set to 1. Then when it's associated with tcf_proto it increments the refcnt to 2.
    Throughout the entire code the root_ht is an exceptional case and can never be referenced,
    therefore the refcnt never incremented/decremented.
    Its lifetime is always bound to tcf_proto, meaning if you delete tcf_proto
    the root_ht is deleted as well. The code made up for the fact that root_ht refcnt is 2 and did
    a double decrement to free it, which is not a fit for the refcount API.
    
    Even though refcount_t is implemented using atomics, we should observe
    a negligible control plane impact.
    
    Signed-off-by: Pedro Tammela <pctammela@mojatatu.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Link: https://lore.kernel.org/r/20231114141856.974326-2-pctammela@mojatatu.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: 73af53d82076 ("net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/sched: taprio: extend minimum interval restriction to entire cycle too [+ + +]

Author: Vladimir Oltean <vladimir.oltean@nxp.com>
Date:   Tue Nov 19 16:06:18 2024 +0800

    net/sched: taprio: extend minimum interval restriction to entire cycle too
    
    [ Upstream commit fb66df20a7201e60f2b13d7f95d031b31a8831d3 ]
    
    It is possible for syzbot to side-step the restriction imposed by the
    blamed commit in the Fixes: tag, because the taprio UAPI permits a
    cycle-time different from (and potentially shorter than) the sum of
    entry intervals.
    
    We need one more restriction, which is that the cycle time itself must
    be larger than N * ETH_ZLEN bit times, where N is the number of schedule
    entries. This restriction needs to apply regardless of whether the cycle
    time came from the user or was the implicit, auto-calculated value, so
    we move the existing "cycle == 0" check outside the "if "(!new->cycle_time)"
    branch. This way covers both conditions and scenarios.
    
    Add a selftest which illustrates the issue triggered by syzbot.
    
    Fixes: b5b73b26b3ca ("taprio: Fix allowing too small intervals")
    Reported-by: syzbot+a7d2b1d5d1af83035567@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/netdev/0000000000007d66bc06196e7c66@google.com/
    Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://lore.kernel.org/r/20240527153955.553333-2-vladimir.oltean@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>
    Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: add copy_safe_from_sockptr() helper [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Nov 19 10:05:36 2024 +0800

    net: add copy_safe_from_sockptr() helper
    
    [ Upstream commit 6309863b31dd80317cd7d6824820b44e254e2a9c ]
    
    copy_from_sockptr() helper is unsafe, unless callers
    did the prior check against user provided optlen.
    
    Too many callers get this wrong, lets add a helper to
    fix them and avoid future copy/paste bugs.
    
    Instead of :
    
       if (optlen < sizeof(opt)) {
           err = -EINVAL;
           break;
       }
       if (copy_from_sockptr(&opt, optval, sizeof(opt)) {
           err = -EFAULT;
           break;
       }
    
    Use :
    
       err = copy_safe_from_sockptr(&opt, sizeof(opt),
                                    optval, optlen);
       if (err)
           break;
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Link: https://lore.kernel.org/r/20240408082845.3957374-2-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: 7a87441c9651 ("nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies")
    Signed-off-by: Sasha Levin <sashal@kernel.org>
    Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: fec: remove .ndo_poll_controller to avoid deadlocks [+ + +]

Author: Wei Fang <wei.fang@nxp.com>
Date:   Tue Nov 19 15:41:35 2024 +0800

    net: fec: remove .ndo_poll_controller to avoid deadlocks
    
    [ Upstream commit c2e0c58b25a0a0c37ec643255558c5af4450c9f5 ]
    
    There is a deadlock issue found in sungem driver, please refer to the
    commit ac0a230f719b ("eth: sungem: remove .ndo_poll_controller to avoid
    deadlocks"). The root cause of the issue is that netpoll is in atomic
    context and disable_irq() is called by .ndo_poll_controller interface
    of sungem driver, however, disable_irq() might sleep. After analyzing
    the implementation of fec_poll_controller(), the fec driver should have
    the same issue. Due to the fec driver uses NAPI for TX completions, the
    .ndo_poll_controller is unnecessary to be implemented in the fec driver,
    so fec_poll_controller() can be safely removed.
    
    Fixes: 7f5c6addcdc0 ("net/fec: add poll controller function for fec nic")
    Signed-off-by: Wei Fang <wei.fang@nxp.com>
    Link: https://lore.kernel.org/r/20240511062009.652918-1-wei.fang@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>
    Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: Make copy_safe_from_sockptr() match documentation [+ + +]

Author: Michal Luczaj <mhal@rbox.co>
Date:   Mon Nov 11 00:17:34 2024 +0100

    net: Make copy_safe_from_sockptr() match documentation
    
    commit eb94b7bb10109a14a5431a67e5d8e31cfa06b395 upstream.
    
    copy_safe_from_sockptr()
      return copy_from_sockptr()
        return copy_from_sockptr_offset()
          return copy_from_user()
    
    copy_from_user() does not return an error on fault. Instead, it returns a
    number of bytes that were not copied. Have it handled.
    
    Patch has a side effect: it un-breaks garbage input handling of
    nfc_llcp_setsockopt() and mISDN's data_sock_setsockopt().
    
    Fixes: 6309863b31dd ("net: add copy_safe_from_sockptr() helper")
    Signed-off-by: Michal Luczaj <mhal@rbox.co>
    Link: https://patch.msgid.link/20241111-sockptr-copy-ret-fix-v1-1-a520083a93fb@rbox.co
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes. [+ + +]

Author: Alexandre Ferrieux <alexandre.ferrieux@gmail.com>
Date:   Sun Nov 10 18:28:36 2024 +0100

    net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes.
    
    [ Upstream commit 73af53d82076bbe184d9ece9e14b0dc8599e6055 ]
    
    To generate hnode handles (in gen_new_htid()), u32 uses IDR and
    encodes the returned small integer into a structured 32-bit
    word. Unfortunately, at disposal time, the needed decoding
    is not done. As a result, idr_remove() fails, and the IDR
    fills up. Since its size is 2048, the following script ends up
    with "Filter already exists":
    
      tc filter add dev myve $FILTER1
      tc filter add dev myve $FILTER2
      for i in {1..2048}
      do
        echo $i
        tc filter del dev myve $FILTER2
        tc filter add dev myve $FILTER2
      done
    
    This patch adds the missing decoding logic for handles that
    deserve it.
    
    Fixes: e7614370d6f0 ("net_sched: use idr to allocate u32 filter handles")
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
    Signed-off-by: Alexandre Ferrieux <alexandre.ferrieux@orange.com>
    Tested-by: Victor Nogueira <victor@mojatatu.com>
    Link: https://patch.msgid.link/20241110172836.331319-1-alexandre.ferrieux@orange.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: vertexcom: mse102x: Fix tx_bytes calculation [+ + +]

Author: Stefan Wahren <wahrenst@gmx.net>
Date:   Fri Nov 8 12:43:43 2024 +0100

    net: vertexcom: mse102x: Fix tx_bytes calculation
    
    [ Upstream commit e68da664d379f352d41d7955712c44e0a738e4ab ]
    
    The tx_bytes should consider the actual size of the Ethernet frames
    without the SPI encapsulation. But we still need to take care of
    Ethernet padding.
    
    Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support")
    Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
    Link: https://patch.msgid.link/20241108114343.6174-3-wahrenst@gmx.net
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

netlink: terminate outstanding dump on socket close [+ + +]

Author: Jakub Kicinski <kuba@kernel.org>
Date:   Tue Nov 5 17:52:34 2024 -0800

    netlink: terminate outstanding dump on socket close
    
    [ Upstream commit 1904fb9ebf911441f90a68e96b22aa73e4410505 ]
    
    Netlink supports iterative dumping of data. It provides the families
    the following ops:
     - start - (optional) kicks off the dumping process
     - dump  - actual dump helper, keeps getting called until it returns 0
     - done  - (optional) pairs with .start, can be used for cleanup
    The whole process is asynchronous and the repeated calls to .dump
    don't actually happen in a tight loop, but rather are triggered
    in response to recvmsg() on the socket.
    
    This gives the user full control over the dump, but also means that
    the user can close the socket without getting to the end of the dump.
    To make sure .start is always paired with .done we check if there
    is an ongoing dump before freeing the socket, and if so call .done.
    
    The complication is that sockets can get freed from BH and .done
    is allowed to sleep. So we use a workqueue to defer the call, when
    needed.
    
    Unfortunately this does not work correctly. What we defer is not
    the cleanup but rather releasing a reference on the socket.
    We have no guarantee that we own the last reference, if someone
    else holds the socket they may release it in BH and we're back
    to square one.
    
    The whole dance, however, appears to be unnecessary. Only the user
    can interact with dumps, so we can clean up when socket is closed.
    And close always happens in process context. Some async code may
    still access the socket after close, queue notification skbs to it etc.
    but no dumps can start, end or otherwise make progress.
    
    Delete the workqueue and flush the dump state directly from the release
    handler. Note that further cleanup is possible in -next, for instance
    we now always call .done before releasing the main module reference,
    so dump doesn't have to take a reference of its own.
    
    Reported-by: syzkaller <syzkaller@googlegroups.com>
    Fixes: ed5d7788a934 ("netlink: Do not schedule work from sk_destruct")
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://patch.msgid.link/20241106015235.2458807-1-kuba@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Tue Nov 19 10:05:37 2024 +0800

    nfc: llcp: fix nfc_llcp_setsockopt() unsafe copies
    
    [ Upstream commit 7a87441c9651ba37842f4809224aca13a554a26f ]
    
    syzbot reported unsafe calls to copy_from_sockptr() [1]
    
    Use copy_safe_from_sockptr() instead.
    
    [1]
    
    BUG: KASAN: slab-out-of-bounds in copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
     BUG: KASAN: slab-out-of-bounds in copy_from_sockptr include/linux/sockptr.h:55 [inline]
     BUG: KASAN: slab-out-of-bounds in nfc_llcp_setsockopt+0x6c2/0x850 net/nfc/llcp_sock.c:255
    Read of size 4 at addr ffff88801caa1ec3 by task syz-executor459/5078
    
    CPU: 0 PID: 5078 Comm: syz-executor459 Not tainted 6.8.0-syzkaller-08951-gfe46a7dd189e #0
    Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
    Call Trace:
     <TASK>
      __dump_stack lib/dump_stack.c:88 [inline]
      dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
      print_address_description mm/kasan/report.c:377 [inline]
      print_report+0x169/0x550 mm/kasan/report.c:488
      kasan_report+0x143/0x180 mm/kasan/report.c:601
      copy_from_sockptr_offset include/linux/sockptr.h:49 [inline]
      copy_from_sockptr include/linux/sockptr.h:55 [inline]
      nfc_llcp_setsockopt+0x6c2/0x850 net/nfc/llcp_sock.c:255
      do_sock_setsockopt+0x3b1/0x720 net/socket.c:2311
      __sys_setsockopt+0x1ae/0x250 net/socket.c:2334
      __do_sys_setsockopt net/socket.c:2343 [inline]
      __se_sys_setsockopt net/socket.c:2340 [inline]
      __x64_sys_setsockopt+0xb5/0xd0 net/socket.c:2340
     do_syscall_64+0xfd/0x240
     entry_SYSCALL_64_after_hwframe+0x6d/0x75
    RIP: 0033:0x7f7fac07fd89
    Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 91 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
    RSP: 002b:00007fff660eb788 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
    RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f7fac07fd89
    RDX: 0000000000000000 RSI: 0000000000000118 RDI: 0000000000000004
    RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000
    R10: 0000000020000a80 R11: 0000000000000246 R12: 0000000000000000
    R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Reported-by: syzbot <syzkaller@googlegroups.com>
    Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Link: https://lore.kernel.org/r/20240408082845.3957374-4-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>
    Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSD: Async COPY result needs to return a write verifier [+ + +]

Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Mon Nov 18 16:18:57 2024 -0500

    NFSD: Async COPY result needs to return a write verifier
    
    [ Upstream commit 9ed666eba4e0a2bb8ffaa3739d830b64d4f2aaad ]
    
    Currently, when NFSD handles an asynchronous COPY, it returns a
    zero write verifier, relying on the subsequent CB_OFFLOAD callback
    to pass the write verifier and a stable_how4 value to the client.
    
    However, if the CB_OFFLOAD never arrives at the client (for example,
    if a network partition occurs just as the server sends the
    CB_OFFLOAD operation), the client will never receive this verifier.
    Thus, if the client sends a follow-up COMMIT, there is no way for
    the client to assess the COMMIT result.
    
    The usual recovery for a missing CB_OFFLOAD is for the client to
    send an OFFLOAD_STATUS operation, but that operation does not carry
    a write verifier in its result. Neither does it carry a stable_how4
    value, so the client /must/ send a COMMIT in this case -- which will
    always fail because currently there's still no write verifier in the
    COPY result.
    
    Thus the server needs to return a normal write verifier in its COPY
    result even if the COPY operation is to be performed asynchronously.
    
    If the server recognizes the callback stateid in subsequent
    OFFLOAD_STATUS operations, then obviously it has not restarted, and
    the write verifier the client received in the COPY result is still
    valid and can be used to assess a COMMIT of the copied data, if one
    is needed.
    
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    [ cel: adjusted to apply to origin/linux-6.1.y ]
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSD: initialize copy->cp_clp early in nfsd4_copy for use by trace point [+ + +]

Author: Dai Ngo <dai.ngo@oracle.com>
Date:   Mon Nov 18 16:18:56 2024 -0500

    NFSD: initialize copy->cp_clp early in nfsd4_copy for use by trace point
    
    [ Upstream commit 15d1975b7279693d6f09398e0e2e31aca2310275 ]
    
    Prepare for adding server copy trace points.
    
    Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
    Tested-by: Chen Hanxiao <chenhx.fnst@fujitsu.com>
    Stable-dep-of: 9ed666eba4e0 ("NFSD: Async COPY result needs to return a write verifier")
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSD: Initialize struct nfsd4_copy earlier [+ + +]

Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Mon Nov 18 16:18:59 2024 -0500

    NFSD: Initialize struct nfsd4_copy earlier
    
    [ Upstream commit 63fab04cbd0f96191b6e5beedc3b643b01c15889 ]
    
    Ensure the refcount and async_copies fields are initialized early.
    cleanup_async_copy() will reference these fields if an error occurs
    in nfsd4_copy(). If they are not correctly initialized, at the very
    least, a refcount underflow occurs.
    
    Reported-by: Olga Kornievskaia <okorniev@redhat.com>
    Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Tested-by: Olga Kornievskaia <okorniev@redhat.com>
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSD: Limit the number of concurrent async COPY operations [+ + +]

Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Mon Nov 18 16:18:58 2024 -0500

    NFSD: Limit the number of concurrent async COPY operations
    
    [ Upstream commit aadc3bbea163b6caaaebfdd2b6c4667fbc726752 ]
    
    Nothing appears to limit the number of concurrent async COPY
    operations that clients can start. In addition, AFAICT each async
    COPY can copy an unlimited number of 4MB chunks, so can run for a
    long time. Thus IMO async COPY can become a DoS vector.
    
    Add a restriction mechanism that bounds the number of concurrent
    background COPY operations. Start simple and try to be fair -- this
    patch implements a per-namespace limit.
    
    An async COPY request that occurs while this limit is exceeded gets
    NFS4ERR_DELAY. The requesting client can choose to send the request
    again after a delay or fall back to a traditional read/write style
    copy.
    
    If there is need to make the mechanism more sophisticated, we can
    visit that in future patches.
    
    Cc: stable@vger.kernel.org
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Link: https://nvd.nist.gov/vuln/detail/CVE-2024-49974
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSD: Never decrement pending_async_copies on error [+ + +]

Author: Chuck Lever <chuck.lever@oracle.com>
Date:   Mon Nov 18 16:19:00 2024 -0500

    NFSD: Never decrement pending_async_copies on error
    
    [ Upstream commit 8286f8b622990194207df9ab852e0f87c60d35e9 ]
    
    The error flow in nfsd4_copy() calls cleanup_async_copy(), which
    already decrements nn->pending_async_copies.
    
    Reported-by: Olga Kornievskaia <okorniev@redhat.com>
    Fixes: aadc3bbea163 ("NFSD: Limit the number of concurrent async COPY operations")
    Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint [+ + +]

Author: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Date:   Thu Nov 7 01:07:33 2024 +0900

    nilfs2: fix null-ptr-deref in block_dirty_buffer tracepoint
    
    commit 2026559a6c4ce34db117d2db8f710fe2a9420d5a upstream.
    
    When using the "block:block_dirty_buffer" tracepoint, mark_buffer_dirty()
    may cause a NULL pointer dereference, or a general protection fault when
    KASAN is enabled.
    
    This happens because, since the tracepoint was added in
    mark_buffer_dirty(), it references the dev_t member bh->b_bdev->bd_dev
    regardless of whether the buffer head has a pointer to a block_device
    structure.
    
    In the current implementation, nilfs_grab_buffer(), which grabs a buffer
    to read (or create) a block of metadata, including b-tree node blocks,
    does not set the block device, but instead does so only if the buffer is
    not in the "uptodate" state for each of its caller block reading
    functions.  However, if the uptodate flag is set on a folio/page, and the
    buffer heads are detached from it by try_to_free_buffers(), and new buffer
    heads are then attached by create_empty_buffers(), the uptodate flag may
    be restored to each buffer without the block device being set to
    bh->b_bdev, and mark_buffer_dirty() may be called later in that state,
    resulting in the bug mentioned above.
    
    Fix this issue by making nilfs_grab_buffer() always set the block device
    of the super block structure to the buffer head, regardless of the state
    of the buffer's uptodate flag.
    
    Link: https://lkml.kernel.org/r/20241106160811.3316-3-konishi.ryusuke@gmail.com
    Fixes: 5305cb830834 ("block: add block_{touch|dirty}_buffer tracepoint")
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Ubisectech Sirius <bugreport@valiantsec.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint [+ + +]

Author: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Date:   Thu Nov 7 01:07:32 2024 +0900

    nilfs2: fix null-ptr-deref in block_touch_buffer tracepoint
    
    commit cd45e963e44b0f10d90b9e6c0e8b4f47f3c92471 upstream.
    
    Patch series "nilfs2: fix null-ptr-deref bugs on block tracepoints".
    
    This series fixes null pointer dereference bugs that occur when using
    nilfs2 and two block-related tracepoints.
    
    
    This patch (of 2):
    
    It has been reported that when using "block:block_touch_buffer"
    tracepoint, touch_buffer() called from __nilfs_get_folio_block() causes a
    NULL pointer dereference, or a general protection fault when KASAN is
    enabled.
    
    This happens because since the tracepoint was added in touch_buffer(), it
    references the dev_t member bh->b_bdev->bd_dev regardless of whether the
    buffer head has a pointer to a block_device structure.  In the current
    implementation, the block_device structure is set after the function
    returns to the caller.
    
    Here, touch_buffer() is used to mark the folio/page that owns the buffer
    head as accessed, but the common search helper for folio/page used by the
    caller function was optimized to mark the folio/page as accessed when it
    was reimplemented a long time ago, eliminating the need to call
    touch_buffer() here in the first place.
    
    So this solves the issue by eliminating the touch_buffer() call itself.
    
    Link: https://lkml.kernel.org/r/20241106160811.3316-1-konishi.ryusuke@gmail.com
    Link: https://lkml.kernel.org/r/20241106160811.3316-2-konishi.ryusuke@gmail.com
    Fixes: 5305cb830834 ("block: add block_{touch|dirty}_buffer tracepoint")
    Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
    Reported-by: Ubisectech Sirius <bugreport@valiantsec.com>
    Closes: https://lkml.kernel.org/r/86bd3013-887e-4e38-960f-ca45c657f032.bugreport@valiantsec.com
    Reported-by: syzbot+9982fb8d18eba905abe2@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=9982fb8d18eba905abe2
    Tested-by: syzbot+9982fb8d18eba905abe2@syzkaller.appspotmail.com
    Cc: Tejun Heo <tj@kernel.org>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues' [+ + +]

Author: Yu Kuai <yukuai3@huawei.com>
Date:   Thu May 23 23:39:34 2024 +0800

    null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'
    
    commit a2db328b0839312c169eb42746ec46fc1ab53ed2 upstream.
    
    Writing 'power' and 'submit_queues' concurrently will trigger kernel
    panic:
    
    Test script:
    
    modprobe null_blk nr_devices=0
    mkdir -p /sys/kernel/config/nullb/nullb0
    while true; do echo 1 > submit_queues; echo 4 > submit_queues; done &
    while true; do echo 1 > power; echo 0 > power; done
    
    Test result:
    
    BUG: kernel NULL pointer dereference, address: 0000000000000148
    Oops: 0000 [#1] PREEMPT SMP
    RIP: 0010:__lock_acquire+0x41d/0x28f0
    Call Trace:
     <TASK>
     lock_acquire+0x121/0x450
     down_write+0x5f/0x1d0
     simple_recursive_removal+0x12f/0x5c0
     blk_mq_debugfs_unregister_hctxs+0x7c/0x100
     blk_mq_update_nr_hw_queues+0x4a3/0x720
     nullb_update_nr_hw_queues+0x71/0xf0 [null_blk]
     nullb_device_submit_queues_store+0x79/0xf0 [null_blk]
     configfs_write_iter+0x119/0x1e0
     vfs_write+0x326/0x730
     ksys_write+0x74/0x150
    
    This is because del_gendisk() can concurrent with
    blk_mq_update_nr_hw_queues():
    
    nullb_device_power_store        nullb_apply_submit_queues
     null_del_dev
     del_gendisk
                                     nullb_update_nr_hw_queues
                                      if (!dev->nullb)
                                      // still set while gendisk is deleted
                                       return 0
                                      blk_mq_update_nr_hw_queues
     dev->nullb = NULL
    
    Fix this problem by resuing the global mutex to protect
    nullb_device_power_store() and nullb_update_nr_hw_queues() from configfs.
    
    Fixes: 45919fbfe1c4 ("null_blk: Enable modifying 'submit_queues' after an instance has been configured")
    Reported-and-tested-by: Yi Zhang <yi.zhang@redhat.com>
    Closes: https://lore.kernel.org/all/CAHj4cs9LgsHLnjg8z06LQ3Pr5cax-+Ps+xT7AP7TPnEjStuwZA@mail.gmail.com/
    Signed-off-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
    Link: https://lore.kernel.org/r/20240523153934.1937851-1-yukuai1@huaweicloud.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>
    Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

null_blk: Fix return value of nullb_device_power_store() [+ + +]

Author: Damien Le Moal <dlemoal@kernel.org>
Date:   Mon May 27 13:34:45 2024 +0900

    null_blk: Fix return value of nullb_device_power_store()
    
    commit d9ff882b54f99f96787fa3df7cd938966843c418 upstream.
    
    When powering on a null_blk device that is not already on, the return
    value ret that is initialized to be count is reused to check the return
    value of null_add_dev(), leading to nullb_device_power_store() to return
    null_add_dev() return value (0 on success) instead of "count".
    So make sure to set ret to be equal to count when there are no errors.
    
    Fixes: a2db328b0839 ("null_blk: fix null-ptr-dereference while configuring 'power' and 'submit_queues'")
    Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
    Reviewed-by: Yu Kuai <yukuai3@huawei.com>
    Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
    Link: https://lore.kernel.org/r/20240527043445.235267-1-dlemoal@kernel.org
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

null_blk: Remove usage of the deprecated ida_simple_xx() API [+ + +]

Author: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Date:   Sun Jan 14 10:00:59 2024 +0100

    null_blk: Remove usage of the deprecated ida_simple_xx() API
    
    commit 95931a245b44ee04f3359ec432e73614d44d8b38 upstream.
    
    ida_alloc() and ida_free() should be preferred to the deprecated
    ida_simple_get() and ida_simple_remove().
    
    This is less verbose.
    
    Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
    Link: https://lore.kernel.org/r/bf257b1078475a415cdc3344c6a750842946e367.1705222845.git.christophe.jaillet@wanadoo.fr
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Xiangyu Chen <xiangyu.chen@windriver.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ocfs2: fix UBSAN warning in ocfs2_verify_volume() [+ + +]

Author: Dmitry Antipov <dmantipov@yandex.ru>
Date:   Wed Nov 6 12:21:00 2024 +0300

    ocfs2: fix UBSAN warning in ocfs2_verify_volume()
    
    commit 23aab037106d46e6168ce1214a958ce9bf317f2e upstream.
    
    Syzbot has reported the following splat triggered by UBSAN:
    
    UBSAN: shift-out-of-bounds in fs/ocfs2/super.c:2336:10
    shift exponent 32768 is too large for 32-bit type 'int'
    CPU: 2 UID: 0 PID: 5255 Comm: repro Not tainted 6.12.0-rc4-syzkaller-00047-gc2ee9f594da8 #0
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
    Call Trace:
     <TASK>
     dump_stack_lvl+0x241/0x360
     ? __pfx_dump_stack_lvl+0x10/0x10
     ? __pfx__printk+0x10/0x10
     ? __asan_memset+0x23/0x50
     ? lockdep_init_map_type+0xa1/0x910
     __ubsan_handle_shift_out_of_bounds+0x3c8/0x420
     ocfs2_fill_super+0xf9c/0x5750
     ? __pfx_ocfs2_fill_super+0x10/0x10
     ? __pfx_validate_chain+0x10/0x10
     ? __pfx_validate_chain+0x10/0x10
     ? validate_chain+0x11e/0x5920
     ? __lock_acquire+0x1384/0x2050
     ? __pfx_validate_chain+0x10/0x10
     ? string+0x26a/0x2b0
     ? widen_string+0x3a/0x310
     ? string+0x26a/0x2b0
     ? bdev_name+0x2b1/0x3c0
     ? pointer+0x703/0x1210
     ? __pfx_pointer+0x10/0x10
     ? __pfx_format_decode+0x10/0x10
     ? __lock_acquire+0x1384/0x2050
     ? vsnprintf+0x1ccd/0x1da0
     ? snprintf+0xda/0x120
     ? __pfx_lock_release+0x10/0x10
     ? do_raw_spin_lock+0x14f/0x370
     ? __pfx_snprintf+0x10/0x10
     ? set_blocksize+0x1f9/0x360
     ? sb_set_blocksize+0x98/0xf0
     ? setup_bdev_super+0x4e6/0x5d0
     mount_bdev+0x20c/0x2d0
     ? __pfx_ocfs2_fill_super+0x10/0x10
     ? __pfx_mount_bdev+0x10/0x10
     ? vfs_parse_fs_string+0x190/0x230
     ? __pfx_vfs_parse_fs_string+0x10/0x10
     legacy_get_tree+0xf0/0x190
     ? __pfx_ocfs2_mount+0x10/0x10
     vfs_get_tree+0x92/0x2b0
     do_new_mount+0x2be/0xb40
     ? __pfx_do_new_mount+0x10/0x10
     __se_sys_mount+0x2d6/0x3c0
     ? __pfx___se_sys_mount+0x10/0x10
     ? do_syscall_64+0x100/0x230
     ? __x64_sys_mount+0x20/0xc0
     do_syscall_64+0xf3/0x230
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    RIP: 0033:0x7f37cae96fda
    Code: 48 8b 0d 51 ce 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1e ce 0c 00 f7 d8 64 89 01 48
    RSP: 002b:00007fff6c1aa228 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
    RAX: ffffffffffffffda RBX: 00007fff6c1aa240 RCX: 00007f37cae96fda
    RDX: 00000000200002c0 RSI: 0000000020000040 RDI: 00007fff6c1aa240
    RBP: 0000000000000004 R08: 00007fff6c1aa280 R09: 0000000000000000
    R10: 00000000000008c0 R11: 0000000000000206 R12: 00000000000008c0
    R13: 00007fff6c1aa280 R14: 0000000000000003 R15: 0000000001000000
     </TASK>
    
    For a really damaged superblock, the value of 'i_super.s_blocksize_bits'
    may exceed the maximum possible shift for an underlying 'int'.  So add an
    extra check whether the aforementioned field represents the valid block
    size, which is 512 bytes, 1K, 2K, or 4K.
    
    Link: https://lkml.kernel.org/r/20241106092100.2661330-1-dmantipov@yandex.ru
    Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
    Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
    Reported-by: syzbot+56f7cd1abe4b8e475180@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=56f7cd1abe4b8e475180
    Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
    Cc: Mark Fasheh <mark@fasheh.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Cc: Changwei Ge <gechangwei@live.cn>
    Cc: Jun Piao <piaojun@huawei.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ocfs2: uncache inode which has failed entering the group [+ + +]

Author: Dmitry Antipov <dmantipov@yandex.ru>
Date:   Thu Nov 14 07:38:44 2024 +0300

    ocfs2: uncache inode which has failed entering the group
    
    commit 737f34137844d6572ab7d473c998c7f977ff30eb upstream.
    
    Syzbot has reported the following BUG:
    
    kernel BUG at fs/ocfs2/uptodate.c:509!
    ...
    Call Trace:
     <TASK>
     ? __die_body+0x5f/0xb0
     ? die+0x9e/0xc0
     ? do_trap+0x15a/0x3a0
     ? ocfs2_set_new_buffer_uptodate+0x145/0x160
     ? do_error_trap+0x1dc/0x2c0
     ? ocfs2_set_new_buffer_uptodate+0x145/0x160
     ? __pfx_do_error_trap+0x10/0x10
     ? handle_invalid_op+0x34/0x40
     ? ocfs2_set_new_buffer_uptodate+0x145/0x160
     ? exc_invalid_op+0x38/0x50
     ? asm_exc_invalid_op+0x1a/0x20
     ? ocfs2_set_new_buffer_uptodate+0x2e/0x160
     ? ocfs2_set_new_buffer_uptodate+0x144/0x160
     ? ocfs2_set_new_buffer_uptodate+0x145/0x160
     ocfs2_group_add+0x39f/0x15a0
     ? __pfx_ocfs2_group_add+0x10/0x10
     ? __pfx_lock_acquire+0x10/0x10
     ? mnt_get_write_access+0x68/0x2b0
     ? __pfx_lock_release+0x10/0x10
     ? rcu_read_lock_any_held+0xb7/0x160
     ? __pfx_rcu_read_lock_any_held+0x10/0x10
     ? smack_log+0x123/0x540
     ? mnt_get_write_access+0x68/0x2b0
     ? mnt_get_write_access+0x68/0x2b0
     ? mnt_get_write_access+0x226/0x2b0
     ocfs2_ioctl+0x65e/0x7d0
     ? __pfx_ocfs2_ioctl+0x10/0x10
     ? smack_file_ioctl+0x29e/0x3a0
     ? __pfx_smack_file_ioctl+0x10/0x10
     ? lockdep_hardirqs_on_prepare+0x43d/0x780
     ? __pfx_lockdep_hardirqs_on_prepare+0x10/0x10
     ? __pfx_ocfs2_ioctl+0x10/0x10
     __se_sys_ioctl+0xfb/0x170
     do_syscall_64+0xf3/0x230
     entry_SYSCALL_64_after_hwframe+0x77/0x7f
    ...
     </TASK>
    
    When 'ioctl(OCFS2_IOC_GROUP_ADD, ...)' has failed for the particular
    inode in 'ocfs2_verify_group_and_input()', corresponding buffer head
    remains cached and subsequent call to the same 'ioctl()' for the same
    inode issues the BUG() in 'ocfs2_set_new_buffer_uptodate()' (trying
    to cache the same buffer head of that inode). Fix this by uncaching
    the buffer head with 'ocfs2_remove_from_cache()' on error path in
    'ocfs2_group_add()'.
    
    Link: https://lkml.kernel.org/r/20241114043844.111847-1-dmantipov@yandex.ru
    Fixes: 7909f2bf8353 ("[PATCH 2/2] ocfs2: Implement group add for online resize")
    Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
    Reported-by: syzbot+453873f1588c2d75b447@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=453873f1588c2d75b447
    Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
    Cc: Dmitry Antipov <dmantipov@yandex.ru>
    Cc: Joel Becker <jlbec@evilplan.org>
    Cc: Mark Fasheh <mark@fasheh.com>
    Cc: Junxiao Bi <junxiao.bi@oracle.com>
    Cc: Changwei Ge <gechangwei@live.cn>
    Cc: Jun Piao <piaojun@huawei.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

parisc: fix a possible DMA corruption [+ + +]

Author: Mikulas Patocka <mpatocka@redhat.com>
Date:   Sat Jul 27 20:22:52 2024 +0200

    parisc: fix a possible DMA corruption
    
    commit 7ae04ba36b381bffe2471eff3a93edced843240f upstream.
    
    ARCH_DMA_MINALIGN was defined as 16 - this is too small - it may be
    possible that two unrelated 16-byte allocations share a cache line. If
    one of these allocations is written using DMA and the other is written
    using cached write, the value that was written with DMA may be
    corrupted.
    
    This commit changes ARCH_DMA_MINALIGN to be 128 on PA20 and 32 on PA1.1 -
    that's the largest possible cache line size.
    
    As different parisc microarchitectures have different cache line size, we
    define arch_slab_minalign(), cache_line_size() and
    dma_get_cache_alignment() so that the kernel may tune slab cache
    parameters dynamically, based on the detected cache line size.
    
    Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Helge Deller <deller@gmx.de>
    Signed-off-by: Bin Lan <bin.lan.cn@windriver.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K" [+ + +]

Author: Aurelien Jarno <aurelien@aurel32.net>
Date:   Sun Nov 10 12:46:36 2024 +0100

    Revert "mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K"
    
    commit 1635e407a4a64d08a8517ac59ca14ad4fc785e75 upstream.
    
    The commit 8396c793ffdf ("mmc: dw_mmc: Fix IDMAC operation with pages
    bigger than 4K") increased the max_req_size, even for 4K pages, causing
    various issues:
    - Panic booting the kernel/rootfs from an SD card on Rockchip RK3566
    - Panic booting the kernel/rootfs from an SD card on StarFive JH7100
    - "swiotlb buffer is full" and data corruption on StarFive JH7110
    
    At this stage no fix have been found, so it's probably better to just
    revert the change.
    
    This reverts commit 8396c793ffdf28bb8aee7cfe0891080f8cab7890.
    
    Cc: stable@vger.kernel.org
    Cc: Sam Protsenko <semen.protsenko@linaro.org>
    Fixes: 8396c793ffdf ("mmc: dw_mmc: Fix IDMAC operation with pages bigger than 4K")
    Closes: https://lore.kernel.org/linux-mmc/614692b4-1dbe-31b8-a34d-cb6db1909bb7@w6rz.net/
    Closes: https://lore.kernel.org/linux-mmc/CAC8uq=Ppnmv98mpa1CrWLawWoPnu5abtU69v-=G-P7ysATQ2Pw@mail.gmail.com/
    Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
    Message-ID: <20241110114700.622372-1-aurelien@aurel32.net>
    Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

samples: pktgen: correct dev to DEV [+ + +]

Author: Wei Fang <wei.fang@nxp.com>
Date:   Tue Nov 12 11:03:47 2024 +0800

    samples: pktgen: correct dev to DEV
    
    [ Upstream commit 3342dc8b4623d835e7dd76a15cec2e5a94fe2f93 ]
    
    In the pktgen_sample01_simple.sh script, the device variable is uppercase
    'DEV' instead of lowercase 'dev'. Because of this typo, the script cannot
    enable UDP tx checksum.
    
    Fixes: 460a9aa23de6 ("samples: pktgen: add UDP tx checksum support")
    Signed-off-by: Wei Fang <wei.fang@nxp.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
    Link: https://patch.msgid.link/20241112030347.1849335-1-wei.fang@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

staging: vchiq_arm: Get the rid off struct vchiq_2835_state [+ + +]

Author: Stefan Wahren <wahrenst@gmx.net>
Date:   Fri Jun 21 15:19:53 2024 +0200

    staging: vchiq_arm: Get the rid off struct vchiq_2835_state
    
    [ Upstream commit 4e2766102da632f26341d5539519b0abf73df887 ]
    
    The whole benefit of this encapsulating struct is questionable.
    It just stores a flag to signalize the init state of vchiq_arm_state.
    Beside the fact this flag is set too soon, the access to uninitialized
    members should be avoided. So initialize vchiq_arm_state properly before
    assign it directly to vchiq_state.
    
    Signed-off-by: Stefan Wahren <wahrenst@gmx.net>
    Link: https://lore.kernel.org/r/20240621131958.98208-6-wahrenst@gmx.net
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Stable-dep-of: 404b739e8955 ("staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation [+ + +]

Author: Umang Jain <umang.jain@ideasonboard.com>
Date:   Wed Oct 16 18:32:24 2024 +0530

    staging: vchiq_arm: Use devm_kzalloc() for vchiq_arm_state allocation
    
    [ Upstream commit 404b739e895522838f1abdc340c554654d671dde ]
    
    The struct vchiq_arm_state 'platform_state' is currently allocated
    dynamically using kzalloc(). Unfortunately, it is never freed and is
    subjected to memory leaks in the error handling paths of the probe()
    function.
    
    To address the issue, use device resource management helper
    devm_kzalloc(), to ensure cleanup after its allocation.
    
    Fixes: 71bad7f08641 ("staging: add bcm2708 vchiq driver")
    Cc: stable@vger.kernel.org
    Signed-off-by: Umang Jain <umang.jain@ideasonboard.com>
    Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
    Link: https://lore.kernel.org/r/20241016130225.61024-2-umang.jain@ideasonboard.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

vdpa/mlx5: Fix PA offset with unaligned starting iotlb map [+ + +]

Author: Si-Wei Liu <si-wei.liu@oracle.com>
Date:   Mon Oct 21 16:40:39 2024 +0300

    vdpa/mlx5: Fix PA offset with unaligned starting iotlb map
    
    commit 29ce8b8a4fa74e841342c8b8f8941848a3c6f29f upstream.
    
    When calculating the physical address range based on the iotlb and mr
    [start,end) ranges, the offset of mr->start relative to map->start
    is not taken into account. This leads to some incorrect and duplicate
    mappings.
    
    For the case when mr->start < map->start the code is already correct:
    the range in [mr->start, map->start) was handled by a different
    iteration.
    
    Fixes: 94abbccdf291 ("vdpa/mlx5: Add shared memory registration code")
    Cc: stable@vger.kernel.org
    Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
    Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com>
    Message-Id: <20241021134040.975221-2-dtatulea@nvidia.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

virtio/vsock: Fix accept_queue memory leak [+ + +]

Author: Michal Luczaj <mhal@rbox.co>
Date:   Thu Nov 7 21:46:12 2024 +0100

    virtio/vsock: Fix accept_queue memory leak
    
    [ Upstream commit d7b0ff5a866724c3ad21f2628c22a63336deec3f ]
    
    As the final stages of socket destruction may be delayed, it is possible
    that virtio_transport_recv_listen() will be called after the accept_queue
    has been flushed, but before the SOCK_DONE flag has been set. As a result,
    sockets enqueued after the flush would remain unremoved, leading to a
    memory leak.
    
    vsock_release
      __vsock_release
        lock
        virtio_transport_release
          virtio_transport_close
            schedule_delayed_work(close_work)
        sk_shutdown = SHUTDOWN_MASK
    (!) flush accept_queue
        release
                                            virtio_transport_recv_pkt
                                              vsock_find_bound_socket
                                              lock
                                              if flag(SOCK_DONE) return
                                              virtio_transport_recv_listen
                                                child = vsock_create_connected
                                          (!)   vsock_enqueue_accept(child)
                                              release
    close_work
      lock
      virtio_transport_do_close
        set_flag(SOCK_DONE)
        virtio_transport_remove_sock
          vsock_remove_sock
            vsock_remove_bound
      release
    
    Introduce a sk_shutdown check to disallow vsock_enqueue_accept() during
    socket destruction.
    
    unreferenced object 0xffff888109e3f800 (size 2040):
      comm "kworker/5:2", pid 371, jiffies 4294940105
      hex dump (first 32 bytes):
        00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        28 00 0b 40 00 00 00 00 00 00 00 00 00 00 00 00  (..@............
      backtrace (crc 9e5f4e84):
        [<ffffffff81418ff1>] kmem_cache_alloc_noprof+0x2c1/0x360
        [<ffffffff81d27aa0>] sk_prot_alloc+0x30/0x120
        [<ffffffff81d2b54c>] sk_alloc+0x2c/0x4b0
        [<ffffffff81fe049a>] __vsock_create.constprop.0+0x2a/0x310
        [<ffffffff81fe6d6c>] virtio_transport_recv_pkt+0x4dc/0x9a0
        [<ffffffff81fe745d>] vsock_loopback_work+0xfd/0x140
        [<ffffffff810fc6ac>] process_one_work+0x20c/0x570
        [<ffffffff810fce3f>] worker_thread+0x1bf/0x3a0
        [<ffffffff811070dd>] kthread+0xdd/0x110
        [<ffffffff81044fdd>] ret_from_fork+0x2d/0x50
        [<ffffffff8100785a>] ret_from_fork_asm+0x1a/0x30
    
    Fixes: 3fe356d58efa ("vsock/virtio: discard packets only when socket is really closed")
    Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
    Signed-off-by: Michal Luczaj <mhal@rbox.co>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

vp_vdpa: fix id_table array not null terminated error [+ + +]

Author: Xiaoguang Wang <lege.wang@jaguarmicro.com>
Date:   Tue Nov 5 21:35:18 2024 +0800

    vp_vdpa: fix id_table array not null terminated error
    
    commit 4e39ecadf1d2a08187139619f1f314b64ba7d947 upstream.
    
    Allocate one extra virtio_device_id as null terminator, otherwise
    vdpa_mgmtdev_get_classes() may iterate multiple times and visit
    undefined memory.
    
    Fixes: ffbda8e9df10 ("vdpa/vp_vdpa : add vdpa tool support in vp_vdpa")
    Cc: stable@vger.kernel.org
    Suggested-by: Parav Pandit <parav@nvidia.com>
    Signed-off-by: Angus Chen <angus.chen@jaguarmicro.com>
    Signed-off-by: Xiaoguang Wang <lege.wang@jaguarmicro.com>
    Message-Id: <20241105133518.1494-1-lege.wang@jaguarmicro.com>
    Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
    Reviewed-by: Parav Pandit <parav@nvidia.com>
    Acked-by: Jason Wang <jasowang@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/mm: Fix a kdump kernel failure on SME system when CONFIG_IMA_KEXEC=y [+ + +]

Author: Baoquan He <bhe@redhat.com>
Date:   Wed Sep 11 16:16:15 2024 +0800

    x86/mm: Fix a kdump kernel failure on SME system when CONFIG_IMA_KEXEC=y
    
    commit 8d9ffb2fe65a6c4ef114e8d4f947958a12751bbe upstream.
    
    The kdump kernel is broken on SME systems with CONFIG_IMA_KEXEC=y enabled.
    Debugging traced the issue back to
    
      b69a2afd5afc ("x86/kexec: Carry forward IMA measurement log on kexec").
    
    Testing was previously not conducted on SME systems with CONFIG_IMA_KEXEC
    enabled, which led to the oversight, with the following incarnation:
    
    ...
      ima: No TPM chip found, activating TPM-bypass!
      Loading compiled-in module X.509 certificates
      Loaded X.509 cert 'Build time autogenerated kernel key: 18ae0bc7e79b64700122bb1d6a904b070fef2656'
      ima: Allocated hash algorithm: sha256
      Oops: general protection fault, probably for non-canonical address 0xcfacfdfe6660003e: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.11.0-rc2+ #14
      Hardware name: Dell Inc. PowerEdge R7425/02MJ3T, BIOS 1.20.0 05/03/2023
      RIP: 0010:ima_restore_measurement_list
      Call Trace:
       <TASK>
       ? show_trace_log_lvl
       ? show_trace_log_lvl
       ? ima_load_kexec_buffer
       ? __die_body.cold
       ? die_addr
       ? exc_general_protection
       ? asm_exc_general_protection
       ? ima_restore_measurement_list
       ? vprintk_emit
       ? ima_load_kexec_buffer
       ima_load_kexec_buffer
       ima_init
       ? __pfx_init_ima
       init_ima
       ? __pfx_init_ima
       do_one_initcall
       do_initcalls
       ? __pfx_kernel_init
       kernel_init_freeable
       kernel_init
       ret_from_fork
       ? __pfx_kernel_init
       ret_from_fork_asm
       </TASK>
      Modules linked in:
      ---[ end trace 0000000000000000 ]---
      ...
      Kernel panic - not syncing: Fatal exception
      Kernel Offset: disabled
      Rebooting in 10 seconds..
    
    Adding debug printks showed that the stored addr and size of ima_kexec buffer
    are not decrypted correctly like:
    
      ima: ima_load_kexec_buffer, buffer:0xcfacfdfe6660003e, size:0xe48066052d5df359
    
    Three types of setup_data info
    
      — SETUP_EFI,
      - SETUP_IMA, and
      - SETUP_RNG_SEED
    
    are passed to the kexec/kdump kernel. Only the ima_kexec buffer
    experienced incorrect decryption. Debugging identified a bug in
    early_memremap_is_setup_data(), where an incorrect range calculation
    occurred due to the len variable in struct setup_data ended up only
    representing the length of the data field, excluding the struct's size,
    and thus leading to miscalculation.
    
    Address a similar issue in memremap_is_setup_data() while at it.
    
      [ bp: Heavily massage. ]
    
    Fixes: b3c72fc9a78e ("x86/boot: Introduce setup_indirect")
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
    Cc: <stable@kernel.org>
    Link: https://lore.kernel.org/r/20240911081615.262202-3-bhe@redhat.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>