Author: Yicong Yang <yang.yicong@picoheart.com>
Date: Wed Jun 24 14:38:16 2026 +0800
ACPI: scan: Use async schedule function in acpi_scan_clear_dep_fn()
[ Upstream commit 7cf28b3797a81b616bb7eb3e90cf131afc452919 ]
The device object rescan in acpi_scan_clear_dep_fn() is scheduled on a
system workqueue which is not guaranteed to be finished before entering
userspace. This may cause some key devices to be missing when userspace
init task tries to find them. Two issues observed on RISCV platforms:
- Kernel panic due to userspace init cannot have an opened
console.
The console device scanning is queued by acpi_scan_clear_dep_queue()
and not finished by the time userspace init process running, thus by
the time userspace init runs, no console is present.
- Entering rescue shell due to the lack of root devices (PCIe nvme in
our case).
Same reason as above, the PCIe host bridge scanning is queued on
a system workqueue and finished after init process runs.
The reason is because both devices (console, PCIe host bridge) depend on
riscv-aplic irqchip to serve their interrupts (console's wired interrupt
and PCI's INTx interrupts). In order to keep the dependency, these
devices are scanned and created after initializing riscv-aplic. The
riscv-aplic is initialized in device_initcall() and a device scan work
is queued via acpi_scan_clear_dep_queue(), which is close to the time
userspace init process is run. Since system_dfl_wq is used in
acpi_scan_clear_dep_queue() with no synchronization, the issues will
happen if userspace init runs before these devices are ready.
The solution is to wait for the queued work to complete before entering
userspace init. One possible way would be to use a dedicated workqueue
instead of system_dfl_wq, and explicitly flush it somewhere in the
initcall stage before entering userspace. Another way is to use
async_schedule_dev_nocall() for scanning these devices. It's designed
for asynchronous initialization and will work in the same way as before
because it's using a dedicated unbound workqueue as well, but the kernel
init code calls async_synchronize_full() right before entering userspace
init which will wait for the work to complete.
Compared to a dedicated workqueue, the second approach is simpler
because the async schedule framework takes care of all of the details.
The ACPI code only needs to focus on its job. A dedicated workqueue for
this could also be redundant because some platforms don't need
acpi_scan_clear_dep_queue() for their device scanning.
Signed-off-by: Yicong Yang <yang.yicong@picoheart.com>
[ rjw: Subject adjustment, changelog edits ]
Link: https://patch.msgid.link/20260128132848.93638-1-yang.yicong@picoheart.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
[ Vivian: Adjust system_dfl_wq -> system_unbound_wq in removed lines ]
Signed-off-by: Vivian Wang <wangruikang@iscas.ac.cn>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Mingyu Wang <25181214217@stu.xidian.edu.cn>
Date: Mon May 4 15:48:23 2026 +0800
agp/amd64: Fix broken error propagation in agp_amd64_probe()
commit b08472db93b1ccff84a7adec5779d47f0e9d3a30 upstream.
A NULL pointer dereference was observed in the AMD64 AGP driver when
running in a virtualized environment (e.g. qemu/kvm) without a physical
AMD northbridge. The crash occurs in amd64_fetch_size() when attempting
to dereference the pointer returned by node_to_amd_nb(0).
The root cause of this crash is broken error propagation in
agp_amd64_probe(): When no AMD northbridges are found, cache_nbs()
correctly returns -ENODEV. However, the probe function erroneously
checks the return value against exactly -1, rather than < 0.
As a result, the hardware absence error is masked, allowing the driver
to improperly proceed with initialization. It eventually calls
agp_add_bridge(), which invokes amd64_fetch_size(). Since the hardware
does not exist, node_to_amd_nb(0) returns NULL, leading to a General
Protection Fault (GPF) when accessing its ->misc member.
Fix the issue by correcting the error check in agp_amd64_probe() to
abort properly when cache_nbs() returns any negative error code. This
prevents the driver from erroneously proceeding without hardware, thereby
avoiding the subsequent NULL pointer dereference at its source.
Fixes: a32073bffc65 ("[PATCH] x86_64: Clean and enhance up K8 northbridge access code")
Signed-off-by: Mingyu Wang <25181214217@stu.xidian.edu.cn>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
Cc: stable@vger.kernel.org # v2.6.18+
Link: https://patch.msgid.link/20260504074823.99377-1-w15303746062@163.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Date: Mon May 11 11:04:08 2026 +0100
crypto: qat - remove unused character device and IOCTLs
commit d237230728c567297f2f98b425d63156ab2ed17f upstream.
The QAT driver exposes a character device (qat_adf_ctl) with IOCTLs
for device configuration, start, stop, status query and enumeration.
These IOCTLs are not part of any public uAPI header and have no known
in-tree or out-of-tree users. Device lifecycle is already managed via
sysfs.
The ioctl interface also increases the attack surface and is the
subject of a number of bug reports.
Remove the character device, the IOCTL definitions, and the related
data structures (adf_dev_status_info, adf_user_cfg_key_val,
adf_user_cfg_section, adf_user_cfg_ctl_data). Drop the now-unused
adf_cfg_user.h header and strip adf_ctl_drv.c down to the minimal
module_init/module_exit hooks for workqueue, AER, and crypto/compression
algorithm registration.
Clean up leftover dead code that was only reachable from the removed
IOCTL paths: adf_cfg_del_all(), adf_devmgr_verify_id(),
adf_devmgr_get_num_dev(), adf_devmgr_get_dev_by_id(),
adf_get_vf_real_id() and the unused ADF_CFG macros.
Additionally, drop the entry associated to QAT IOCTLs in
ioctl-number.rst.
Cc: stable@vger.kernel.org
Fixes: d8cba25d2c68 ("crypto: qat - Intel(R) QAT driver framework")
Reported-by: Zhi Wang <wangzhi@stu.xidian.edu.cn>
Reported-by: Bin Yu <byu@xidian.edu.cn>
Reported-by: MingYu Wang <w15303746062@163.com>
Closes: https://lore.kernel.org/all/61d6d499.ab89.19b9b7f3186.Coremail.wangzhi_xd@stu.xidian.edu.cn/
Link: https://lore.kernel.org/all/20260508034841.256794-1-w15303746062@163.com/
Link: https://lore.kernel.org/all/20260508023542.256299-1-w15303746062@163.com/
Link: https://lore.kernel.org/all/20260504025120.98242-1-w15303746062@163.com/
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Ahsan Atta <ahsan.atta@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Mon Jun 22 11:54:38 2026 +0200
debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING
commit 06e0ae988f6e3499785c407429953ade19c1096b upstream.
The pool of free objects is refilled on several occasions such as object
initialisation. On PREEMPT_RT refilling is limited to preemptible
sections due to sleeping locks used by the memory allocator. The system
boots with disabled interrupts so the pool can not be refilled.
If too many objects are initialized and the pool gets empty then
debugobjects disables itself.
Refiling can also happen early in the boot with disabled interrupts as
long as the scheduler is not operational. If the scheduler can not
preempt a task then a sleeping lock can not be contended.
Allow to additionally refill the pool if the scheduler is not
operational.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251127153652.291697-2-bigeasy@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Helen Koike <koike@igalia.com>
Date: Mon Jun 22 11:54:46 2026 +0200
debugobjects: Do not fill_pool() if pi_blocked_on
commit 5f41161059fd0f1bbf18c90f3180e38cc45a14eb upstream.
On RT enabled kernels, fill_pool() ends up calling rtlock_lock(), which
asserts if current::pi_blocked_on is set, because a task can obviously only
block on one lock as otherwise the priority inheritenace chain gets
corrupted.
Prevent this by expanding the conditional to take current::pi_blocked_on
into account.
Fixes: 4bedcc28469a ("debugobjects: Make them PREEMPT_RT aware")
Reported-by: syzbot+b8ca586b9fc235f0c0df@syzkaller.appspotmail.com
Signed-off-by: Helen Koike <koike@igalia.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260511215359.3351259-1-koike@igalia.com
Closes: https://syzkaller.appspot.com/bug?extid=b8ca586b9fc235f0c0df
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Waiman Long <longman@redhat.com>
Date: Mon Jun 22 11:54:51 2026 +0200
debugobjects: Dont call fill_pool() in early boot hardirq context
commit 0d046ae106255cba5eb83b23f78ee93f3620247d upstream.
When booting a debug PREEMPT_RT kernel on an ARM64 system, a "inconsistent
{HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage" lockdep warning message was
reported to the console.
During early boot, interrupts are enabled before the scheduler is
enabled. In this window (before SYSTEM_SCHEDULING is set) interrupts can
fire and in the hard interrupt context handler attempt to fill the pool
This can lead to a deadlock when the interrupt occurred when the interrupt
hits a region which holds a lock that is required to be taken in the
allocation path.
Add a new can_fill_pool() helper and reorder the exception rule and forbid
this scenario by excluding allocations from hard interrupt context.
Fixes: 06e0ae988f6e ("debugobjects: Allow to refill the pool before SYSTEM_SCHEDULING")
Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260605173038.495075-1-longman@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: Mon Jun 22 11:54:42 2026 +0200
debugobjects: Use LD_WAIT_CONFIG instead of LD_WAIT_SLEEP
commit 37de2dbc318ee10577c1c2704de5a803e75e55a2 upstream.
fill_pool_map is used to suppress nesting violations caused by acquiring
a spinlock_t (from within the memory allocator) while holding a
raw_spinlock_t. The used annotation is wrong.
LD_WAIT_SLEEP is for always sleeping lock types such as mutex_t.
LD_WAIT_CONFIG is for lock type which are sleeping while spinning on
PREEMPT_RT such as spinlock_t.
Use LD_WAIT_CONFIG as override.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://patch.msgid.link/20251127153652.291697-3-bigeasy@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Georgi Djakov <georgi.djakov@oss.qualcomm.com>
Date: Thu May 14 02:26:57 2026 -0700
drivers/base/memory: set mem->altmap after successful device registration
commit a2b8d7827f48ee54a686cb80e4a1d0ff954ec42a upstream.
If __add_memory_block() fails at xa_store() (under memory pressure for
example), device_unregister() is called, which eventually triggers
memory_block_release() with mem->altmap still set, causing a
WARN_ON(mem->altmap). This was triggered by modifying virtio-mem driver.
Fix this by delaying the assignment of mem->altmap until after
__add_memory_block() has succeeded.
Link: https://lore.kernel.org/20260514092657.3057141-1-georgi.djakov@oss.qualcomm.com
Fixes: 1a8c64e11043 ("mm/memory_hotplug: embed vmem_altmap details in memory block")
Signed-off-by: Georgi Djakov <georgi.djakov@oss.qualcomm.com>
Acked-by: Oscar Salvador (SUSE) <osalvador@kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Richard Cheng <icheng@nvidia.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Georgi Djakov <djakov@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Dexuan Cui <decui@microsoft.com>
Date: Tue Jun 16 12:56:09 2026 -0400
Drivers: hv: vmbus: Improve the logic of reserving fb_mmio on Gen2 VMs
[ Upstream commit 016a25e4b0df4d77e7c258edee4aaf982e4ee809 ]
If vmbus_reserve_fb() in the kdump/kexec kernel fails to properly reserve
the framebuffer MMIO range (which is below 4GB) due to a Gen2 VM's
screen.lfb_base being zero [1], there is an MMIO conflict between the
drivers hyperv-drm and pci-hyperv: when the driver pci-hyperv's
hv_allocate_config_window() calls vmbus_allocate_mmio() to get an
MMIO range, typically it gets a 32-bit MMIO range that overlaps with the
framebuffer MMIO range, and later hv_pci_enter_d0() fails with an
error message "PCI Pass-through VSP failed D0 Entry with status" since
the host thinks that PCI devices must not use MMIO space that the
host has assigned to the framebuffer.
This is especially an issue if pci-hyperv is built-in and hyperv-drm is
built as a module. Consequently, the kdump/kexec kernel fails to detect
PCI devices via pci-hyperv, and may fail to mount the root file system,
which may reside in a NVMe disk. The issue described here has existed
for SR-IOV VF NICs since day one of the pci-hyperv driver, and has been
worked around on x64 when possible. With the recent introduction of
ARM64 VMs that boot from NVMe, there is no workaround, so we need a
formal fix.
On Gen2 VMs, if the screen.lfb_base is 0 in the kdump/kexec kernel [1],
fall back to the low MMIO base, which should be equal to the framebuffer
MMIO base [2] (the statement is true according to my testing on x64
Windows Server 2016, and on x64 and ARM64 Windows Server 2025 and on
Azure. I checked with the Hyper-V team and they said the statement should
continue to be true for Gen2 VMs). In the first kernel, screen.lfb_base
is not 0; if the user specifies a very high resolution, it's not enough
to only reserve 8MB: let's always reserve half of the space below 4GB,
but cap the reservation to 128MB, which is the required framebuffer size
of the highest resolution 7680*4320 supported by Hyper-V.
While at it, fix the comparison "end > VTPM_BASE_ADDRESS" by changing
the > to >=. Here the 'end' is an inclusive end (typically, it's
0xFFFF_FFFF for the low MMIO range).
Note: vmbus_reserve_fb() now also reserves an MMIO range at the beginning
of the low MMIO range on CVMs, which have no framebuffers (the
'screen.lfb_base' in vmbus_reserve_fb() is 0 for CVMs), just in case the
host might treat the beginning of the low MMIO range specially [3]. BTW,
the OpenHCL kernel is not affected by the change, because that kernel
boots with DeviceTree rather than ACPI (so vmbus_reserve_fb() won't run
there), and there is no framebuffer device for that kernel.
Note: normally Gen1 VMs don't have the MMIO conflict issue because the
framebuffer MMIO range (which is hardcoded to base=4GB-128MB and
size=64MB for Gen1 VMs by the host) is always reported via the legacy PCI
graphics device's BAR, so the kdump/kexec kernel can reserve the 64MB
MMIO range; however, if the VM is configured to use a very high resolution
and the required framebuffer size exceeds 64MB (AFAIK, in practice, this
isn't a typical configuration by users), the hyperv-drm driver may need to
allocate an MMIO range above 4GB and change the framebuffer MMIO location
to the allocated MMIO range -- in this case, there can still be issues [4]
which can't be easily fixed: any possible affected Gen1 users would have
to use a resolution whose framebuffer size is <= 64MB, or switch to Gen2
VMs.
[1] https://lore.kernel.org/all/SA1PR21MB692176C1BC53BFC9EAE5CF8EBF51A@SA1PR21MB6921.namprd21.prod.outlook.com/
[2] https://lore.kernel.org/all/SA1PR21MB69218F955B62DFF62E3E88D2BF222@SA1PR21MB6921.namprd21.prod.outlook.com/
[3] https://lore.kernel.org/all/SN6PR02MB415726B17D5A6027CD1717E8D4342@SN6PR02MB4157.namprd02.prod.outlook.com/
[4] https://lore.kernel.org/all/SA1PR21MB69213486F821CA5A2C793C81BF342@SA1PR21MB6921.namprd21.prod.outlook.com/
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs")
CC: stable@vger.kernel.org
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Tested-by: Krister Johansen <kjlx@templeofstupid.com>
Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Tudor Ambarus <tudor.ambarus@linaro.org>
Date: Tue Jun 16 21:47:19 2026 -0400
firmware: samsung: acpm: Fix cross-thread RX length corruption
[ Upstream commit f133bd4b5daf71bccdde0ad1a4f47fac76a6bfb1 ]
Sashiko identified a cross-thread RX length corruption bug when
reviewing the thermal addition to ACPM [1].
When multiple threads concurrently send IPC requests, the ACPM polling
mechanism can encounter responses belonging to other threads. To drain
the queue, the driver saves these concurrent responses into an internal
cache (`rx_data->cmd`) to be retrieved later by the owning thread.
Previously, the driver incorrectly used `xfer->rxcnt` (the expected
receive length of the *current* polling thread) when copying data for
*other* threads into this cache. If the threads expected responses of
different lengths, this resulted in buffer underflows (leading to reads
of uninitialized memory) or potential buffer overflows.
Fix this by replacing the boolean `response` flag in
`struct acpm_rx_data` with `rxcnt`, caching the exact expected receive
length for each specific transaction during transfer preparation. Use
this cached length when saving concurrent responses.
Consequently, ensure that `xfer->rxcnt` is explicitly zeroed in driver
helpers (e.g., `acpm_dvfs_set_xfer`) for fire-and-forget messages to
prevent uninitialized stack garbage from being interpreted as a massive
expected receive length.
Cc: stable@vger.kernel.org
Fixes: a88927b534ba ("firmware: add Exynos ACPM protocol driver")
Closes: https://sashiko.dev/#/patchset/20260420-acpm-tmu-v3-0-3dc8e93f0b26%40linaro.org [1]
Reported-by: Titouan Ameline de Cadeville <titouan.ameline@gmail.com>
Closes: https://lore.kernel.org/r/20260426210255.73674-1-titouan.ameline@gmail.com/
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
Link: https://patch.msgid.link/20260505-acpm-fixes-sashiko-reports-v5-1-43b5ee7f1674@linaro.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Joanne Koong <joannelkoong@gmail.com>
Date: Mon May 18 22:28:06 2026 -0700
fuse: re-lock request before replacing page cache folio
commit a078484921052d0badd827fcc2770b5cfc1d4120 upstream.
fuse_try_move_folio() unlocks the request on entry but does not
re-lock it on the success path. This means fuse_chan_abort() can end the
request and free the fuse_io_args (eg fuse_readpages_end()) while the
subsequent copy chain logic after fuse_try_move_folio() accesses the
fuse_io_args, leading to use-after-free issues.
Fix this by calling lock_request() before replace_page_cache_folio().
This ensures the request is locked on the success path which will
prevent the fuse_io_args from being freed while the later copying logic
runs, and also ensures that the ap->folios[i]->mapping is never null
since ap->folios[i] will always point to the newfolio after
replace_page_cache_folio().
Fixes: ce534fb05292 ("fuse: allow splice to move pages")
Cc: stable@vger.kernel.org
Reported-by: Lei Lu <llfamsec@gmail.com>
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Thorsten Blum <thorsten.blum@linux.dev>
Date: Tue Jun 16 12:55:57 2026 -0400
hv: utils: handle and propagate errors in kvp_register
[ Upstream commit 3fcf923302a8f5c0dc3af3d2ca2657cb5fae4297 ]
Make kvp_register() return an error code instead of silently ignoring
failures, and propagate the error from kvp_handle_handshake() instead of
returning success.
This propagates both kzalloc_obj() and hvutil_transport_send() failures
to kvp_handle_handshake() and thus to kvp_on_msg().
Fixes: 245ba56a52a3 ("Staging: hv: Implement key/value pair (KVP)")
Cc: stable@vger.kernel.org
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Weiming Shi <bestswngs@gmail.com>
Date: Wed Apr 15 01:23:39 2026 +0800
i2c: stub: Reject I2C block transfers with invalid length
commit 6036b5067a8199ba7a2dc7b377d4b9dd276d5f9e upstream.
The I2C_SMBUS_I2C_BLOCK_DATA case in stub_xfer() uses data->block[0]
as the transfer length. The existing check only clamps it to avoid
overrunning the chip->words[256] register array, but does not validate
it against I2C_SMBUS_BLOCK_MAX (32), which is the limit of the union
i2c_smbus_data.block buffer (34 bytes total). The driver is a
development/test tool (CONFIG_I2C_STUB=m, not built by default)
that must be loaded with a chip_addr= parameter.
A local user with access to /dev/i2c-* can issue an I2C_SMBUS ioctl
with I2C_SMBUS_I2C_BLOCK_DATA and data->block[0] > 32, causing
stub_xfer() to read or write past the end of the union
i2c_smbus_data.block buffer:
BUG: KASAN: stack-out-of-bounds in stub_xfer (drivers/i2c/i2c-stub.c:223)
Read of size 1 at addr ffff88800abcfd92 by task exploit/81
Call Trace:
<TASK>
stub_xfer (drivers/i2c/i2c-stub.c:223)
__i2c_smbus_xfer (drivers/i2c/i2c-core-smbus.c:593)
i2c_smbus_xfer (drivers/i2c/i2c-core-smbus.c:536)
i2cdev_ioctl_smbus (drivers/i2c/i2c-dev.c:391)
i2cdev_ioctl (drivers/i2c/i2c-dev.c:478)
__x64_sys_ioctl (fs/ioctl.c:583)
do_syscall_64 (arch/x86/entry/syscall_64.c:94)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
</TASK>
The bug exists because i2c-stub implements .smbus_xfer directly,
bypassing the I2C_SMBUS_BLOCK_MAX validation in
i2c_smbus_xfer_emulated(). The I2C_SMBUS_BLOCK_DATA case in the same
function correctly validates against I2C_SMBUS_BLOCK_MAX, but the
I2C_SMBUS_I2C_BLOCK_DATA case does not.
Fix by rejecting transfers with data->block[0] == 0 or
data->block[0] > I2C_SMBUS_BLOCK_MAX with -EINVAL, consistent with
both the I2C_SMBUS_BLOCK_DATA case in the same function and the
I2C_SMBUS_I2C_BLOCK_DATA validation in i2c_smbus_xfer_emulated().
Fixes: 4710317891e4 ("i2c-stub: Implement I2C block support")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Reviewed-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sam Daly <sam@samdaly.ie>
Date: Thu May 14 18:23:20 2026 +0200
iio: adc: ti-ads1298: add bounds check to pga_settings index
commit 95e8a48d7a85d4226934020e57815a3316d3a14b upstream.
ads1298_pga_settings has 7 elements but ADS1298_MASK_CH_PGA can yield
values 0-7. If it yields a value >= 7, this causes an out-of-bounds
array access. Add a bounds check and return -EINVAL if the index
is out of range.
Note that the remaining value b111 is reserved so should not be seen
in a correctly functioning system.
Assisted-by: gkh_clanker_2000
Cc: stable <stable@kernel.org>
Cc: Jonathan Cameron <jic23@kernel.org>
Cc: David Lechner <dlechner@baylibre.com>
Cc: "Nuno Sá" <nuno.sa@analog.com>
Cc: Andy Shevchenko <andy@kernel.org>
Signed-off-by: Sam Daly <sam@samdaly.ie>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Sam Daly <sam@samdaly.ie>
Date: Thu May 14 18:23:21 2026 +0200
iio: light: veml6075: add bounds check to veml6075_it_ms index
commit 307dc4240bd41852d9e0912921e298160db1c109 upstream.
veml6075_it_ms has 5 elements but VEML6075_CONF_IT can yield values 0-7.
If it returns a value >= 5, this causes an out-of-bounds array access.
Add a bounds check and return -EINVAL if the index is out of range.
The problem values are reserved so should never be read from the
register. Hence this is hardening against fault device, missprogramming
or bus corruption.
Assisted-by: gkh_clanker_2000
Cc: stable <stable@kernel.org>
Signed-off-by: Sam Daly <sam@samdaly.ie>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Javier Carrasco <javier.carrasco.cruz@gmail.com>
Signed-off-by: Jonathan Cameron <jic23@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Gabriel Krisman Bertazi <krisman@suse.de>
Date: Wed Jun 17 15:27:22 2026 -0400
io_uring/net: Avoid msghdr on op_connect/op_bind async data
[ Upstream commit 3979840cd858f30f43ea9f4e7f7f1f56de82d698 ]
This fixes a memory leak due to the lack of the cleanup hook for the
iovec. The stable backport differs from upstream by dropping the
io_connect_bpf_populate hunk, which didn't exist at the time and by
fixing the merge conflict due to the introduction of
io_bind_file_create.
Both IORING_OP_CONNECT and IORING_OP_BIND reuse the msghdr object just
to store the sockaddr. Beyond allocating a much larger object than
needed, msghdr can also wrap an iovec, which will be recycled
unnecessarily. This uses the sockaddr directly.
Cc: stable@vger.kernel.org
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Link: https://patch.msgid.link/20260602215327.1885109-2-krisman@suse.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Gil Portnoy <dddhkts1@gmail.com>
Date: Thu Jun 11 22:59:19 2026 +0900
ksmbd: reject non-VALID session in compound request branch
commit 609ca17d869d04ba249e32cdcbf13c0b1c66f43c upstream.
smb2_check_user_session() takes a shortcut for any operation that is not
the first in a COMPOUND request: it reuses work->sess (the session bound by
the first operation) and validates only the SessionId, then returns
"valid". It never re-checks work->sess->state == SMB2_SESSION_VALID, and a
SessionId of 0xFFFFFFFFFFFFFFFF (ULLONG_MAX, the MS-SMB2 related-operation
value) skips even the id comparison. The standalone path
(ksmbd_session_lookup_all() plus the SESSION_SETUP state machine) does
enforce the VALID state; the compound branch bypasses all of it.
A SESSION_SETUP carrying only an NTLM Type-1 (NtLmNegotiate) blob publishes
a fresh SMB2_SESSION_IN_PROGRESS session whose sess->user is still NULL
(->user is assigned later, by ntlm_authenticate()). Used as operation 1 of
a COMPOUND with operation 2 = TREE_CONNECT (related, SessionId=ULLONG_MAX,
\\host\IPC$), the tree-connect then runs on that IN_PROGRESS session and
reaches ksmbd_ipc_tree_connect_request(), which dereferences
user_name(sess->user) with sess->user == NULL (transport_ipc.c:687/701/704)
-> remote NULL-pointer dereference and a kernel Oops that wedges the ksmbd
worker for all clients.
Reject any non-first compound operation that lands on a session which is
not SMB2_SESSION_VALID, mirroring the validity the standalone lookup path
enforces. SESSION_SETUP itself legitimately runs on an IN_PROGRESS session,
but it is never carried as a non-first compound operation, so multi-leg
authentication is unaffected by this check.
Fixes: 5005bcb42191 ("ksmbd: validate session id and tree id in the compound request")
Cc: stable@vger.kernel.org
Signed-off-by: Gil Portnoy <dddhkts1@gmail.com>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Sat Jun 27 11:06:50 2026 +0100
Linux 6.18.37
Link: https://lore.kernel.org/r/20260625125645.554579168@linuxfoundation.org
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Brett A C Sheffield <bacs@librecast.net>
Tested-by: Peter Schneider <pschneider1968@googlemail.com>
Tested-by: Shuah Khan <skhan@linuxfoundation.org>
Tested-by: Ron Economos <re@w6rz.net>
Tested-by: Miguel Ojeda <ojeda@kernel.org>
Tested-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Ruslan Valiyev <linuxoid@gmail.com>
Date: Tue Mar 17 17:05:44 2026 +0000
media: vidtv: fix NULL pointer dereference in vidtv_mux_push_si
commit 7d8bf3d8f91073f4db347ed3aa6302b56107499c upstream.
syzbot reported a general protection fault in
vidtv_psi_ts_psi_write_into [1].
vidtv_mux_get_pid_ctx() can return NULL, but vidtv_mux_push_si() does
not check for this before dereferencing the returned pointer to access
the continuity counter. This leads to a general protection fault when
accessing a near-NULL address.
The root cause is that vidtv_mux_pid_ctx_init() does not check the
return value of vidtv_mux_create_pid_ctx_once() for PMT section PIDs.
If the allocation fails, the PID context is never created, but init
returns success. The subsequent vidtv_mux_push_si() call then gets
NULL from vidtv_mux_get_pid_ctx() and crashes.
Fix both the root cause (add error check in vidtv_mux_pid_ctx_init
for PMT PIDs) and add defensive NULL checks in vidtv_mux_push_si for
all vidtv_mux_get_pid_ctx() calls.
[1]
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
Workqueue: events vidtv_mux_tick
RIP: 0010:vidtv_psi_ts_psi_write_into+0x54a/0xbc0 drivers/media/test-drivers/vidtv/vidtv_psi.c:197
Call Trace:
<TASK>
vidtv_psi_table_header_write_into drivers/media/test-drivers/vidtv/vidtv_psi.c:799 [inline]
vidtv_psi_pmt_write_into+0x3b2/0xa70 drivers/media/test-drivers/vidtv/vidtv_psi.c:1231
vidtv_mux_push_si+0x932/0xe80 drivers/media/test-drivers/vidtv/vidtv_mux.c:196
vidtv_mux_tick+0xe9b/0x1480 drivers/media/test-drivers/vidtv/vidtv_mux.c:408
Fixes: f90cf6079bf67 ("media: vidtv: add a bridge driver")
Cc: stable@vger.kernel.org
Reported-by: syzbot+814c351d094f4f1a1b86@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=814c351d094f4f1a1b86
Signed-off-by: Ruslan Valiyev <linuxoid@gmail.com>
Signed-off-by: Hans Verkuil <hverkuil+cisco@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lorenzo Stoakes <ljs@kernel.org>
Date: Fri May 15 15:42:12 2026 +0300
mm: add atomic VMA flags and set VM_MAYBE_GUARD as such
commit 568822502383acd57d7cc1c72ee43932c45a9524 upstream.
This patch adds the ability to atomically set VMA flags with only the mmap
read/VMA read lock held.
As this could be hugely problematic for VMA flags in general given that
all other accesses are non-atomic and serialised by the mmap/VMA locks, we
implement this with a strict allow-list - that is, only designated flags
are allowed to do this.
We make VM_MAYBE_GUARD one of these flags.
Link: https://lkml.kernel.org/r/97e57abed09f2663077ed7a36fb8206e243171a9.1763460113.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ahmed Elaidy <elaidya225@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lorenzo Stoakes <ljs@kernel.org>
Date: Wed Jan 14 11:00:06 2026 +0000
mm: do not copy page tables unnecessarily for VM_UFFD_WP
commit 35e247032606f06c2f19d90a6562bc315206b7a7 upstream.
Commit ab04b530e7e8 ("mm: introduce copy-on-fork VMAs and make
VM_MAYBE_GUARD one") aggregates flags checks in vma_needs_copy(),
including VM_UFFD_WP.
However in doing so, it incorrectly performed this check against src_vma.
This check was done on the assumption that all relevant flags are copied
upon fork.
However the userfaultfd logic is very innovative in that it implements
custom logic on fork in dup_userfaultfd(), including a rather well hidden
case where lacking UFFD_FEATURE_EVENT_FORK causes VM_UFFD_WP to not be
propagated to the destination VMA.
And indeed, vma_needs_copy(), prior to this patch, did check this property
on dst_vma, not src_vma.
Since all the other relevant flags are copied on fork, we can simply fix
this by checking against dst_vma.
While we're here, we fix a comment against VM_COPY_ON_FORK (noting that it
did indeed already reference dst_vma) to make it abundantly clear that we
must check against the destination VMA.
Link: https://lkml.kernel.org/r/20260114110006.1047071-1-lorenzo.stoakes@oracle.com
Fixes: ab04b530e7e8 ("mm: introduce copy-on-fork VMAs and make VM_MAYBE_GUARD one")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reported-by: Chris Mason <clm@meta.com>
Closes: https://lore.kernel.org/all/20260113231257.3002271-1-clm@meta.com/
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Acked-by: Pedro Falcato <pfalcato@suse.de>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lorenzo Stoakes <ljs@kernel.org>
Date: Fri May 15 15:42:14 2026 +0300
mm: implement sticky VMA flags
commit 64212ba02e66e705cabce188453ba4e61e9d7325 upstream.
It is useful to be able to designate that certain flags are 'sticky', that
is, if two VMAs are merged one with a flag of this nature and one without,
the merged VMA sets this flag.
As a result we ignore these flags for the purposes of determining VMA flag
differences between VMAs being considered for merge.
This patch therefore updates the VMA merge logic to perform this action,
with flags possessing this property being described in the VM_STICKY
bitmap.
Those flags which ought to be ignored for the purposes of VMA merge are
described in the VM_IGNORE_MERGE bitmap, which the VMA merge logic is also
updated to use.
As part of this change we place VM_SOFTDIRTY in VM_IGNORE_MERGE as it
already had this behaviour, alongside VM_STICKY as sticky flags by
implication must not disallow merge.
Ultimately it seems that we should make VM_SOFTDIRTY a sticky flag in its
own right, but this change is out of scope for this series.
The only sticky flag designated as such is VM_MAYBE_GUARD, so as a result
of this change, once the VMA flag is set upon guard region installation,
VMAs with guard ranges will now not have their merge behaviour impacted as
a result and can be freely merged with other VMAs without VM_MAYBE_GUARD
set.
Also update the comments for vma_modify_flags() to directly reference
sticky flags now we have established the concept.
We also update the VMA userland tests to account for the changes.
Link: https://lkml.kernel.org/r/22ad5269f7669d62afb42ce0c79bad70b994c58d.1763460113.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ahmed Elaidy <elaidya225@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lorenzo Stoakes <ljs@kernel.org>
Date: Fri May 15 15:42:15 2026 +0300
mm: introduce copy-on-fork VMAs and make VM_MAYBE_GUARD one
commit ab04b530e7e8bd5cf9fb0c1ad20e0deee8f569ec upstream.
Gather all the VMA flags whose presence implies that page tables must be
copied on fork into a single bitmap - VM_COPY_ON_FORK - and use this
rather than specifying individual flags in vma_needs_copy().
We also add VM_MAYBE_GUARD to this list, as it being set on a VMA implies
that there may be metadata contained in the page tables (that is - guard
markers) which would will not and cannot be propagated upon fork.
This was already being done manually previously in vma_needs_copy(), but
this makes it very explicit, alongside VM_PFNMAP, VM_MIXEDMAP and
VM_UFFD_WP all of which imply the same.
Note that VM_STICKY flags ought generally to be marked VM_COPY_ON_FORK too
- because equally a flag being VM_STICKY indicates that the VMA contains
metadat that is not propagated by being faulted in - i.e. that the VMA
metadata does not fully describe the VMA alone, and thus we must propagate
whatever metadata there is on a fork.
However, for maximum flexibility, we do not make this necessarily the case
here.
Link: https://lkml.kernel.org/r/5d41b24e7bc622cda0af92b6d558d7f4c0d1bc8c.1763460113.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ahmed Elaidy <elaidya225@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lorenzo Stoakes <ljs@kernel.org>
Date: Fri May 15 15:42:11 2026 +0300
mm: introduce VM_MAYBE_GUARD and make visible in /proc/$pid/smaps
commit 5dba5cc2e0ffa76f2f6c8922a04469dc9602c396 upstream.
Patch series "introduce VM_MAYBE_GUARD and make it sticky", v4.
Currently, guard regions are not visible to users except through
/proc/$pid/pagemap, with no explicit visibility at the VMA level.
This makes the feature less useful, as it isn't entirely apparent which
VMAs may have these entries present, especially when performing actions
which walk through memory regions such as those performed by CRIU.
This series addresses this issue by introducing the VM_MAYBE_GUARD flag
which fulfils this role, updating the smaps logic to display an entry for
these.
The semantics of this flag are that a guard region MAY be present if set
(we cannot be sure, as we can't efficiently track whether an
MADV_GUARD_REMOVE finally removes all the guard regions in a VMA) - but if
not set the VMA definitely does NOT have any guard regions present.
It's problematic to establish this flag without further action, because
that means that VMAs with guard regions in them become non-mergeable with
adjacent VMAs for no especially good reason.
To work around this, this series also introduces the concept of 'sticky'
VMA flags - that is flags which:
a. if set in one VMA and not in another still permit those VMAs to be
merged (if otherwise compatible).
b. When they are merged, the resultant VMA must have the flag set.
The VMA logic is updated to propagate these flags correctly.
Additionally, VM_MAYBE_GUARD being an explicit VMA flag allows us to solve
an issue with file-backed guard regions - previously these established an
anon_vma object for file-backed mappings solely to have vma_needs_copy()
correctly propagate guard region mappings to child processes.
We introduce a new flag alias VM_COPY_ON_FORK (which currently only
specifies VM_MAYBE_GUARD) and update vma_needs_copy() to check explicitly
for this flag and to copy page tables if it is present, which resolves
this issue.
Additionally, we add the ability for allow-listed VMA flags to be
atomically writable with only mmap/VMA read locks held.
The only flag we allow so far is VM_MAYBE_GUARD, which we carefully ensure
does not cause any races by being allowed to do so.
This allows us to maintain guard region installation as a read-locked
operation and not endure the overhead of obtaining a write lock here.
Finally we introduce extensive VMA userland tests to assert that the
sticky VMA logic behaves correctly as well as guard region self tests to
assert that smaps visibility is correctly implemented.
This patch (of 9):
Currently, if a user needs to determine if guard regions are present in a
range, they have to scan all VMAs (or have knowledge of which ones might
have guard regions).
Since commit 8e2f2aeb8b48 ("fs/proc/task_mmu: add guard region bit to
pagemap") and the related commit a516403787e0 ("fs/proc: extend the
PAGEMAP_SCAN ioctl to report guard regions"), users can use either
/proc/$pid/pagemap or the PAGEMAP_SCAN functionality to perform this
operation at a virtual address level.
This is not ideal, and it gives no visibility at a /proc/$pid/smaps level
that guard regions exist in ranges.
This patch remedies the situation by establishing a new VMA flag,
VM_MAYBE_GUARD, to indicate that a VMA may contain guard regions (it is
uncertain because we cannot reasonably determine whether a
MADV_GUARD_REMOVE call has removed all of the guard regions in a VMA, and
additionally VMAs may change across merge/split).
We utilise 0x800 for this flag which makes it available to 32-bit
architectures also, a flag that was previously used by VM_DENYWRITE, which
was removed in commit 8d0920bde5eb ("mm: remove VM_DENYWRITE") and hasn't
bee reused yet.
We also update the smaps logic and documentation to identify these VMAs.
Another major use of this functionality is that we can use it to identify
that we ought to copy page tables on fork.
We do not actually implement usage of this flag in mm/madvise.c yet as we
need to allow some VMA flags to be applied atomically under mmap/VMA read
lock in order to avoid the need to acquire a write lock for this purpose.
Link: https://lkml.kernel.org/r/cover.1763460113.git.ljs@kernel.org
Link: https://lkml.kernel.org/r/cf8ef821eba29b6c5b5e138fffe95d6dcabdedb9.1763460113.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ahmed Elaidy <elaidya225@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lorenzo Stoakes <ljs@kernel.org>
Date: Fri May 15 15:42:18 2026 +0300
mm: propagate VM_SOFTDIRTY on merge
commit 6707915e030a3258868355f989b80140c1a45bbe upstream.
Patch series "make VM_SOFTDIRTY a sticky VMA flag", v2.
Currently we set VM_SOFTDIRTY when a new mapping is set up (whether by
establishing a new VMA, or via merge) as implemented in __mmap_complete()
and do_brk_flags().
However, when performing a merge of existing mappings such as when
performing mprotect(), we may lose the VM_SOFTDIRTY flag.
Now we have the concept of making VMA flags 'sticky', that is that they
both don't prevent merge and, importantly, are propagated to merged VMAs,
this seems a sensible alternative to the existing special-casing of
VM_SOFTDIRTY.
We additionally add a self-test that demonstrates that this logic behaves
as expected.
This patch (of 2):
Currently we set VM_SOFTDIRTY when a new mapping is set up (whether by
establishing a new VMA, or via merge) as implemented in __mmap_complete()
and do_brk_flags().
However, when performing a merge of existing mappings such as when
performing mprotect(), we may lose the VM_SOFTDIRTY flag.
This is because currently we simply ignore VM_SOFTDIRTY for the purposes
of merge, so one VMA may possess the flag and another not, and whichever
happens to be the target VMA will be the one upon which the merge is
performed which may or may not have VM_SOFTDIRTY set.
Now we have the concept of 'sticky' VMA flags, let's make VM_SOFTDIRTY one
which solves this issue.
Additionally update VMA userland tests to propagate changes.
[akpm@linux-foundation.org: update comments, per Lorenzo]
Link: https://lkml.kernel.org/r/0019e0b8-ee1e-4359-b5ee-94225cbe5588@lucifer.local
Link: https://lkml.kernel.org/r/cover.1763399675.git.ljs@kernel.org
Link: https://lkml.kernel.org/r/955478b5170715c895d1ef3b7f68e0cd77f76868.1763399675.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand (Red Hat) <david@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Acked-by: Andrey Vagin <avagin@gmail.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ahmed Elaidy <elaidya225@gmail.com>
Fixes: 34228d473efe ("mm: ignore VM_SOFTDIRTY on VMA merging")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lorenzo Stoakes <ljs@kernel.org>
Date: Fri May 15 15:42:16 2026 +0300
mm: set the VM_MAYBE_GUARD flag on guard region install
commit 49e14dabed7a294427588d4b315f57fbfcab9990 upstream.
Now we have established the VM_MAYBE_GUARD flag and added the capacity to
set it atomically, do so upon MADV_GUARD_INSTALL.
The places where this flag is used currently and matter are:
* VMA merge - performed under mmap/VMA write lock, therefore excluding
racing writes.
* /proc/$pid/smaps - can race the write, however this isn't meaningful
as the flag write is performed at the point of the guard region being
established, and thus an smaps reader can't reasonably expect to avoid
races. Due to atomicity, a reader will observe either the flag being
set or not. Therefore consistency will be maintained.
In all other cases the flag being set is irrelevant and atomicity
guarantees other flags will be read correctly.
Note that non-atomic updates of unrelated flags do not cause an issue with
this flag being set atomically, as writes of other flags are performed
under mmap/VMA write lock, and these atomic writes are performed under
mmap/VMA read lock, which excludes the write, avoiding RMW races.
Note that we do not encounter issues with KCSAN by adjusting this flag
atomically, as we are only updating a single bit in the flag bitmap and
therefore we do not need to annotate these changes.
We intentionally set this flag in advance of actually updating the page
tables, to ensure that any racing atomic read of this flag will only
return false prior to page tables being updated, to allow for
serialisation via page table locks.
Note that we set vma->anon_vma for anonymous mappings. This is because
the expectation for anonymous mappings is that an anon_vma is established
should they possess any page table mappings. This is also consistent with
what we were doing prior to this patch (unconditionally setting anon_vma
on guard region installation).
We also need to update retract_page_tables() to ensure that madvise(...,
MADV_COLLAPSE) doesn't incorrectly collapse file-backed ranges contain
guard regions.
This was previously guarded by anon_vma being set to catch MAP_PRIVATE
cases, but the introduction of VM_MAYBE_GUARD necessitates that we check
this flag instead.
We utilise vma_flag_test_atomic() to do so - we first perform an
optimistic check, then after the PTE page table lock is held, we can check
again safely, as upon guard marker install the flag is set atomically
prior to the page table lock being taken to actually apply it.
So if the initial check fails either:
* Page table retraction acquires page table lock prior to VM_MAYBE_GUARD
being set - guard marker installation will be blocked until page table
retraction is complete.
OR:
* Guard marker installation acquires page table lock after setting
VM_MAYBE_GUARD, which raced and didn't pick this up in the initial
optimistic check, blocking page table retraction until the guard regions
are installed - the second VM_MAYBE_GUARD check will prevent page table
retraction.
Either way we're safe.
We refactor the retraction checks into a single
file_backed_vma_is_retractable(), there doesn't seem to be any reason that
the checks were separated as before.
Note that VM_MAYBE_GUARD being set atomically remains correct as
vma_needs_copy() is invoked with the mmap and VMA write locks held,
excluding any race with madvise_guard_install().
Link: https://lkml.kernel.org/r/e9e9ce95b6ac17497de7f60fc110c7dd9e489e8d.1763460113.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ahmed Elaidy <elaidya225@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lorenzo Stoakes <ljs@kernel.org>
Date: Fri May 15 15:42:13 2026 +0300
mm: update vma_modify_flags() to handle residual flags, document
commit 9119d6c2095bb20292cb9812dd70d37f17e3bd37 upstream.
The vma_modify_*() family of functions each either perform splits, a merge
or no changes at all in preparation for the requested modification to
occur.
When doing so for a VMA flags change, we currently don't account for any
flags which may remain (for instance, VM_SOFTDIRTY) despite the requested
change in the case that a merge succeeded.
This is made more important by subsequent patches which will introduce the
concept of sticky VMA flags which rely on this behaviour.
This patch fixes this by passing the VMA flags parameter as a pointer and
updating it accordingly on merge and updating callers to accommodate for
this.
Additionally, while we are here, we add kdocs for each of the
vma_modify_*() functions, as the fact that the requested modification is
not performed is confusing so it is useful to make this abundantly clear.
We also update the VMA userland tests to account for this change.
Link: https://lkml.kernel.org/r/23b5b549b0eaefb2922625626e58c2a352f3e93c.1763460113.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Reviewed-by: Pedro Falcato <pfalcato@suse.de>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nico Pache <npache@redhat.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ahmed Elaidy <elaidya225@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com>
Date: Sat Mar 7 05:58:43 2026 -0500
net: export netif_open for self_test usage
commit 3fdd33697c2be9184668c89ba4f24a5ecbc8ec51 upstream.
dev_open() already is exported, but drivers which use the netdev
instance lock need to use netif_open() instead. netif_close() is
also already exported [1] so this completes the pairing.
This export is required for the following fbnic self tests to
avoid calling ndo_stop() and ndo_open() in favor of the
more appropriate netif_open() and netif_close() that notifies
any listeners that the interface went down to test and is now
coming back up.
Link: https://patch.msgid.link/20250309215851.2003708-1-sdf@fomichev.me [1]
Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com>
Link: https://patch.msgid.link/20260307105847.1438-2-mike.marciniszyn@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Faicker Mo <faicker.mo@gmail.com>
Date: Mon May 11 22:05:51 2026 +0800
net: net_failover: Fix the deadlock in slave register
commit b84c5632c7b31f8910167075a8128cfb9e50fcfe upstream.
There is netdev_lock_ops() before the NETDEV_REGISTER notifier
in register_netdevice(), so use the non-locking functions
in net_failover_slave_register().
failover_slave_register() in failover_existing_slave_register() adds lock
and unlock ops too.
Call Trace:
<TASK>
__schedule+0x30d/0x7a0
schedule+0x27/0x90
schedule_preempt_disabled+0x15/0x30
__mutex_lock.constprop.0+0x538/0x9e0
__mutex_lock_slowpath+0x13/0x20
mutex_lock+0x3b/0x50
dev_set_mtu+0x40/0xe0
net_failover_slave_register+0x24/0x280
failover_slave_register+0x103/0x1b0
failover_event+0x15e/0x210
? dropmon_net_event+0xac/0xe0
notifier_call_chain+0x5e/0xe0
raw_notifier_call_chain+0x16/0x30
call_netdevice_notifiers_info+0x52/0xa0
register_netdevice+0x5f4/0x7c0
register_netdev+0x1e/0x40
_mlx5e_probe+0xe2/0x370 [mlx5_core]
mlx5e_probe+0x59/0x70 [mlx5_core]
? __pfx_mlx5e_probe+0x10/0x10 [mlx5_core]
Fixes: 4c975fd70002 ("net: hold instance lock during NETDEV_REGISTER/UP")
Signed-off-by: Faicker Mo <faicker.mo@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Weiming Shi <bestswngs@gmail.com>
Date: Thu May 14 05:25:12 2026 -0700
net: qualcomm: rmnet: fix endpoint use-after-free in rmnet_dellink()
commit d00c953a8f69921f484b629801766da68f27f658 upstream.
rmnet_dellink() removes the endpoint from the hash table with
hlist_del_init_rcu() and then immediately frees it with kfree(). However,
RCU readers on the receive path (rmnet_rx_handler ->
__rmnet_map_ingress_handler) may still hold a reference to the endpoint and
dereference ep->egress_dev after the memory has been freed. The endpoint is
a kmalloc-32 object, and the stale read at offset 8 corresponds to the
egress_dev pointer.
BUG: unable to handle page fault for address: ffffffffde942eef
Oops: 0002 [#1] SMP NOPTI
CPU: 1 UID: 0 PID: 137 Comm: poc_write Not tainted 7.0.0+ #4 PREEMPTLAZY
RIP: 0010:rmnet_vnd_rx_fixup (rmnet_vnd.c:27)
Call Trace:
<TASK>
__rmnet_map_ingress_handler (rmnet_handlers.c:48 rmnet_handlers.c:101)
rmnet_rx_handler (rmnet_handlers.c:129 rmnet_handlers.c:235)
__netif_receive_skb_core.constprop.0 (net/core/dev.c:6096)
__netif_receive_skb_one_core (net/core/dev.c:6208)
netif_receive_skb (net/core/dev.c:6467)
tun_get_user (drivers/net/tun.c:1955)
tun_chr_write_iter (drivers/net/tun.c:2003)
vfs_write (fs/read_write.c:688)
ksys_write (fs/read_write.c:740)
</TASK>
Add an rcu_head field to struct rmnet_endpoint and replace kfree() with
kfree_rcu() so the endpoint memory remains valid through the RCU grace
period. Also remove the rmnet_vnd_dellink() call and inline only the
nr_rmnet_devs decrement, since rmnet_vnd_dellink() would set
ep->egress_dev to NULL during the grace period, creating a data race
with lockless readers.
Fixes: ceed73a2cf4a ("drivers: net: ethernet: qualcomm: rmnet: Initial implementation")
Reported-by: Xiang Mei <xmei5@asu.edu>
Signed-off-by: Weiming Shi <bestswngs@gmail.com>
Link: https://patch.msgid.link/20260514122511.3083479-2-bestswngs@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Date: Fri Jan 30 20:04:57 2026 +0000
net: stmmac: fix stm32 (and potentially others) resume regression
[ Upstream commit dbbec8c5a79f4c7aa8d07da8c0b5a34d76c50699 ]
Marek reported that suspending stm32 causes the following errors when
the interface is administratively down:
$ echo devices > /sys/power/pm_test
$ echo mem > /sys/power/state
...
ck_ker_eth2stp already disabled
...
ck_ker_eth2stp already unprepared
...
On suspend, stm32 starts the eth2stp clock in its suspend method, and
stops it in the resume method. This is because the blamed commit omits
the call to the platform glue ->suspend() method, but does make the
call to the platform glue ->resume() method.
This problem affects all other converted drivers as well - e.g. looking
at the PCIe drivers, pci_save_state() will not be called, but
pci_restore_state() will be. Similar issues affect all other drivers.
Fix this by always calling the ->suspend() method, even when the network
interface is down. This fixes all the conversions to the platform glue
->suspend() and ->resume() methods.
Link: https://lore.kernel.org/r/20260114081809.12758-1-marex@nabladev.com
Fixes: 07bbbfe7addf ("net: stmmac: add suspend()/resume() platform ops")
Reported-by: Marek Vasut <marex@nabladev.com>
Tested-by: Marek Vasut <marex@nabladev.com>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://patch.msgid.link/E1vlujh-00000007Hkw-2p6r@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Author: Lord Ulf Henrik Holmberg <henrik.holmberg@defensify.se>
Date: Sat May 9 10:40:11 2026 +0200
RDMA/bnxt_re: zero shared page before exposing to userspace
commit f6b079629becfa977f9c51fe53ad2e6dcc55ef44 upstream.
bnxt_re_alloc_ucontext() allocates uctx->shpg via
__get_free_page(GFP_KERNEL). The buddy allocator does not zero pages
without __GFP_ZERO, so the page contains stale kernel data from
whatever object most recently freed it.
The page is then mapped into userspace via vm_insert_page() under
BNXT_RE_MMAP_SH_PAGE in bnxt_re_mmap(). The driver only ever writes
4 bytes (a u32 AVID) at offset BNXT_RE_AVID_OFFT (0x10) inside
bnxt_re_create_ah(); the remaining 4092 bytes of the page are exposed
to userspace unsanitised, leaking kernel memory contents.
Any user with access to /dev/infiniband/uverbsX on a host with a
bnxt_re device (typically rdma group membership) can read this data
via a single mmap() at pgoff 0 after IB_USER_VERBS_CMD_GET_CONTEXT.
Other shared pages in the same file already use get_zeroed_page()
correctly:
drivers/infiniband/hw/bnxt_re/ib_verbs.c
srq->uctx_srq_page = (void *)get_zeroed_page(GFP_KERNEL);
cq->uctx_cq_page = (void *)get_zeroed_page(GFP_KERNEL);
uctx->shpg is the only outlier. Bring it in line with the existing
convention by switching to get_zeroed_page().
Fixes: 1ac5a4047975 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Signed-off-by: Lord Ulf Henrik Holmberg <henrik.holmberg@defensify.se>
Link: https://patch.msgid.link/20260509084011.11971-1-pomzm67@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: André Draszik <andre.draszik@linaro.org>
Date: Fri Jan 9 08:38:38 2026 +0000
regulator: core: fix locking in regulator_resolve_supply() error path
commit 497330b203d2c59c5ff3fa4c34d14494d7203bc3 upstream.
If late enabling of a supply regulator fails in
regulator_resolve_supply(), the code currently triggers a lockdep
warning:
WARNING: drivers/regulator/core.c:2649 at _regulator_put+0x80/0xa0, CPU#6: kworker/u32:4/596
...
Call trace:
_regulator_put+0x80/0xa0 (P)
regulator_resolve_supply+0x7cc/0xbe0
regulator_register_resolve_supply+0x28/0xb8
as the regulator_list_mutex must be held when calling _regulator_put().
To solve this, simply switch to using regulator_put().
While at it, we should also make sure that no concurrent access happens
to our rdev while we clear out the supply pointer. Add appropriate
locking to ensure that.
While the code in question will be removed altogether in a follow-up
commit, I believe it is still beneficial to have this corrected before
removal for future reference.
Fixes: 36a1f1b6ddc6 ("regulator: core: Fix memory leak in regulator_resolve_supply()")
Fixes: 8e5356a73604 ("regulator: core: Clear the supply pointer if enabling fails")
Signed-off-by: André Draszik <andre.draszik@linaro.org>
Link: https://patch.msgid.link/20260109-regulators-defer-v2-2-1a25dc968e60@linaro.org
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Nazar Kalashnikov <nazarkalashnikov0@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yang Erkun <yangerkun@huawei.com>
Date: Wed May 13 10:42:52 2026 +0800
Revert "NFSD: Defer sub-object cleanup in export put callbacks"
commit 516403d4d85607fdef3ca41d4a56b54e5566fa9a upstream.
This reverts commit 48db892356d6cb80f6942885545de4a6dd8d2a29.
Commit 48db892356d6 ("NFSD: Defer sub-object cleanup in export
put callbacks") moved path_put() and auth_domain_put() out of
svc_export_put() and expkey_put() and behind queue_rcu_work() to
close a claimed use-after-free in e_show() and c_show() against
ex_path and ex_client->name. Discussion in [1] shows neither
the diagnosis nor the remedy survives review.
The downstream teardown of both sub-objects is already RCU-deferred.
auth_domain_put() reaches svcauth_unix_domain_release(), which frees
the unix_domain and its ->name through call_rcu(). path_put()
reaches dentry_free(), which frees the dentry through call_rcu(),
and prepend_path() is already structured to tolerate concurrent
dentry teardown. A reader in cache_seq_start_rcu() therefore
observes both sub-objects through the next grace period regardless
of whether svc_export_put() runs synchronously, so the synchronous
form was never unsafe.
The crash signature in the report cited by commit 48db892356d6
("NFSD: Defer sub-object cleanup in export put callbacks") has a
different root cause: a /proc/net/rpc cache file held open across
network-namespace exit lets cache_destroy_net() free cd->hash_table
while a reader is still walking it. The correct fix pins cd->net for
the open fd's lifetime and does not require any deferral inside
svc_export_put().
Meanwhile, deferring path_put() out of svc_export_put() reintroduces
the regression that commit 69d803c40ede ("nfsd: Revert "nfsd:
release svc_expkey/svc_export with rcu_work"") repaired: after
"exportfs -r" drops the last cache reference, the mount reference
held through ex_path lingers in the workqueue, so a subsequent
umount fails with EBUSY.
Restore the synchronous path_put() and auth_domain_put() in
svc_export_put() and expkey_put() and the call_rcu()/kfree_rcu()
free of the containing structures. The unrelated fix for
ex_uuid/ex_stats from commit 2530766492ec ("nfsd: fix UAF when
access ex_uuid or ex_stats") is preserved.
Link: https://lore.kernel.org/all/10019b42-4589-4f9f-8d5b-d8197db1ce3c@huawei.com/ [1]
Fixes: 48db892356d6 ("NFSD: Defer sub-object cleanup in export put callbacks")
Cc: stable@vger.kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Alexandr Alexandrov <alexandr.alexandrov@oracle.com>
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Sun May 31 15:41:45 2026 +0200
rose: cancel neighbour timers in rose_neigh_put() before freeing
commit 9b222cb1d23ff210975e9df5ebab7b011acb6fad upstream.
rose_neigh_put() kfree()s the neighbour but never cancels its ftimer and
t0timer. Until now every caller that dropped the final reference first
called rose_remove_neigh(), which deletes those timers. The socket
heartbeat reaping path drops the last reference directly, so a neighbour
could be freed with t0timer still armed -- it re-arms itself in
rose_t0timer_expiry() -- leading to a use-after-free write in
enqueue_timer().
Cancel both timers with timer_delete_sync() (the synchronous variant, to
wait out a concurrently running, self-rearming handler) in the
refcount-zero branch of rose_neigh_put().
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Sat May 16 12:10:38 2026 +0200
rose: clear neighbour pointer after rose_neigh_put() in state machines
commit e8eb0c6faa8849ba7769516c1a8c84d9f612acf6 upstream.
After calling rose_neigh_put() in rose_state1_machine() through
rose_state5_machine(), rose->neighbour was left pointing at the
potentially freed neighbour structure. A subsequent timer expiry or
concurrent teardown path could dereference the stale pointer, causing
a use-after-free.
Set rose->neighbour to NULL immediately after each rose_neigh_put()
call in the state machine functions.
Fixes: d860d1faa6b2 ("net: rose: convert 'use' field to refcount_t")
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Sun May 31 15:41:45 2026 +0200
rose: clear neighbour pointer in rose_kill_by_device()
commit 606e42d195b467480d4d405f8814c48d1651a76a upstream.
rose_kill_by_device() drops the neighbour reference but leaves
rose->neighbour pointing at it, unlike every other rose_neigh_put() site
(see "rose: clear neighbour pointer after rose_neigh_put() in state
machines"). The heartbeat STATE_0 reaping path then puts the same
neighbour a second time, causing a rose_neigh refcount underflow and a
use-after-free.
Set rose->neighbour = NULL after the put, restoring the invariant.
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Thu May 28 17:38:18 2026 +0200
rose: disconnect orphaned STATE_2 sockets when device is gone
commit d4f4cf9f09a3f5fafa8f09110a7c1b5d10f2f261 upstream.
When ax25stop brings down ROSE interfaces, sockets in ROSE_STATE_2
(awaiting CLEAR CONFIRM) whose device pointer is already NULL are not
reached by rose_kill_by_device() and wait for T3 (up to 180s) before
self-cleaning via rose_timer_expiry(). This keeps the rose module
usecount at 1, blocking rmmod for the full T3 duration.
In rose_heartbeat_expiry(), detect ROSE_STATE_2 sockets with no device,
cancel T3, release the neighbour reference, and call rose_disconnect()
+ sock_set_flag(SOCK_DESTROY). The next heartbeat tick (<=5s) then
destroys the socket via the existing ROSE_STATE_0/SOCK_DESTROY path,
allowing clean module unload within 10s instead of up to 180s.
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Sun May 31 15:41:45 2026 +0200
rose: don't free fd-owned sockets when reaping in the heartbeat
commit 56576518920edd7b6c3479477d8d490fe2ebdaaa upstream.
The heartbeat reaps orphaned ROSE sockets after their bound device goes
down. A socket still attached to a struct socket (sk->sk_socket != NULL --
e.g. an incoming connection an fpad client has accepted and kept open) is
owned by that userspace fd: rose_release() frees it on close(). Freeing it
from the heartbeat left the fd dangling, so the eventual close() touched
freed memory -- slab-use-after-free in rose_release().
Reap only sockets with sk->sk_socket == NULL (unaccepted incoming
connections and post-close orphans). For an fd-owned socket whose device
went down, disconnect it and fall through to the switch so close() does
the teardown. Also release the neighbour reference held by orphaned
incoming sockets before tearing them down.
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Thu May 28 20:20:55 2026 +0200
rose: drop CALL_REQUEST in loopback timer when device is not running
commit cf5567a2652e44866eae8987dff4c1ea507680df upstream.
When ax25stop brings down rose0 while the loopback timer has pending
CALL_REQUEST frames, rose_loopback_timer() calls rose_dev_get() and
finds the device still registered (unregister_netdevice waits for
refs to drop), then calls rose_rx_call_request() which takes a
netdev_hold() for the new socket.
But NETDEV_DOWN fires only once: rose_kill_by_device() already ran
before this timer tick, so the new socket is never cleaned up. The
stuck reference prevents unregister_netdevice from completing, and the
orphan socket's timers eventually fire on freed memory (KASAN
slab-use-after-free in __run_timers).
The kernel clears IFF_UP via dev_close() before sending NETDEV_DOWN,
so checking netif_running() after rose_dev_get() is sufficient: if the
device is no longer running, the CALL_REQUEST is silently dropped and
no socket is created. This closes the race without touching the
module-exit path (which already stops the timer via loopback_stopping).
Tested: unregister_netdevice completes immediately after ax25stop with
active loopback connections; no ref_tracker warnings, no KASAN.
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Sat May 16 12:09:33 2026 +0200
rose: fix dev_put() leak in rose_loopback_timer()
commit ff91adc54db2b62c7cdf063ff761eceb5adf2215 upstream.
rose_rx_call_request() always consumes or returns the skb but never
releases the device reference obtained from rose_dev_get(). When
rose_rx_call_request() succeeds (returns non-zero) dev_put() was never
called, leaking one reference per loopback CALL_REQUEST.
Move dev_put() outside the conditional so it is called unconditionally
after rose_rx_call_request() in all cases.
Also remove the dead check (!rose_loopback_neigh->dev &&
!rose_loopback_neigh->loopback) that immediately precedes it: the
loopback neighbour always has loopback=1 so this condition can never
be true.
Fixes: 0453c6824595 ("net/rose: fix unbound loop in rose_loopback_timer()")
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Thu May 28 19:11:55 2026 +0200
rose: fix netdev double-hold in rose_make_new()
commit b9fb21ceb4f0d043767a1eba60786ec84809033b upstream.
rose_make_new() copies orose->device from the listener socket and calls
netdev_hold(), storing the tracker in rose->dev_tracker. The only
caller, rose_rx_call_request(), then overwrites both make_rose->device
and make_rose->dev_tracker with a fresh netdev_hold() for the actual
incoming-call device.
This orphans the tracker allocated by rose_make_new(): it remains in
the device's refcount_tracker list but no pointer exists to free it
via netdev_put(). The result is one spurious outstanding reference per
accepted CALL_REQUEST, visible at rmmod time as:
ref_tracker: netdev@X has 2/2 users at
rose_rx_call_request+0xba3/0x1d50 [rose]
rose_loopback_timer+0x3eb/0x670 [rose]
The second entry is the orphaned tracker from rose_make_new(); the
first is the correctly-managed socket reference from rose_rx_call_request().
Fix: initialise rose->device to NULL in rose_make_new() and let
rose_rx_call_request() -- the sole caller -- assign the correct device
and take the sole netdev_hold() as it already does.
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Tue May 26 15:57:04 2026 +0200
rose: fix netdev double-hold in rose_rx_call_request()
commit c675277c3ba0d2310e0825577d58308c39931e14 upstream.
rose_rx_call_request() used netdev_tracker_alloc() after assigning
make_rose->device, intending to take ownership of the reference passed
by the caller. But every caller -- rose_route_frame() and
rose_loopback_timer() -- already calls dev_put() for its own hold after
the function returns, so the socket ended up with a tracker entry
pointing at a reference that had already been released.
The result was spurious refcount_t warnings ("saturated", "decrement
hit 0") on every incoming CALL_REQUEST, leading to refcount corruption
and eventual silent freeze.
Replace netdev_tracker_alloc() with netdev_hold() so that
rose_rx_call_request() acquires its own independent reference. Each
caller retains its own hold from rose_dev_get() and releases it via
dev_put() as before; socket cleanup releases the socket's separate hold
via netdev_put().
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Tue May 26 15:57:47 2026 +0200
rose: fix notifier unregistered too early in rose_exit()
commit f71a8a1edc14dba746edde38adddd654ba202b4d upstream.
rose_exit() called unregister_netdevice_notifier() before the loop that
calls unregister_netdev() on each ROSE virtual device. As a result,
the NETDEV_DOWN event fired by unregister_netdev() was never delivered
to rose_device_event(), so rose_kill_by_device() never ran.
Every socket whose rose->device pointed at a ROSE device therefore kept
its netdev_tracker entry live until free_netdev() destroyed the
ref_tracker_dir, at which point the kernel reported all of them as
leaked references (165 entries in a typical FPAC setup). Worse, those
sockets retained stale device pointers and live timers that could fire
into freed module text after module unload, causing a silent system
freeze with no kernel panic logged.
Fix by moving unregister_netdevice_notifier() to after the device-
unregistration loop. unregister_netdev() then delivers NETDEV_DOWN
while the notifier is still registered, rose_kill_by_device() runs for
each device, releases all netdev references held by open sockets, and
calls rose_disconnect() which stops the per-socket timers.
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Sat May 16 12:10:20 2026 +0200
rose: fix race between loopback timer and module removal
commit 47dd6ec1a77d77895afb00aa2e68373a48289108 upstream.
rose_loopback_clear() called timer_delete() which returns immediately
without waiting for any running callback to complete. If the timer
fired concurrently with module removal, rose_loopback_timer() could
re-arm the timer after timer_delete() returned and then access
rose_loopback_neigh after it was freed.
Two complementary changes close the race:
1. Add a loopback_stopping atomic flag. rose_loopback_timer() checks
it at entry (before acquiring a reference) and again inside the
loop; when set it drains the queue and exits without re-arming the
timer.
2. Switch rose_loopback_clear() to timer_delete_sync() so it blocks
until any in-flight callback has returned before freeing resources.
The smp_mb() between setting the flag and calling timer_delete_sync()
ensures the flag is visible to any callback that is about to run.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Sat May 16 12:10:55 2026 +0200
rose: guard rose_neigh_put() against NULL in timer expiry
commit 2b67342c6ff899a0b83359517146a5b7b243af97 upstream.
In rose_timer_expiry(), the ROSE_STATE_2 branch calls
rose_neigh_put(rose->neighbour) without first checking whether the
pointer is NULL. After commit 5de7665e0a07 ("net: rose: fix timer
races against user threads") the timer is re-armed when the socket is
owned by a user thread; between the re-arm and the next firing, a
device-down event or concurrent teardown via rose_kill_by_device() can
set rose->neighbour to NULL, leading to a NULL-pointer dereference
inside rose_neigh_put().
Add a NULL check before the put and clear the pointer afterwards.
Fixes: 5de7665e0a07 ("net: rose: fix timer races against user threads")
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Sat May 16 12:10:03 2026 +0200
rose: hold loopback neighbour reference across timer callback
commit d270a7a5793af84555c40dd1eb80f1d497fdf53c upstream.
rose_loopback_timer() dereferences rose_loopback_neigh throughout its
body but holds no reference on it. A concurrent rose_loopback_clear()
followed by rose_add_loopback_neigh() could free and reallocate the
neighbour while the timer body is running, causing a use-after-free.
Take a reference with rose_neigh_hold() at the start of the callback
(bailing out if the pointer is already NULL) and release it with
rose_neigh_put() at the single exit point. The neigh cannot be freed
while the callback holds a reference.
Fixes: d860d1faa6b2 ("net: rose: convert 'use' field to refcount_t")
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Thu May 28 19:38:31 2026 +0200
rose: release netdev ref and destroy orphaned incoming sockets
commit df12be096302d2c947388acc25764456c7f18cc1 upstream.
Two related cleanup gaps left the module unremovable after a loopback
session:
1. rose_destroy_socket() did not release the device reference. When
an unaccepted incoming socket (created by rose_rx_call_request()) is
destroyed via rose_heartbeat_expiry(), it is removed from rose_list
before rose_kill_by_device() can find it, so the netdev_hold() taken
in rose_rx_call_request() was never matched by netdev_put(). Add the
release at the top of rose_destroy_socket() guarded by a NULL check
so that rose_release() and rose_kill_by_device(), which already call
netdev_put() and set device = NULL, are not affected.
2. rose_heartbeat_expiry() STATE_0 cleanup required TCP_LISTEN in
addition to SOCK_DEAD. Unaccepted incoming sockets are
TCP_ESTABLISHED, so the condition was never true and those sockets
lingered forever, holding the module use count above zero and
blocking rmmod. Drop the TCP_LISTEN restriction: any STATE_0 +
SOCK_DEAD socket is orphaned and should be destroyed.
Together with the earlier rose_make_new() double-hold fix these three
patches allow clean rmmod after loopback sessions.
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Bernard Pidoux <bernard.f6bvp@gmail.com>
Date: Wed May 27 14:11:21 2026 +0200
rose: set SOCK_DESTROY in rose_kill_by_device() for prompt cleanup
commit 741a4863ad570889c75f7a8e404567d8f3e46335 upstream.
When rose_kill_by_device() is called (via NETDEV_DOWN on module exit
or interface removal), it calls rose_disconnect() which transitions
sockets to ROSE_STATE_0 and sets SOCK_DEAD. However,
rose_heartbeat_expiry() only calls rose_destroy_socket() at
ROSE_STATE_0 if SOCK_DESTROY is set -- the SOCK_DEAD path is reserved
for TCP_LISTEN sockets. Without SOCK_DESTROY, orphaned sockets in
ROSE_STATE_2 (clearing) loop indefinitely in the heartbeat without
ever being freed, keeping the module use-count elevated and blocking
modprobe -r rose until the T1 timer (up to 200 s) expires.
Set SOCK_DESTROY immediately after rose_disconnect() so the heartbeat
destroys the socket at its next tick (within 5 s), allowing clean
module unload.
Signed-off-by: Bernard Pidoux <bernard.f6bvp@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Xin Long <lucien.xin@gmail.com>
Date: Thu Jun 25 10:43:46 2026 +0300
sctp: disable BH before calling udp_tunnel_xmit_skb()
commit 2cd7e6971fc2787408ceef17906ea152791448cf upstream.
udp_tunnel_xmit_skb() / udp_tunnel6_xmit_skb() are expected to run with
BH disabled. After commit 6f1a9140ecda ("add xmit recursion limit to
tunnel xmit functions"), on the path:
udp(6)_tunnel_xmit_skb() -> ip(6)tunnel_xmit()
dev_xmit_recursion_inc()/dec() must stay balanced on the same CPU.
Without local_bh_disable(), the context may move between CPUs, which can
break the inc/dec pairing. This may lead to incorrect recursion level
detection and cause packets to be dropped in ip(6)_tunnel_xmit() or
__dev_queue_xmit().
Fix it by disabling BH around both IPv4 and IPv6 SCTP UDP xmit paths.
In my testing, after enabling the SCTP over UDP:
# ip net exec ha sysctl -w net.sctp.udp_port=9899
# ip net exec ha sysctl -w net.sctp.encap_port=9899
# ip net exec hb sysctl -w net.sctp.udp_port=9899
# ip net exec hb sysctl -w net.sctp.encap_port=9899
# ip net exec ha iperf3 -s
- without this patch:
# ip net exec hb iperf3 -c 192.168.0.1 --sctp
[ 5] 0.00-10.00 sec 37.2 MBytes 31.2 Mbits/sec sender
[ 5] 0.00-10.00 sec 37.1 MBytes 31.1 Mbits/sec receiver
- with this patch:
# ip net exec hb iperf3 -c 192.168.0.1 --sctp
[ 5] 0.00-10.00 sec 3.14 GBytes 2.69 Gbits/sec sender
[ 5] 0.00-10.00 sec 3.14 GBytes 2.69 Gbits/sec receiver
Fixes: 6f1a9140ecda ("net: add xmit recursion limit to tunnel xmit functions")
Fixes: 046c052b475e ("sctp: enable udp tunneling socks")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Link: https://patch.msgid.link/c874a8548221dcd56ff03c65ba75a74e6cf99119.1776017727.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Alexander Martyniuk <alexevgmart@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com>
Date: Thu May 28 22:48:07 2026 +0530
serial: qcom_geni: Fix RX DMA stall when SE_DMA_RX_LEN_IN is zero
commit b93062b6d8a1b2d9bad235cac25558a909819026 upstream.
In qcom_geni_serial_handle_rx_dma(), geni_se_rx_dma_unprep() clears
port->rx_dma_addr before SE_DMA_RX_LEN_IN is read. If the register is zero,
for example when the RX stale counter fires on an idle line, the handler
returns without calling geni_se_rx_dma_prep().
The next RX DMA interrupt then hits the !port->rx_dma_addr guard and
returns immediately, so the RX DMA buffer is never rearmed and later input
is lost.
Keep the handler on the rearm path when rx_in is zero. Warn about the
unexpected zero-length DMA completion, skip received-data handling, and
always call geni_se_rx_dma_prep().
Fixes: 2aaa43c70778 ("tty: serial: qcom-geni-serial: add support for serial engine DMA")
Cc: stable@vger.kernel.org
Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@oss.qualcomm.com>
Signed-off-by: Viken Dadhaniya <viken.dadhaniya@oss.qualcomm.com>
Link: https://patch.msgid.link/20260528-serial-rx-0-byte-fix-v2-1-b4195cfe342f@oss.qualcomm.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Lorenzo Stoakes <ljs@kernel.org>
Date: Fri May 15 15:42:19 2026 +0300
testing/selftests/mm: add soft-dirty merge self-test
commit c7ba92bcfea34f6b4afc744c3b65c8f7420fefe0 upstream.
Assert that we correctly merge VMAs containing VM_SOFTDIRTY flags now that
we correctly handle these as sticky.
In order to do so, we have to account for the fact the pagemap interface
checks soft dirty PTEs and additionally that newly merged VMAs are marked
VM_SOFTDIRTY.
We do this by using use unfaulted anon VMAs, establishing one and clearing
references on that one, before establishing another and merging the two
before checking that soft-dirty is propagated as expected.
We check that this functions correctly with mremap() and mprotect() as
sample cases, because VMA merge of adjacent newly mapped VMAs will
automatically be made soft-dirty due to existing logic which does so.
We are therefore exercising other means of merging VMAs.
Link: https://lkml.kernel.org/r/d5a0f735783fb4f30a604f570ede02ccc5e29be9.1763399675.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes <ljs@kernel.org>
Cc: Andrey Vagin <avagin@gmail.com>
Cc: David Hildenbrand (Red Hat) <david@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ahmed Elaidy <elaidya225@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Yi Yang <yiyang13@huawei.com>
Date: Thu Jun 4 06:07:34 2026 +0000
vc_screen: fix null-ptr-deref in vcs_notifier() during concurrent vcs_write
commit a287620312dc6dcb9a093417a0e589bf30fcf38a upstream.
A KASAN null-ptr-deref was observed in vcs_notifier():
BUG: KASAN: null-ptr-deref in vcs_notifier+0x98/0x130
Read of size 2 at addr qmp_cmd_name: qmp_capabilities, arguments: {}
The issue is a race condition in vcs_write(). When the console_lock is
temporarily dropped (to copy data from userspace), the vc_data pointer
obtained from vcs_vc() may become stale. After re-acquiring the lock,
vcs_vc() is called again to re-validate the pointer. If the vc has been
deallocated in the meantime, vcs_vc() returns NULL, and the while loop
breaks (with written > 0). However, after the loop, vcs_scr_updated(vc)
is still called with the now-NULL vc pointer, leading to a null pointer
dereference in the notifier chain (vcs_notifier dereferences param->vc).
Fix this by adding a NULL check for vc before calling vcs_scr_updated().
Fixes: 8fb9ea65c9d1 ("vc_screen: reload load of struct vc_data pointer in vcs_write() to avoid UAF")
Cc: stable@vger.kernel.org
Signed-off-by: Yi Yang <yiyang13@huawei.com>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://patch.msgid.link/20260604060734.2914976-1-yiyang13@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Author: Miklos Szeredi <mszeredi@redhat.com>
Date: Thu May 28 10:58:24 2026 +0200
virtiofs: fix UAF on submount umount
commit 06b41351779e9289e8785694ade9042ae85e41ea upstream.
iput() called from fuse_release_end() can Oops if the super block has
already been destroyed. Normally this is prevented by waiting for
num_waiting to go down to zero before commencing with super block shutdown.
This only works, however, for the last submount instance, as the wait
counter is per connection, not per superblock.
Revert to using synchronous release requests for the auto_submounts case,
which is virtiofs only at this time.
Reported-by: Aurélien Bombo <abombo@microsoft.com>
Reported-by: Zhihao Cheng <chengzhihao1@huawei.com>
Cc: Greg Kurz <gkurz@redhat.com>
Closes: https://github.com/kata-containers/kata-containers/issues/12589
Fixes: 26e5c67deb2e ("fuse: fix livelock in synchronous file put from fuseblk workers")
Cc: stable@vger.kernel.org
Reviewed-by: Greg Kurz <gkurz@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>