]> Gentwo Git Trees - linux/.git/log
linux/.git
2 weeks agoext4: make data=journal support large block size
Baokun Li [Fri, 21 Nov 2025 09:06:51 +0000 (17:06 +0800)]
ext4: make data=journal support large block size

Currently, ext4_set_inode_mapping_order() does not set max folio order
for files with the data journalling flag. For files that already have
large folios enabled, ext4_inode_journal_mode() ignores the data
journalling flag once max folio order is set.

This is not because data journalling cannot work with large folios, but
because credit estimates will go through the roof if there are too many
blocks per folio.

Since the real constraint is blocks-per-folio, to support data=journal
under LBS, we now set max folio order to be equal to min folio order for
files with the journalling flag. When LBS is disabled, the max folio order
remains unset as before.

Therefore, before ext4_change_inode_journal_flag() switches the journalling
mode, we call truncate_pagecache() to drop all page cache for that inode,
and filemap_write_and_wait() is called unconditionally.

After that, once the journalling mode has been switched, we can safely
reset the inode mapping order, and the mapping_large_folio_support() check
in ext4_inode_journal_mode() can be removed.

Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-22-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: support large block size in __ext4_block_zero_page_range()
Zhihao Cheng [Fri, 21 Nov 2025 09:06:50 +0000 (17:06 +0800)]
ext4: support large block size in __ext4_block_zero_page_range()

Use the EXT4_PG_TO_LBLK() macro to convert folio indexes to blocks to avoid
negative left shifts after supporting blocksize greater than PAGE_SIZE.

Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-21-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: support large block size in mpage_prepare_extent_to_map()
Baokun Li [Fri, 21 Nov 2025 09:06:49 +0000 (17:06 +0800)]
ext4: support large block size in mpage_prepare_extent_to_map()

Use the EXT4_PG_TO_LBLK/EXT4_LBLK_TO_PG macros to complete the conversion
between folio indexes and blocks to avoid negative left/right shifts after
supporting blocksize greater than PAGE_SIZE.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-20-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: support large block size in mpage_map_and_submit_buffers()
Baokun Li [Fri, 21 Nov 2025 09:06:48 +0000 (17:06 +0800)]
ext4: support large block size in mpage_map_and_submit_buffers()

Use the EXT4_PG_TO_LBLK/EXT4_LBLK_TO_PG macros to complete the conversion
between folio indexes and blocks to avoid negative left/right shifts after
supporting blocksize greater than PAGE_SIZE.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-19-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: support large block size in ext4_block_write_begin()
Baokun Li [Fri, 21 Nov 2025 09:06:47 +0000 (17:06 +0800)]
ext4: support large block size in ext4_block_write_begin()

Use the EXT4_PG_TO_LBLK() macro to convert folio indexes to blocks to avoid
negative left shifts after supporting blocksize greater than PAGE_SIZE.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-18-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: support large block size in ext4_mpage_readpages()
Baokun Li [Fri, 21 Nov 2025 09:06:46 +0000 (17:06 +0800)]
ext4: support large block size in ext4_mpage_readpages()

Use the EXT4_PG_TO_LBLK() macro to convert folio indexes to blocks to avoid
negative left shifts after supporting blocksize greater than PAGE_SIZE.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-17-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: rename 'page' references to 'folio' in multi-block allocator
Zhihao Cheng [Fri, 21 Nov 2025 09:06:45 +0000 (17:06 +0800)]
ext4: rename 'page' references to 'folio' in multi-block allocator

The ext4 multi-block allocator now fully supports folio objects. Update
all variable names, function names, and comments to replace legacy 'page'
terminology with 'folio', improving clarity and consistency.

No functional changes.

Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-16-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: prepare buddy cache inode for BS > PS with large folios
Baokun Li [Fri, 21 Nov 2025 09:06:44 +0000 (17:06 +0800)]
ext4: prepare buddy cache inode for BS > PS with large folios

We use EXT4_BAD_INO for the buddy cache inode number. This inode is not
accessed via __ext4_new_inode() or __ext4_iget(), meaning
ext4_set_inode_mapping_order() is not called to set its folio order range.

However, future block size greater than page size support requires this
inode to support large folios, and the buddy cache code already handles
BS > PS. Therefore, ext4_set_inode_mapping_order() is now explicitly
called for this specific inode to set its folio order range.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-15-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: support large block size in ext4_mb_init_cache()
Baokun Li [Fri, 21 Nov 2025 09:06:43 +0000 (17:06 +0800)]
ext4: support large block size in ext4_mb_init_cache()

Currently, ext4_mb_init_cache() uses blocks_per_page to calculate the
folio index and offset. However, when blocksize is larger than PAGE_SIZE,
blocks_per_page becomes zero, leading to a potential division-by-zero bug.

Since we now have the folio, we know its exact size. This allows us to
convert {blocks, groups}_per_page to {blocks, groups}_per_folio, thus
supporting block sizes greater than page size.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-14-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: support large block size in ext4_mb_get_buddy_page_lock()
Baokun Li [Fri, 21 Nov 2025 09:06:42 +0000 (17:06 +0800)]
ext4: support large block size in ext4_mb_get_buddy_page_lock()

Currently, ext4_mb_get_buddy_page_lock() uses blocks_per_page to calculate
folio index and offset. However, when blocksize is larger than PAGE_SIZE,
blocks_per_page becomes zero, leading to a potential division-by-zero bug.

To support BS > PS, use bytes to compute folio index and offset within
folio to get rid of blocks_per_page.

Also, since ext4_mb_get_buddy_page_lock() already fully supports folio,
rename it to ext4_mb_get_buddy_folio_lock().

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-13-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: support large block size in ext4_mb_load_buddy_gfp()
Baokun Li [Fri, 21 Nov 2025 09:06:41 +0000 (17:06 +0800)]
ext4: support large block size in ext4_mb_load_buddy_gfp()

Currently, ext4_mb_load_buddy_gfp() uses blocks_per_page to calculate the
folio index and offset. However, when blocksize is larger than PAGE_SIZE,
blocks_per_page becomes zero, leading to a potential division-by-zero bug.

To support BS > PS, use bytes to compute folio index and offset within
folio to get rid of blocks_per_page.

Also, if buddy and bitmap land in the same folio, we get that folio’s ref
instead of looking it up again before updating the buddy.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-12-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: add EXT4_LBLK_TO_PG and EXT4_PG_TO_LBLK for block/page conversion
Baokun Li [Fri, 21 Nov 2025 09:06:40 +0000 (17:06 +0800)]
ext4: add EXT4_LBLK_TO_PG and EXT4_PG_TO_LBLK for block/page conversion

As BS > PS support is coming, all block number to page index (and
vice-versa) conversions must now go via bytes. Added EXT4_LBLK_TO_PG()
and EXT4_PG_TO_LBLK() macros to simplify these conversions and handle
both BS <= PS and BS > PS scenarios cleanly.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-11-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: add EXT4_LBLK_TO_B macro for logical block to bytes conversion
Baokun Li [Fri, 21 Nov 2025 09:06:39 +0000 (17:06 +0800)]
ext4: add EXT4_LBLK_TO_B macro for logical block to bytes conversion

No functional changes.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-10-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: support large block size in ext4_readdir()
Baokun Li [Fri, 21 Nov 2025 09:06:38 +0000 (17:06 +0800)]
ext4: support large block size in ext4_readdir()

In ext4_readdir(), page_cache_sync_readahead() is used to readahead mapped
physical blocks. With LBS support, this can lead to a negative right shift.

To fix this, the page index is now calculated by first converting the
physical block number (pblk) to a file position (pos) before converting
it to a page index. Also, the correct number of pages to readahead is now
passed.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-9-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: support large block size in ext4_calculate_overhead()
Baokun Li [Fri, 21 Nov 2025 09:06:37 +0000 (17:06 +0800)]
ext4: support large block size in ext4_calculate_overhead()

ext4_calculate_overhead() used a single page for its bitmap buffer, which
worked fine when PAGE_SIZE >= block size. However, with block size greater
than page size (BS > PS) support, the bitmap can exceed a single page.

To address this, we now use kvmalloc() to allocate memory of the filesystem
block size, to properly support BS > PS.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-8-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: introduce s_min_folio_order for future BS > PS support
Baokun Li [Fri, 21 Nov 2025 09:06:36 +0000 (17:06 +0800)]
ext4: introduce s_min_folio_order for future BS > PS support

This commit introduces the s_min_folio_order field to the ext4_sb_info
structure. This field will store the minimum folio order required by the
current filesystem, laying groundwork for future support of block sizes
greater than PAGE_SIZE.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-7-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: enable DIOREAD_NOLOCK by default for BS > PS as well
Baokun Li [Fri, 21 Nov 2025 09:06:35 +0000 (17:06 +0800)]
ext4: enable DIOREAD_NOLOCK by default for BS > PS as well

The dioread_nolock related processes already support large folio, so
dioread_nolock is enabled by default regardless of whether the blocksize
is less than, equal to, or greater than PAGE_SIZE.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-6-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: make ext4_punch_hole() support large block size
Baokun Li [Fri, 21 Nov 2025 09:06:34 +0000 (17:06 +0800)]
ext4: make ext4_punch_hole() support large block size

When preparing for bs > ps support, clean up unnecessary PAGE_SIZE
references in ext4_punch_hole().

Previously, when a hole extended beyond i_size, we aligned the hole end
upwards to PAGE_SIZE to handle partial folio invalidation. Now that
truncate_inode_pages_range() already handles partial folio invalidation
correctly, this alignment is no longer required.

However, to save pointless tail block zeroing, we still keep rounding up
to the block size here.

In addition, as Honza pointed out, when the hole end equals i_size, it
should also be rounded up to the block size. This patch fixes that as well.

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-5-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: remove PAGE_SIZE checks for rec_len conversion
Baokun Li [Fri, 21 Nov 2025 09:06:33 +0000 (17:06 +0800)]
ext4: remove PAGE_SIZE checks for rec_len conversion

Previously, ext4_rec_len_(to|from)_disk only performed complex rec_len
conversions when PAGE_SIZE >= 65536 to reduce complexity.

However, we are soon to support file system block sizes greater than
page size, which makes these conditional checks unnecessary. Thus, these
checks are now removed.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-4-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: remove page offset calculation in ext4_block_truncate_page()
Baokun Li [Fri, 21 Nov 2025 09:06:32 +0000 (17:06 +0800)]
ext4: remove page offset calculation in ext4_block_truncate_page()

For bs <= ps scenarios, calculating the offset within the block is
sufficient. For bs > ps, an initial page offset calculation can lead to
incorrect behavior. Thus this redundant calculation has been removed.

Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-3-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: remove page offset calculation in ext4_block_zero_page_range()
Zhihao Cheng [Fri, 21 Nov 2025 09:06:31 +0000 (17:06 +0800)]
ext4: remove page offset calculation in ext4_block_zero_page_range()

For bs <= ps scenarios, calculating the offset within the block is
sufficient. For bs > ps, an initial page offset calculation can lead to
incorrect behavior. Thus this redundant calculation has been removed.

Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Message-ID: <20251121090654.631996-2-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: align max orphan file size with e2fsprogs limit
Baokun Li [Thu, 20 Nov 2025 13:42:33 +0000 (21:42 +0800)]
ext4: align max orphan file size with e2fsprogs limit

Kernel commit 0a6ce20c1564 ("ext4: verify orphan file size is not too big")
limits the maximum supported orphan file size to 8 << 20.

However, in e2fsprogs, the orphan file size is set to 32–512 filesystem
blocks when creating a filesystem.

With 64k block size, formatting an ext4 fs >32G gives an orphan file bigger
than the kernel allows, so mount prints an error and fails:

    EXT4-fs (vdb): orphan file too big: 8650752
    EXT4-fs (vdb): mount failed

To prevent this issue and allow previously created 64KB filesystems to
mount, we updates the maximum allowed orphan file size in the kernel to
512 filesystem blocks.

Fixes: 0a6ce20c1564 ("ext4: verify orphan file size is not too big")
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251120134233.2994147-1-libaokun@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2 weeks agoDocumentation: ext4: Document casefold and encrypt flags
Daniel Tang [Wed, 19 Nov 2025 14:32:13 +0000 (09:32 -0500)]
Documentation: ext4: Document casefold and encrypt flags

Based on ext4(5) and fs/ext4/ext4.h.

For INCOMPAT_ENCRYPT, it's possible to create a new filesystem with that
flag without creating any encrypted inodes. ext4(5) says it adds
"support" but doesn't say whether anything's actually present like
COMPAT_RESIZE_INODE does.

Signed-off-by: Daniel Tang <danielzgtg.opensource@gmail.com>
Message-ID: <4506189.9SDvczpPoe@daniel-desktop3>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agofs/ext4: fix typo in comment
Haodong Tian [Wed, 12 Nov 2025 15:59:16 +0000 (23:59 +0800)]
fs/ext4: fix typo in comment

Correct 'metdata' -> 'metadata' in comment.

Signed-off-by: Haodong Tian <tianhd25@mails.tsinghua.edu.cn>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Message-ID: <20251112155916.3007639-1-tianhd25@mails.tsinghua.edu.cn>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: correct the comments place for EXT4_EXT_MAY_ZEROOUT
Yang Erkun [Wed, 12 Nov 2025 08:45:38 +0000 (16:45 +0800)]
ext4: correct the comments place for EXT4_EXT_MAY_ZEROOUT

Move the comments just before we set EXT4_EXT_MAY_ZEROOUT in
ext4_split_convert_extents.

Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Message-ID: <20251112084538.1658232-4-yangerkun@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: cleanup for ext4_map_blocks
Yang Erkun [Wed, 12 Nov 2025 08:45:37 +0000 (16:45 +0800)]
ext4: cleanup for ext4_map_blocks

Retval from ext4_map_create_blocks means we really create some blocks,
cannot happened with m_flags without EXT4_MAP_UNWRITTEN and
EXT4_MAP_MAPPED.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Message-ID: <20251112084538.1658232-3-yangerkun@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: rename EXT4_GET_BLOCKS_PRE_IO
Yang Erkun [Wed, 12 Nov 2025 08:45:36 +0000 (16:45 +0800)]
ext4: rename EXT4_GET_BLOCKS_PRE_IO

This flag has been generalized to split an unwritten extent when we do
dio or dioread_nolock writeback, or to avoid merge new extents which was
created by extents split. Update some related comments too.

Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Yang Erkun <yangerkun@huawei.com>
Message-ID: <20251112084538.1658232-2-yangerkun@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: improve integrity checking in __mb_check_buddy by enhancing order-0 validation
Yongjian Sun [Thu, 6 Nov 2025 06:06:14 +0000 (14:06 +0800)]
ext4: improve integrity checking in __mb_check_buddy by enhancing order-0 validation

When the MB_CHECK_ASSERT macro is enabled, we found that the
current validation logic in __mb_check_buddy has a gap in
detecting certain invalid buddy states, particularly related
to order-0 (bitmap) bits.

The original logic consists of three steps:
1. Validates higher-order buddies: if a higher-order bit is
set, at most one of the two corresponding lower-order bits
may be free; if a higher-order bit is clear, both lower-order
bits must be allocated (and their bitmap bits must be 0).
2. For any set bit in order-0, ensures all corresponding
higher-order bits are not free.
3. Verifies that all preallocated blocks (pa) in the group
have pa_pstart within bounds and their bitmap bits marked as
allocated.

However, this approach fails to properly validate cases where
order-0 bits are incorrectly cleared (0), allowing some invalid
configurations to pass:

               corrupt            integral

order 3           1                  1
order 2       1       1          1       1
order 1     1   1   1   1      1   1   1   1
order 0    0 0 1 1 1 1 1 1    1 1 1 1 1 1 1 1

Here we get two adjacent free blocks at order-0 with inconsistent
higher-order state, and the right one shows the correct scenario.

The root cause is insufficient validation of order-0 zero bits.
To fix this and improve completeness without significant performance
cost, we refine the logic:

1. Maintain the top-down higher-order validation, but we no longer
check the cases where the higher-order bit is 0, as this case will
be covered in step 2.
2. Enhance order-0 checking by examining pairs of bits:
   - If either bit in a pair is set (1), all corresponding
     higher-order bits must not be free.
   - If both bits are clear (0), then exactly one of the
     corresponding higher-order bits must be free
3. Keep the preallocation (pa) validation unchanged.

This change closes the validation gap, ensuring illegal buddy states
involving order-0 are correctly detected, while removing redundant
checks and maintaining efficiency.

Fixes: c9de560ded61f ("ext4: Add multi block allocator for ext4")
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251106060614.631382-3-sunyongjian@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: fix incorrect group number assertion in mb_check_buddy
Yongjian Sun [Thu, 6 Nov 2025 06:06:13 +0000 (14:06 +0800)]
ext4: fix incorrect group number assertion in mb_check_buddy

When the MB_CHECK_ASSERT macro is enabled, an assertion failure can
occur in __mb_check_buddy when checking preallocated blocks (pa) in
a block group:

Assertion failure in mb_free_blocks() : "groupnr == e4b->bd_group"

This happens when a pa at the very end of a block group (e.g.,
pa_pstart=32765, pa_len=3 in a group of 32768 blocks) becomes
exhausted - its pa_pstart is advanced by pa_len to 32768, which
lies in the next block group. If this exhausted pa (with pa_len == 0)
is still in the bb_prealloc_list during the buddy check, the assertion
incorrectly flags it as belonging to the wrong group. A possible
sequence is as follows:

ext4_mb_new_blocks
  ext4_mb_release_context
    pa->pa_pstart += EXT4_C2B(sbi, ac->ac_b_ex.fe_len)
    pa->pa_len -= ac->ac_b_ex.fe_len

                 __mb_check_buddy
                           for each pa in group
                             ext4_get_group_no_and_offset
                             MB_CHECK_ASSERT(groupnr == e4b->bd_group)

To fix this, we modify the check to skip block group validation for
exhausted preallocations (where pa_len == 0). Such entries are in a
transitional state and will be removed from the list soon, so they
should not trigger an assertion. This change prevents the false
positive while maintaining the integrity of the checks for active
allocations.

Fixes: c9de560ded61f ("ext4: Add multi block allocator for ext4")
Signed-off-by: Yongjian Sun <sunyongjian1@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251106060614.631382-2-sunyongjian@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2 weeks agoext4: add i_data_sem protection in ext4_destroy_inline_data_nolock()
Alexey Nepomnyashih [Tue, 4 Nov 2025 09:33:25 +0000 (09:33 +0000)]
ext4: add i_data_sem protection in ext4_destroy_inline_data_nolock()

Fix a race between inline data destruction and block mapping.

The function ext4_destroy_inline_data_nolock() changes the inode data
layout by clearing EXT4_INODE_INLINE_DATA and setting EXT4_INODE_EXTENTS.
At the same time, another thread may execute ext4_map_blocks(), which
tests EXT4_INODE_EXTENTS to decide whether to call ext4_ext_map_blocks()
or ext4_ind_map_blocks().

Without i_data_sem protection, ext4_ind_map_blocks() may receive inode
with EXT4_INODE_EXTENTS flag and triggering assert.

kernel BUG at fs/ext4/indirect.c:546!
EXT4-fs (loop2): unmounting filesystem.
invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
RIP: 0010:ext4_ind_map_blocks.cold+0x2b/0x5a fs/ext4/indirect.c:546

Call Trace:
 <TASK>
 ext4_map_blocks+0xb9b/0x16f0 fs/ext4/inode.c:681
 _ext4_get_block+0x242/0x590 fs/ext4/inode.c:822
 ext4_block_write_begin+0x48b/0x12c0 fs/ext4/inode.c:1124
 ext4_write_begin+0x598/0xef0 fs/ext4/inode.c:1255
 ext4_da_write_begin+0x21e/0x9c0 fs/ext4/inode.c:3000
 generic_perform_write+0x259/0x5d0 mm/filemap.c:3846
 ext4_buffered_write_iter+0x15b/0x470 fs/ext4/file.c:285
 ext4_file_write_iter+0x8e0/0x17f0 fs/ext4/file.c:679
 call_write_iter include/linux/fs.h:2271 [inline]
 do_iter_readv_writev+0x212/0x3c0 fs/read_write.c:735
 do_iter_write+0x186/0x710 fs/read_write.c:861
 vfs_iter_write+0x70/0xa0 fs/read_write.c:902
 iter_file_splice_write+0x73b/0xc90 fs/splice.c:685
 do_splice_from fs/splice.c:763 [inline]
 direct_splice_actor+0x10f/0x170 fs/splice.c:950
 splice_direct_to_actor+0x33a/0xa10 fs/splice.c:896
 do_splice_direct+0x1a9/0x280 fs/splice.c:1002
 do_sendfile+0xb13/0x12c0 fs/read_write.c:1255
 __do_sys_sendfile64 fs/read_write.c:1323 [inline]
 __se_sys_sendfile64 fs/read_write.c:1309 [inline]
 __x64_sys_sendfile64+0x1cf/0x210 fs/read_write.c:1309
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x35/0x80 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8

Fixes: c755e251357a ("ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()")
Cc: stable@vger.kernel.org # v4.11+
Signed-off-by: Alexey Nepomnyashih <sdl@nppct.ru>
Message-ID: <20251104093326.697381-1-sdl@nppct.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agoext4: clear i_state_flags when alloc inode
Haibo Chen [Tue, 4 Nov 2025 08:12:24 +0000 (16:12 +0800)]
ext4: clear i_state_flags when alloc inode

i_state_flags used on 32-bit archs, need to clear this flag when
alloc inode.
Find this issue when umount ext4, sometimes track the inode as orphan
accidently, cause ext4 mesg dump.

Fixes: acf943e9768e ("ext4: fix checks for orphan inodes")
Signed-off-by: Haibo Chen <haibo.chen@nxp.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251104-ext4-v1-1-73691a0800f9@nxp.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2 weeks agojbd2: fix the inconsistency between checksum and data in memory for journal sb
Ye Bin [Mon, 3 Nov 2025 01:01:23 +0000 (09:01 +0800)]
jbd2: fix the inconsistency between checksum and data in memory for journal sb

Copying the file system while it is mounted as read-only results in
a mount failure:
[~]# mkfs.ext4 -F /dev/sdc
[~]# mount /dev/sdc -o ro /mnt/test
[~]# dd if=/dev/sdc of=/dev/sda bs=1M
[~]# mount /dev/sda /mnt/test1
[ 1094.849826] JBD2: journal checksum error
[ 1094.850927] EXT4-fs (sda): Could not load journal inode
mount: mount /dev/sda on /mnt/test1 failed: Bad message

The process described above is just an abstracted way I came up with to
reproduce the issue. In the actual scenario, the file system was mounted
read-only and then copied while it was still mounted. It was found that
the mount operation failed. The user intended to verify the data or use
it as a backup, and this action was performed during a version upgrade.
Above issue may happen as follows:
ext4_fill_super
 set_journal_csum_feature_set(sb)
  if (ext4_has_metadata_csum(sb))
   incompat = JBD2_FEATURE_INCOMPAT_CSUM_V3;
  if (test_opt(sb, JOURNAL_CHECKSUM)
   jbd2_journal_set_features(sbi->s_journal, compat, 0, incompat);
    lock_buffer(journal->j_sb_buffer);
    sb->s_feature_incompat  |= cpu_to_be32(incompat);
    //The data in the journal sb was modified, but the checksum was not
      updated, so the data remaining in memory has a mismatch between the
      data and the checksum.
    unlock_buffer(journal->j_sb_buffer);

In this case, the journal sb copied over is in a state where the checksum
and data are inconsistent, so mounting fails.
To solve the above issue, update the checksum in memory after modifying
the journal sb.

Fixes: 4fd5ea43bc11 ("jbd2: checksum journal superblock")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251103010123.3753631-1-yebin@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2 weeks agoext4: check if mount_opts is NUL-terminated in ext4_ioctl_set_tune_sb()
Fedor Pchelkin [Sat, 1 Nov 2025 16:04:29 +0000 (19:04 +0300)]
ext4: check if mount_opts is NUL-terminated in ext4_ioctl_set_tune_sb()

params.mount_opts may come as potentially non-NUL-term string.  Userspace
is expected to pass a NUL-term string.  Add an extra check to ensure this
holds true.  Note that further code utilizes strscpy_pad() so this is just
for proper informing the user of incorrect data being provided.

Found by Linux Verification Center (linuxtesting.org).

Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251101160430.222297-2-pchelkin@ispras.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
2 weeks agoext4: fix string copying in parse_apply_sb_mount_options()
Fedor Pchelkin [Sat, 1 Nov 2025 16:04:28 +0000 (19:04 +0300)]
ext4: fix string copying in parse_apply_sb_mount_options()

strscpy_pad() can't be used to copy a non-NUL-term string into a NUL-term
string of possibly bigger size.  Commit 0efc5990bca5 ("string.h: Introduce
memtostr() and memtostr_pad()") provides additional information in that
regard.  So if this happens, the following warning is observed:

strnlen: detected buffer overflow: 65 byte read of buffer size 64
WARNING: CPU: 0 PID: 28655 at lib/string_helpers.c:1032 __fortify_report+0x96/0xc0 lib/string_helpers.c:1032
Modules linked in:
CPU: 0 UID: 0 PID: 28655 Comm: syz-executor.3 Not tainted 6.12.54-syzkaller-00144-g5f0270f1ba00 #0
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:__fortify_report+0x96/0xc0 lib/string_helpers.c:1032
Call Trace:
 <TASK>
 __fortify_panic+0x1f/0x30 lib/string_helpers.c:1039
 strnlen include/linux/fortify-string.h:235 [inline]
 sized_strscpy include/linux/fortify-string.h:309 [inline]
 parse_apply_sb_mount_options fs/ext4/super.c:2504 [inline]
 __ext4_fill_super fs/ext4/super.c:5261 [inline]
 ext4_fill_super+0x3c35/0xad00 fs/ext4/super.c:5706
 get_tree_bdev_flags+0x387/0x620 fs/super.c:1636
 vfs_get_tree+0x93/0x380 fs/super.c:1814
 do_new_mount fs/namespace.c:3553 [inline]
 path_mount+0x6ae/0x1f70 fs/namespace.c:3880
 do_mount fs/namespace.c:3893 [inline]
 __do_sys_mount fs/namespace.c:4103 [inline]
 __se_sys_mount fs/namespace.c:4080 [inline]
 __x64_sys_mount+0x280/0x300 fs/namespace.c:4080
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x64/0x140 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

Since userspace is expected to provide s_mount_opts field to be at most 63
characters long with the ending byte being NUL-term, use a 64-byte buffer
which matches the size of s_mount_opts, so that strscpy_pad() does its job
properly.  Return with error if the user still managed to provide a
non-NUL-term string here.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Fixes: 8ecb790ea8c3 ("ext4: avoid potential buffer over-read in parse_apply_sb_mount_options()")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251101160430.222297-1-pchelkin@ispras.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agojbd2: store more accurate errno in superblock when possible
Wengang Wang [Fri, 31 Oct 2025 21:05:01 +0000 (14:05 -0700)]
jbd2: store more accurate errno in superblock when possible

When jbd2_journal_abort() is called, the provided error code is stored
in the journal superblock. Some existing calls hard-code -EIO even when
the actual failure is not I/O related.

This patch updates those calls to pass more accurate error codes,
allowing the superblock to record the true cause of failure. This helps
improve diagnostics and debugging clarity when analyzing journal aborts.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reviewed-by: Zhang Yi <yi.zhang@huawei.com>
Message-ID: <20251031210501.7337-1-wen.gang.wang@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2 weeks agojbd2: avoid bug_on in jbd2_journal_get_create_access() when file system corrupted
Ye Bin [Sat, 25 Oct 2025 07:26:57 +0000 (15:26 +0800)]
jbd2: avoid bug_on in jbd2_journal_get_create_access() when file system corrupted

There's issue when file system corrupted:
------------[ cut here ]------------
kernel BUG at fs/jbd2/transaction.c:1289!
Oops: invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 5 UID: 0 PID: 2031 Comm: mkdir Not tainted 6.18.0-rc1-next
RIP: 0010:jbd2_journal_get_create_access+0x3b6/0x4d0
RSP: 0018:ffff888117aafa30 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88811a86b000 RCX: ffffffff89a63534
RDX: 1ffff110200ec602 RSI: 0000000000000004 RDI: ffff888100763010
RBP: ffff888100763000 R08: 0000000000000001 R09: ffff888100763028
R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000000
R13: ffff88812c432000 R14: ffff88812c608000 R15: ffff888120bfc000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f91d6970c99 CR3: 00000001159c4000 CR4: 00000000000006f0
Call Trace:
 <TASK>
 __ext4_journal_get_create_access+0x42/0x170
 ext4_getblk+0x319/0x6f0
 ext4_bread+0x11/0x100
 ext4_append+0x1e6/0x4a0
 ext4_init_new_dir+0x145/0x1d0
 ext4_mkdir+0x326/0x920
 vfs_mkdir+0x45c/0x740
 do_mkdirat+0x234/0x2f0
 __x64_sys_mkdir+0xd6/0x120
 do_syscall_64+0x5f/0xfa0
 entry_SYSCALL_64_after_hwframe+0x76/0x7e

The above issue occurs with us in errors=continue mode when accompanied by
storage failures. There have been many inconsistencies in the file system
data.
In the case of file system data inconsistency, for example, if the block
bitmap of a referenced block is not set, it can lead to the situation where
a block being committed is allocated and used again. As a result, the
following condition will not be satisfied then trigger BUG_ON. Of course,
it is entirely possible to construct a problematic image that can trigger
this BUG_ON through specific operations. In fact, I have constructed such
an image and easily reproduced this issue.
Therefore, J_ASSERT() holds true only under ideal conditions, but it may
not necessarily be satisfied in exceptional scenarios. Using J_ASSERT()
directly in abnormal situations would cause the system to crash, which is
clearly not what we want. So here we directly trigger a JBD abort instead
of immediately invoking BUG_ON.

Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251025072657.307851-1-yebin@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
4 weeks agojbd2: use a weaker annotation in journal handling
Byungchul Park [Fri, 24 Oct 2025 07:39:40 +0000 (16:39 +0900)]
jbd2: use a weaker annotation in journal handling

jbd2 journal handling code doesn't want jbd2_might_wait_for_commit()
to be placed between start_this_handle() and stop_this_handle().  So it
marks the region with rwsem_acquire_read() and rwsem_release().

However, the annotation is too strong for that purpose.  We don't have
to use more than try lock annotation for that.

rwsem_acquire_read() implies:

   1. might be a waiter on contention of the lock.
   2. enter to the critical section of the lock.

All we need in here is to act 2, not 1.  So trylock version of
annotation is sufficient for that purpose.  Now that dept partially
relies on lockdep annotaions, dept interpets rwsem_acquire_read() as a
potential wait and might report a deadlock by the wait.

Replace it with trylock version of annotation.

Signed-off-by: Byungchul Park <byungchul@sk.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Message-ID: <20251024073940.1063-1-byungchul@sk.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
4 weeks agojbd2: use a per-journal lock_class_key for jbd2_trans_commit_key
Tetsuo Handa [Wed, 22 Oct 2025 11:11:37 +0000 (20:11 +0900)]
jbd2: use a per-journal lock_class_key for jbd2_trans_commit_key

syzbot is reporting possibility of deadlock due to sharing lock_class_key
for jbd2_handle across ext4 and ocfs2. But this is a false positive, for
one disk partition can't have two filesystems at the same time.

Reported-by: syzbot+6e493c165d26d6fcbf72@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=6e493c165d26d6fcbf72
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Tested-by: syzbot+6e493c165d26d6fcbf72@syzkaller.appspotmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <987110fc-5470-457a-a218-d286a09dd82f@I-love.SAKURA.ne.jp>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
4 weeks agoext4: xattr: fix null pointer deref in ext4_raw_inode()
Karina Yankevich [Wed, 22 Oct 2025 09:32:53 +0000 (12:32 +0300)]
ext4: xattr: fix null pointer deref in ext4_raw_inode()

If ext4_get_inode_loc() fails (e.g. if it returns -EFSCORRUPTED),
iloc.bh will remain set to NULL. Since ext4_xattr_inode_dec_ref_all()
lacks error checking, this will lead to a null pointer dereference
in ext4_raw_inode(), called right after ext4_get_inode_loc().

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Fixes: c8e008b60492 ("ext4: ignore xattrs past end")
Cc: stable@kernel.org
Signed-off-by: Karina Yankevich <k.yankevich@omp.ru>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Baokun Li <libaokun1@huawei.com>
Message-ID: <20251022093253.3546296-1-k.yankevich@omp.ru>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
4 weeks agoext4: refresh inline data size before write operations
Deepanshu Kartikey [Mon, 20 Oct 2025 06:09:36 +0000 (11:39 +0530)]
ext4: refresh inline data size before write operations

The cached ei->i_inline_size can become stale between the initial size
check and when ext4_update_inline_data()/ext4_create_inline_data() use
it. Although ext4_get_max_inline_size() reads the correct value at the
time of the check, concurrent xattr operations can modify i_inline_size
before ext4_write_lock_xattr() is acquired.

This causes ext4_update_inline_data() and ext4_create_inline_data() to
work with stale capacity values, leading to a BUG_ON() crash in
ext4_write_inline_data():

  kernel BUG at fs/ext4/inline.c:1331!
  BUG_ON(pos + len > EXT4_I(inode)->i_inline_size);

The race window:
1. ext4_get_max_inline_size() reads i_inline_size = 60 (correct)
2. Size check passes for 50-byte write
3. [Another thread adds xattr, i_inline_size changes to 40]
4. ext4_write_lock_xattr() acquires lock
5. ext4_update_inline_data() uses stale i_inline_size = 60
6. Attempts to write 50 bytes but only 40 bytes actually available
7. BUG_ON() triggers

Fix this by recalculating i_inline_size via ext4_find_inline_data_nolock()
immediately after acquiring xattr_sem. This ensures ext4_update_inline_data()
and ext4_create_inline_data() work with current values that are protected
from concurrent modifications.

This is similar to commit a54c4613dac1 ("ext4: fix race writing to an
inline_data file while its xattrs are changing") which fixed i_inline_off
staleness. This patch addresses the related i_inline_size staleness issue.

Reported-by: syzbot+f3185be57d7e8dda32b8@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?extid=f3185be57d7e8dda32b8
Cc: stable@kernel.org
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Message-ID: <20251020060936.474314-1-kartikey406@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 weeks agoext4: add two trace points for moving extents
Zhang Yi [Mon, 13 Oct 2025 01:51:28 +0000 (09:51 +0800)]
ext4: add two trace points for moving extents

To facilitate tracking the length, type, and outcome of the move extent,
add a trace point at both the entry and exit of mext_move_extent().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251013015128.499308-13-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 weeks agoext4: add large folios support for moving extents
Zhang Yi [Mon, 13 Oct 2025 01:51:27 +0000 (09:51 +0800)]
ext4: add large folios support for moving extents

Pass the moving extent length into mext_folio_double_lock() so that it
can acquire a higher-order folio if the length exceeds PAGE_SIZE. This
can speed up extent moving when the extent is larger than one page.
Additionally, remove the unnecessary comments from
mext_folio_double_lock().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251013015128.499308-12-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 weeks agoext4: switch to using the new extent movement method
Zhang Yi [Mon, 13 Oct 2025 01:51:26 +0000 (09:51 +0800)]
ext4: switch to using the new extent movement method

Now that we have mext_move_extent(), we can switch to this new interface
and deprecate move_extent_per_page(). First, after acquiring the
i_rwsem, we can directly use ext4_map_blocks() to obtain a contiguous
extent from the original inode as the extent to be moved. It can and
it's safe to get mapping information from the extent status tree without
needing to access the ondisk extent tree, because ext4_move_extent()
will check the sequence cookie under the folio lock. Then, after
populating the mext_data structure, we call ext4_move_extent() to move
the extent. Finally, the length of the extent will be adjusted in
mext.orig_map.m_len and the actual length moved is returned through
m_len.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251013015128.499308-11-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 weeks agoext4: introduce mext_move_extent()
Zhang Yi [Mon, 13 Oct 2025 01:51:25 +0000 (09:51 +0800)]
ext4: introduce mext_move_extent()

When moving extents, the current move_extent_per_page() process can only
move extents of length PAGE_SIZE at a time, which is highly inefficient,
especially when the fragmentation of the file is not particularly
severe, this will result in a large number of unnecessary extent split
and merge operations. Moreover, since the ext4 file system now supports
large folios, using PAGE_SIZE as the processing unit is no longer
practical.

Therefore, introduce a new move extents method, mext_move_extent(). It
moves one extent of the origin inode at a time, but not exceeding the
size of a folio. The parameters for the move are passed through the new
mext_data data structure, which includes the origin inode, donor inode,
the mapping extent of the origin inode to be moved, and the starting
offset of the donor inode.

The move process is similar to move_extent_per_page() and can be
categorized into three types: MEXT_SKIP_EXTENT, MEXT_MOVE_EXTENT, and
MEXT_COPY_DATA. MEXT_SKIP_EXTENT indicates that the corresponding area
of the donor file is a hole, meaning no actual space is allocated, so
the move is skipped. MEXT_MOVE_EXTENT indicates that the corresponding
areas of both the origin and donor files are unwritten, so no data needs
to be copied; only the extents are swapped. MEXT_COPY_DATA indicates
that the corresponding areas of both the origin and donor files contain
data, so data must be copied. The data copying is performed in three
steps: first, the data from the original location is read into the page
cache; then, the extents are swapped, and the page cache is rebuilt to
reflect the index of the physical blocks; finally, the dirty page cache
is marked and written back to ensure that the data is written to disk
before the metadata is persisted.

One important point to note is that the folio lock and i_data_sem are
held only during the moving process. Therefore, before moving an extent,
it is necessary to check whether the sequence cookie of the area to be
moved has changed while holding the folio lock. If a change is detected,
it indicates that concurrent write-back operations may have occurred
during this period, and the type of the extent to be moved can no longer
be considered reliable. For example, it may have changed from unwritten
to written. In such cases, return -ESTALE, and the calling function
should reacquire the move extent of the original file and retry the
movement.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251013015128.499308-10-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 weeks agoext4: rename mext_page_mkuptodate() to mext_folio_mkuptodate()
Zhang Yi [Mon, 13 Oct 2025 01:51:24 +0000 (09:51 +0800)]
ext4: rename mext_page_mkuptodate() to mext_folio_mkuptodate()

mext_page_mkuptodate() no longer works on a single page, so rename it to
mext_folio_mkuptodate().

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251013015128.499308-9-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 weeks agoext4: refactor mext_check_arguments()
Zhang Yi [Mon, 13 Oct 2025 01:51:23 +0000 (09:51 +0800)]
ext4: refactor mext_check_arguments()

When moving extents, mext_check_validity() performs some basic file
system and file checks. However, some essential checks need to be
performed after acquiring the i_rwsem are still scattered in
mext_check_arguments(). Move those checks into mext_check_validity() and
make it executes entirely under the i_rwsem to make the checks clearer.
Furthermore, rename mext_check_arguments() to mext_check_adjust_range(),
as it only performs checks and length adjustments on the move extent
range. Finally, also change the print message for the non-existent file
check to be consistent with other unsupported checks.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251013015128.499308-8-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 weeks agoext4: add mext_check_validity() to do basic check
Zhang Yi [Mon, 13 Oct 2025 01:51:22 +0000 (09:51 +0800)]
ext4: add mext_check_validity() to do basic check

Currently, the basic validation checks during the move extent operation
are scattered across __ext4_ioctl() and ext4_move_extents(), which makes
the code somewhat disorganized. Introduce a new helper,
mext_check_validity(), to handle these checks. This change involves only
code relocation without any logical modifications.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251013015128.499308-7-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 weeks agoext4: use EXT4_B_TO_LBLK() in mext_check_arguments()
Zhang Yi [Mon, 13 Oct 2025 01:51:21 +0000 (09:51 +0800)]
ext4: use EXT4_B_TO_LBLK() in mext_check_arguments()

Switch to using EXT4_B_TO_LBLK() to calculate the EOF position of the
origin and donor inodes, instead of using open-coded calculations.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251013015128.499308-6-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 weeks agoext4: pass out extent seq counter when mapping blocks
Zhang Yi [Mon, 13 Oct 2025 01:51:20 +0000 (09:51 +0800)]
ext4: pass out extent seq counter when mapping blocks

When creating or querying mapping blocks using the ext4_map_blocks() and
ext4_map_{query|create}_blocks() helpers, also pass out the extent
sequence number of the block mapping info through the ext4_map_blocks
structure. This sequence number can later serve as a valid cookie within
iomap infrastructure and the move extents procedure.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251013015128.499308-5-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 weeks agoext4: make ext4_es_lookup_extent() pass out the extent seq counter
Zhang Yi [Mon, 13 Oct 2025 01:51:19 +0000 (09:51 +0800)]
ext4: make ext4_es_lookup_extent() pass out the extent seq counter

When querying extents in the extent status tree, we should hold the
data_sem if we want to obtain the sequence number as a valid cookie
simultaneously. However, currently, ext4_map_blocks() calls
ext4_es_lookup_extent() without holding data_sem. Therefore, we should
acquire i_es_lock instead, which also ensures that the sequence cookie
and the extent remain consistent. Consequently, make
ext4_es_lookup_extent() to pass out the sequence number when necessary.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251013015128.499308-4-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 weeks agoext4: introduce seq counter for the extent status entry
Zhang Yi [Mon, 13 Oct 2025 01:51:18 +0000 (09:51 +0800)]
ext4: introduce seq counter for the extent status entry

In the iomap_write_iter(), the iomap buffered write frame does not hold
any locks between querying the inode extent mapping info and performing
page cache writes. As a result, the extent mapping can be changed due to
concurrent I/O in flight. Similarly, in the iomap_writepage_map(), the
write-back process faces a similar problem: concurrent changes can
invalidate the extent mapping before the I/O is submitted.

Therefore, both of these processes must recheck the mapping info after
acquiring the folio lock. To address this, similar to XFS, we propose
introducing an extent sequence number to serve as a validity cookie for
the extent. After commit 24b7a2331fcd ("ext4: clairfy the rules for
modifying extents"), we can ensure the extent information should always
be processed through the extent status tree, and the extent status tree
is always uptodate under i_rwsem or invalidate_lock or folio lock, so
it's safe to introduce this sequence number. The sequence number will be
increased whenever the extent status tree changes, preparing for the
buffered write iomap conversion.

Besides, this mechanism is also applicable for the moving extents case.
In move_extent_per_page(), it also needs to reacquire data_sem and check
the mapping info again under the folio lock.

Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251013015128.499308-3-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 weeks agoext4: correct the checking of quota files before moving extents
Zhang Yi [Mon, 13 Oct 2025 01:51:17 +0000 (09:51 +0800)]
ext4: correct the checking of quota files before moving extents

The move extent operation should return -EOPNOTSUPP if any of the inodes
is a quota inode, rather than requiring both to be quota inodes.

Fixes: 02749a4c2082 ("ext4: add ext4_is_quota_file()")
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-ID: <20251013015128.499308-2-yi.zhang@huaweicloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 weeks agofs: ext4: fix uninitialized symbols
Ranganath V N [Sat, 11 Oct 2025 06:38:29 +0000 (12:08 +0530)]
fs: ext4: fix uninitialized symbols

Fix the issue detected by the smatch tool.

fs/ext4/inode.c:3583 ext4_map_blocks_atomic_write_slow() error: uninitialized symbol 'next_pblk'.
fs/ext4/namei.c:1776 ext4_lookup() error: uninitialized symbol 'de'.
fs/ext4/namei.c:1829 ext4_get_parent() error: uninitialized symbol 'de'.
fs/ext4/namei.c:3162 ext4_rmdir() error: uninitialized symbol 'de'.
fs/ext4/namei.c:3242 __ext4_unlink() error: uninitialized symbol 'de'.
fs/ext4/namei.c:3697 ext4_find_delete_entry() error: uninitialized symbol 'de'.

These changes enhance code clarity, address static analysis tool errors.

Signed-off-by: Ranganath V N <vnranganath.20@gmail.com>
Message-ID: <20251011063830.47485-1-vnranganath.20@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
5 weeks agoext4: make error code in __ext4fs_dirhash() consistent.
Julian Sun [Fri, 10 Oct 2025 09:52:57 +0000 (17:52 +0800)]
ext4: make error code in __ext4fs_dirhash() consistent.

Currently __ext4fs_dirhash() returns -1 (-EPERM) if fscrypt doesn't
have encryption key, which may confuse users. Make the error code here
consistent with existing error code.

Signed-off-by: Julian Sun <sunjunchao@bytedance.com>
Message-ID: <20251010095257.3008275-1-sunjunchao@bytedance.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
6 weeks agoLinux 6.18-rc4 v6.18-rc4
Linus Torvalds [Sun, 2 Nov 2025 19:28:02 +0000 (11:28 -0800)]
Linux 6.18-rc4

6 weeks agoMerge tag 'spi-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/brooni...
Linus Torvalds [Sat, 1 Nov 2025 17:50:43 +0000 (10:50 -0700)]
Merge tag 'spi-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fix from Mark Brown:
 "One new device ID for an Intel SoC"

* tag 'spi-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: intel: Add support for Oak Stream SPI serial flash

6 weeks agoMerge tag 'regulator-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 1 Nov 2025 17:49:12 +0000 (10:49 -0700)]
Merge tag 'regulator-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
 "A simple fix for a missed part of an API conversion in the bd718x7
  driver"

* tag 'regulator-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: bd718x7: Fix voltages scaled by resistor divider

6 weeks agoMerge tag 'regmap-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 1 Nov 2025 17:45:39 +0000 (10:45 -0700)]
Merge tag 'regmap-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

Pull regmap fixes from Mark Brown:
 "One documentation fix and a fix for a problem with the slimbus regmap
  which was uncovered by some changes in one of the drivers"

* tag 'regmap-fix-v6.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: irq: Correct documentation of wake_invert flag
  regmap: slimbus: fix bus_context pointer in regmap init calls

6 weeks agoMerge tag 'x86-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 1 Nov 2025 17:20:07 +0000 (10:20 -0700)]
Merge tag 'x86-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes from Ingo Molnar:

 - Limit AMD microcode Entrysign sha256 signature checking to
   known CPU generations

 - Disable AMD RDSEED32 on certain Zen5 CPUs that have a
   microcode version before when the microcode-based fix was
   issued for the AMD-SB-7055 erratum

 - Fix FPU AMD XFD state synchronization on signal delivery

 - Fix (work around) a SSE4a-disassembly related build failure
   on X86_NATIVE_CPU=y builds

 - Extend the AMD Zen6 model space with a new range of models

 - Fix <asm/intel-family.h> CPU model comments

 - Fix the CONFIG_CFI=y and CONFIG_LTO_CLANG_FULL=y build, which
   was unhappy due to missing kCFI type annotations of clear_page()
   variants

* tag 'x86-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Ensure clear_page() variants always have __kcfi_typeid_ symbols
  x86/cpu: Add/fix core comments for {Panther,Nova} Lake
  x86/CPU/AMD: Extend Zen6 model range
  x86/build: Disable SSE4a
  x86/fpu: Ensure XFD state on signal delivery
  x86/CPU/AMD: Add RDSEED fix for Zen5
  x86/microcode/AMD: Limit Entrysign signature checking to known generations

6 weeks agoMerge tag 'perf-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 1 Nov 2025 17:17:40 +0000 (10:17 -0700)]
Merge tag 'perf-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf event fixes from Ingo Molnar:
 "Miscellaneous fixes and CPU model updates:

   - Fix an out-of-bounds access on non-hybrid platforms in the Intel
     PMU DS code, reported by KASAN

   - Add WildcatLake PMU and uncore support: it's identical to the
     PantherLake version"

* tag 'perf-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel/uncore: Add uncore PMU support for Wildcat Lake
  perf/x86/intel: Add PMU support for WildcatLake
  perf/x86/intel: Fix KASAN global-out-of-bounds warning

6 weeks agoMerge tag 'objtool-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 1 Nov 2025 17:07:35 +0000 (10:07 -0700)]
Merge tag 'objtool-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool fix from Ingo Molnar:
 "Fix objtool warning when faced with raw STAC/CLAC instructions"

* tag 'objtool-urgent-2025-11-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Fix skip_alt_group() for non-alternative STAC/CLAC

6 weeks agoMerge tag 'xfs-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Linus Torvalds [Sat, 1 Nov 2025 17:04:35 +0000 (10:04 -0700)]
Merge tag 'xfs-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Carlos Maiolino:
 "Just a single bug fix (and documentation for the issue)"

* tag 'xfs-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: document another racy GC case in xfs_zoned_map_extent
  xfs: prevent gc from picking the same zone twice

6 weeks agoMerge tag 'kbuild-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 1 Nov 2025 17:00:53 +0000 (10:00 -0700)]
Merge tag 'kbuild-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux

Pull Kbuild fixes from Nathan Chancellor:

 - Formally adopt Kconfig in MAINTAINERS

 - Fix install-extmod-build for more O= paths

 - Align end of .modinfo to fix Authenticode calculation in EDK2

 - Restore dynamic check for '-fsanitize=kernel-memory' in
   CONFIG_HAVE_KMSAN_COMPILER to ensure backend target has support
   for it

 - Initialize locale in menuconfig and nconfig to fix UTF-8 terminals
   that may not support VT100 ACS by default like PuTTY

* tag 'kbuild-fixes-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
  kconfig/nconf: Initialize the default locale at startup
  kconfig/mconf: Initialize the default locale at startup
  KMSAN: Restore dynamic check for '-fsanitize=kernel-memory'
  kbuild: align modinfo section for Secureboot Authenticode EDK2 compat
  kbuild: install-extmod-build: Fix when given dir outside the build dir
  MAINTAINERS: Update Kconfig section

6 weeks agoobjtool: Fix skip_alt_group() for non-alternative STAC/CLAC
Josh Poimboeuf [Wed, 29 Oct 2025 19:54:08 +0000 (12:54 -0700)]
objtool: Fix skip_alt_group() for non-alternative STAC/CLAC

If an insn->alt points to a STAC/CLAC instruction, skip_alt_group()
assumes it's part of an alternative ("alt group") as opposed to some
other kind of "alt" such as an exception fixup.

While that assumption may hold true in the current code base, Linus has
an out-of-tree patch which breaks that assumption by replacing the
STAC/CLAC alternatives with raw STAC/CLAC instructions.

Make skip_alt_group() more robust by making sure it's actually an alt
group before continuing.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: 2d12c6fb7875 ("objtool: Remove ANNOTATE_IGNORE_ALTERNATIVE from CLAC/STAC")
Closes: https://lore.kernel.org/CAHk-=wi6goUT36sR8GE47_P-aVrd5g38=VTRHpktWARbyE-0ow@mail.gmail.com
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://patch.msgid.link/3d22415f7b8e06a64e0873b21f48389290eeaa49.1761767616.git.jpoimboe@kernel.org
6 weeks agokconfig/nconf: Initialize the default locale at startup
Jakub Horký [Tue, 14 Oct 2025 14:44:06 +0000 (16:44 +0200)]
kconfig/nconf: Initialize the default locale at startup

Fix bug where make nconfig doesn't initialize the default locale, which
causes ncurses menu borders to be displayed incorrectly (lqqqqk) in
UTF-8 terminals that don't support VT100 ACS by default, such as PuTTY.

Signed-off-by: Jakub Horký <jakub.git@horky.net>
Link: https://patch.msgid.link/20251014144405.3975275-2-jakub.git@horky.net
[nathan: Alphabetize locale.h include]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
6 weeks agokconfig/mconf: Initialize the default locale at startup
Jakub Horký [Tue, 14 Oct 2025 15:49:32 +0000 (17:49 +0200)]
kconfig/mconf: Initialize the default locale at startup

Fix bug where make menuconfig doesn't initialize the default locale, which
causes ncurses menu borders to be displayed incorrectly (lqqqqk) in
UTF-8 terminals that don't support VT100 ACS by default, such as PuTTY.

Signed-off-by: Jakub Horký <jakub.git@horky.net>
Link: https://patch.msgid.link/20251014154933.3990990-1-jakub.git@horky.net
[nathan: Alphabetize locale.h include]
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
6 weeks agoMerge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Linus Torvalds [Sat, 1 Nov 2025 01:22:26 +0000 (18:22 -0700)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

 - Mark migrate_disable/enable() as always_inline to avoid issues with
   partial inlining (Yonghong Song)

 - Fix powerpc stack register definition in libbpf bpf_tracing.h (Andrii
   Nakryiko)

 - Reject negative head_room in __bpf_skb_change_head (Daniel Borkmann)

 - Conditionally include dynptr copy kfuncs (Malin Jonsson)

 - Sync pending IRQ work before freeing BPF ring buffer (Noorain Eqbal)

 - Do not audit capability check in x86 do_jit() (Ondrej Mosnacek)

 - Fix arm64 JIT of BPF_ST insn when it writes into arena memory
   (Puranjay Mohan)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf/arm64: Fix BPF_ST into arena memory
  bpf: Make migrate_disable always inline to avoid partial inlining
  bpf: Reject negative head_room in __bpf_skb_change_head
  bpf: Conditionally include dynptr copy kfuncs
  libbpf: Fix powerpc's stack register definition in bpf_tracing.h
  bpf: Do not audit capability check in do_jit()
  bpf: Sync pending IRQ work before freeing ring buffer

6 weeks agox86/mm: Ensure clear_page() variants always have __kcfi_typeid_ symbols
Nathan Chancellor [Mon, 13 Oct 2025 21:27:36 +0000 (14:27 -0700)]
x86/mm: Ensure clear_page() variants always have __kcfi_typeid_ symbols

When building with CONFIG_CFI=y and CONFIG_LTO_CLANG_FULL=y, there is a series
of errors from the various versions of clear_page() not having __kcfi_typeid_
symbols.

  $ cat kernel/configs/repro.config
  CONFIG_CFI=y
  # CONFIG_LTO_NONE is not set
  CONFIG_LTO_CLANG_FULL=y

  $ make -skj"$(nproc)" ARCH=x86_64 LLVM=1 clean defconfig repro.config bzImage
  ld.lld: error: undefined symbol: __kcfi_typeid_clear_page_rep
  >>> referenced by ld-temp.o
  >>>               vmlinux.o:(__cfi_clear_page_rep)

  ld.lld: error: undefined symbol: __kcfi_typeid_clear_page_orig
  >>> referenced by ld-temp.o
  >>>               vmlinux.o:(__cfi_clear_page_orig)

  ld.lld: error: undefined symbol: __kcfi_typeid_clear_page_erms
  >>> referenced by ld-temp.o
  >>>               vmlinux.o:(__cfi_clear_page_erms)

With full LTO, it is possible for LLVM to realize that these functions never
have their address taken (as they are only used within an alternative, which
will make them a direct call) across the whole kernel and either drop or skip
generating their kCFI type identification symbols.

clear_page_{rep,orig,erms}() are defined in clear_page_64.S with
SYM_TYPED_FUNC_START as a result of

  2981557cb040 ("x86,kcfi: Fix EXPORT_SYMBOL vs kCFI"),

as exported functions are free to be called indirectly thus need kCFI type
identifiers.

Use KCFI_REFERENCE with these clear_page() functions to force LLVM to see
these functions as address-taken and generate then keep the kCFI type
identifiers.

Fixes: 2981557cb040 ("x86,kcfi: Fix EXPORT_SYMBOL vs kCFI")
Closes: https://github.com/ClangBuiltLinux/linux/issues/2128
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://patch.msgid.link/20251013-x86-fix-clear_page-cfi-full-lto-errors-v1-1-d69534c0be61@kernel.org
6 weeks agoMerge tag 'drm-fixes-2025-10-31' of https://gitlab.freedesktop.org/drm/kernel
Linus Torvalds [Fri, 31 Oct 2025 21:47:02 +0000 (14:47 -0700)]
Merge tag 'drm-fixes-2025-10-31' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Simona Vetter:
 "Looks like stochastics conspired to make this one a bit bigger, but
  nothing scary at all. Also first examples of the new Link: tags, yay!

  Next week Dave should be back.

  Drivers:
   - mediatek: uaf in unbind, fixes -rc2 boot regression
   - radeon: devm conversion fixes
   - amdgpu: VPE idle handler, re-enable DM idle optimization, DCN3,
     SMU, vblank, HDP eDP, powerplay fixes for fiji/iceland
   - msm: bunch of gem error path fixes, gmu fw parsing fix, dpu fixes
   - intel: fix dmc/dc6 asserts on ADL-S
   - xe: fix xe_validation_guard(), wake device handling around gt reset
   - ast: fix display output on AST2300
   - etnaviv: fix gpu flush
   - imx: fix parallel bridge handling
   - nouveau: scheduler locking fix
   - panel: fixes for kingdisplay-kd097d04 and sitronix-st7789v

  Core Changes:
   - CI: disable broken sanity job
   - sysfb: fix NULL pointer access
   - sched: fix SIGKILL handling, locking for race condition
   - dma_fence: better timeline name for signalled fences"

* tag 'drm-fixes-2025-10-31' of https://gitlab.freedesktop.org/drm/kernel: (44 commits)
  drm/ast: Clear preserved bits from register output value
  drm/imx: parallel-display: add the bridge before attaching it
  drm/imx: parallel-display: convert to devm_drm_bridge_alloc() API
  drm/panel: kingdisplay-kd097d04: Disable EoTp
  drm/panel: sitronix-st7789v: fix sync flags for t28cp45tn89
  drm/xe: Do not wake device during a GT reset
  drm/xe: Fix uninitialized return value from xe_validation_guard()
  drm/msm/dpu: Fix adjusted mode clock check for 3d merge
  drm/msm/dpu: Disable broken YUV on QSEED2 hardware
  drm/msm/dpu: Require linear modifier for writeback framebuffers
  drm/msm/dpu: Fix pixel extension sub-sampling
  drm/msm/dpu: Disable scaling for unsupported scaler types
  drm/msm/dpu: Propagate error from dpu_assign_plane_resources
  drm/msm/dpu: Fix allocation of RGB SSPPs without scaling
  drm/msm: dsi: fix PLL init in bonded mode
  drm/i915/dmc: Clear HRR EVT_CTL/HTP to zero on ADL-S
  drm/amd/display: Fix incorrect return of vblank enable on unconfigured crtc
  drm/amd/display: Add HDR workaround for a specific eDP
  drm/amdgpu: fix SPDX header on cyan_skillfish_reg_init.c
  drm/amdgpu: fix SPDX header on irqsrcs_vcn_5_0.h
  ...

6 weeks agoMerge tag 'pci-v6.18-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Linus Torvalds [Fri, 31 Oct 2025 21:24:32 +0000 (14:24 -0700)]
Merge tag 'pci-v6.18-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

 - Restore custom qcom ASPM enablement code so L1 PM Substates are
   enabled as they were in v6.17 even though the PCI core now enables
   just L0s and L1 by default (Bjorn Helgaas)

 - Size prefetchable bridge windows only when they actually exist, to
   avoid a WARN_ON() regression (Ilpo Järvinen)

* tag 'pci-v6.18-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI: Do not size non-existing prefetchable window
  Revert "PCI: qcom: Remove custom ASPM enablement code"

6 weeks agoMerge tag 'vfio-v6.18-rc4' of https://github.com/awilliam/linux-vfio
Linus Torvalds [Fri, 31 Oct 2025 21:20:09 +0000 (14:20 -0700)]
Merge tag 'vfio-v6.18-rc4' of https://github.com/awilliam/linux-vfio

Pull VFIO fixes from Alex Williamson:

 - Fix overflows in vfio type1 backend for mappings at the end of the
   64-bit address space, resulting in leaked pinned memory.

   New selftest support included to avoid such issues in the future
   (Alex Mastro)

* tag 'vfio-v6.18-rc4' of https://github.com/awilliam/linux-vfio:
  vfio: selftests: add end of address space DMA map/unmap tests
  vfio: selftests: update DMA map/unmap helpers to support more test kinds
  vfio/type1: handle DMA map/unmap up to the addressable limit
  vfio/type1: move iova increment to unmap_unpin_*() caller
  vfio/type1: sanitize for overflow using check_*_overflow()

6 weeks agoPCI: Do not size non-existing prefetchable window
Ilpo Järvinen [Mon, 27 Oct 2025 13:24:23 +0000 (15:24 +0200)]
PCI: Do not size non-existing prefetchable window

pbus_size_mem() should only be called for bridge windows that exist but
__pci_bus_size_bridges() may point 'pref' to a resource that does not exist
(has zero flags) in case of non-root buses.

When prefetchable bridge window does not exist, the same non-prefetchable
bridge window is sized more than once which may result in duplicating
entries into the realloc_head list. Duplicated entries are shown in this
log and trigger a WARN_ON() because realloc_head had residual entries after
the resource assignment algorithm:

  pci 0000:00:03.0: [11ab:6820] type 01 class 0x060400 PCIe Root Port
  pci 0000:00:03.0: PCI bridge to [bus 00]
  pci 0000:00:03.0:   bridge window [io  0x0000-0x0fff]
  pci 0000:00:03.0:   bridge window [mem 0x00000000-0x000fffff]
  pci 0000:00:03.0: bridge window [mem 0x00200000-0x003fffff] to [bus 02] add_size 200000 add_align 200000
  pci 0000:00:03.0: bridge window [mem 0x00200000-0x003fffff] to [bus 02] add_size 200000 add_align 200000
  pci 0000:00:03.0: bridge window [mem 0xe0000000-0xe03fffff]: assigned
  pci 0000:00:03.0: PCI bridge to [bus 02]
  pci 0000:00:03.0:   bridge window [mem 0xe0000000-0xe03fffff]
  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 1 at drivers/pci/setup-bus.c:2373 pci_assign_unassigned_root_bus_resources+0x1bc/0x234

Check resource flags of 'pref' and only size the prefetchable window if the
resource has the IORESOURCE_PREFETCH flag.

Fixes: ae88d0b9c57f ("PCI: Use pbus_select_window_for_type() during mem window sizing")
Reported-by: Klaus Kudielka <klaus.kudielka@gmail.com>
Closes: https://lore.kernel.org/r/51e8cf1c62b8318882257d6b5a9de7fdaaecc343.camel@gmail.com/
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Klaus Kudielka <klaus.kudielka@gmail.com>
Link: https://patch.msgid.link/20251027132423.8841-1-ilpo.jarvinen@linux.intel.com
6 weeks agoRevert "PCI: qcom: Remove custom ASPM enablement code"
Bjorn Helgaas [Fri, 24 Oct 2025 19:41:40 +0000 (14:41 -0500)]
Revert "PCI: qcom: Remove custom ASPM enablement code"

This reverts commit a729c16646198872e345bf6c48dbe540ad8a9753.

Prior to a729c1664619 ("PCI: qcom: Remove custom ASPM enablement code"),
the qcom controller driver enabled ASPM, including L0s, L1, and L1 PM
Substates, for all devices powered on at the time the controller driver
enumerates them.

ASPM was *not* enabled for devices powered on later by pwrctrl (unless the
kernel was built with PCIEASPM_POWERSAVE or PCIEASPM_POWER_SUPERSAVE, or
the user enabled ASPM via module parameter or sysfs).

After f3ac2ff14834 ("PCI/ASPM: Enable all ClockPM and ASPM states for
devicetree platforms"), the PCI core enabled all ASPM states for all
devices whether powered on initially or by pwrctrl, so a729c1664619 was
unnecessary and reverted.

But f3ac2ff14834 was too aggressive and broke platforms that didn't support
CLKREQ# or required device-specific configuration for L1 Substates, so
df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
enabled only L0s and L1.

On Qualcomm platforms, this left L1 Substates disabled, which was a
regression.  Revert a729c1664619 so L1 Substates will be enabled on devices
that are initially powered on.  Devices powered on by pwrctrl will be
addressed later.

Fixes: df5192d9bb0e ("PCI/ASPM: Enable only L0s and L1 for devicetree platforms")
Reported-by: Johan Hovold <johan@kernel.org>
Closes: https://lore.kernel.org/lkml/aPuXZlaawFmmsLmX@hovoldconsulting.com/
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Johan Hovold <johan@kernel.org>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20251024210514.1365996-1-helgaas@kernel.org
6 weeks agoMerge tag 'block-6.18-20251031' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 31 Oct 2025 19:57:19 +0000 (12:57 -0700)]
Merge tag 'block-6.18-20251031' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

 - Fix blk-crypto reporting EIO when EINVAL is the correct error code

 - Two bug fixes for the block zone support

 - NVME pull request via Keith:
      - Target side authentication fixup
      - Peer-to-peer metadata fixup

 - null_blk DMA alignment fix

* tag 'block-6.18-20251031' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  null_blk: set dma alignment to logical block size
  blk-crypto: use BLK_STS_INVAL for alignment errors
  block: make REQ_OP_ZONE_OPEN a write operation
  block: fix op_is_zone_mgmt() to handle REQ_OP_ZONE_RESET_ALL
  nvme-pci: use blk_map_iter for p2p metadata
  nvmet-auth: update sc_c in host response

6 weeks agoMerge tag 's390-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Linus Torvalds [Fri, 31 Oct 2025 19:50:35 +0000 (12:50 -0700)]
Merge tag 's390-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Heiko Carstens:

 - Use correct locking in zPCI event code to avoid deadlock

 - Get rid of irqs_registered flag in zpci_dev structure and restore IRQ
   unconditionally for zPCI devices. This fixes sit uations where the
   flag was not correctly updated

 - Fix potential memory leak kernel page table dumper code

 - Disable (revert) ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP for s390 again.

   The optimized hugetlb vmemmap code modifies kernel page tables in a
   way which does not work on s390 and leads to reproducible kernel
   crashes due to stale TLB entries. This needs to be addressed with
   some larger changes. For now simply disable the feature

 - Update defconfigs

* tag 's390-6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390: Disable ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
  s390/mm: Fix memory leak in add_marker() when kvrealloc() fails
  s390/pci: Restore IRQ unconditionally for the zPCI device
  s390: Update defconfigs
  s390/pci: Avoid deadlock between PCI error recovery and mlx5 crdump

6 weeks agobpf/arm64: Fix BPF_ST into arena memory
Puranjay Mohan [Thu, 30 Oct 2025 12:17:14 +0000 (12:17 +0000)]
bpf/arm64: Fix BPF_ST into arena memory

The arm64 JIT supports BPF_ST with BPF_PROBE_MEM32 (arena) by using the
tmp2 register to hold the dst + arena_vm_base value and using tmp2 as the
new dst register. But this is broken because in case is_lsi_offset()
returns false the tmp2 will be clobbered by emit_a64_mov_i(1, tmp2, off,
ctx); and hence the emitted store instruction will be of the form:
strb    w10, [x11, x11]
Fix this by using the third temporary register to hold the dst +
arena_vm_base.

Fixes: 339af577ec05 ("bpf: Add arm64 JIT support for PROBE_MEM32 pseudo instructions.")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20251030121715.55214-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 weeks agobpf: Make migrate_disable always inline to avoid partial inlining
Yonghong Song [Wed, 29 Oct 2025 18:36:46 +0000 (11:36 -0700)]
bpf: Make migrate_disable always inline to avoid partial inlining

The build fails with llvm 21/22:

  $ make LLVM=1 -j
    ...
    LD      vmlinux.o
    GEN     .vmlinux.objs
    ...
    BTF     .tmp_vmlinux1.btf.o
    ...
    AS      .tmp_vmlinux2.kallsyms.o
    LD      vmlinux.unstripped
    BTFIDS  vmlinux.unstripped
  WARN: resolve_btfids: unresolved symbol migrate_enable
  WARN: resolve_btfids: unresolved symbol migrate_disable
  make[2]: *** [vmlinux.unstripped] Error 255
  make[2]: *** Deleting file 'vmlinux.unstripped'
  make[1]: *** [Makefile:1242: vmlinux] Error 2
  make: *** [Makefile:248: __sub-make] Error 2

Two functions with identical names but different addresses are
considered ambiguous and removed by "pahole" from vmlinux BTF.
Later resolve_btfids warns since it cannot find them.

Commit 378b7708194f ("sched: Make migrate_{en,dis}able() inline") made
them inlineable in most places, but in vmlinux built with llvm 21 and 22
there are four symbols for migrate_{enable,disable}:
three static functions and one global function.

Fix the issue by marking migrate_{enable,disable} as always inline.
The alternative is to mark them as notrace/nokprobe which is more
drastic. Only bpf programs are prevented from attaching to these
functions. The rest of the tracing shouldn't be affected.

[note: Peter ok-ed the patch, Alexei rewrote commit log]

Fixes: 378b7708194f ("sched: Make migrate_{en,dis}able() inline")
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Menglong Dong <menglong.dong@linux.dev>
Link: https://lore.kernel.org/r/20251029183646.3811774-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
6 weeks agoMerge tag 'drm-xe-fixes-2025-10-30' of https://gitlab.freedesktop.org/drm/xe/kernel...
Simona Vetter [Fri, 31 Oct 2025 18:11:16 +0000 (19:11 +0100)]
Merge tag 'drm-xe-fixes-2025-10-30' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
 - Fix xe_validation_guard() not guarding (Thomas Hellström)
 - Do not wake device during a GT reset (Matthew Brost)

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/o2b3lucyitafbbcd5bewpfqnslavtnnpc6ck4qatnou2wwukix@rz6seyfw75uy
6 weeks agoMerge tag 'drm-misc-fixes-2025-10-30' of https://gitlab.freedesktop.org/drm/misc...
Simona Vetter [Fri, 31 Oct 2025 18:10:04 +0000 (19:10 +0100)]
Merge tag 'drm-misc-fixes-2025-10-30' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

ast:
- Preserve correct bits on register I/O

dma-fence:
- Use correct timeline name

etnaviv:
- Use correct GPU adress space for flush

imx:
- parallel-display: Fix bridge handling

nouveau:
- Fix locking in scheduler

panel:
- kingdisplay-kd097d04: Disable EOT packet
- sitronix-st7789v: Use correct SYNC flags

sched:
- Fix locking to avoid race condition
- Fix SIGKILL handling

sysfb:
- Avoid NULL-pointer access

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20251030195644.GA188441@localhost.localdomain
6 weeks agoMerge tag 'drm-intel-fixes-2025-10-30' of https://gitlab.freedesktop.org/drm/i915...
Simona Vetter [Fri, 31 Oct 2025 18:08:36 +0000 (19:08 +0100)]
Merge tag 'drm-intel-fixes-2025-10-30' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Fix DMC/DC6 asserts on ADL-S (Ville)

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/aQNtTV75vPaDhnXh@intel.com
6 weeks agoMerge tag 'drm-msm-fixes-2025-10-29' of https://gitlab.freedesktop.org/drm/msm into...
Simona Vetter [Fri, 31 Oct 2025 18:07:39 +0000 (19:07 +0100)]
Merge tag 'drm-msm-fixes-2025-10-29' of https://gitlab.freedesktop.org/drm/msm into drm-fixes

Fixes for v6.18-rc4

CI
- Disable broken sanity job

GEM
- Fix vm_bind prealloc error path
- Fix dma-buf import free
- Fix last-fence update
- Reject MAP_NULL if PRR is unsupported
- Ensure vm is created in VM_BIND ioctl

GPU
- GMU fw parsing fix

DPU:
- Fixed mode_valid callback
- Fixed planes on DPU 1.x devices.

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Rob Clark <rob.clark@oss.qualcomm.com>
Link: https://patch.msgid.link/CACSVV03kUm1ms7FBg0m9U4ZcyickSWbnayAWqYqs0XH4UjWf+A@mail.gmail.com
6 weeks agoMerge tag 'amd-drm-fixes-6.18-2025-10-29' of https://gitlab.freedesktop.org/agd5f...
Simona Vetter [Fri, 31 Oct 2025 18:00:01 +0000 (19:00 +0100)]
Merge tag 'amd-drm-fixes-6.18-2025-10-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.18-2025-10-29:

amdgpu:
- VPE idle handler fix
- Re-enable DM idle optimizations
- DCN3.0 fix
- SMU fix
- Powerplay fixes for fiji/iceland
- License fixes
- HDP eDP panel fix
- Vblank fix

radeon:
- devm migration fixes

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20251029201342.8813-1-alexander.deucher@amd.com
6 weeks agoMerge tag 'mediatek-drm-fixes-20251028' of https://git.kernel.org/pub/scm/linux/kerne...
Simona Vetter [Fri, 31 Oct 2025 17:54:21 +0000 (18:54 +0100)]
Merge tag 'mediatek-drm-fixes-20251028' of https://git.kernel.org/pub/scm/linux/kernel/git/chunkuang.hu/linux into drm-fixes

Mediatek DRM Fixes - 20251028

1. Fix device use-after-free on unbind

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Chun-Kuang Hu <chunkuang.hu@kernel.org>
Link: https://patch.msgid.link/20251028151548.3944-1-chunkuang.hu@kernel.org
6 weeks agoMerge tag '6.18-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Linus Torvalds [Fri, 31 Oct 2025 16:34:21 +0000 (09:34 -0700)]
Merge tag '6.18-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

 - fix potential UAF in statfs

 - DFS fix for expired referrals

 - fix minor modinfo typo

 - small improvement to reconnect for smbdirect

* tag '6.18-rc3-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: call smbd_destroy() in the same splace as kernel_sock_shutdown()/sock_release()
  smb: client: handle lack of IPC in dfs_cache_refresh()
  smb: client: fix potential cfid UAF in smb2_query_info_compound
  cifs: fix typo in enable_gcm_256 module parameter

6 weeks agonull_blk: set dma alignment to logical block size
Hans Holmberg [Fri, 31 Oct 2025 09:48:26 +0000 (10:48 +0100)]
null_blk: set dma alignment to logical block size

This driver assumes that bio vectors are memory aligned to the logical
block size, so set the queue limit to reflect that.

Unless we set up the limit based on the logical block size, we will go
out of page bounds in copy_to_nullb / copy_from_nullb.

Apparently this wasn't noticed so far because none of the tests generate
such buffers, but since commit 851c4c96db00 ("xfs: implement
XFS_IOC_DIOINFO in terms of vfs_getattr") xfstests generates unaligned
I/O, which now lead to memory corruption when using null_blk devices
with 4k block size.

Fixes: bf8d08532bc1 ("iomap: add support for dma aligned direct-io")
Fixes: b1a000d3b8ec ("block: relax direct io memory alignment")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
6 weeks agoMerge tag 'sound-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Fri, 31 Oct 2025 14:29:09 +0000 (07:29 -0700)]
Merge tag 'sound-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes. It became slightly bigger than usual due
  to timing issues (holidays, etc), but all changes are rather
  device-specific fixes, so not really worrisome.

   - ASoC Cirrus codec fixes for AMD

   - Various fixes for ASoC Intel AVS, Qualcomm, SoundWire, FSL,
     Mediatek, Renesas

   - A few HD-audio quirks, and USB-audio regression fixes for Presonus"

* tag 'sound-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (24 commits)
  ALSA: hda/realtek: Enable mic on Vaio RPL
  ASoC: dt-bindings: pm4125-sdw: correct number of soundwire ports
  ASoC: renesas: rz-ssi: Use proper dma_buffer_pos after resume
  ASoC: soc_sdw_utils: remove cs42l43 component_name
  ASoC: fsl_sai: Fix sync error in consumer mode
  ASoC: Fix build for sdw_utils
  ALSA: hda/realtek: Fix mute led for HP Victus 15-fa1xxx (MB 8C2D)
  ASoC: rt721: fix prepare clock stop failed
  ALSA: usb-audio: don't log messages meant for 1810c when initializing 1824c
  ASoC: mediatek: Fix double pm_runtime_disable in remove functions
  ASoC: fsl_micfil: correct the endian format for DSD
  ASoC: fsl_sai: fix bit order for DSD format
  ASoC: Intel: avs: Use snd_codec format when initializing probe
  ASoC: Intel: avs: Disable periods-elapsed work when closing PCM
  ASoC: Intel: avs: Unprepare a stream when XRUN occurs
  ASoC: sdw_utils: add name_prefix for rt1321 part id
  ASoC: qdsp6: q6asm: do not sleep while atomic
  ASoC: Intel: soc-acpi-intel-ptl-match: Remove cs42l43 match from sdw link3
  ASOC: max98090/91: fix for filter configuration: AHPF removed DMIC2_HPF added
  ASoC: amd: acp: Add ACP7.0 match entries for cs35l56 and cs42l43
  ...

6 weeks agoMerge tag 'v6.18-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Fri, 31 Oct 2025 14:25:10 +0000 (07:25 -0700)]
Merge tag 'v6.18-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:

 - Fix double free in aspeed

 - Fix req->nbytes clobbering in s390/phmac

* tag 'v6.18-p4' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: aspeed - fix double free caused by devm
  crypto: s390/phmac - Do not modify the req->nbytes value

6 weeks agoMerge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Linus Torvalds [Fri, 31 Oct 2025 14:08:47 +0000 (07:08 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "ufs driver plus two core fixes.

  One core fix makes the unit attention counters atomic (just in case
  multiple commands detect them) and the other is fixing a merge window
  regression caused by changes in the block tree"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: core: Fix the unit attention counter implementation
  scsi: ufs: core: Declare tx_lanes witout initialization
  scsi: ufs: core: Initialize value of an attribute returned by uic cmd
  scsi: ufs: core: Fix error handler host_sem issue
  scsi: core: Fix a regression triggered by scsi_host_busy()

6 weeks agoxfs: document another racy GC case in xfs_zoned_map_extent
Christoph Hellwig [Thu, 23 Oct 2025 15:17:03 +0000 (17:17 +0200)]
xfs: document another racy GC case in xfs_zoned_map_extent

Besides blocks being invalidated, there is another case when the original
mapping could have changed between querying the rmap for GC and calling
xfs_zoned_map_extent.  Document it there as it took us quite some time
to figure out what is going on while developing the multiple-GC
protection fix.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
6 weeks agoxfs: prevent gc from picking the same zone twice
Christoph Hellwig [Thu, 23 Oct 2025 15:17:02 +0000 (17:17 +0200)]
xfs: prevent gc from picking the same zone twice

When we are picking a zone for gc it might already be in the pipeline
which can lead to us moving the same data twice resulting in in write
amplification and a very unfortunate case where we keep on garbage
collecting the zone we just filled with migrated data stopping all
forward progress.

Fix this by introducing a count of on-going GC operations on a zone, and
skip any zone with ongoing GC when picking a new victim.

Fixes: 080d01c41 ("xfs: implement zoned garbage collection")
Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com>
Co-developed-by: Hans Holmberg <hans.holmberg@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Carlos Maiolino <cem@kernel.org>
6 weeks agoMerge tag 'linux_kselftest-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Fri, 31 Oct 2025 02:48:13 +0000 (19:48 -0700)]
Merge tag 'linux_kselftest-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kselftest fixes from Shuah Khan:
 "Fix build warning in cachestat found during clang build and add
  tmpshmcstat to .gitignore"

* tag 'linux_kselftest-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  selftests: cachestat: Fix warning on declaration under label
  selftests/cachestat: add tmpshmcstat file to .gitignore

6 weeks agoMerge tag 'linux_kselftest-kunit-fixes-6.18-rc4' of git://git.kernel.org/pub/scm...
Linus Torvalds [Fri, 31 Oct 2025 02:11:27 +0000 (19:11 -0700)]
Merge tag 'linux_kselftest-kunit-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

Pull kunit fixes from Shuah Khan:
 "Fix log overwrite in param_tests and fixes incorrect cast of priv
  pointer in test_dev_action().

  Update email address for Rae Moar in MAINTAINERS KUnit entry"

* tag 'linux_kselftest-kunit-fixes-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
  MAINTAINERS: Update KUnit email address for Rae Moar
  kunit: prevent log overwrite in param_tests
  kunit: test_dev_action: Correctly cast 'priv' pointer to long*

6 weeks agoMerge tag 'acpi-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 31 Oct 2025 02:05:46 +0000 (19:05 -0700)]
Merge tag 'acpi-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
 "These fix three ACPI driver issues and add version checks to two ACPI
  table parsers:

   - Call input_free_device() on failing input device registration as
     necessary (and mentioned in the input subsystem documentation) in
     the ACPI button driver (Kaushlendra Kumar)

   - Fix use-after-free in acpi_video_switch_brightness() by canceling a
     delayed work during tear-down (Yuhao Jiang)

   - Use platform device for devres-related actions in the ACPI fan
     driver to allow device-managed resources to be cleaned up properly
     (Armin Wolf)

   - Add version checks to the MRRM and SPCR table parsers (Tony Luck
     and Punit Agrawal)"

* tag 'acpi-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI: SPCR: Check for table version when using precise baudrate
  ACPI: MRRM: Check revision of MRRM table
  ACPI: fan: Use platform device for devres-related actions
  ACPI: fan: Use ACPI handle when retrieving _FST
  ACPI: video: Fix use-after-free in acpi_video_switch_brightness()
  ACPI: button: Call input_free_device() on failing input device registration

6 weeks agoMerge tag 'pm-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Linus Torvalds [Fri, 31 Oct 2025 02:02:16 +0000 (19:02 -0700)]
Merge tag 'pm-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
 "These fix three regressions, two recent ones and one introduced during
  the 6.17 development cycle:

   - Add an exit latency check to the menu cpuidle governor in the case
     when it considers using a real idle state instead of a polling one
     to address a performance regression (Rafael Wysocki)

   - Revert an attempted cleanup of a system suspend code path that
     introduced a regression elsewhere (Samuel Wu)

   - Allow pm_restrict_gfp_mask() to be called multiple times in a row
     and adjust pm_restore_gfp_mask() accordingly to avoid having to
     play nasty games with these calls during hibernation (Rafael
     Wysocki)"

* tag 'pm-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM: sleep: Allow pm_restrict_gfp_mask() stacking
  cpuidle: governors: menu: Select polling state in some more cases
  Revert "PM: sleep: Make pm_wakeup_clear() call more clear"

6 weeks agoMerge tag 'fbdev-for-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/delle...
Linus Torvalds [Fri, 31 Oct 2025 01:58:49 +0000 (18:58 -0700)]
Merge tag 'fbdev-for-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

Pull fbdev fixes from Helge Deller:

 - atyfb: Avoid hard lock up when PLL not initialized (Daniel Palmer)

 - pvr2fb: Fix build error when CONFIG_PVR2_DMA enabled (Florian Fuchs)

 - bitblit: Fix out-of-bounds read in bit_putcs* (Junjie Cao)

 - valkyriefb: Fix reference count leak (Miaoqian Lin)

 - fbcon: Fix slab-use-after-free in fb_mode_is_equal (Quanmin Yan)

 - fb.h: Fix typo in "vertical" (Piyush Choudhary)

* tag 'fbdev-for-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: atyfb: Check if pll_ops->init_pll failed
  fbcon: Set fb_display[i]->mode to NULL when the mode is released
  fbdev: bitblit: bound-check glyph index in bit_putcs*
  fbdev: pvr2fb: Fix leftover reference to ONCHIP_NR_DMA_CHANNELS
  fbdev: valkyriefb: Fix reference count leak in valkyriefb_init
  video: fb: Fix typo in comment in fb.h

6 weeks agoMerge tag 'net-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Linus Torvalds [Fri, 31 Oct 2025 01:35:35 +0000 (18:35 -0700)]
Merge tag 'net-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from wireless, Bluetooth and netfilter.

  Current release - regressions:

    - tcp: fix too slow tcp_rcvbuf_grow() action

    - bluetooth: fix corruption in h4_recv_buf() after cleanup

  Previous releases - regressions:

    - mptcp: restore window probe

    - bluetooth:
       - fix connection cleanup with BIG with 2 or more BIS
       - fix crash in set_mesh_sync and set_mesh_complete

    - batman-adv: release references to inactive interfaces

    - nic:
       - ice: fix usage of logical PF id
       - sfc: fix potential memory leak in efx_mae_process_mport()

  Previous releases - always broken:

    - devmem: refresh devmem TX dst in case of route invalidation

    - netfilter: add seqadj extension for natted connections

    - wifi:
       - iwlwifi: fix potential use after free in iwl_mld_remove_link()
       - brcmfmac: fix crash while sending action frames in standalone AP Mode

    - eth:
       - mlx5e: cancel tls RX async resync request in error flows
       - ixgbe: fix memory leak and use-after-free in ixgbe_recovery_probe()
       - hibmcge: fix rx buf avl irq is not re-enabled in irq_handle issue
       - cxgb4: fix potential use-after-free in ipsec callback
       - nfp: fix memory leak in nfp_net_alloc()"

* tag 'net-6.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (75 commits)
  net: sctp: fix KMSAN uninit-value in sctp_inq_pop
  net: devmem: refresh devmem TX dst in case of route invalidation
  net: stmmac: est: Fix GCL bounds checks
  net: stmmac: Consider Tx VLAN offload tag length for maxSDU
  net: stmmac: vlan: Disable 802.1AD tag insertion offload
  net/mlx5e: kTLS, Cancel RX async resync request in error flows
  net: tls: Cancel RX async resync request on rcd_delta overflow
  net: tls: Change async resync helpers argument
  net: phy: dp83869: fix STRAP_OPMODE bitmask
  selftests: net: use BASH for bareudp testing
  net: mctp: Fix tx queue stall
  net/mlx5: Don't zero user_count when destroying FDB tables
  net: usb: asix_devices: Check return value of usbnet_get_endpoints
  mptcp: zero window probe mib
  mptcp: restore window probe
  mptcp: fix MSG_PEEK stream corruption
  mptcp: drop bogus optimization in __mptcp_check_push()
  netconsole: Fix race condition in between reader and writer of userdata
  Documentation: netconsole: Remove obsolete contact people
  nfp: xsk: fix memory leak in nfp_net_alloc()
  ...

6 weeks agoMerge tag 'nvme-6.18-2025-10-30' of git://git.infradead.org/nvme into block-6.18
Jens Axboe [Fri, 31 Oct 2025 01:26:19 +0000 (19:26 -0600)]
Merge tag 'nvme-6.18-2025-10-30' of git://git.infradead.org/nvme into block-6.18

Pull NVMe fixes from Keith:

"- Target side authentication fixup (Hannes)
 - Peer-to-peer metadata fixup (Keith)"

* tag 'nvme-6.18-2025-10-30' of git://git.infradead.org/nvme:
  nvme-pci: use blk_map_iter for p2p metadata
  nvmet-auth: update sc_c in host response

6 weeks agodrm/ast: Clear preserved bits from register output value
Thomas Zimmermann [Fri, 24 Oct 2025 07:35:53 +0000 (09:35 +0200)]
drm/ast: Clear preserved bits from register output value

Preserve the I/O register bits in __ast_write8_i_masked() as specified
by preserve_mask. Accidentally OR-ing the output value into these will
overwrite the register's previous settings.

Fixes display output on the AST2300, where the screen can go blank at
boot. The driver's original commit 312fec1405dd ("drm: Initial KMS
driver for AST (ASpeed Technologies) 2000 series (v2)") already added
the broken code. Commit 6f719373b943 ("drm/ast: Blank with VGACR17 sync
enable, always clear VGACRB6 sync off") triggered the bug.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reported-by: Peter Schneider <pschneider1968@googlemail.com>
Closes: https://lore.kernel.org/dri-devel/a40caf8e-58ad-4f9c-af7f-54f6f69c29bb@googlemail.com/
Tested-by: Peter Schneider <pschneider1968@googlemail.com>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Fixes: 6f719373b943 ("drm/ast: Blank with VGACR17 sync enable, always clear VGACRB6 sync off")
Fixes: 312fec1405dd ("drm: Initial KMS driver for AST (ASpeed Technologies) 2000 series (v2)")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Nick Bowler <nbowler@draconx.ca>
Cc: Douglas Anderson <dianders@chromium.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Jocelyn Falempe <jfalempe@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v3.5+
Link: https://patch.msgid.link/20251024073626.129032-1-tzimmermann@suse.de
6 weeks agoMerge branches 'acpi-button', 'acpi-video' and 'acpi-fan'
Rafael J. Wysocki [Thu, 30 Oct 2025 19:40:49 +0000 (20:40 +0100)]
Merge branches 'acpi-button', 'acpi-video' and 'acpi-fan'

Merge ACPI button, ACPI backlight (video), and ACPI fan driver fixes for
6.18-rc4:

 - Call input_free_device() on failing input device registration as
   necessary (and mentioned in the input subsystem documentation) in the
   ACPI button driver (Kaushlendra Kumar)

 - Fix use-after-free in acpi_video_switch_brightness() by canceling
   a delayed work during tear-down (Yuhao Jiang)

 - Use platform device for devres-related actions in the ACPI fan driver
   to allow device-managed resources to be cleaned up properly (Armin
   Wolf)

* acpi-button:
  ACPI: button: Call input_free_device() on failing input device registration

* acpi-video:
  ACPI: video: Fix use-after-free in acpi_video_switch_brightness()

* acpi-fan:
  ACPI: fan: Use platform device for devres-related actions
  ACPI: fan: Use ACPI handle when retrieving _FST

6 weeks agoMerge branches 'pm-cpuidle' and 'pm-sleep'
Rafael J. Wysocki [Thu, 30 Oct 2025 19:25:18 +0000 (20:25 +0100)]
Merge branches 'pm-cpuidle' and 'pm-sleep'

Merge a cpuidle fix and two fixes related to system sleep for 6.18-rc4:

 - Add an exit latency check to the menu cpuidle governor in the case
   when it considers using a real idle state instead of a polling one to
   address a performance regression (Rafael Wysocki)

 - Revert an attempted cleanup of a system suspend code path that
   introduced a regression elsewhere (Samuel Wu)

 - Allow pm_restrict_gfp_mask() to be called multiple times in a row
   and adjust pm_restore_gfp_mask() accordingly to avoid having to play
   nasty games with these calls during hibernation (Rafael Wysocki)

* pm-cpuidle:
  cpuidle: governors: menu: Select polling state in some more cases

* pm-sleep:
  PM: sleep: Allow pm_restrict_gfp_mask() stacking
  Revert "PM: sleep: Make pm_wakeup_clear() call more clear"