Skip to content

Commit 65e8fbd

Browse files
Mikulas PatockaMike Snitzer
authored andcommitted
dm: call the resume method on internal suspend
There is this reported crash when experimenting with the lvm2 testsuite. The list corruption is caused by the fact that the postsuspend and resume methods were not paired correctly; there were two consecutive calls to the origin_postsuspend function. The second call attempts to remove the "hash_list" entry from a list, while it was already removed by the first call. Fix __dm_internal_resume so that it calls the preresume and resume methods of the table's targets. If a preresume method of some target fails, we are in a tricky situation. We can't return an error because dm_internal_resume isn't supposed to return errors. We can't return success, because then the "resume" and "postsuspend" methods would not be paired correctly. So, we set the DMF_SUSPENDED flag and we fake normal suspend - it may confuse userspace tools, but it won't cause a kernel crash. ------------[ cut here ]------------ kernel BUG at lib/list_debug.c:56! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 1 PID: 8343 Comm: dmsetup Not tainted 6.8.0-rc6 #4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 RIP: 0010:__list_del_entry_valid_or_report+0x77/0xc0 <snip> RSP: 0018:ffff8881b831bcc0 EFLAGS: 00010282 RAX: 000000000000004e RBX: ffff888143b6eb80 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffff819053d0 RDI: 00000000ffffffff RBP: ffff8881b83a3400 R08: 00000000fffeffff R09: 0000000000000058 R10: 0000000000000000 R11: ffffffff81a24080 R12: 0000000000000001 R13: ffff88814538e000 R14: ffff888143bc6dc0 R15: ffffffffa02e4bb0 FS: 00000000f7c0f780(0000) GS:ffff8893f0a40000(0000) knlGS:0000000000000000 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000057fb5000 CR3: 0000000143474000 CR4: 00000000000006b0 Call Trace: <TASK> ? die+0x2d/0x80 ? do_trap+0xeb/0xf0 ? __list_del_entry_valid_or_report+0x77/0xc0 ? do_error_trap+0x60/0x80 ? __list_del_entry_valid_or_report+0x77/0xc0 ? exc_invalid_op+0x49/0x60 ? __list_del_entry_valid_or_report+0x77/0xc0 ? asm_exc_invalid_op+0x16/0x20 ? table_deps+0x1b0/0x1b0 [dm_mod] ? __list_del_entry_valid_or_report+0x77/0xc0 origin_postsuspend+0x1a/0x50 [dm_snapshot] dm_table_postsuspend_targets+0x34/0x50 [dm_mod] dm_suspend+0xd8/0xf0 [dm_mod] dev_suspend+0x1f2/0x2f0 [dm_mod] ? table_deps+0x1b0/0x1b0 [dm_mod] ctl_ioctl+0x300/0x5f0 [dm_mod] dm_compat_ctl_ioctl+0x7/0x10 [dm_mod] __x64_compat_sys_ioctl+0x104/0x170 do_syscall_64+0x184/0x1b0 entry_SYSCALL_64_after_hwframe+0x46/0x4e RIP: 0033:0xf7e6aead <snip> ---[ end trace 0000000000000000 ]--- Fixes: ffcc393 ("dm: enhance internal suspend and resume interface") Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
1 parent b25b8f4 commit 65e8fbd

File tree

1 file changed

+20
-6
lines changed

1 file changed

+20
-6
lines changed

drivers/md/dm.c

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2945,6 +2945,9 @@ static void __dm_internal_suspend(struct mapped_device *md, unsigned int suspend
29452945

29462946
static void __dm_internal_resume(struct mapped_device *md)
29472947
{
2948+
int r;
2949+
struct dm_table *map;
2950+
29482951
BUG_ON(!md->internal_suspend_count);
29492952

29502953
if (--md->internal_suspend_count)
@@ -2953,12 +2956,23 @@ static void __dm_internal_resume(struct mapped_device *md)
29532956
if (dm_suspended_md(md))
29542957
goto done; /* resume from nested suspend */
29552958

2956-
/*
2957-
* NOTE: existing callers don't need to call dm_table_resume_targets
2958-
* (which may fail -- so best to avoid it for now by passing NULL map)
2959-
*/
2960-
(void) __dm_resume(md, NULL);
2961-
2959+
map = rcu_dereference_protected(md->map, lockdep_is_held(&md->suspend_lock));
2960+
r = __dm_resume(md, map);
2961+
if (r) {
2962+
/*
2963+
* If a preresume method of some target failed, we are in a
2964+
* tricky situation. We can't return an error to the caller. We
2965+
* can't fake success because then the "resume" and
2966+
* "postsuspend" methods would not be paired correctly, and it
2967+
* would break various targets, for example it would cause list
2968+
* corruption in the "origin" target.
2969+
*
2970+
* So, we fake normal suspend here, to make sure that the
2971+
* "resume" and "postsuspend" methods will be paired correctly.
2972+
*/
2973+
DMERR("Preresume method failed: %d", r);
2974+
set_bit(DMF_SUSPENDED, &md->flags);
2975+
}
29622976
done:
29632977
clear_bit(DMF_SUSPENDED_INTERNALLY, &md->flags);
29642978
smp_mb__after_atomic();

0 commit comments

Comments
 (0)