[PATCH] btrfs: fix rw device counting in __btrfs_free_extra_devids

Anand Jain anand.jain at oracle.com
Sun Jul 25 13:49:17 UTC 2021



On 22/07/2021 01:59, David Sterba wrote:
> On Thu, Jul 15, 2021 at 06:34:03PM +0800, Desmond Cheong Zhi Xi wrote:
>> Syzbot reports a warning in close_fs_devices that happens because
>> fs_devices->rw_devices is not 0 after calling btrfs_close_one_device
>> on each device.
>>
>> This happens when a writeable device is removed in
>> __btrfs_free_extra_devids, but the rw device count is not decremented
>> accordingly. So when close_fs_devices is called, the removed device is
>> still counted and we get an off by 1 error.
>>
>> Here is one call trace that was observed:
>>    btrfs_mount_root():
>>      btrfs_scan_one_device():
>>        device_list_add();   <---------------- device added
>>      btrfs_open_devices():
>>        open_fs_devices():
>>          btrfs_open_one_device();   <-------- rw device count ++
>>      btrfs_fill_super():
>>        open_ctree():
>>          btrfs_free_extra_devids():
>> 	  __btrfs_free_extra_devids();  <--- device removed
>> 	  fail_tree_roots:
>> 	    btrfs_close_devices():
>> 	      close_fs_devices();   <------- rw device count off by 1
>>
>> Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
> 
> What this patch did in the last hunk was the rw_devices decrement, but
> conditional:
> 
> @@ -1080,9 +1071,6 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
>                  if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
>                          list_del_init(&device->dev_alloc_list);
>                          clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);



> -                       if (!test_bit(BTRFS_DEV_STATE_REPLACE_TGT,
> -                                     &device->dev_state))

This condition was wrong.
The 1st roll of this patch which is here [1], has the details of why. As 
shown below -

[1]
https://patchwork.kernel.org/project/linux-btrfs/patch/b3a0a629df98bd044a1fd5c4964f381ff6e7aa05.1600777827.git.anand.jain@oracle.com/#23640775

----
rw_devices is incremented in btrfs_open_one_device() for all write-able
devices except for devid == BTRFS_DEV_REPLACE_DEVID.
But while we clean up the extra devices in __btrfs_free_extra_devids()
we used the BTRFS_DEV_STATE_REPLACE_TGT flag isn't set because
there isn't the replace-item. So rw_devices went below zero.
----


> -                               fs_devices->rw_devices--;
>                  }
>                  list_del_init(&device->dev_list);
>                  fs_devices->num_devices--;
> ---
> 
> 
>> @@ -1078,6 +1078,7 @@ static void __btrfs_free_extra_devids(struct btrfs_fs_devices *fs_devices,
>>   		if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state)) {
>>   			list_del_init(&device->dev_alloc_list);
>>   			clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state);
>> +			fs_devices->rw_devices--;
>>   		}
>>   		list_del_init(&device->dev_list);
>>   		fs_devices->num_devices--;
> 
> So should it be reinstated in the original form?

No. The reason is the same as above.
Only the rw_devices decrement line has to be restored.

> The rest of
> cf89af146b7e handles unexpected device replace item during mount.

> Adding the decrement is correct, but right now I'm not sure about the
> corner case when teh devcie has the BTRFS_DEV_STATE_REPLACE_TGT bit set.

BTRFS_DEV_STATE_REPLACE_TGT is set (on BTRFS_DEV_REPLACE_DEVID) for two 
reasons when we call replace through ioctl or during mount upon finding 
a replace-device item.

> The state machine of the device bits and counters is not trivial so
> fixing it one way or the other could lead to further syzbot reports if
> we don't understand the issue.

I agree. Also, a good idea to convert this sysbot test into an xfstests 
case.

Thanks, Anand


More information about the Linux-kernel-mentees mailing list