閱讀824 返回首頁    go 阿裏雲 go 技術社區[雲棲]


btrfs 使用指南 - 1 概念,創建,塊設備管理,性能優化

一、btrfs概念
在btrfs中存在三種類型的數據,data, metadata和system。它們表示:
       DATA
           store data blocks and nothing else。數據塊。

       METADATA
           store internal metadata in b-trees, can store file data if they fit into the inline limit。
       b-trees格式存儲的btrfs內部源數據,例如文件inode信息,文件大小,修改時間等等。

       SYSTEM
           store structures that describe the mapping between the physical devices and the linear logical space representing the filesystem。
       塊設備和文件係統線性邏輯空間之間的映射信息,類似尋址映射關係,還包括RAID的關係(profile)。

block group或chunk的概念,這兩個術語可以通用。它們表示:
           a logical range of space of a given profile, stores data, metadata or both; sometimes the terms are used interchangably。
       block group或chunk術語用來表示以上幾種數據類型data,metadata,system的一個空間邏輯範圍,一次性分配的最小空間。(為了保持好的數據連續性?)

           A typical size of metadata block group is 256MiB (filesystem smaller than 50GiB) and 1GiB (larger than 50GiB), for data it’s 1GiB. The system block group size is a few megabytes.
       例如metadata數據類型一次分配的空間為256MB(當文件係統小於50GB時)或1GB(當文件係統大於50GB時)。
       data數據類型一次分配的空間是1GB。
       system數據塊則一次分配很少的MB。
       你可以用btrfs filesystem show觀察到這些信息。

       RAID
           a block group profile type that utilizes RAID-like features on multiple devices: striping, mirroring, parity
       RAID是profile的一種描述,包括條帶(raid0, raid10),mirror(raid1),奇偶校驗(raid 5,6)。

       profile
           when used in connection with block groups refers to the allocation strategy and constraints, see the section PROFILES for more details
       profile和block group結合起來,用來描述數據的分配策略或約束。例如:
       single表示隻存一份數據,即每個block group都是獨一無二的。
           DUP表示在一個塊設備中存雙份數據,即每個block group在 同一個塊設備 中有一個一樣的block group副本。
       RAID0表示條帶,單個block group可能跨塊設備存儲。
       RAID10表示鏡像加條帶,單個block group可能跨塊設備存儲,其中每個部分都會在兩個塊設備中存成鏡像。

PROFILES
       There are the following block group types available:

       ┌────────┬─────────────────────┬────────────┬─────────────────┐
       │Profile │ Redundancy          │ Striping   │ Min/max devices │
       ├────────┼─────────────────────┼────────────┼─────────────────┤
       │        │                     │            │                 │
       │single  │ 1 copy              │ n/a        │ 1/any           │
       ├────────┼─────────────────────┼────────────┼─────────────────┤
       │        │                     │            │                 │
       │DUP     │ 2 copies / 1 device │ n/a        │ 1/1             │
       ├────────┼─────────────────────┼────────────┼─────────────────┤
       │        │                     │            │                 │
       │RAID0   │ n/a                 │ 1 to N     │ 2/any           │
       ├────────┼─────────────────────┼────────────┼─────────────────┤
       │        │                     │            │                 │
       │RAID10  │ 2 copies            │ 1 to N     │ 4/any           │
       ├────────┼─────────────────────┼────────────┼─────────────────┤
       │        │                     │            │                 │
       │RAID5   │ 2 copies            │ 3 to N - 1 │ 2/any           │
       ├────────┼─────────────────────┼────────────┼─────────────────┤
       │        │                     │            │                 │
       │RAID6   │ 3 copies            │ 3 to N - 2 │ 3/any           │
       └────────┴─────────────────────┴────────────┴─────────────────┘
二、創建一個btrfs文件係統
man mkfs.btrfs
       -d|--data <profile>
           Specify the profile for the data block groups. Valid values are raid0, raid1, raid5, raid6, raid10 or single, (case does not matter).
       指定data數據類型的profile,需要結合塊設備,如果底層塊設備沒有冗餘措施,建議這裏使用冗餘存儲。否則存單份即可,single。
       如果有多個塊設備,可以選擇是否需要條帶,條帶話可以帶來好的負載均衡性能。
       -m|--metadata <profile>
           Specify the profile for the metadata block groups. Valid values are raid0, raid1, raid5, raid6, raid10, single or dup, (case does not matter).

           A single device filesystem will default to DUP, unless a SSD is detected. Then it will default to single. The detection is based on the value of /sys/block/DEV/queue/rotational, where DEV is the short name of the device.
           This is because SSDs can remap the blocks internally to a single copy thus deduplicating them which negates the purpose of increased metadata redunancy and just wastes space.

           Note that the rotational status can be arbitrarily set by the underlying block device driver and may not reflect the true status (network block device, memory-backed SCSI devices etc). Use the options --data/--metadata
           to avoid confusion.
       指定metadata數據類型的profile,需要結合塊設備,如果底層塊設備沒有冗餘措施,建議這裏使用冗餘存儲。否則存單份即可,single。
       如果有多個塊設備,可以選擇是否需要條帶,條帶話可以帶來好的負載均衡性能。
       -n|--nodesize <size>
           Specify the nodesize, the tree block size in which btrfs stores metadata. The default value is 16KiB (16384) or the page size, whichever is bigger. Must be a multiple of the sectorsize, but not larger than 64KiB (65536).
           Leafsize always equals nodesize and the options are aliases.

           Smaller node size increases fragmentation but lead to higher b-trees which in turn leads to lower locking contention. Higher node sizes give better packing and less fragmentation at the cost of more expensive memory
           operations while updating the metadata blocks.

               Note
               versions up to 3.11 set the nodesize to 4k.
       對於數據庫應用,建議使用4K,減少衝突。
       -f|--force
           Forcibly overwrite the block devices when an existing filesystem is detected. By default, mkfs.btrfs will utilize libblkid to check for any known filesystem on the devices. Alternatively you can use the wipefs utility to
           clear the devices.

有多個塊設備時,可以直接指定多個塊設備進行格式化。
並且可以為metadata和data指定不同的profile級別。
例如:
[root@digoal ~]# mkfs.btrfs -m raid10 -d raid10 -n 4096 -f /dev/sdb /dev/sdc /dev/sdd /dev/sde
btrfs-progs v4.3.1
See https://btrfs.wiki.kernel.org for more information.

Label:              (null)
UUID:               00036b8e-7914-41a9-831a-d35c97202eeb
Node size:          4096
Sector size:        4096
Filesystem size:    80.00GiB
Block group profiles:  可以看到已分配的block group,三種數據類型,分別分配了多少容量。
  Data:             RAID10            2.01GiB
  Metadata:         RAID10            2.01GiB
  System:           RAID10           20.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  4
Devices:
   ID        SIZE  PATH
    1    20.00GiB  /dev/sdb
    2    20.00GiB  /dev/sdc
    3    20.00GiB  /dev/sdd
    4    20.00GiB  /dev/sde
下麵這個,metadata使用raid1,不使用條帶。而data使用raid10,使用條帶。可以看到system和metadata一樣,使用了raid1。
不過建議將metadata和data設置為一致的風格。
[root@digoal ~]# mkfs.btrfs -m raid1 -d raid10 -n 4096 -f /dev/sdb /dev/sdc /dev/sdd /dev/sde
btrfs-progs v4.3.1
See https://btrfs.wiki.kernel.org for more information.

Label:              (null)
UUID:               4eef7b0c-73a3-430c-bb61-028b37d1872b
Node size:          4096
Sector size:        4096
Filesystem size:    80.00GiB
Block group profiles:
  Data:             RAID10            2.01GiB
  Metadata:         RAID1             1.01GiB
  System:           RAID1            12.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  4
Devices:
   ID        SIZE  PATH
    1    20.00GiB  /dev/sdb
    2    20.00GiB  /dev/sdc
    3    20.00GiB  /dev/sdd
    4    20.00GiB  /dev/sde

[root@digoal ~]# btrfs filesystem show /dev/sdb
Label: none  uuid: 4eef7b0c-73a3-430c-bb61-028b37d1872b
        Total devices 4 FS bytes used 28.00KiB
        devid    1 size 20.00GiB used 2.00GiB path /dev/sdb
        devid    2 size 20.00GiB used 2.00GiB path /dev/sdc
        devid    3 size 20.00GiB used 1.01GiB path /dev/sdd
        devid    4 size 20.00GiB used 1.01GiB path /dev/sde

三、mount btrfs文件係統
如果你的btrfs管理了多個塊設備,那麼你有兩種選擇來mount它,第一種是直接指定多個塊設備,第二種是先scan,再mount,因為某些係統重新啟動或者btrfs模塊重新加載後,需要重新scan來識別。
例如:
[root@digoal ~]# btrfs device scan
Scanning for Btrfs filesystems
[root@digoal ~]# mount /dev/sdb /data01
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 00036b8e-7914-41a9-831a-d35c97202eeb
        Total devices 4 FS bytes used 1.03MiB
        devid    1 size 20.00GiB used 2.01GiB path /dev/sdb
        devid    2 size 20.00GiB used 2.01GiB path /dev/sdc
        devid    3 size 20.00GiB used 2.01GiB path /dev/sdd
        devid    4 size 20.00GiB used 2.01GiB path /dev/sde
或者
[root@digoal ~]# mount -o device=/dev/sdb,device=/dev/sdc,device=/dev/sdd,device=/dev/sde /dev/sdb /data01
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 00036b8e-7914-41a9-831a-d35c97202eeb
        Total devices 4 FS bytes used 1.03MiB
        devid    1 size 20.00GiB used 2.01GiB path /dev/sdb
        devid    2 size 20.00GiB used 2.01GiB path /dev/sdc
        devid    3 size 20.00GiB used 2.01GiB path /dev/sdd
        devid    4 size 20.00GiB used 2.01GiB path /dev/sde
或者
# vi /etc/fstab

UUID=00036b8e-7914-41a9-831a-d35c97202eeb /data01 btrfs ssd,ssd_spread,discard,noatime,nodiratime,compress=no,space_cache,recovery,defaults 0 0
或者
UUID=00036b8e-7914-41a9-831a-d35c97202eeb /data01 btrfs device=/dev/sdb,device=/dev/sdc,device=/dev/sdd,device=/dev/sde,ssd,ssd_spread,discard,noatime,nodiratime,compress=no,space_cache,recovery,defaults 0 0
四、mount參數建議
https://btrfs.wiki.kernel.org/index.php/Mount_options
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/btrfs-mount.html
4.1 ssd相關參數建議
discard,ssd,ssd_spread

discard
    Use this option to enable discard/TRIM on freed blocks.
ssd
    Turn on some of the SSD optimized behaviour within btrfs. This is enabled automatically by checking /sys/block/sdX/queue/rotational to be zero. This does not enable discard/TRIM!

ssd_spread
    Mount -o ssd_spread is more strict about finding a large unused region of the disk for new allocations, which tends to fragment the free space more over time. It is often faster on the less expensive SSD devices. 廉價ssd硬盤建議開啟ssd_spread

nossd
The ssd mount option only enables the ssd option. Use the nossd option to disable it.

4.2 性能相關參數建議
noatime,nodiratime,space_cache

noatime,nodiratime
    as discussed in the mailing list noatime mount option might speed up your file system, especially in case you have lots of snapshots. Each read access to a file is supposed to update its unix access time. COW will happen and will make even more writes. Default is now relatime which updates access times less often.
space_cache
    Btrfs stores the free space data on-disk to make the caching of a block group much quicker. It's a persistent change and is safe to boot into old kernels.

4.3 其他建議參數建議
defaults,compress=no,recovery

compress=no
recovery
    Enable autorecovery upon mount; currently it scans list of several previous tree roots and tries to use the first readable. The information about the tree root backups is stored by kernels starting with 3.2, older kernels do not and thus no recovery can be done.
thread_pool=number 
    The number of worker threads to allocate.

4.4 Linux塊設備IO調度策略建議
    deadline

五、resize btrfs文件係統
btrfs文件係統整合了塊設備的管理,正如前麵所述,btrfs存儲了data, metadata, system三種數據類型。當任何一種數據類型需要空間時,btrfs會為對應的數據類型分配空間(block group),這些分配的空間就來自btrfs管理的塊設備。
所以,resize btrfs,實際上就是resize 塊設備的使用空間。對於單個塊設備的btrfs,resize btrfs root掛載點和resize block dev的效果是一樣的。

5.1 擴大
單位支持k,m,g。
# btrfs filesystem resize amount /mount-point
# btrfs filesystem show /mount-point
# btrfs filesystem resize devid:amount /mount-point
# btrfs filesystem resize devid:max /mount-point

對於單個塊設備的btrfs,不需要指定塊設備ID
# btrfs filesystem resize +200M /btrfssingle
Resize '/btrfssingle' of '+200M'

對於多個塊設備的btrfs,需要指定塊設備ID
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 00036b8e-7914-41a9-831a-d35c97202eeb
        Total devices 4 FS bytes used 2.12GiB
        devid    1 size 19.00GiB used 4.01GiB path /dev/sdb
        devid    2 size 20.00GiB used 4.01GiB path /dev/sdc
        devid    3 size 20.00GiB used 4.01GiB path /dev/sdd
        devid    4 size 20.00GiB used 4.01GiB path /dev/sde
[root@digoal ~]# btrfs filesystem resize '1:+1G' /data01
Resize '/data01' of '1:+1G'
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 00036b8e-7914-41a9-831a-d35c97202eeb
        Total devices 4 FS bytes used 2.12GiB
        devid    1 size 20.00GiB used 4.01GiB path /dev/sdb
        devid    2 size 20.00GiB used 4.01GiB path /dev/sdc
        devid    3 size 20.00GiB used 4.01GiB path /dev/sdd
        devid    4 size 20.00GiB used 4.01GiB path /dev/sde
可以指定max,表示使用塊設備的所有容量。
[root@digoal ~]# btrfs filesystem resize '1:max' /data01
Resize '/data01' of '1:max'

5.2 縮小
# btrfs filesystem resize amount /mount-point
# btrfs filesystem show /mount-point
# btrfs filesystem resize devid:amount /mount-point
類似:
# btrfs filesystem resize -200M /btrfssingle
Resize '/btrfssingle' of '-200M'

5.3 設置固定大小
# btrfs filesystem resize amount /mount-point
# btrfs filesystem resize 700M /btrfssingle
Resize '/btrfssingle' of '700M'

# btrfs filesystem show /mount-point
# btrfs filesystem resize devid:amount /mount-point

同樣支持max:
[root@digoal ~]# btrfs filesystem resize 'max' /data01
Resize '/data01' of 'max'
[root@digoal ~]# btrfs filesystem resize '2:max' /data01
Resize '/data01' of '2:max'
[root@digoal ~]# btrfs filesystem resize '3:max' /data01
Resize '/data01' of '3:max'
[root@digoal ~]# btrfs filesystem resize '4:max' /data01
Resize '/data01' of '4:max'

六、btrfs文件係統卷管理
btrfs文件係統多個塊設備如何管理
MULTIPLE DEVICES
       Before mounting a multiple device filesystem, the kernel module must know the association of the block devices that are attached to the filesystem UUID.

       There is typically no action needed from the user. On a system that utilizes a udev-like daemon(自動識別, 不需要scan, centos 7是這樣的), any new block device is automatically registered. The rules call btrfs device scan.

       The same command can be used to trigger the device scanning if the btrfs kernel module is reloaded (naturally all previous information about the device registration is lost).

       Another possibility is to use the mount options device to specify the list of devices to scan at the time of mount.

           # mount -o device=/dev/sdb,device=/dev/sdc /dev/sda /mnt

           Note
           that this means only scanning, if the devices do not exist in the system, mount will fail anyway. This can happen on systems without initramfs/initrd and root partition created with RAID1/10/5/6 profiles. The mount
           action can happen before all block devices are discovered. The waiting is usually done on the initramfs/initrd systems.
否則,在操作係統重啟或者btrfs模塊重載後,需要先scan 一下,才能mount使用了多個塊設備的btrfs。

七、負載均衡
使用raid0, raid10, raid5, raid6時,支持條帶,一個block group將橫跨多個塊設備,所以有負載均衡的作用。

八、單到多轉換
如果一開始btrfs隻用了一個塊設備,要轉換成raid1,如何轉換?
[root@digoal ~]# mkfs.btrfs -m single -d single -n 4096 -f /dev/sdb
btrfs-progs v4.3.1
See https://btrfs.wiki.kernel.org for more information.
Label:              (null)
UUID:               165f59f6-77b5-4421-b3d8-90884d3c0b40
Node size:          4096
Sector size:        4096
Filesystem size:    20.00GiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         single            8.00MiB
  System:           single            4.00MiB
SSD detected:       no
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1    20.00GiB  /dev/sdb

[root@digoal ~]# mount -o ssd,ssd_spread,discard,noatime,nodiratime,compress=no,space_cache,recovery,defaults /dev/sdb /data01
添加塊設備
[root@digoal ~]# btrfs device add /dev/sdc /data01 -f
使用balance在線轉換,其中-m指metadata, -d指data
[root@digoal ~]# btrfs balance start -dconvert=raid1 -mconvert=raid1 /data01
Done, had to relocate 3 out of 3 chunks
這裏的chunks指的就是block group.

[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 2 FS bytes used 360.00KiB
        devid    1 size 20.00GiB used 1.28GiB path /dev/sdb
        devid    2 size 20.00GiB used 1.28GiB path /dev/sdc

查看balance任務是否完成
[root@digoal ~]# btrfs balance status -v /data01
No balance found on '/data01'

還可以繼續轉換,例如data我想用raid0,可以這樣。
[root@digoal ~]# btrfs balance start -dconvert=raid0 /data01
Done, had to relocate 1 out of 3 chunks
這裏的chunks指的就是block group.

九、添加塊設備,數據重分布。
和前麵的轉換差不多,隻是不改-d -m的profile。
[root@digoal ~]# btrfs device add /dev/sdd/data01 -f
[root@digoal ~]# btrfs device add /dev/sde/data01 -f

[root@digoal ~]# btrfs filesystem show /dev/sdb
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 4 FS bytes used 616.00KiB
        devid    1 size 20.00GiB used 1.28GiB path /dev/sdb
        devid    2 size 20.00GiB used 1.28GiB path /dev/sdc
        devid    3 size 20.00GiB used 0.00B path /dev/sdd
        devid    4 size 20.00GiB used 0.00B path /dev/sde
數據重分布
[root@digoal ~]# btrfs balance start /data01
Done, had to relocate 3 out of 3 chunks
[root@digoal ~]# btrfs filesystem show /dev/sdb
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 4 FS bytes used 1.29MiB
        devid    1 size 20.00GiB used 1.03GiB path /dev/sdb
        devid    2 size 20.00GiB used 1.03GiB path /dev/sdc
        devid    3 size 20.00GiB used 2.00GiB path /dev/sdd
        devid    4 size 20.00GiB used 2.00GiB path /dev/sde
將metadata轉換為raid10存儲,重分布。
[root@digoal ~]# btrfs balance start -mconvert=raid10 /data01
Done, had to relocate 2 out of 3 chunks
[root@digoal ~]# btrfs filesystem show /dev/sdb
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 4 FS bytes used 1.54MiB
        devid    1 size 20.00GiB used 1.53GiB path /dev/sdb
        devid    2 size 20.00GiB used 1.53GiB path /dev/sdc
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    4 size 20.00GiB used 1.53GiB path /dev/sde
查看重分布後的三種類型的使用量。
[root@digoal ~]# btrfs filesystem df /data01
Data, RAID0: total=4.00GiB, used=1.25MiB
System, RAID10: total=64.00MiB, used=4.00KiB
Metadata, RAID10: total=1.00GiB, used=36.00KiB
GlobalReserve, single: total=4.00MiB, used=0.00B

十、刪除塊設備(必須確保達到該profile級別最小個數的塊設備)
[root@digoal ~]# btrfs filesystem df /data01
Data, RAID10: total=2.00GiB, used=1.00GiB
System, RAID10: total=64.00MiB, used=4.00KiB
Metadata, RAID10: total=1.00GiB, used=1.18MiB
GlobalReserve, single: total=4.00MiB, used=0.00B
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 4 FS bytes used 1.00GiB
        devid    1 size 20.00GiB used 1.53GiB path /dev/sdb
        devid    2 size 20.00GiB used 1.53GiB path /dev/sdc
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    4 size 20.00GiB used 1.53GiB path /dev/sde
因為raid10至少需要4個塊設備,所以刪除失敗
[root@digoal ~]# btrfs device delete /dev/sdb /data01
ERROR: error removing device '/dev/sdb': unable to go below four devices on raid10

先轉換為raid1,再演示
[root@digoal ~]# btrfs balance start -mconvert=raid1 -dconvert=raid1 /data01
Done, had to relocate 3 out of 3 chunks
[root@digoal ~]# btrfs filesystem df /data01
Data, RAID1: total=2.00GiB, used=1.00GiB
System, RAID1: total=32.00MiB, used=4.00KiB
Metadata, RAID1: total=1.00GiB, used=1.11MiB
GlobalReserve, single: total=4.00MiB, used=0.00B
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 4 FS bytes used 1.00GiB
        devid    1 size 20.00GiB used 1.03GiB path /dev/sdb
        devid    2 size 20.00GiB used 2.00GiB path /dev/sdc
        devid    3 size 20.00GiB used 2.00GiB path /dev/sdd
        devid    4 size 20.00GiB used 1.03GiB path /dev/sde
raid1最少隻需要2個塊設備,所以可以刪除兩個。
[root@digoal ~]# btrfs device delete /dev/sdb /data01
[root@digoal ~]# btrfs device delete /dev/sdc /data01
[root@digoal ~]# btrfs filesystem df /data01
Data, RAID1: total=2.00GiB, used=1.00GiB
System, RAID1: total=32.00MiB, used=4.00KiB
Metadata, RAID1: total=256.00MiB, used=1.12MiB
GlobalReserve, single: total=4.00MiB, used=0.00B
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 2 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 2.28GiB path /dev/sdd
        devid    4 size 20.00GiB used 2.28GiB path /dev/sde
繼續刪除則失敗
[root@digoal ~]# btrfs device delete /dev/sdd /data01
ERROR: error removing device '/dev/sdd': unable to go below two devices on raid1
再加回去
[root@digoal ~]# btrfs device add /dev/sdb /data01
[root@digoal ~]# btrfs device add /dev/sdc /data01
[root@digoal ~]# btrfs balance start /data01
Done, had to relocate 4 out of 4 chunks
轉換為raid5
[root@digoal ~]# btrfs balance start -mconvert=raid5 -dconvert=raid5 /data01
Done, had to relocate 4 out of 4 chunks
可以刪除1個,因為raid5最少需要3個塊設備
[root@digoal ~]# btrfs device delete /dev/sde /data01

[root@digoal ~]# btrfs filesystem df /data01
Data, RAID5: total=2.00GiB, used=1.00GiB
System, RAID5: total=64.00MiB, used=4.00KiB
Metadata, RAID5: total=1.00GiB, used=1.12MiB
GlobalReserve, single: total=4.00MiB, used=0.00B
[root@digoal ~]# btrfs filesystem show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 3 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    5 size 20.00GiB used 1.53GiB path /dev/sdb
        devid    6 size 20.00GiB used 1.53GiB path /dev/sdc

十、處理壞塊設備。
假設當前btrfs管理了3個塊設備,其中data profile=raid5, metadata profile=raid5, system profile=raid1
設置好這樣的狀態:
[root@digoal ~]# btrfs balance start -sconvert=raid1 -f /data01
Done, had to relocate 1 out of 3 chunks

[root@digoal ~]# btrfs fi show
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 3 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    5 size 20.00GiB used 1.50GiB path /dev/sdb
        devid    6 size 20.00GiB used 1.53GiB path /dev/sdc

[root@digoal ~]# btrfs fi df /data01
Data, RAID5: total=2.00GiB, used=1.00GiB
System, RAID1: total=32.00MiB, used=4.00KiB
Metadata, RAID5: total=1.00GiB, used=1.12MiB
GlobalReserve, single: total=4.00MiB, used=0.00B

刪除一個塊設備文件,模擬壞設備
[root@digoal ~]# rm -f /dev/sdb

[root@digoal ~]# btrfs fi df /data01
Data, RAID5: total=2.00GiB, used=1.00GiB
System, RAID1: total=32.00MiB, used=4.00KiB
Metadata, RAID5: total=1.00GiB, used=1.12MiB
GlobalReserve, single: total=4.00MiB, used=0.00B

現在btrfs顯示有一些設備處於missing狀態。
[root@digoal ~]# btrfs fi show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 3 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    6 size 20.00GiB used 1.53GiB path /dev/sdc
        *** Some devices missing

umount掉之後,就不能掛載上來了。必須使用degraded模式掛載。
[root@digoal ~]# umount /data01

[root@digoal ~]# mount /dev/sdc /data01
mount: wrong fs type, bad option, bad superblock on /dev/sdc,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.

dmesg|tail -n 5
[ 1311.617838] BTRFS: open /dev/sdb failed
[ 1311.618763] BTRFS info (device sdc): disk space caching is enabled
[ 1311.618767] BTRFS: has skinny extents
[ 1311.623540] BTRFS: failed to read chunk tree on sdc
[ 1311.648198] BTRFS: open_ctree failed

你可以看到超級塊在sdc sdd是好的。
[root@digoal ~]# btrfs rescue super-recover -v /dev/sdc
All Devices:
        Device: id = 3, name = /dev/sdd
        Device: id = 6, name = /dev/sdc

Before Recovering:
        [All good supers]:
                device name = /dev/sdd
                superblock bytenr = 65536

                device name = /dev/sdd
                superblock bytenr = 67108864

                device name = /dev/sdc
                superblock bytenr = 65536

                device name = /dev/sdc
                superblock bytenr = 67108864

        [All bad supers]:

All supers are valid, no need to recover
所以可以使用degraded掛載。
[root@digoal ~]# mount -t btrfs -o degraded /dev/sdc /data01

[root@digoal ~]# btrfs fi show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 3 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    6 size 20.00GiB used 1.53GiB path /dev/sdc
        *** Some devices missing

[root@digoal ~]# btrfs fi df /data01
Data, RAID5: total=2.00GiB, used=1.00GiB
System, RAID1: total=32.00MiB, used=4.00KiB
Metadata, RAID5: total=1.00GiB, used=1.12MiB
GlobalReserve, single: total=4.00MiB, used=0.00B

刪除missing的塊設備,同樣需要保證profile對應的級別,至少要滿足最少的數據塊格式,因為用了raid5,所以至少要3個塊設備。刪除失敗。
[root@digoal ~]# btrfs device delete missing /data01
ERROR: error removing device 'missing': unable to go below two devices on raid5

你可以先添加塊設備進來,然後再刪除missing的設備。
[root@digoal ~]# btrfs device add /dev/sde /data01

[root@digoal ~]# btrfs fi show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 4 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    6 size 20.00GiB used 1.53GiB path /dev/sdc
        devid    7 size 20.00GiB used 0.00B path /dev/sde
        *** Some devices missing

[root@digoal ~]# btrfs device delete missing /data01

[root@digoal ~]# btrfs fi show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 3 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    6 size 20.00GiB used 1.53GiB path /dev/sdc
        devid    7 size 20.00GiB used 1.50GiB path /dev/sde

重新平衡。
[root@digoal ~]# btrfs balance start /data01
Done, had to relocate 3 out of 3 chunks

[root@digoal ~]# btrfs fi show /data01
Label: none  uuid: 165f59f6-77b5-4421-b3d8-90884d3c0b40
        Total devices 3 FS bytes used 1.00GiB
        devid    3 size 20.00GiB used 1.53GiB path /dev/sdd
        devid    6 size 20.00GiB used 1.50GiB path /dev/sdc
        devid    7 size 20.00GiB used 1.53GiB path /dev/sde

[小結]
1. 建議的mkfs參數
多個塊設備時,建議
-n 4096 -m raid10 -d raid10
或
-n 4096 -m raid10 -d raid5
...
單個塊設備建議(非SSD)
-n 4096 -m DUP -d single
單個塊設備建議(SSD)
-n 4096 -m single -d single

2. 建議的mount參數
discard,ssd,ssd_spread,noatime,nodiratime,space_cache,defaults,compress=no,recovery

3. 建議的IO調度策略
deadline

4. btrfs 架構
搞清幾個概念:
1. block group, chunk
2. profile
3. 三種數據類型
4. block dev

5. 添加塊設備後,記得執行重分布。

6. 搞清楚man btrfs以及所有子命令所有的內容.

[參考]
1. man mkfs.btrfs
2. man btrfs
3. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/ch-btrfs.html
4. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/index.html
5. https://www.suse.com/events/susecon/sessions/presentations/SUSECon-2012-TT1301.pdf
6. https://www.suse.com/documentation/
7. https://wiki.gentoo.org/wiki/Btrfs
8. https://wiki.gentoo.org/wiki/ZFS

最後更新:2017-04-01 13:37:08

  上一篇:go 如何健壯你的後端服務?
  下一篇:go GPU---並行計算利器