This is an old revision of the document!
dataset="$1" keyname="${HOSTNAME}_${dataset//\//_}.key" b=/root/.zfs/vault p=$b/$keyname mkdir -p $b dd if=/dev/urandom of=$p bs=32 count=1 zfs create \ -o encryption=on \ -o keylocation=file://$p \ -o keyformat=raw \ tank/${dataset}
/etc/systemd/system/zfs-load-key@.service
[Unit] Description=Load ZFS keys DefaultDependencies=no Before=zfs-mount.service After=zfs-import.target Requires=zfs-import.target [Service] Type=oneshot RemainAfterExit=yes ExecStart=/sbin/zfs load-key %I [Install] WantedBy=zfs-mount.service
You need to escape the dataset path. %I
unescapes the /
when used in the unit file.
# systemd-escape --path "dozer/nextcloud/data" dozer-nextcloud-data
Note that the /
gets converted to -
# systemctl enable zfs-load-key@$(systemd-escape --path "dozer/nextcloud/data") Created symlink /etc/systemd/system/zfs-mount.service.wants/zfs-load-key@dozer-nextcloud-data.service -> /etc/systemd/system/zfs-load-key@.service.
zfs list -r tank/vault -H -o name | while read line; do zfs set keylocation=file:///etc/zfs/vault/vault.key "$line" zfs load-key "$line" done or zfs load-key -r tank/vault
root@luna:~# dd if=/dev/urandom of=/etc/zfs/vault/test.key bs=32 count=1 1+0 records in 1+0 records out 32 bytes copied, 0.000326268 s, 98.1 kB/s root@luna:~# zfs create -o encryption=on -o keylocation=file:///test.key -o keyformat=raw tank/test root@luna:~# zfs get encryption,keylocation,keyformat tank/test NAME PROPERTY VALUE SOURCE tank/test encryption aes-256-gcm - tank/test keylocation file:///test.key local tank/test keyformat raw - root@luna:~# zfs change-key -l -o keyformat=passphrase -o keylocation=prompt tank/test Enter new passphrase for 'tank/test': Re-enter new passphrase for 'tank/test': root@luna:~# zfs get encryption,keylocation,keyformat tank/test NAME PROPERTY VALUE SOURCE tank/test encryption aes-256-gcm - tank/test keylocation prompt local tank/test keyformat passphrase -
It's just two commands. There are many ways to specify the drive.
The general format is: zfs <command> <pool name> <old drive> [<new drive>]
zfs offline tank ata-ST3300831A_5NF0552X
Now physically replace the drive.
zfs replace tank ata-ST3300831A_5NF0552X /dev/disk/by-id/<new drive>
Check the status of the zpool. You will see output that says the drive is being replaced.
zpool status
1: https://askubuntu.com/questions/305830/replacing-a-dead-disk-in-a-zpool
As an FYI about what to do about this.
On 01/29/2018 01:12 PM, root wrote:
ZFS has detected an io error:
eid: 295
class: io
host: backup1
time: 2018-01-29 13:12:11-0600
vtype: disk
vpath: /dev/disk/by-id/ata-WDC_WD60PURX-64LZMY0_WD-WX21D65NV9YL-part1
vguid: 0xBC64F09126CA52D8
cksum: 0
read: 0
write: 0
pool: tank
root at backup1:~# zpool status pool: tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. see: http://zfsonlinux.org/msg/ZFS-8000-9P scan: scrub repaired 36K in 82h10m with 0 errors on Wed Jan 17 10:34:24 2018 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX11DA40HCD2 ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX21D1404379 ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX21D65NV9YL ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX21D65NVAAC ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX21D65NVAE9 ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX31D65A28YZ ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX31D65A2AZK ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX31D65A2H5K ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX31D65A2KTJ ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX31D65A2NU9 ONLINE 0 0 29 errors: No known data errors
1. The link (http://zfsonlinux.org/msg/ZFS-8000-9P/) it provides has useful info. You should read it.
2. root at backup1:~# smartctl -a /dev/disk/by-id/ata-WDC_WD60PURX-64LZMY0_WD-WX31D65A2NU9 | less SMART values 5,197,198,(187,188 are unavailable) are all at 0. So we shall proceed under the assumption that the disk is fine.
3. ZFS did recover, so there doesn't seem to be anything to do. Probably it was a bit flip, so we can safely clear the error.
root at backup1:~# zpool clear tank ata-WDC_WD60PURX-64LZMY0_WD-WX31D65A2NU9 root at backup1:~# zpool status pool: tank state: ONLINE scan: scrub repaired 36K in 82h10m with 0 errors on Wed Jan 17 10:34:24 2018 config: NAME STATE READ WRITE CKSUM tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX11DA40HCD2 ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX21D1404379 ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX21D65NV9YL ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX21D65NVAAC ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX21D65NVAE9 ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX31D65A28YZ ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX31D65A2AZK ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX31D65A2H5K ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX31D65A2KTJ ONLINE 0 0 0 ata-WDC_WD60PURX-64LZMY0_WD-WX31D65A2NU9 ONLINE 0 0 0 errors: No known data errors
4. If errors keep happening to this particular drive ZFS will mark it as faulty, and when/if that happens we just replace the disk.
More info:
Looking at the previous emails we have the following IO errors reported by ZED on backup1.
2018-01-29: /dev/disk/by-id/ata-WDC_WD60PURX-64LZMY0_WD-WX21D65NV9YL-part1 2018-03-14: /dev/disk/by-id/ata-WDC_WD60PURX-64LZMY0_WD-WX21D65NVAAC-part1 2018-04-10: /dev/disk/by-id/ata-WDC_WD60PURX-64LZMY0_WD-WX21D65NV9YL-part1 2018-05-10: /dev/disk/by-id/ata-WDC_WD60PURX-64LZMY0_WD-WX21D65NVAAC-part1 2018-05-17: /dev/disk/by-id/ata-WDC_WD60PURX-64LZMY0_WD-WX21D65NVAE9-part1
Taking this last IO error as an example:
root@backup1:~# smartctl -A /dev/disk/by-id/ata-WDC_WD60PURX-64LZMY0_WD-WX21D65NVAE9 | grep -E '^ 5|^197|^198|^187|^188' 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0
Because SMART values differ from drive to drive / manufacturer to manufacturer interpreting SMART values is HARD! Based on the Backblaze data releases we can use the SMART values 5,187,188,197,198 as a base line. If any of these RAW values is above 0 the likelihood that the drive will fail increases (not linear). Because the RAW values on this particular drive are 0, this IO error is probably nothing to worry about (zfs did recover).
Probably we are dealing with firmware that isn't optimized for how we are using these drives. Also they are cheaper drives so I would expect these IO errors to be normal for these WD purples.
There is a scrub still running so we should find out more when it's done.
root@backup1:~# zpool status | head -n5 pool: tank state: ONLINE scan: scrub in progress since Sun May 13 00:24:01 2018 12.9T scanned out of 19.0T at 35.7M/s, 49h46m to go 0 repaired, 67.97% done
root@sendbox:~# zfs allow senduser send,hold pool/ds root@recvbox:~# zfs allow recvuser receive,create,mount pool
Prevent snapshots from being destroyed. https://docs.oracle.com/cd/E19253-01/819-5461/gjdfk/index.html
List all holds in all pools.
zfs get -Ht snapshot userrefs | grep -v $'\t'0 | cut -d $'\t' -f 1 | tr '\n' '\0' | xargs -0 zfs holds
List holds in specific dataset.
zfs list -H -r -d 1 -t snapshot -o name nameoffilesystem | xargs zfs holds
#!/bin/bash { echo -e "NAME\tUSED"; for snapshot in $(zfs list -Hpr "$1" -t snapshot -o name -s creation -d 1); do echo -ne "${snapshot/@/@%}\t" zfs destroy -nv "${snapshot/@/@%}" | sed -nre "s/would reclaim (.+)/\1/p" done } | column -t