ZFS ‘Failed to start Mark current ZSYS boot as successful’ fix
On Ubuntu 20.04 after installing the NVIDIA driver 510 metapackage the system stopped booting.
It will either hang with a black screen and blinking cursor on the top left or show the following error message:
[FAILED] Failed to start Mark current ZSYS boot as successful.
See 'systemctl status zsys-commit.service' for details.
[ OK ] Stopped User Manager for UID 1000.
Attempting to revert from a snapshot ends up with the same error message. This wasn’t the case on another separate system that had the same upgrade.
The “20.04 zys-commit.service fails” message is quite interesting and it seems that the overall cause is a mismatch of user/kernel zfs components.
These are the steps I followed to fix it. Many thanks to Lockszmith for his research in identifying the issue and finding a fix. He created two posts raising it, links provided here.
[In GRUB]
*Advanced options for Ubuntu 20.04.3 LTS
[Select the first recovery option in the menu]
*Ubuntu 20.04.3 LTS, with Linux 5.xx.x-xx-generic (recovery mode)
[Wait for the system to load the menu and select:]
root
[Press Enter for Maintenance to get the CLI]
Check the reason for the error.
# systemctl status zsys-commit.service
[...]
Feb 17 11:11:24 ab350 systemd[1] zsysctl[4068]: level=error msg="couldn't commit: couldn't promote dataset "rpool/ROOT/ubuntu_733qyk": couldn't promote "rpool/ROOT/ubuntu_733qyk": not a cloned filesystem"
[...]
Attempting to promote it manually fails:
# zfs promote rpool/ROOT/ubuntu_733qyk
cannot promote `rpool/ROOT/ubuntu_733qyk` : not a cloned filesystem
[boot in recovery mode]
# apt reinstall zfs-initramfs zfs-zed zfsutils-linux
# zfs promote rpool/ROOT/ubuntu_733qyk
[reboot in normal mode]
[Configure the 470 drivers]
Reverting to previous ZFS version
The system should now be back to normal, but you might want to revert to the mainline ZFS version despite the bug. After all, this was a hack to promote the filesystem and get it back to work.
# add-apt-repository --remove ppa:jonathonf/zfs
[Check that is has been removed]
$ apt policy
# apt update
[Pray]
# apt remove zfs-initramfs zfs-zed zfsutils-linux
# apt install zfs-initramfs zfs-zed zfsutils-linux
[Check the right version is installed]
# apt list --installed | grep zfs
# apt autoremove
[Pray harder]
# reboot
With that I managed to bring my system back to a working condition, but updating the drivers a second time made it fail again and I couldn’t fix it. A clean install of 20.04.3 doesn’t seem to exhibit this problem. Not sure what is the reason behind it but there are a few bugs open with Canonical regarding this.
I hope that 22.04 will bring a better ZFS version.
Raspberry Pi : Configuring a Time Capsule/Backintime server
You are going to have to create users for each of the services/users that will be connecting to the server. You want to keep files and access as isolated as possible. As in a given user shouldn’t have any visibility or notion of other users’ backups. We are also creating accounts that can’t login into the system for Time Machine, only authenticate.
If required, the default shell can be changed with:
# usermod -s /usr/sbin/nologin timemachine_john
Setting up backup user groups
If more than one system is going to be backed up it is advisable to use different accounts for each.
It is possible to isolate users by assigning them individual datasets, but that might create storage silos.
An alternative is to create individual users that belong to the same backup group. The backup group can access the backintime dataset, but not each other’s data.
Create the group.
# addgroup backupusers
Assign main group and secondary group (the secondary group would be the shared one).
Edit the settings of the netatalk service so that that share can be seen with the name of your choice and work as a Time Capsule server.
# vim /etc/netatalk/AppleVolumes.default
Enter the following:
/backups/timecapsule "pi-capsule" options:tm
Note that you can give the capsule a name with spaces above.
Restart the service:
# systemctl restart netatalk
Check that netatalk has been installed correctly:
# afpd -V
afpd 3.1.12 - Apple Filing Protocol (AFP) daemon of Netatalk
[...]
afpd has been compiled with support for these features:
AFP versions: 2.2 3.0 3.1 3.2 3.3 3.4
CNID backends: dbd last tdb
Zeroconf support: Avahi
TCP wrappers support: Yes
Quota support: Yes
Admin group support: Yes
Valid shell checks: Yes
cracklib support: No
EA support: ad | sys
ACL support: Yes
LDAP support: Yes
D-Bus support: Yes
Spotlight support: Yes
DTrace probes: Yes
afp.conf: /etc/netatalk/afp.conf
extmap.conf: /etc/netatalk/extmap.conf
state directory: /var/lib/netatalk/
afp_signature.conf: /var/lib/netatalk/afp_signature.conf
afp_voluuid.conf: /var/lib/netatalk/afp_voluuid.conf
UAM search path: /usr/lib/netatalk//
Server messages path: /var/lib/netatalk/msg/
Configure netatalk
# vim /etc/nsswitch.conf
Change this line:
hosts: files mdns4_minimal [NOTFOUND=return] dns
to this:
hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4 mdns
Note that if you are running Netatalk 3.1.11 or above it is not necessary any more to create the /etc/avahi/services/afpd.service. Using this file will cause an error.
If you are running an older version go ahead, otherwise jump to the next section.
[Global]
; Global server settings
mimic model = TimeCapsule6,106
[pi-capsule]
path = /backups/timecapsule
time machine = yes
Check configuration and reload if needed:
# systemctl status avahi-daemon
[restart if necessary]
# systemctl restart netatalk
[Make the service automatically start]
# systemctl enable netatalk.service
If you go to your Mac’s Time Machine preferences the new volume will be available and you can start using it.
netatalk troubleshooting
Some notes of things to check from the server side (Time Capsule server):
If you have disabled passwords and are only using keys, you will need to temporarily change the security settings to allow Backintime to exchange keys.
On the remote system/Pi/server:
# vim /etc/ssh/sshd_config
PasswordAuthentication yes
# systemctl restart ssh
Backintime uses SSH, so the user accounts need to be allowed to login. Therefore the default login shell needs to reflect this.
If not created already, assign the user a home directory. Finally, allow the user to read and write the folder containing the backups.
[Local system]
The client machine that is running Backintime and that you want to backup your data from.
[Remote system]
The SSH server that has the storage where your backup is going to be stored.
From the local system account you want to run backintime (either your user or root, depending on how you run Backintime) SSH into the remote system. In my case, a Raspberry Pi.
Auto-remove
Older than 10 years
If free space is less than 50GiB
If free inodes is less than 2%
Smart remove
Run in background on remote Host
Keep last
14 days (7 days)
21 days (14 days)
8 weeks (6 weeks)
36 months (14 months)
Don't remove named snapshots
Options
Enable notifications
Backup replaced files on restore
Continue on errors (keep incomplete snapshots)
Log level: Changes & Errors
After the first run has completed you can check which is the best performing cipher from the CLI.
# backintime benchmark-cipher --profile-id 2
After a few rounds, aes192-ctr came out as the best performing cipher for me.
Secure SSH
If you changed the SSH configuration at the beginning, after setting everything up, remember to secure SSH again on the server/remote system.
# vim /etc/ssh/sshd_config
PasswordAuthentication no
# systemctl restart ssh
Restoring restrictions to backup users
The login account is required for Backintime to be able to run rsync. It is worth doing a bit more research on how to harden/limit these accounts.
Troubleshooting
Some examples of some issues and some troubleshooting steps you can apply.
Time Capsule can’t be reached / firewall settings
Make sure the server is allowing AFP connections from the Mac client.
# ufw allow proto tcp from CLIENT_IP to PI_CAPSULE_IP port 548
Time Capsule – Configuring Time Machine backups via the network on a macOS VM
The destination needs to be configured manually.
Mount the AFP/Time Capsule mount via the Finder.
In the CLI configure the destination:
# tmutil setdestination -a /Volumes/pi-capsule
The backups can then be started from the GUI.
You can get information about the current configured destinations via the CLI.
# tmutil destinationinfo
====================================
Name : pi-capsule
Kind : Network
Mount Point : /Volumes/pi-capsule
ID : 7B648734-9BFC-417F-B5A1-F31B8AD52F4B
Time Capsule – Checking backup status
# tmutil currentphase
# tmutil status
ZFS stalling on a Raspberry Pi
Check the recordsize property. Reduce it to the default 128 kiB.
Reduce ARC size to reduce the amount of memory consumed/reserved for ZFS.
Understanding rsync logs
The logs indicate the type of change rsync is seeing. A reference is available here:
XYcstpoguax path/to/file
|||||||||||
||||||||||╰- x: The extended attribute information changed
|||||||||╰-- a: The ACL information changed
||||||||╰--- u: The u slot is reserved for future use
|||||||╰---- g: Group is different
||||||╰----- o: Owner is different
|||||╰------ p: Permission are different
||||╰------- t: Modification time is different
|||╰-------- s: Size is different
||╰--------- c: Different checksum (for regular files), or
|| changed value (for symlinks, devices, and special files)
|╰---------- the file type:
| f: for a file,
| d: for a directory,
| L: for a symlink,
| D: for a device,
| S: for a special file (e.g. named sockets and fifos)
╰----------- the type of update being done::
<: file is being transferred to the remote host (sent)
>: file is being transferred to the local host (received)
c: local change/creation for the item, such as:
- the creation of a directory
- the changing of a symlink,
- etc.
h: the item is a hard link to another item (requires
--hard-links).
.: the item is not being updated (though it might have
attributes that are being modified)
*: means that the rest of the itemized-output area contains
a message (e.g. "deleting")
If you are new to ZFS, I would advise doing a little bit of research first to understand the fundamentals. Jim Salter’s articles on storage and ZFS are very recommended.
The examples below are to create a pool from a single disk, with separate datasets used for network backups.
In some examples, I might use device names for simplicity, but you are advised to use disks IDs or serials.
Installing ZFS
Ubuntu makes it very easy.
# apt install zfsutils-linux
ZFS Cockpit module
If Cockpit is installed, it is possible to install a module for ZFS. This module is sadly no longer in development. If you know of alternatives, please share!
By default, the configuration runs the following snapshots and retention policies:
Period
Retention
Hourly
24 hours
Daily
31 days
Weekly
Eight weeks
Monthly
12 months
I configured the following snapshot retention policy:
Period
Retention
Hourly
48 hours
Daily
14 days
Weekly
Four weeks
Monthly
Three months
Hourly
# vim /etc/cron.hourly/zfs-auto-snapshot
#!/bin/sh
# Only call zfs-auto-snapshot if it's available
which zfs-auto-snapshot > /dev/null || exit 0
exec zfs-auto-snapshot --quiet --syslog --label=hourly --keep=48 //
Daily
# vim /etc/cron.daily/zfs-auto-snapshot
#!/bin/sh
# Only call zfs-auto-snapshot if it's available
which zfs-auto-snapshot > /dev/null || exit 0
exec zfs-auto-snapshot --quiet --syslog --label=daily --keep=14 //
Weekly
# vim /etc/cron.weekly/zfs-auto-snapshot
#!/bin/sh
# Only call zfs-auto-snapshot if it's available
which zfs-auto-snapshot > /dev/null || exit 0
exec zfs-auto-snapshot --quiet --syslog --label=weekly --keep=4 //
Monthly
# vim /etc/cron.monthly/zfs-auto-snapshot
#!/bin/sh
# Only call zfs-auto-snapshot if it's available
which zfs-auto-snapshot > /dev/null || exit 0
exec zfs-auto-snapshot --quiet --syslog --label=monthly --keep=3 //
Setting up the ZFS pool
This post has several use cases and examples, and I recommend it highly if you want further details on different commands and ways to configure your pools.
In my example there is no resilience, as there is only one attached disk. For me, this is acceptable because I have an additional local backup besides this filesystem.
It is preferable to have a second backup (ideally off-site) than a single one regardless of any added resilience you might set.
I create a single pool with an external drive. Read below for an explanation of the different command flags.
Of the above values, the most important one by far is ashift.
The ashift property sets the block size of the vdev. It can’t be changed once set, and if it isn’t correct, it will cause massive performance issues with the filesystem.
recordsize is another performance impacting property, especially on the Raspberry Pi. Smaller sizes can improve performance when accessing random batches, but higher values will provide better performance and compression when reading sequential data. The problem on the Raspberry Pi has been that with a value of 1M the system load increased, eventually stopping the filesystem activity until the system was restarted.
The default value (128k) has performed without any noticeable issue.
Compression
lz4 compression is going to yield an optimum performance/compression ratio. It will make the storage perform faster than if there is no compression.
ZFS 0.8 doesn’t give many choices regarding compression but bear in mind that you can change the algorithm on a live system.
gzip will impact performance but yields a higher compression rate. It might be worth checking the performance with different compression formats on the Pi 4. With older Raspberry Pi models, the limitation will be the USB / network in most cases.
For reference, on the same amount of data these were the compression ratios I obtained:
All in all, the performance impact and memory consumption didn’t make switching from lz4 worthwhile.
Permissions
acltype=posixacl xattr=sa
It enables the POSIX ACLs and Linux Extended Attributes on the inodes rather than on separate files.
Access times
atime is recommended to be disabled (off) to reduce the number of IOPS.
relatime offers a good compromise between the atime and notime behaviours.
Normalisation
The normalization property indicates whether a file system should perform a Unicode normalisation of file names whenever two file names are compared and which normalisation algorithm should be used.
formD is the default set by Canonical when setting up a pool. It seems to be a good choice if sharing the volume via NFS with macOS systems and avoiding files not being displayed due to names using non-ASCII characters.
Additional properties
The pool is configured with the canmount property off so that it can’t be mounted.
This is because I will be creating separate datasets, one for Time Capsule backups, and another two for Backintime, and I don’t want them to mix.
All datasets will share the same pool, but I don’t want the pool root to be mounted. Only datasets will mount.
dnodesize is set to auto, as per several recommendations when datasets are using the xattr=sa property.
sync is set as standard. There is a performance hit for writes, but disabling it comes at the expense of data consistency if there is a power cut or similar.
A brief test showed a lower system load when sync=standard than with sync=disabled. Also, with standard there were fewer spikes. It is likely that the performance is lower, but it certainly causes the system to suffer less.
Encryption
I am not too keen to encrypt physically secure volumes because when doing data recovery, you are adding an additional layer that might hamper and slow things down.
For reference, I am writing down an example of encryption options using an external key for a volume. This might not be appropriate for your particular scenario. Research alternatives if needed.
Automatic trimming of the pool is essential for SSDs:
# zpool set autotrim=on backup_pool
Disabling automatic mount for the pool. (This applies only to the root of the pool, the datasets can still be set to be mountable regardless of this setting.)
# zfs set canmount=off backup_pool
Setting up the ZFS datasets
I will create three separate datasets with assigned quotas for each.
[Create datasets]
# zfs create backup_pool/backintime_tuxedo
# zfs create backup_pool/backintime_ab350
# zfs create backup_pool/timecapsule
[Set mountpoints]
# zfs set mountpoint=/backups/backintime_tuxedo backup_pool/backintime_tuxedo
# zfs set mountpoint=/backups/backintime_ab350 backup_pool/backintime_ab350
# zfs set mountpoint=/backups/timecapsule backup_pool/timecapsule
[Set quotas]
# zfs set quota=2T backup_pool/backintime_tuxedo
# zfs set quota=2T backup_pool/backintime_ab350
# zfs set quota=2T backup_pool/timecapsule
Changing compression on a dataset
The default lz4 compression is recommended. gzip consumes a lot of CPU and makes data transfers slower, impacting backups restoration.
If you still want to change the compression for a given dataset:
# zfs set compression=gzip-7 backup_pool/timecapsule
A comparison of compression and decompression using different algorithms with OpenZFS:
You can add an additional disk/partition and make the pool redundant in a RAID-Z configuration. Unfortunately, it doesn’t work to make it a RAID-Z2 or RAID-Z3.
# zpool attach backup_pool /dev/sda7 /dev/sdb7
Renaming disks in pools
By default, Ubuntu uses device identifiers for the disks. This should not be an issue, but in some cases, adding or connecting drives might change the device name order and degrade one or more pools.
This is why creating a pool with disk IDs or serials is recommended. You can still fix this if you created your pool using device names.
With the pool unmounted, export it, and reimport pointing to the right path:
ZFS should be running on a system with at least 4GiB of RAM. If you plan to use it on a Raspberry Pi (or any other system with limited resources), reduce the ARC size.
In this case, I am limiting it to 3GiB. It is a change that can be done live:
Linux / Ubuntu / hdparm: Identifying drive features and setting sleep patterns
Preparing the storage
Install hdparm and smartmontools
Install hdparm and the SMART monitoring tools.
# apt install hdparm smartmontools
Identify the right hard drive
Make sure you identify the correct drive, as some of the commands will destroy data. If you don’t understand the commands, then check them first. You have been warned.
Identify the block size
Knowing the block size of the device is important. It will help optimising writes, and in the case of SSD or flash drives avoid write amplification and wear and tear.
Pay attention to the physical/optimal size. This is the one that matters.
SSDs will hide the true size of the pages and blocks. Even the same drive models might be built with different components, so getting it right is tricky.
Use the drive’s sector physical size to match the ZFS ashift (block size).
Retrieve drive IDs
When setting ZFS pools or using disk tools it is best to avoid using device names as they can easily change their order. Using the drive ID or serial will ensure that no matter in which port or in which order the drives are plugged it will be the correct drive selected.
This matters with any disk accessing utility if you have several drives, or will be inserting external drives regularly.
$ ls -l /dev/disk/by-id/
[...]
lrwxrwxrwx 1 root root 9 Mar 9 13:16 usb-TOSHIBA_External_USB_3.0_20150612015531-0:0 -> ../../sda
[...]
You can also extract model and serial numbers with hdparm.
Even better, depending on the use of the drive, and if there is a plan to add mirror drives, is to partition the drive to ensure there is enough space if a different drive model is later added. Although I believe ZFS already does this and rounds down partitions using Mebibytes.
Test for damaged sectors
An additional and optional step is to test the hard drive for damaged sectors. This kind of test tends to be destructive so it is best if it is done before configuring the pools.
badblocks is a useful tool to achieve this.
It is installed by default, but if not you can do it manually.
# apt install e2fsprogs
A destructive test can be done with:
# badblocks -wsv -b 4096 /dev/sda
If you want to run the test while preserving the disk data you can run it in a non-destructive way. This will take longer.
# badblocks -nsv -b 4096 /dev/sda
ZFS has built-in checks and protection so in most cases you can skip this step.
Setting hard drive sleep patterns
Above I explained that using disk IDs is always a better idea. For simplicity, I will be using device names in several examples below, but I still advise using IDs or serials.
Check if the disk supports sleep
Check if the drive supports standby.
# hdparm -y /dev/sda
If supported the output will be:
/dev/sda:
issuing standby command
Any other output might indicate that the drive doesn’t support sleep, or that a different tool/setting might be required.
Next, check if the drive supports write cache:
# hdparm -I /dev/sda | grep -i 'Write cache'
The expected output is:
* Write cache
The * indicates that the feature is supported.
An example of a complete hdparm output from a drive is shown below for reference. Different drives, with different features, will show different output, or even none at all.
# hdparm -I /dev/sda
/dev/sda:
ATA device, with non-removable media
Model Number: TOSHIBA MD04ACA500
Serial Number: 55OBK0SPFPHC
Firmware Revision: FP2A
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Supported: 8 7 6 5
Likely used: 8
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 9767541168
Logical Sector size: 512 bytes
Physical Sector size: 4096 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 4769307 MBytes
device size with M = 1000*1000: 5000981 MBytes (5000 GB)
cache/buffer size = unknown
Form Factor: 3.5 inch
Nominal Media Rotation Rate: 7200
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, no device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: 128
DMA: sdma0 sdma1 sdma2 mdma0 mdma1 *mdma2 udma0 udma1 udma2 udma3 udma4 udma5
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
Enabled Supported:
* SMART feature set
Security Mode feature set
* Power Management feature set
* Write cache
* Look-ahead
* Host Protected Area feature set
* WRITE_BUFFER command
* READ_BUFFER command
* NOP cmd
* DOWNLOAD_MICROCODE
* Advanced Power Management feature set
SET_MAX security extension
* 48-bit Address feature set
* Device Configuration Overlay feature set
* Mandatory FLUSH_CACHE
* FLUSH_CACHE_EXT
* SMART error logging
* SMART self-test
* General Purpose Logging feature set
* WRITE_{DMA|MULTIPLE}_FUA_EXT
* 64-bit World wide name
* WRITE_UNCORRECTABLE_EXT command
* {READ,WRITE}_DMA_EXT_GPL commands
* Segmented DOWNLOAD_MICROCODE
unknown 119[7]
* Gen1 signaling speed (1.5Gb/s)
* Gen2 signaling speed (3.0Gb/s)
* Gen3 signaling speed (6.0Gb/s)
* Native Command Queueing (NCQ)
* Host-initiated interface power management
* Phy event counters
* Host automatic Partial to Slumber transitions
* Device automatic Partial to Slumber transitions
* READ_LOG_DMA_EXT equivalent to READ_LOG_EXT
DMA Setup Auto-Activate optimization
Device-initiated interface power management
* Software settings preservation
* SMART Command Transport (SCT) feature set
* SCT Write Same (AC2)
* SCT Error Recovery Control (AC3)
* SCT Features Control (AC4)
* SCT Data Tables (AC5)
* reserved 69[3]
Security:
Master password revision code = 65534
supported
not enabled
not locked
not frozen
not expired: security count
supported: enhanced erase
more than 508min for SECURITY ERASE UNIT. more than 508min for ENHANCED SECURITY ERASE UNIT.
Logical Unit WWN Device Identifier: 500003964bc01970
NAA : 5
IEEE OUI : 000039
Unique ID : 64bc01970
Checksum: correct
An example of a complete smartctl output from a drive is shown below also for reference. As mentioned earlier, different systems will generate different outputs.
# smartctl --all /dev/sda
smartctl 7.1 2019-12-30 r5022 [aarch64-linux-5.4.0-1029-raspi] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Toshiba 3.5" MD04ACA... Enterprise HDD
Device Model: TOSHIBA MD04ACA500
Serial Number: 55OBK0SPFPHC
LU WWN Device Id: 5 000039 64bc01970
Firmware Version: FP2A
User Capacity: 5,000,981,078,016 bytes [5.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Mon Mar 8 15:02:10 2021 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART Status not supported: Incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
was never started.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 533) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 9003
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 9222
5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 6418
10 Spin_Retry_Count 0x0033 253 100 030 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 9212
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 482
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 104
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 9225
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 37 (Min/Max 15/72)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 253 000 Old_age Always - 0
220 Disk_Shift 0x0002 100 100 000 Old_age Always - 0
222 Loaded_Hours 0x0032 085 085 000 Old_age Always - 6393
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0
224 Load_Friction 0x0022 100 100 000 Old_age Always - 0
226 Load-in_Time 0x0026 100 100 000 Old_age Always - 214
240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 5617 -
# 2 Short offline Completed without error 00% 4702 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
More information about hdparm and smartctl is available on the following sites.
Depending on the drive manufacturer and model you might need to query the settings with different flags. Check the man page.
[Get/set the Western Digital Green Drive's "idle3" timeout value.]
# hdparm -J /dev/sd[a-e]
/dev/sda:
wdidle3 = 300 secs (or 13.8 secs for older drives)
/dev/sdb:
wdidle3 = 8.0 secs
/dev/sdc:
wdidle3 = 300 secs (or 13.8 secs for older drives)
/dev/sdd:
wdidle3 = 300 secs (or 13.8 secs for older drives)
/dev/sde:
wdidle3 = 300 secs (or 13.8 secs for older drives)
From the man page:
A setting of 30 seconds is recommended for Linux use. Permitted values are from 8 to 12 seconds, and from 30 to 300 seconds in 30-second increments. Specify a value of zero (0) to disable the WD idle3 timer completely (NOT RECOMMENDED!).
There are flags for temperature (-H for Hitachi drives), acoustic management (-M), measuring cache performance (-T), and others. Go on, read that man page. 🙂
The -S flag sets the standby/spindown timeout for the drive. Basically, how long the drive will wait with no disk activity before turning off the motor.
Value
Description
0
Disable the feature.
1 to 240
Five seconds multiples (a value of 120 means 10 minutes).
241 to 251
Thirty minutes intervals (a value of 242 means 1 hour).
Note that hdparm might wake the drive up when is queried. smartctl can query the drive without waking it.
# APM setting (-B)
apm = 127
# APM setting while on battery (-B)
apm_battery = 127
# on/off drive's write caching feature (-W)
write_cache = on
# Standby (spindown) timeout for drive (-S)
spindown_time = 120
# Western Digital (WD) Green Drive's "idle3" timeout value. (-J)
wdidle3 = 300
hdparm.conf method
Edit the configuration file:
# vim /etc/hdparm.conf
And insert an entry for each drive. Select only settings/features/values that are supported by that drive, otherwise the rest of the options won’t be applied. Test, test, test!
In my case, the above method works. I couldn’t get this one to work on my system, but it could be because of the OS. I am leaving it for reference in case it might be of help.
# vim /etc/udev/rules.d/69-disk.rules
Create an entry for each drive editing the serial number and hdparm parameters. Make sure that only supported flags are added or it will fail.
SNFS/Xsan: Quantum SNFS metadata controller and Xsan client compatibility chart
In a previous life, I designed and built many SANs based on Xsan (I believe I started with Xsan 1.3). I then migrated to looking after SANs based on SNFS, either from 3rd party vendors, or Quantum.
I believe that the age of Fibre Channel is long over (although SNFS also works on Infiniband if I recall correctly). The advantages of block-level access have been eclipsed by the much higher bandwidth with Ethernet, at a fraction of the cost.
The information has been collected from Apple support articles (current and obsolete ones), ADIC’s and Quantum’s StorNext documentation, and personal experience.
Every Xsan 2.0 and above client has been included. Maybe one day I will add Xsan 1.x releases for historical purposes
Xsan 20.0
Xsan 5.0.1
Xsan 5
Xsan 4.1
Xsan 4
Xsan 3.1
Xsan 3
Xsan 2.3
Xsan 2.2 to 2.2.2
Xsan 2 to 2.1.1
11.0.1
10.13, 10.14, 10.15
10.12
10.11
10.10
10.9
10.8
10.7
10.6
10.5
SNFS 7.0.x
✓
✓
✓
❌
❌
❌
❌
❌
❌
❌
SNFS 6.4.0
✓
✓
✓
❌
❌
❌
❌
❌
❌
❌
SNFS 6.3.x
❌
✓
✓
❌
❌
❌
❌
❌
❌
❌
SNFS 6.2.x
❌
✓
✓
❌
❌
❌
?
?
?
SNFS 6.1.x
❌
✓
✓
❌
❌
?
?
?
?
❌
SNFS 6.0.5, 6.0.5.1, 6.0.6
❌
✓
✓
❌
❌
?
?
?
?
❌
SNFS 6.0, 6.01, 6.0.1.1
❌
✓
✓
✓
✓
?
?
?
?
❌
SNFS 5.4.x
❌
✓
✓
✓
✓
?
?
?
?
❌
SNFS 5.3.2.x
❌
✓
✓
✓
✓
✓
✓
✓
✓
❌
SNFS 5.3.1
❌
❌
❌
✓
✓
✓
✓
✓
✓
❌
SNFS 5.3.0
❌
❌
❌
✓
✓
✓
✓
✓
✓
❌
SNFS 5.2.2
❌
❌
❌
✓
✓
✓
✓
✓
✓
❌
SNFS 5.2.1
❌
❌
❌
❌
✓
✓
✓
✓
✓
❌
SNFS 5.2.0
❌
❌
❌
❌
✓
✓
✓
✓
✓
❌
SNFS 5.1.x
❌
❌
❌
❌
❌
❌
✓
❌
❌
❌
SNFS 5.0.x
❌
❌
❌
❌
❌
❌
✓
❌
❌
❌
SNFS 4.7.x
❌
❌
❌
❌
❌
✓
✓
✓
✓
❌
SNFS 4.6
❌
❌
❌
❌
❌
✓
✓
✓
✓
❌
SNFS 4.3
❌
❌
❌
❌
❌
✓
✓
✓
✓
❌
SNFS 4.2.1
❌
❌
❌
❌
❌
✓
✓
✓
✓
❌
SNFS 4.2.0
❌
❌
❌
❌
❌
❌
❌
❌
✓
❌
SNFS 4.1.1 to 4.1.3
❌
❌
❌
❌
❌
❌
❌
✓
✓
✓
SNFS 4.0 to 4.1
❌
❌
❌
❌
❌
❌
❌
❌
✓
✓
SNFS 3.5.x
❌
❌
❌
❌
❌
❌
❌
❌
✓
✓
SNFS 3.1.2 to 3.1.5
❌
❌
❌
❌
❌
❌
❌
❌
❌
✓
SNFS controller and Xsan client compatibility chart.
There are some caveats with some of the supported configurations. Some releases were originally marked by Apple as incompatible and then reverted. In the same way, some configurations that were originally marked as working were then updated as not compatible.
Double-check official documentation before any deployment.
I hope you find this table useful. There are some additional Xsan curiosities I will post in the future.
VirtualBox/KVM: Reduce VM sizes
There are two utilities that can help discard unused blocks so that VMs can be shrunk.
zerofree finds unused blocks with non-zero content in ext2, ext3 and ext4 filesystems and fills them with zeros. The volume can’ be mounted which makes the process of running it a bit convoluted.
fstrim will discard unused blocks on a mounted filesystem. It is best and preferred when working with SSD drives and thinly provisioned storage. It will work with more filesystems, and it won’t hammer your SSD with unnecessary writes.
It is recommended to use fstrim and only use zerofree if unavoidable.
CentOS 7/8
fstrim
# fstrim -va
zerofree (ext2, ext3, ext4)
# yum install epel-release
# yum install zerofree
[Reboot]
Press e on GRUB menu
Go to line that starts with 'linux'
Add init=/bin/bash
Ctrl-X
[Find which disk to trim]
# df
# zerofree -v /dev/mapper/centos_centos7-root
[Shutdown machine]
zerofree (xfs)
# yum install epel-release
# yum install zerofree
[Reboot]
Press e on GRUB menu
Go to line that starts with 'linux'
Change ro to rw
Add init=/bin/bash
Ctrl-X
[Find the partition/filesystem to trim]
# df
[Fill the filesystem with zeros. This will work with any filesystem but it will write a lot of data on your drives.]
# dd if=/dev/zero of=/tmp/dd bs=$((1024*1024)); rm /tmp/dd
# sync
# exit
[Shutdown machine]
Debian 9/10
fstrim
[Debian 9]
# fstrim -va
[Debian 10]
# fstrim -vA
zerofree
# apt install zerofree
[Reboot]
Press e on GRUB menu
Go to line that starts with 'linux'
Add init=/bin/bash
Ctrl-X
[Find disk to trim]
# df
# zerofree -v /dev/sda1
[Shutdown machine]
Be aware that if you are using ZFS on Ubuntu (or any other distro) the above commands won’t work. In fact, it will generate a lot of extra writes on the filesystem.
Just ensure that ZFS is using compression, or avoid it in the guest system.
Reducing the image size
Virtualbox
[List all disks]
$ vboxmanage list hdds
[Just the paths]
$ vboxmanage list hdds | grep 'Location.*.vdi' | awk '{$1=""}1'
[Compress one image]
$ vboxmanage modifymedium disk --compact /home/user/Virtualbox/Kali-Linux-2021.1/Kali-Linux-2020.4-vbox-amd64-disk001.vdi
[List all images path]
$ vboxmanage list hdds | grep 'Location.*.vdi' | awk '{$1=""}1' | sed 's/^ /"/;s/$/"/'
I wish I knew the syntax to automatise compressing all the images with one line. I might revisit it in the future with a script.
#!/bin/sh
# All images
for file_name in `ls -1 *.cow2`
do
echo
echo ==================
echo Image: $file_name
echo -n Old `qemu-img info $file_name | grep 'disk\ size'` ; echo
mv $file_name $file_name.tmp
qemu-img convert -O qcow2 $file_name.tmp $file_name
rm $file_name.tmp
echo -n New `qemu-img info $file_name | grep 'disk\ size'` ; echo
echo ==================
done
Ubuntu: ZFS bpool is full and not running snapshots during apt updates
When running apt to update my system I kept seeing a message saying that bpool had less than 20% space free and that the automatic snapshotting would not run.
What I didn’t realise is that this would apply to the rpool even if it had plenty of free space. They are run together and have to match. Checking the snapshots it seems they had stopped running for several months. Yikes!
You can list the current snapshots in several ways:
[List existing snapshots with their names and creation date.]
$ zsysctl show
Name: rpool/ROOT/ubuntu_dd5xf4
ZSys: true
Last Used: current
History:
- Name: rpool/ROOT/ubuntu_dd5xf4@autozsys_qfi5pz
Created on: 2021-01-12 23:35:01
- Name: rpool/ROOT/ubuntu_dd5xf4@autozsys_1osqbq
Created on: 2021-01-12 23:33:22
You can also use the zfs commands for the same purpose.
List existing snapshots with default properties information
(name, used, references, mountpoint)
$ zfs list -t snapshot
You can also list the creation date asking for the creation property.
$ zfs list -t snapshot -o name,creation
It should list then in creation order, but if not, you can use -s option to sort them.
$ zfs list -t snapshot -o name,creation -s creation
Deciding which snapshots to delete will vary. You might want to get rid of the older ones, or maybe the ones that are consuming the most space.
My snapshots were a few months old so there wasn’t much point in keeping them. I deleted all with the following one-liner:
[-H removes headers]
[-o name displays the name of the filesystem]
[-t snapshot displays only snapshots]
# zfs list -H -o name -t snapshot | grep auto | xargs -n1 zfs destroy
I can’t stress how important it is that whatever zfs destroy command you issue, especially if doing several automatic iterations, only applies to the snapshots you want to.
You can delete filesystems, volumes and snapshots with the above command. Deleting snapshots isn’t an issue. Deleting the filesystem is a pretty big one.
Please, ensure that the command lists only snapshots you want to remove before running it. You have been warned.
Ubuntu 20.04: Install Ubuntu with ZFS and encryption
Ubuntu 20.04 offers installing ZFS as the default filesystem. This has lots of advantages. My favourite is being able to revert the system and home partitions (simultaneously or individually) to a previous state through the boot menu.
One major drawback for me is the lack of an option to encrypt the filesystem during the installation.
You have the option to use LUKS and ext4 but there isn’t an encryption option in the installer for ZFS.
Some people have used LUKS and ZFS in the past, but that solution didn’t quite work for me. The tutorials I saw were using LUKS1 instead of LUKS2 and it also felt that the approach was cumbersome now that ZFS on Linux supports native encryption.
The more you deviate from a standard installation the more complicated it will be to do any troubleshooting if anything breaks in the future. Keep it simple.
The ZFS on Linux version included with the 20.04 installer is 0.8.3.
The installation of Ubuntu 20.04 on ZFS will create two pools: bpool and rpool.
bpool contains the boot partition and rpool all the other mountpoints in several datasets.
In a very security minded world both pools should be encrypted, but I prefer not encrypt the boot partition. Adding that extra layer of security might make a system recovery that much more difficult or impossible.
The default partitioning during the install creates four partitions and two ZFS pools, using all the storage in the installation disk:
/boot/efi
512MiB
EFI System Partition (vfat)
SWAP
2GiB
Linux Swap Partition (swap)
bpool
2GiB
ZFS/Solaris boot partition (zfs)
rpool
all remaining space
ZFS/Solaris root partition (zfs)
To encrypt the rpool we will need to edit the installation script.
Replace PASSWORD with the encryption password you want to use. You will be prompted to type this at boot time.
Save the changes to the file and exit.
Launch the installer:
# ubiquity
Install Ubuntu as you would. In the storage section:
Select “Use entire disk”
Select ZFS (Experimental)
The system will be installed with the encryption options set on the script and on boot it will prompt you with the password you setup.
Some comments on the options for reference:
-o ashift=12 This is the default setting that means that your disk’s block size is 4,096 bytes (2^12=4,096). Valid values are:
0 for autodetect sector size 9 for 512 bytes 10 for 1,024 bytes 11 for 2,048 12 for 4,096 13 for 8,192 14 for 16,384 15 for 32,768 16 for 65,536
You can output the physical sector size with lsblk -t, although values of 512 might be simulated. You should check the specifications if the drive is SSD.
Alternative ways to retrieve physical sector sizes are:
A value of 12 will work just fine, even on 512 sector drives and likely being the reason for Canonical setting up as the default.
If set too low this can have a huge and negative impact on performance.
-O recordsize=1M Other tutorials suggest creating this entry. According to Oracle’s documentation this parameter is used for databases and I have read that it can also be used for certain types of VMs.
The default value is 128k. You can tune it for your individual use by changing the record size of an existing pool. Any new files created will use the new record size value. You can cp/rm files to force them to be rewritten with the new value.
You can change this value later on with:
# zfs set recordsize=128k rpool
or
# zfs set recordsize=128k rpool/filesystem
-O encryption=aes-256-gcm AES with key lengths of 128, 192 and 256 bits in CCM and GCM operation modes are supported natively. 0.8.4 comes with a fix that improves performance with AES-GCM and should hopefully be included in an update to Ubuntu soon.
-O keylocation=prompt Valid options are prompt or file:// </absolute/file/path>
Prompt will ask you to type the password, in this case during boot. File will point to the location of the decryption key, but on a portable system it would defy its purpose.
-O keyformat=passphrase Options are raw, hex or passphrase. When using passphrase the password can be between 8 and 512 bytes in length.
In the past it used to be a very straightforward process. You would rename the volume configuration file and run cvfsck.
With newer versions if you try to do that you will get an error message.
To make the name change you can use the cvupdatefscommand.
If you have more than one volume running, the below instructions will allow you to rename one volume while the rest are still running, minimising downtime.
Stopping the file system
Stop the file system
# cvadmin -e 'stop oldname_volume1'
Check that it hasn't failed over.
# cvadmin -e 'select'
If it has failed over to another server just run the first command again until the volume you want to rename isn't running.
Check the filesystem
# cvfsck -j oldname_volume1
# cvfsck -nvvvvv oldname_volume1
If errors are shown at the above you need to fix them. Ideally you want to dump inode information before any big repair but that is for another article.
You can fix the errors with:
# cvfsck -vvvvv
Run the above command until there are no errors shown.
Changing the volume name
You can now change the volume name.
# cvupdatefs -R newname_volume2 oldname_volume1
Update the name of the volume in fmslist.
SNFS
# vim /usr/cvfs/config/fsmlist
Xsan
# vim /Library/Preferences/Xsan/fsmlist
In Xsan you need to push the changes to the second metadata server.
# xsanctl pushConfigUpdate
In Xsan you might need to check that the name isn’t referenced in any other configuration file (/Library/Preferences/Xsan), but you can run grep and see where you might need to make changes.
Also in Xsan, if needed, copy the configuration file to the second metadata server. Be aware that Xsan Admin does very often fail to make a good copy of the configuration to the second server. Run a file checksum on both ends and copy the volume configuration file manually if it doesn’t match.
This issue with Xsan Admin will in the best case not allow a volume to fail over, and in the worst case, cause data loss.
In SNFS/Linux you should check for any references of the old name in /usr/cvfs/config/
Also in SNFS/Linux, make sure that the changed files are also updated on the second metadata server.
Remounting the filesystem
In Xsan you don’t need to issue a new profile for the clients to mount the new volume. Just mount it once from the CLI and it will automount on restart:
# xsanctl mount newname_volume2
On Linux clients update entries in /etc/vstab or /etc/fstab to automount the volume on boot.
On Windows clients you will need to use the SNFS configuration tool to mount the newly named volume.