ZFS Disabling the ZIL

Disabling the ZIL (Don’t)

ZIL stands for ZFS Intent Log. It is used during synchronous writes operations. The ZIL is an essential part of ZFS and should never be disabled. Significant performance gains can be achieved by not having the ZIL, but that would be at the expense of data integrity. One can be infinitely fast, if correctness is not required.

One reason to disable the ZIL is to check if a given workload is significantly impacted by it. A little while ago, a workload that was a heavy consumer of ZIL operations was shown to not be impacted by disabling the ZIL. It convinced us to look elsewhere for improvements. If the ZIL is shown to be a factor in the performance of a workload, more investigation is necessary to see if the ZIL can be improved.

The Solaris Nevada release now has the option of storing the ZIL on separate devices from the main pool. Using separate possibly low latency devices for the Intent Log is a great way to improve ZIL sensitive loads.

Caution: Disabling the ZIL on an NFS server will lead to client side corruption. The ZFS pool integrity itself is not compromised by this tuning.

Current Solaris Releases

If you must, then:

echo zil_disable/W0t1 | mdb -kw

Revert to default:

echo zil_disable/W0t0 | mdb -kw

RFEs

* zil synchronicity

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6280630

Further Reading

http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on http://blogs.sun.com/erickustarz/entry/zil_disable http://blogs.sun.com/roch/entry/nfs_and_zfs_a_fine

ZFS Disabling Metadata Compression

Disabling Metadata Compression

Caution: This tuning needs to be researched as it’s now apparent that the tunable applies only to indirect blocks leaving a lot of metadata compressed anyway.

With ZFS, compression of data blocks is under the control of the file system administrator and can be turned on or off by using the command "zfs set compression …".

On the other hand, ZFS internal metadata is always compressed on disk, by default. For metadata intensive loads, this default is expected to gain some amount of space (a few percentages) at the expense of a little extra CPU computation. However, a bigger motivation exists to have metadata compression on. For directories that grow to millions of objects then shrink to just a few, metadata compression saves large amounts of space (>>10X).

In general, metadata compression can be left as is. If your workload is CPU intensive (say > 80% load) and kernel profiling shows medata compression is a significant contributor and we are not expected to create and shrink huge directories, then disabling metadata compression can be attempted with the goal of providing more CPU to handle the workload.

Solaris 10 11/06 and Solaris Nevada (snv_52) Releases

Set dynamically:

echo zfs_mdcomp_disable/W0t1 | mdb -kw

Revert to default:

echo zfs_mdcomp_disable/W0t0 | mdb -kw

Set the following parameter in the /etc/system file:

set zfs:zfs_mdcomp_disable = 1

Earlier Solaris Releases

Not tunable.

RFEs

* 6391873 metadata compression should be turned back on (Integrated in NEVADA snv_36)

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6391873

ZFS and heavily cached disk arrays like StorageTek/Engenio

ZFS really has some interesting quirks. One of them is that it is truly designed to deal with dumb-as-a-rock storage. If you have a box of SATA disks with firmware flakier than Paris Hilton on a coke binge, then ZFS has truly been designed for you.

As a result, ZFS doesn’t trust that anything it writes to the ZFS Intent Log (ZIL) made it to your storage, until it flushes the storage cache. After every write to the ZIL, ZFS executes an fsync() call to instruct the storage to flush its write cache to the disk. In fact, ZFS won’t return on a write operation until the ZIL write and flush have completed. If the devices making up your zpool are individual hard drives…particularly SATA ones…this is a great behavior. If the power goes kaput during a write, you don’t have the problem that the write made it to drive cache but never to the disk.

The major problem with this strategy only occurs when you try to layer ZFS over an intelligent storage array with a decent battery-backed cache.

Most of these arrays have sizable 2GB or greater caches with 72-hour batteries. The cache gives a huge performance boost, particularly on writes. Since cache is so much faster than disk, the array can tell the writer really quickly, "I’ve got it from here, you can go back to what you were doing". Essentially, as fast as the data goes into the cache, the array can release the writer. Unlike the drive-based caches, the array cache has a 72-hour battery attached to it. So, if the array loses power and dies, you don’t lose the writes in the cache. When the array boots back up, it flushes the writes in the cache to the disk. However, ZFS doesn’t know that its talking to an array, so it assumes that the cache isn’t trustworthy, and still issues an fsync() after every ZIL write. So every time a ZIL write occurs, the write goes into the array write cache, and then the array is immediately instructed to flush the cache contents to the disk. This means ZFS doesn’t get the benefit of a quick return from the array, instead it has to wait the amount of time it takes to flush the write cache to the slow disks. If the array is under heavy load and the disks are thrashing away, your write return time (latency) can be awful with ZFS. Even when the array is idle, your latency with flushing is typically higher than the latency under heavy load with no flushing. With our array honoring ZFS ZIL flushes, we saw idle latencies of 54ms, and heavy load latencies of 224ms.

You have two options to rid yourself of the bane of existence known as write cache flushing:

* Disable the ZIL. The ZIL is the way ZFS maintains consistency until it can get the blocks written to their final place on the disk. That’s why the ZIL flushes the cache. If you don’t have the ZIL and a power outage occurs, your blocks may go poof in your server’s RAM…’cause they never made it to the disk Kemosabe. See Dracko article #570 on how to disable ZIL

* Tell your array to ignore ZFS’ flush commands. This is pretty safe, and massively beneficial.

The former option, is really a no go because it opens you up to losing data. The second option really works well and is darn safe. It ends up being safe because if ZFS is waiting for the write to complete, that means the write made it to the array, and if its in the array cache you’re golden. Whether famine or flood or a loose power cable come, your array will get that write to the disk eventually. So its OK to have the array lie to ZFS and release ZFS almost immediately after the ZIL flush command executes.

So how do you get your array to ignore SCSI flush commands from ZFS? That differs depending on the array, but I can tell you how to do it on an Engenio array. If you’ve got any of the following arrays, its made by Engenio and this may work for you:

* Sun StorageTek FlexLine 200/300 series
* Sun StorEdge 6130
* Sun StorageTek 6140/6540
* IBM DS4x00
* many SGI InfiniteStorage arrays (you’ll need to check to make sure your array is actually OEM’d from Engenio)

On a StorageTek FLX210 with SANtricity 9.15, the the following command script will instruct the array to ignore flush commands issued by Solaris hosts:

//Show Solaris ICS option
show controller[a] HostNVSRAMbyte[0x2, 0x21];
show controller[b] HostNVSRAMbyte[0x2, 0x21];

//Enable ICS
set controller[a] HostNVSRAMbyte[0x2, 0x21]=0x01;
set controller[b] HostNVSRAMbyte[0x2, 0x21]=0x01;

// Make changes effective
// Rebooting controllers
show "Rebooting A controller.";
reset controller[a];

show "Rebooting B controller.";
reset controller[b];

If you notice carefully, I said the script will cause the array to ignore flush commands from Solaris hosts. So all Solaris hosts attached to the array will have their flush commands ignored. You can’t turn this behavior on and off on a per host basis. To run this script, cut and paste the script into the script editor of the "Enterprise Management Window" of the SANtricity management GUI. That’s it! A key note here is that you should definitely have your server shut down, or at minimum your ZFS zpool exported before you run this. Otherwise, when your array reboots ZFS will kernel panic the server. In our experience, this will happen even if you only reboot one controller at a time, waiting for one controller to come back online before rebooting the other. For whatever reason, MPXIO which normally works beautifully to keep a LUN available when losing a controller, fails miserably with this situation. Its probably the array’s fault, but whatever the issue, that’s the reality. Plan for downtime when you do this.

Attaching and Detaching Devices in a ZFS Storage Pool

Attaching and Detaching Devices in a ZFS Storage Pool

In addition to the zpool add command, you can use the zpool attach command to add a new device to an existing mirrored or non-mirrored device. For example:

# zpool attach zeepool c1t1d0 c2t1d0

If the existing device is part of a two-way mirror, attaching the new device, creates a three-way mirror, and so on. In either case, the new device begins to resilver immediately.

In is example, zeepool is an existing two-way mirror that is transformed to a three-way mirror by attaching c2t1d0, the new device, to the existing device, c1t1d0.

You can use the zpool detach command to detach a device from a pool. For example:

# zpool detach zeepool c2t1d0

However, this operation is refused if there are no other valid replicas of the data. For example:

# zpool detach newpool c1t2d0
cannot detach c1t2d0: only applicable to mirror and replacing vdevs

Onlining and Offlining Devices in a ZFS Storage Pool

Onlining and Offlining Devices in a ZFS Storage Pool

ZFS allows individual devices to be taken offline or brought online. When hardware is unreliable or not functioning properly, ZFS continues to read or write data to the device, assuming the condition is only temporary. If the condition is not temporary, it is possible to instruct ZFS to ignore the device by bringing it offline. ZFS does not send any requests to an offlined device.
Note

Devices do not need to be taken offline in order to replace them.

You can use the offline command when you need to temporarily disconnect storage. For example, if you need to physically disconnect an array from one set of Fibre Channel switches and connect the array to a different set, you could take the LUNs offline from the array that was used in ZFS storage pools. After the array was reconnected and operational on the new set of switches, you could then bring the same LUNs online. Data that had been added to the storage pools while the LUNs were offline would resilver to the LUNs after they were brought back online.

This scenario is possible assuming that the systems in question see the storage once it is attached to the new switches, possibly through different controllers than before, and your pools are set up as RAID-Z or mirrored configurations.
Taking a Device Offline

You can take a device offline by using the zpool offline command. The device can be specified by path or by short name, if the device is a disk. For example:

# zpool offline tank c1t0d0
bringing device c1t0d0 offline

You cannot take a pool offline to the point where it becomes faulted. For example, you cannot take offline two devices out of a RAID-Z configuration, nor can you take offline a top-level virtual device.

# zpool offline tank c1t0d0
cannot offline c1t0d0: no valid replicas

Note

Currently, you cannot replace a device that has been taken offline.

Offlined devices show up in the OFFLINE state when you query pool status. For information about querying pool status, see Querying ZFS Storage Pool Status.

By default, the offline state is persistent. The device remains offline when the system is rebooted.

To temporarily take a device offline, use the zpool offline t option. For example:

# zpool offline -t tank c1t0d0
bringing device ‘c1t0d0’ offline

When the system is rebooted, this device is automatically returned to the ONLINE state.

For more information on device health, see Health Status of ZFS Storage Pools.
Bringing a Device Online

Once a device is taken offline, it can be restored by using the zpool online command:

# zpool online tank c1t0d0
bringing device c1t0d0 online

When a device is brought online, any data that has been written to the pool is resynchronized to the newly available device. Note that you cannot use device onlining to replace a disk. If you offline a device, replace the drive, and try to bring it online, it remains in the faulted state.

If you attempt to online a faulted device, a message similar to the following is displayed from fmd:

# zpool online tank c1t0d0
Bringing device c1t0d0 online
#
SUNW-MSG-ID: ZFS-8000-D3, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Fri Mar 17 14:38:47 MST 2006
PLATFORM: SUNW,Ultra-60, CSN: -, HOSTNAME: neo
SOURCE: zfs-diagnosis, REV: 1.0
EVENT-ID: 043bb0dd-f0a5-4b8f-a52d-8809e2ce2e0a
DESC: A ZFS device failed. Refer to http://sun.com/msg/ZFS-8000-D3 for more information.
AUTO-RESPONSE: No automated response will occur.
IMPACT: Fault tolerance of the pool may be compromised.
REC-ACTION: Run ‘zpool status -x’ and replace the bad device.

Remote Replication of ZFS Data

Remote Replication of ZFS Data

You can use the zfs send and zfs recv commands to remotely copy a snapshot stream representation from one system to another system. For example:

# zfs send tank/cindy@today | ssh newsys zfs recv sandbox/restfs@today

This command saves the tank/cindy@today snapshot data and restores it into the sandbox/restfs file system and also creates a restfs@today snapshot on the newsys system. In this example, the user has been configured to use ssh on the remote system.

ZFS mount: cd .. or ls .. permission denied

Problem : When moving up from one ZFS mount point to a parent directory (also a zfs mount point) using the ".. " notation through cd or ls permission is denied, however cd and ls will access directories when explicitly naming the directory.

ZFS mounts store the . and .. entries separately for their mounted and unmounted states. So when the zfs mount is in it’s mounted state, the . and .. commands will work properly, however if they mount is in the unmounted state an access denied response will display for all non root users. This is because zfs . and .. entries are owned by root in the unmounted state by default, and permissions for them need to be set separately when the zfs mount is unmounted in order for other users to use those entries.

This particular issue will cause problems in situations such as applying patches to those mount directories for programs such as oracle.

ZFS Command cheatsheet















































































































































































What You Do and See



Why



$ man zpool
$ man zfs


Get familiar with command structure and options



$ su
Password:
# cd /
# mkfile 100m disk1 disk2 disk3 disk5
# mkfile 50m disk4
# ls -l disk*
-rw——T 1 root root 104857600 Sep 11 12:15 disk1
-rw——T 1 root root 104857600 Sep 11 12:15 disk2
-rw——T 1 root root 104857600 Sep 11 12:15 disk3
-rw——T 1 root root 52428800 Sep 11 12:15 disk4
-rw——T 1 root root 104857600 Sep 11 12:15 disk5


Create some “virtual devices” or vdevs as described
in the zpool documentation. These can also be real disk slices if
you have them available.



# zpool create myzfs /disk1 /disk2
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
myzfs 191M 94K 191M 0% ONLINE –


Create a storage pool and check the size and usage.



# zpool status -v
pool: myzfs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk2 ONLINE 0 0 0

errors: No known data errors



Get more detailed status of the zfs storage pool.



# zpool destroy myzfs
# zpool list
no pools available


Destroy a zfs storage pool



# zpool create myzfs mirror /disk1 /disk4
invalid vdev specification
use ‘-f’ to override the following errors:
mirror contains devices of different sizes


Attempt to create a zfs pool with different size vdevs fails.
Using -f options forces it to occur but only uses space allowed by
smallest device.



# zpool create myzfs mirror /disk1 /disk2 /disk3
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
myzfs 95.5M 112K 95.4M 0% ONLINE –
# zpool status -v
pool: myzfs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk2 ONLINE 0 0 0
/disk3 ONLINE 0 0 0

errors: No known data errors



Create a mirrored storage pool. In this case, a 3 way mirrored
storage pool.



# zpool detach myzfs /disk3
# zpool status -v
pool: myzfs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk2 ONLINE 0 0 0

errors: No known data errors



Detach a device from a mirrored pool.



# zpool attach myzfs /disk1 /disk3
# zpool status -v
pool: myzfs
state: ONLINE
scrub: resilver completed with 0 errors on Tue Sep 11 13:31:49 2007
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk2 ONLINE 0 0 0
/disk3 ONLINE 0 0 0

errors: No known data errors



Attach device to pool. This creates a two-way mirror is the
pool is not already a mirror, else it adds another mirror, in this
case making it a 3 way mirror.



# zpool remove myzfs /disk3
cannot remove /disk3: only inactive hot spares can be removed
# zpool detach myzfs /disk3


Attempt to remove a device from a pool. In this case it’s
a mirror, so we must use “zpool detach”.



# zpool add myzfs spare /disk3
# zpool status -v
pool: myzfs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk2 ONLINE 0 0 0
spares
/disk3 AVAIL

errors: No known data errors



Add a hot spare to a storage pool.



# zpool remove myzfs /disk3
# zpool status -v
pool: myzfs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk2 ONLINE 0 0 0

errors: No known data errors



Remove a hot spare from a pool.



# zpool offline myzfs /disk1
# zpool status -v
pool: myzfs
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning
in a degraded state.
action: Online the device using ‘zpool online’ or replace the device
with ‘zpool replace’.
scrub: resilver completed with 0 errors on Tue Sep 11 13:39:25 2007
config:

NAME STATE READ WRITE CKSUM
myzfs DEGRADED 0 0 0
mirror DEGRADED 0 0 0
/disk1 OFFLINE 0 0 0
/disk2 ONLINE 0 0 0

errors: No known data errors



Take the specified device offline. No attempt to read or write
to the device will take place until it’s brought back
online. Use the -t option to temporarily offline a device. A
reboot will bring the device back online.



# zpool online myzfs /disk1
# zpool status -v
pool: myzfs
state: ONLINE
scrub: resilver completed with 0 errors on Tue Sep 11 13:47:14 2007
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk2 ONLINE 0 0 0

errors: No known data errors



Bring the specified device online.



# zpool replace myzfs /disk1 /disk3
# zpool status -v
pool: myzfs
state: ONLINE
scrub: resilver completed with 0 errors on Tue Sep 11 13:25:48 2007
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk3 ONLINE 0 0 0
/disk2 ONLINE 0 0 0

errors: No known data errors



Replace a disk in a pool with another disk, for example when a
disk fails



# zpool scrub myzfs


Perform a scrub of the storage pool to verify that it checksums
correctly. On mirror or raidz pools, ZFS will automatically repair
any damage.
WARNING: scrubbing is I/O
intensive.



# zpool export myzfs
# zpool list
no pools available


Export a pool from the system for importing on another system.



# zpool import -d / myzfs
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
myzfs 95.5M 114K 95.4M 0% ONLINE –


Import a previously exported storage pool. If -d is not
specified, this command searches /dev/dsk. As we’re using
files in this example, we need to specify the directory of the
files used by the storage pool.



# zpool upgrade
This system is currently running ZFS pool version 8.

All pools are formatted using this version.
# zpool upgrade -v
This system is currently running ZFS pool version 8.

The following versions are supported:

VER DESCRIPTION
— ——————————————————–
1 Initial ZFS version
2 Ditto blocks (replicated metadata)
3 Hot spares and double parity RAID-Z
4 zpool history
5 Compression using the gzip algorithm
6 pool properties
7 Separate intent log devices
8 Delegated administration
For more information on a particular version, including supported
releases, see:

http://www.opensolaris.org/os/community/zfs/version/N

Where ‘N’ is the version number.



Display pools format version. The -v flag shows the features
supported by the current version. Use the -a flag to upgrade all
pools to the latest on-disk version. Pools that are upgraded will
no longer be accessible to any systems running older versions.



# zpool iostat 5
capacity operations bandwidth
pool used avail read write read write
———- —– —– —– —– —– —–
myzfs 112K 95.4M 0 4 26 11.4K
myzfs 112K 95.4M 0 0 0 0
myzfs 112K 95.4M 0 0 0 0


Get I/O statistics for the pool



# zfs create myzfs/colin
# df -h
Filesystem kbytes used avail capacity Mounted on

myzfs/colin 64M 18K 63M 1% /myzfs/colin


Create a file system and check it with standard df -h command.
File systems are automatically mounted by default under the /zfs
location. See the Mountpoints section of the zfs man page for more
details.



# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 139K 63.4M 19K /myzfs
myzfs/colin 18K 63.4M 18K /myzfs/colin


List current zfs file systems.



# zpool add myzfs /disk1
invalid vdev specification
use ‘-f’ to override the following errors:
mismatched replication level: pool uses mirror and new vdev is file


Attempt to add a single vdev to a mirrored set fails



# zpool add myzfs mirror /disk1 /disk5
# zpool status -v
pool: myzfs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
//disk3 ONLINE 0 0 0
//disk2 ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk5 ONLINE 0 0 0

errors: No known data errors



Add a mirrored set of vdevs



# zfs create myzfs/colin2
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 172K 159M 21K /myzfs
myzfs/colin 18K 159M 18K /myzfs/colin
myzfs/colin2 18K 159M 18K /myzfs/colin2


Create a second file system. Note that both file system show
159M available because no quotas are set. Each “could”
grow to fill the pool.



# zfs set reservation=20m myzfs/colin
# zfs list -o reservation
RESERV
none
20M
none


Reserve a specified amount of space for a file system ensuring
that other users don’t take up all the space.



# zfs set quota=20m myzfs/colin2
# zfs list -o quota myzfs/colin myzfs/colin2
QUOTA
none
20M


Set and view quotas



# zfs set compression=on myzfs/colin2
# zfs list -o compression
COMPRESS
off
off
on


Turn on and verify compression



# zfs snapshot myzfs/colin@test
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 20.2M 139M 21K /myzfs
myzfs/colin 18K 159M 18K /myzfs/colin
myzfs/colin@test 0 – 18K –
myzfs/colin2 18K 20.0M 18K /myzfs/colin2


Create a snapshot called test.



# zfs rollback myzfs/colin@test


Rollback to a snapshot.



# zfs clone myzfs/colin@test myzfs/colin3
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 20.2M 139M 21K /myzfs
myzfs/colin 18K 159M 18K /myzfs/colin
myzfs/colin@test 0 – 18K –
myzfs/colin2 18K 20.0M 18K /myzfs/colin2
myzfs/colin3 0 139M 18K /myzfs/colin3


A snapshot is not directly addressable. A clone must be made.
The target dataset can be located anywhere in the ZFS hierarchy,
and will be created as the same type as the original.



# zfs destroy myzfs/colin2
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 20.1M 139M 22K /myzfs
myzfs/colin 18K 159M 18K /myzfs/colin
myzfs/colin@test 0 – 18K –
myzfs/colin3 0 139M 18K /myzfs/colin3


Destroy a filesystem



# zfs destroy myzfs/colin
cannot destroy ‘myzfs/colin’: filesystem has children
use ‘-r’ to destroy the following datasets:
myzfs/colin@test


Attempt to destroy a filesystem that had a child. In this case,
the snapshot filesystem. We must either remove the snapshot, or
make a clone and promote the clone.



# zfs promote myzfs/colin3
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 20.1M 139M 21K /myzfs
myzfs/colin 0 159M 18K /myzfs/colin
myzfs/colin3 18K 139M 18K /myzfs/colin3
myzfs/colin3@test 0 – 18K –
# zfs destroy myzfs/colin
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 147K 159M 21K /myzfs
myzfs/colin3 18K 159M 18K /myzfs/colin3
myzfs/colin3@test 0 – 18K –


Promte a clone filesystem to no longer be a dependent on it’s
“origin” snapshot. This now associates makes the
snapshot a child of the cloned filesystem. We can then delete the
original filesystem.



# zfs rename myzfs/colin3 myzfs/bob
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 153K 159M 21K /myzfs
myzfs/bob 18K 159M 18K /myzfs/bob
myzfs/bob@test 0 – 18K –
# zfs rename myzfs/bob@test myzfs/bob@newtest
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 146K 159M 20K /myzfs
myzfs/bob 18K 159M 18K /myzfs/bob
myzfs/bob@newtest 0 – 18K –


Rename a filesystem, and separately rename the snapshot.



# zfs get all
NAME PROPERTY VALUE SOURCE
myzfs type filesystem –
myzfs creation Tue Sep 11 14:21 2007 –
myzfs used 146K –
myzfs available 159M –
myzfs referenced 20K –
[…]


Display properties for the given datasets. This can be refined
further using options.



# zpool destroy myzfs
cannot destroy ‘myzfs’: pool is not empty
use ‘-f’ to force destruction anyway


Can’t destroy a pool with active filesystems.



# zfs unmount myzfs/bob
# df -h
myzfs 159M 20K 159M 1% /myzfs


Unmount a ZFS file system



# zfs mount myzfs/bob
# df -h
myzfs 159M 20K 159M 1% /myzfs
myzfs/bob 159M 18K 159M 1% /myzfs/bob


Mount a ZFS filesystem. This is usually automatically done on
boot.



# zfs send myzfs/bob@newtest | ssh localhost zfs receive myzfs/backup
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 172K 159M 20K /myzfs
myzfs/backup 18K 159M 18K /myzfs/backup
myzfs/backup@newtest 0 – 18K –
myzfs/bob 18K 159M 18K /myzfs/bob
myzfs/bob@newtest 0 – 18K –


Create a stream representation of the snapshot and redirect it
to zfs receive. In this example I’ve redirected to the
localhost for illustration purposes. This can be used to backup to
a remote host, or even to a local file.



# zpool history
History for ‘myzfs’:
2007-09-11.15:35:50 zpool create myzfs mirror /disk1 /disk2 /disk3
2007-09-11.15:36:00 zpool detach myzfs /disk3
2007-09-11.15:36:10 zpool attach myzfs /disk1 /disk3
2007-09-11.15:36:53 zpool detach myzfs /disk3
2007-09-11.15:36:59 zpool add myzfs spare /disk3
2007-09-11.15:37:09 zpool remove myzfs /disk3
2007-09-11.15:37:18 zpool offline myzfs /disk1
2007-09-11.15:37:27 zpool online myzfs /disk1
2007-09-11.15:37:37 zpool replace myzfs /disk1 /disk3
2007-09-11.15:37:47 zpool scrub myzfs
2007-09-11.15:37:57 zpool export myzfs
2007-09-11.15:38:05 zpool import -d / myzfs
2007-09-11.15:38:52 zfs create myzfs/colin
2007-09-11.15:39:27 zpool add myzfs mirror /disk1 /disk5
2007-09-11.15:39:38 zfs create myzfs/colin2
2007-09-11.15:39:50 zfs set reservation=20m myzfs/colin
2007-09-11.15:40:18 zfs set quota=20m myzfs/colin2
2007-09-11.15:40:35 zfs set compression=on myzfs/colin2
2007-09-11.15:40:48 zfs snapshot myzfs/colin@test
2007-09-11.15:40:59 zfs rollback myzfs/colin@test
2007-09-11.15:41:11 zfs clone myzfs/colin@test myzfs/colin3
2007-09-11.15:41:25 zfs destroy myzfs/colin2
2007-09-11.15:42:12 zfs promote myzfs/colin3
2007-09-11.15:42:26 zfs rename myzfs/colin3 myzfs/bob
2007-09-11.15:42:57 zfs destroy myzfs/colin
2007-09-11.15:43:23 zfs rename myzfs/bob@test myzfs/bob@newtest
2007-09-11.15:44:30 zfs receive myzfs/backup


Display the command history of all storage pools. This can be
limited to a single pool by specifying its name on the command
line. The history is only stored for existing pools. Once you’ve
destroyed the pool, you’ll no longer have access to it’s
history.



# zpool destroy -f myzfs
# zpool status -v
no pools available


Use the -f option to destroy a pool with files systems created.



Thanks to http://www.lildude.co.uk/2006/09/zfs-cheatsheet/

Tuning ZFS Checksums

Tuning ZFS Checksums

End-to-end checksumming is one of the great features of ZFS. It allows ZFS to detect and correct many kinds of errors other products can’t detect and correct. Disabling checksum is, of course, a very bad idea. Having file system level checksums enabled can alleviate the need to have application level checksums enabled. In this case, using the ZFS checksum becomes a performance enabler.

The checksums are computed asynchronously to most application processing and should normally not be an issue. However, each pool currently has a single thread computing the checksums (RFE below) and it’s possible for that computation to limit pool throughput. So, if disk count is very large (>> 10) or single CPU is weak (< Ghz), then this tuning might help. If a system is close to CPU saturated, the checksum computations might become noticeable. In those cases, do a run with checksums off to verify if checksum calculation is a problem. If you tune this parameter, please reference this URL in shell script or in an /etc/system comment. http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Checksums Verify the type of checksum used: zfs get checksum

Tuning is achieved dynamically by using:

zfs set checksum=off

And reverted:

zfs set checksum=’on | fletcher2 | fletcher4 | sha256′

Fletcher2 checksum (the default) has been observed to consume roughly 1Ghz of a CPU when checksumming 500 MByte per second.

RFEs

* single-threaded checksum & raidz2 parity calculations limit write bandwidth on thumper

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6533726

How do you remove a disk from a ZFS pool

Currently there is no easy way to permanently remove a disk from a ZFS storage pool without destroying the pool. This is something the developers have given a top priority to for the upcoming release of Solaris 11. What you have to do is destroy the pool and recreate the pool without the disk you wish to remove. This can be done three different ways.

1. Back up the data using traditional methods (tar, cpio), destroy and recreate the pool without the disk
2. Back up the data with sfz snaphot pool/@snap , zfs send pool/@snap to a backup device
3. Clone the pool

To get the removed disk back to UFS use:
# format -e

label
0
y
q

or
dd if=/dev/zero of=Ctag

hope this helps