Remote Replication of ZFS Data

Remote Replication of ZFS Data

You can use the zfs send and zfs recv commands to remotely copy a snapshot stream representation from one system to another system. For example:

# zfs send tank/cindy@today | ssh newsys zfs recv sandbox/restfs@today

This command saves the tank/cindy@today snapshot data and restores it into the sandbox/restfs file system and also creates a restfs@today snapshot on the newsys system. In this example, the user has been configured to use ssh on the remote system.

ZFS mount: cd .. or ls .. permission denied

Problem : When moving up from one ZFS mount point to a parent directory (also a zfs mount point) using the ".. " notation through cd or ls permission is denied, however cd and ls will access directories when explicitly naming the directory.

ZFS mounts store the . and .. entries separately for their mounted and unmounted states. So when the zfs mount is in it’s mounted state, the . and .. commands will work properly, however if they mount is in the unmounted state an access denied response will display for all non root users. This is because zfs . and .. entries are owned by root in the unmounted state by default, and permissions for them need to be set separately when the zfs mount is unmounted in order for other users to use those entries.

This particular issue will cause problems in situations such as applying patches to those mount directories for programs such as oracle.

ZFS Limiting the ARC Cache

Limiting the ARC Cache

The ARC is where ZFS caches data from all active storage pools. The ARC grows and consumes memory on the principle that no need exists to return data to the system while there is still plenty of free memory. When the ARC has grown and outside memory pressure exists, for example, when a new application starts up, then the ARC releases its hold on memory. ZFS is not designed to steal memory from applications. A few bumps appeared along the way, but the established mechanism works reasonably well for many situations and does not commonly warrant tuning.

However, a few situations stand out.

* If a future memory requirement is significantly large and well defined, then it can be advantageous to prevent ZFS from growing the ARC into it. So, if we know that a future application requires 20% of memory, it makes sense to cap the ARC such that it does not consume more than the remaining 80% of memory.

* If the application is a known consumer of large memory pages, then again limiting the ARC prevents ZFS from breaking up the pages and fragmenting the memory. Limiting the ARC preserves the availability of large pages.

* If dynamic reconfiguration of a memory board is needed (supported on certain platforms), then it is a requirement to prevent the ARC (and thus the kernel cage) to grow onto all boards.

For theses cases, it can be desirable to limit the ARC. This will, of course, also limit the amount of cached data and this can have adverse effects on performance. No easy way exists to foretell if limiting the ARC degrades performance.

If you tune this parameter, please reference this URL in shell script or in an /etc/system comment.

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#ARCSIZE
Solaris 10 8/07 and Solaris Nevada (snv_51) Releases

For example, if an application needs 5 Gbytes of memory on a system with 36-Gbytes of memory, you could set the arc maximum to 30 Gbytes (0x780000000).

Set the following parameter in the /etc/system file:

set zfs:zfs_arc_max = 0x780000000

Earlier Solaris Releases

You can only change the ARC maximum size by using the mdb command. Because the system is already booted, the ARC init routine has already executed and other ARC size parameters have already been set based on the default c_max size. Therefore, you should tune the arc.c and arc.p values, along with arc.c_max, using the formula:

arc.c = arc.c_max

arc.p = arc.c / 2

For example, to the set the ARC parameters to small values, such as arc_c_max to 512MB, and complying with the formula above (arc.c_max to 512MB, and arc.p to 256MB), use the following syntax:

# mdb -kw
> arc::print -a p c c_max
ffffffffc00b3260 p = 0xb75e46ff
ffffffffc00b3268 c = 0x11f51f570
ffffffffc00b3278 c_max = 0x3bb708000

> ffffffffc00b3260/Z 0x10000000
ffffffffc00b3260: 0xb75e46ff = 0x10000000
> ffffffffc00b3268/Z 0x20000000
ffffffffc00b3268: 0x11f51f570 = 0x20000000
> ffffffffc00b3278/Z 0x20000000
ffffffffc00b3278: 0x11f51f570 = 0x20000000

You should verify the values have been set correctly by examining them again in mdb (using the same print command in the example). You can also monitor the actual size of the ARC to ensure it has not exceeded:

# echo "arc::print -d size" | mdb -k

The above command displays the current ARC size in decimal.

Here is a perl script that you can call from an init script to configure your ARC on boot with the above guidelines:

#!/bin/perl

use strict;
my $arc_max = shift @ARGV;
if ( !defined($arc_max) ) {
print STDERR "usage: arc_tune \n";
exit -1;
}
$| = 1;
use IPC::Open2;
my %syms;
my $mdb = "/usr/bin/mdb";
open2(*READ, *WRITE, "$mdb -kw") || die "cannot execute mdb";
print WRITE "arc::print -a\n";
while() {
my $line = $_;

if ( $line =~ /^ +([a-f0-9]+) (.*) =/ ) {
$syms{$2} = $1;
} elsif ( $line =~ /^\}/ ) {
last;
}
}
# set c & c_max to our max; set p to max/2
printf WRITE "%s/Z 0x%x\n", $syms{p}, ( $arc_max / 2 );
print scalar ;
printf WRITE "%s/Z 0x%x\n", $syms{c}, $arc_max;
print scalar ;
printf WRITE "%s/Z 0x%x\n", $syms{c_max}, $arc_max;
print scalar ;

RFEs

* ZFS should avoiding growing the ARC into trouble

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6488341

* The ARC allocates memory inside the kernel cage, preventing DR

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6522017

* ZFS/ARC should cleanup more after itself

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6424665

* Each zpool needs to monitor it’s throughput and throttle heavy writers

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6429205

Further Reading

http://blogs.sun.com/roch/entry/does_zfs_really_use_more

ZFS File-Level Prefetching

File-Level Prefetching

ZFS implements a file-level prefetching mechanism labeled zfetch. This mechanism looks at the patterns of reads to files, and anticipates on some reads, reducing application wait times. The current code needs attention (RFE below) and suffers from 2 drawbacks:

* Sequential read patterns made of small reads very often hit in the cache. In this case, the current code consumes a significant amount of CPU time trying to find the next I/O to issue, whereas performance is governed more by the CPU availability.

* The zfetch code has been observed to limit scalability of some loads.

So, if CPU profiling, by using lockstat(1M) with -I argument or er_kernel as described here:

http://developers.sun.com/prodtech/cc/articles/perftools.html

shows significant time in zfetch_* functions, or if lock profiling (lockstat(1m)) shows contention around zfetch locks, then disabling file level prefetching should be considered.

Disabling prefetching can be achieved dynamically or through a setting in the /etc/system file.

If you tune this parameter, please reference this URL in shell script or in an /etc/system comment.

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#ZFETCH
Solaris 10 8/07 and Solaris Nevada (snv_51) Releases

Set dynamically:

echo zfetch_prefetch_disable/W0t1 | mdb -kw

Revert to default:

echo zfetch_prefetch_disable/W0t0 | mdb -kw

Set the following parameter in the /etc/system file:

set zfs:zfetch_prefetch_disable = 1

Earlier Solaris Releases

Set dynamically:

echo zfetch_array_rd_sz/Z0x0 | mdb -kw

Revert to default:

echo zfetch_array_rd_sz/Z0x100000 | mdb -kw

Set the following parameter in the /etc/system file:

set zfs:zfetch_array_rd_sz = 0

RFEs

* 6412053 zfetch needs some love

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6412053

* 6579975 dnode_new_blkid should first check as RW_READER

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6579975

ZFS Device I/O Queue Size (I/O Concurrency)

Device I/O Queue Size (I/O Concurrency)

ZFS controls the I/O queue depth for a given LUN. The default is 35, which allows common SCSI and SATA disks to reach their maximum throughput under ZFS. However, having 35 concurrent I/Os means that the service times can be inflated. For NVRAM-based storage, it is not expected that this 35-deep queue is reached nor plays a significant role. Tuning this parameter for NVRAM-based storage is expected to be ineffective. For JBOD-type storage, tuning this parameter is expected to help response times at the expense of raw streaming throughput.

The Solaris Nevada release now has the option of storing the ZIL on separate devices from the main pool. Using separate intent log devices can alleviate the need to tune this parameter for loads that are synchronously write intensive.

If you tune this parmeter, please reference this URL in shell script or in an /etc/system comment.

http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#MAXPEND

Tuning is not expected to be effective for NVRAM-based storage arrays.
[edit] Solaris 10 8/07 and Solaris Nevada (snv_53 to snv_69) Releases

Set dynamically:

echo zfs_vdev_max_pending/W0t10 | mdb -kw

Revert to default:

echo zfs_vdev_max_pending/W0t35 | mdb -kw

Set the following parameter in the /etc/system file:

set zfs:zfs_vdev_max_pending = 10

For earlier Solaris releases, see:

http://blogs.sun.com/roch/entry/tuning_the_knobs
RFEs

* 6471212 need reserved I/O scheduler slots to improve I/O latency of critical ops

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6471212
Further Reading

http://blogs.sun.com/perrin/entry/slog_blog_or_blogging_on

ZFS Device-Level Prefetching

Device-Level Prefetching

ZFS does a device-level prefetching in addition to file-level prefetching. When ZFS reads a block from a disk, it inflates the I/O size, hoping to pull interesting data or metadata from the disk. Prior to the Solaris Nevada (snv_70) release, the code has caused problems for system with lots of disks because the extra prefetched data can cause congestion on the channel between the storage and the host. Tuning down the prefetching has been effective for OLTP type loads in the past. However, in the Solaris Nevada release, the code is now only prefetching metadata and this is not expected to require any tuning.

No tuning is required for snv_70 and after.
Solaris 10 8/07 and Nevada (snv_53 to snv_69) Releases

Set the following parameter in the /etc/system file:

set zfs:zfs_vdev_cache_bshift = 13

/* Comments
/* Setting zfs_vdev_cache_bshift with mdb crashes a system.
/* zfs_vdev_cache_bshift is the base 2 logarithm of the size used to read disks.
/* The default value of 16 means reads are issued in size of 1 << 16 = 64K. /* A value of 13 means disk reads are padded to 8K. For earlier releases, see: http://blogs.sun.com/roch/entry/tuning_the_knobs RFEs * vdev_cache wises up: increase DB performance by 16% (integrated in snv_70) http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6437054 [edit] Further Reading http://blogs.sun.com/erickustarz/entry/vdev_cache_improvements_to_help

ZFS Cache Flushes

ZFS Cache Flushes

ZFS is designed to work with storage devices that manage a disk-level cache. ZFS commonly asks the storage device to ensure that data is safely placed on stable storage by requesting a cache flush. For JBOD storage, this works as designed and without problems. For many NVRAM-based storage arrays, a problem might come up if the array takes the cache flush request and actually does something rather than ignoring it. Some storage will flush their caches despite the fact that the NVRAM protection makes those caches as good as stable storage.

ZFS issues infrequent flushes (every 5 second or so) after the uberblock updates. The problem here is fairly inconsequential. No tuning is warranted here.

ZFS also issues a flush every time an application requests a synchronous write (O_DSYNC, fsync, NFS commit, and so on). The completion of this type of flush is waited upon by the application and impacts performance. Greatly so, in fact. From a performance standpoint, this neutralizes the benefits of having an NVRAM-based storage.

The upcoming fix is that the flush request semantic will be qualified to instruct storage devices to ignore the requests if they have the proper protection. This change requires a fix to our disk drivers and for the storage to support the updated semantics.

Since ZFS is not aware of the nature of the storage and if NVRAM is present, the best way to fix this issue is to tell the storage to ignore the requests. For more information, see:

http://blogs.digitar.com/jjww/?itemid=44.

Please check with your storage vendor for ways to achieve the same thing.

As a last resort, when all LUNs exposed to ZFS come from NVRAM-protected storage array and procedures ensure that no unprotected LUNs will be added in the future, ZFS can be tuned to not issue the flush requests. If some LUNs exposed to ZFS are not protected by NVRAM, then this tuning can lead to data loss, application level corruption, or even pool corruption.

NOTE: Cache flushing is commonly done as part of the ZIL operations. While disabling cache flushing can, at times, make sense, disabling the ZIL does not.

Solaris 10 11/06 and Solaris Nevada (snv_52) Releases

Set dynamically:

echo zfs_nocacheflush/W0t1 | mdb -kw

Revert to default:

echo zfs_nocacheflush/W0t0 | mdb -kw

Set the following parameter in the /etc/system file:

set zfs:zfs_nocacheflush = 1

Risk: Some storage might revert to working like a JBOD disk when their battery is low, for instance. Disabling the caches can have adverse effects here. Check with your storage vendor.

Earlier Solaris Releases

Set the following parameter in the /etc/system file:

set zfs:zil_noflush = 1

Set dynamically:

echo zil_noflush/W0t1 | mdb -kw

Revert to default:

echo zil_noflush/W0t0 | mdb -kw

Risk: Some storage might revert to working like a JBOD disk when their battery is low, for instance. Disabling the caches can have adverse effects here. Check with your storage vendor.

RFEs

* sd driver should set SYNC_NV bit when issuing SYNCHRONIZE CACHE to SBC-2 devices

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6462690

* zil shouldn’t send write-cache-flush command …

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6460889

ZFS Command cheatsheet















































































































































































What You Do and See



Why



$ man zpool
$ man zfs


Get familiar with command structure and options



$ su
Password:
# cd /
# mkfile 100m disk1 disk2 disk3 disk5
# mkfile 50m disk4
# ls -l disk*
-rw——T 1 root root 104857600 Sep 11 12:15 disk1
-rw——T 1 root root 104857600 Sep 11 12:15 disk2
-rw——T 1 root root 104857600 Sep 11 12:15 disk3
-rw——T 1 root root 52428800 Sep 11 12:15 disk4
-rw——T 1 root root 104857600 Sep 11 12:15 disk5


Create some “virtual devices” or vdevs as described
in the zpool documentation. These can also be real disk slices if
you have them available.



# zpool create myzfs /disk1 /disk2
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
myzfs 191M 94K 191M 0% ONLINE –


Create a storage pool and check the size and usage.



# zpool status -v
pool: myzfs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk2 ONLINE 0 0 0

errors: No known data errors



Get more detailed status of the zfs storage pool.



# zpool destroy myzfs
# zpool list
no pools available


Destroy a zfs storage pool



# zpool create myzfs mirror /disk1 /disk4
invalid vdev specification
use ‘-f’ to override the following errors:
mirror contains devices of different sizes


Attempt to create a zfs pool with different size vdevs fails.
Using -f options forces it to occur but only uses space allowed by
smallest device.



# zpool create myzfs mirror /disk1 /disk2 /disk3
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
myzfs 95.5M 112K 95.4M 0% ONLINE –
# zpool status -v
pool: myzfs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk2 ONLINE 0 0 0
/disk3 ONLINE 0 0 0

errors: No known data errors



Create a mirrored storage pool. In this case, a 3 way mirrored
storage pool.



# zpool detach myzfs /disk3
# zpool status -v
pool: myzfs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk2 ONLINE 0 0 0

errors: No known data errors



Detach a device from a mirrored pool.



# zpool attach myzfs /disk1 /disk3
# zpool status -v
pool: myzfs
state: ONLINE
scrub: resilver completed with 0 errors on Tue Sep 11 13:31:49 2007
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk2 ONLINE 0 0 0
/disk3 ONLINE 0 0 0

errors: No known data errors



Attach device to pool. This creates a two-way mirror is the
pool is not already a mirror, else it adds another mirror, in this
case making it a 3 way mirror.



# zpool remove myzfs /disk3
cannot remove /disk3: only inactive hot spares can be removed
# zpool detach myzfs /disk3


Attempt to remove a device from a pool. In this case it’s
a mirror, so we must use “zpool detach”.



# zpool add myzfs spare /disk3
# zpool status -v
pool: myzfs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk2 ONLINE 0 0 0
spares
/disk3 AVAIL

errors: No known data errors



Add a hot spare to a storage pool.



# zpool remove myzfs /disk3
# zpool status -v
pool: myzfs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk2 ONLINE 0 0 0

errors: No known data errors



Remove a hot spare from a pool.



# zpool offline myzfs /disk1
# zpool status -v
pool: myzfs
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning
in a degraded state.
action: Online the device using ‘zpool online’ or replace the device
with ‘zpool replace’.
scrub: resilver completed with 0 errors on Tue Sep 11 13:39:25 2007
config:

NAME STATE READ WRITE CKSUM
myzfs DEGRADED 0 0 0
mirror DEGRADED 0 0 0
/disk1 OFFLINE 0 0 0
/disk2 ONLINE 0 0 0

errors: No known data errors



Take the specified device offline. No attempt to read or write
to the device will take place until it’s brought back
online. Use the -t option to temporarily offline a device. A
reboot will bring the device back online.



# zpool online myzfs /disk1
# zpool status -v
pool: myzfs
state: ONLINE
scrub: resilver completed with 0 errors on Tue Sep 11 13:47:14 2007
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk2 ONLINE 0 0 0

errors: No known data errors



Bring the specified device online.



# zpool replace myzfs /disk1 /disk3
# zpool status -v
pool: myzfs
state: ONLINE
scrub: resilver completed with 0 errors on Tue Sep 11 13:25:48 2007
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk3 ONLINE 0 0 0
/disk2 ONLINE 0 0 0

errors: No known data errors



Replace a disk in a pool with another disk, for example when a
disk fails



# zpool scrub myzfs


Perform a scrub of the storage pool to verify that it checksums
correctly. On mirror or raidz pools, ZFS will automatically repair
any damage.
WARNING: scrubbing is I/O
intensive.



# zpool export myzfs
# zpool list
no pools available


Export a pool from the system for importing on another system.



# zpool import -d / myzfs
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
myzfs 95.5M 114K 95.4M 0% ONLINE –


Import a previously exported storage pool. If -d is not
specified, this command searches /dev/dsk. As we’re using
files in this example, we need to specify the directory of the
files used by the storage pool.



# zpool upgrade
This system is currently running ZFS pool version 8.

All pools are formatted using this version.
# zpool upgrade -v
This system is currently running ZFS pool version 8.

The following versions are supported:

VER DESCRIPTION
— ——————————————————–
1 Initial ZFS version
2 Ditto blocks (replicated metadata)
3 Hot spares and double parity RAID-Z
4 zpool history
5 Compression using the gzip algorithm
6 pool properties
7 Separate intent log devices
8 Delegated administration
For more information on a particular version, including supported
releases, see:

http://www.opensolaris.org/os/community/zfs/version/N

Where ‘N’ is the version number.



Display pools format version. The -v flag shows the features
supported by the current version. Use the -a flag to upgrade all
pools to the latest on-disk version. Pools that are upgraded will
no longer be accessible to any systems running older versions.



# zpool iostat 5
capacity operations bandwidth
pool used avail read write read write
———- —– —– —– —– —– —–
myzfs 112K 95.4M 0 4 26 11.4K
myzfs 112K 95.4M 0 0 0 0
myzfs 112K 95.4M 0 0 0 0


Get I/O statistics for the pool



# zfs create myzfs/colin
# df -h
Filesystem kbytes used avail capacity Mounted on

myzfs/colin 64M 18K 63M 1% /myzfs/colin


Create a file system and check it with standard df -h command.
File systems are automatically mounted by default under the /zfs
location. See the Mountpoints section of the zfs man page for more
details.



# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 139K 63.4M 19K /myzfs
myzfs/colin 18K 63.4M 18K /myzfs/colin


List current zfs file systems.



# zpool add myzfs /disk1
invalid vdev specification
use ‘-f’ to override the following errors:
mismatched replication level: pool uses mirror and new vdev is file


Attempt to add a single vdev to a mirrored set fails



# zpool add myzfs mirror /disk1 /disk5
# zpool status -v
pool: myzfs
state: ONLINE
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
myzfs ONLINE 0 0 0
mirror ONLINE 0 0 0
//disk3 ONLINE 0 0 0
//disk2 ONLINE 0 0 0
mirror ONLINE 0 0 0
/disk1 ONLINE 0 0 0
/disk5 ONLINE 0 0 0

errors: No known data errors



Add a mirrored set of vdevs



# zfs create myzfs/colin2
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 172K 159M 21K /myzfs
myzfs/colin 18K 159M 18K /myzfs/colin
myzfs/colin2 18K 159M 18K /myzfs/colin2


Create a second file system. Note that both file system show
159M available because no quotas are set. Each “could”
grow to fill the pool.



# zfs set reservation=20m myzfs/colin
# zfs list -o reservation
RESERV
none
20M
none


Reserve a specified amount of space for a file system ensuring
that other users don’t take up all the space.



# zfs set quota=20m myzfs/colin2
# zfs list -o quota myzfs/colin myzfs/colin2
QUOTA
none
20M


Set and view quotas



# zfs set compression=on myzfs/colin2
# zfs list -o compression
COMPRESS
off
off
on


Turn on and verify compression



# zfs snapshot myzfs/colin@test
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 20.2M 139M 21K /myzfs
myzfs/colin 18K 159M 18K /myzfs/colin
myzfs/colin@test 0 – 18K –
myzfs/colin2 18K 20.0M 18K /myzfs/colin2


Create a snapshot called test.



# zfs rollback myzfs/colin@test


Rollback to a snapshot.



# zfs clone myzfs/colin@test myzfs/colin3
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 20.2M 139M 21K /myzfs
myzfs/colin 18K 159M 18K /myzfs/colin
myzfs/colin@test 0 – 18K –
myzfs/colin2 18K 20.0M 18K /myzfs/colin2
myzfs/colin3 0 139M 18K /myzfs/colin3


A snapshot is not directly addressable. A clone must be made.
The target dataset can be located anywhere in the ZFS hierarchy,
and will be created as the same type as the original.



# zfs destroy myzfs/colin2
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 20.1M 139M 22K /myzfs
myzfs/colin 18K 159M 18K /myzfs/colin
myzfs/colin@test 0 – 18K –
myzfs/colin3 0 139M 18K /myzfs/colin3


Destroy a filesystem



# zfs destroy myzfs/colin
cannot destroy ‘myzfs/colin’: filesystem has children
use ‘-r’ to destroy the following datasets:
myzfs/colin@test


Attempt to destroy a filesystem that had a child. In this case,
the snapshot filesystem. We must either remove the snapshot, or
make a clone and promote the clone.



# zfs promote myzfs/colin3
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 20.1M 139M 21K /myzfs
myzfs/colin 0 159M 18K /myzfs/colin
myzfs/colin3 18K 139M 18K /myzfs/colin3
myzfs/colin3@test 0 – 18K –
# zfs destroy myzfs/colin
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 147K 159M 21K /myzfs
myzfs/colin3 18K 159M 18K /myzfs/colin3
myzfs/colin3@test 0 – 18K –


Promte a clone filesystem to no longer be a dependent on it’s
“origin” snapshot. This now associates makes the
snapshot a child of the cloned filesystem. We can then delete the
original filesystem.



# zfs rename myzfs/colin3 myzfs/bob
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 153K 159M 21K /myzfs
myzfs/bob 18K 159M 18K /myzfs/bob
myzfs/bob@test 0 – 18K –
# zfs rename myzfs/bob@test myzfs/bob@newtest
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 146K 159M 20K /myzfs
myzfs/bob 18K 159M 18K /myzfs/bob
myzfs/bob@newtest 0 – 18K –


Rename a filesystem, and separately rename the snapshot.



# zfs get all
NAME PROPERTY VALUE SOURCE
myzfs type filesystem –
myzfs creation Tue Sep 11 14:21 2007 –
myzfs used 146K –
myzfs available 159M –
myzfs referenced 20K –
[…]


Display properties for the given datasets. This can be refined
further using options.



# zpool destroy myzfs
cannot destroy ‘myzfs’: pool is not empty
use ‘-f’ to force destruction anyway


Can’t destroy a pool with active filesystems.



# zfs unmount myzfs/bob
# df -h
myzfs 159M 20K 159M 1% /myzfs


Unmount a ZFS file system



# zfs mount myzfs/bob
# df -h
myzfs 159M 20K 159M 1% /myzfs
myzfs/bob 159M 18K 159M 1% /myzfs/bob


Mount a ZFS filesystem. This is usually automatically done on
boot.



# zfs send myzfs/bob@newtest | ssh localhost zfs receive myzfs/backup
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
myzfs 172K 159M 20K /myzfs
myzfs/backup 18K 159M 18K /myzfs/backup
myzfs/backup@newtest 0 – 18K –
myzfs/bob 18K 159M 18K /myzfs/bob
myzfs/bob@newtest 0 – 18K –


Create a stream representation of the snapshot and redirect it
to zfs receive. In this example I’ve redirected to the
localhost for illustration purposes. This can be used to backup to
a remote host, or even to a local file.



# zpool history
History for ‘myzfs’:
2007-09-11.15:35:50 zpool create myzfs mirror /disk1 /disk2 /disk3
2007-09-11.15:36:00 zpool detach myzfs /disk3
2007-09-11.15:36:10 zpool attach myzfs /disk1 /disk3
2007-09-11.15:36:53 zpool detach myzfs /disk3
2007-09-11.15:36:59 zpool add myzfs spare /disk3
2007-09-11.15:37:09 zpool remove myzfs /disk3
2007-09-11.15:37:18 zpool offline myzfs /disk1
2007-09-11.15:37:27 zpool online myzfs /disk1
2007-09-11.15:37:37 zpool replace myzfs /disk1 /disk3
2007-09-11.15:37:47 zpool scrub myzfs
2007-09-11.15:37:57 zpool export myzfs
2007-09-11.15:38:05 zpool import -d / myzfs
2007-09-11.15:38:52 zfs create myzfs/colin
2007-09-11.15:39:27 zpool add myzfs mirror /disk1 /disk5
2007-09-11.15:39:38 zfs create myzfs/colin2
2007-09-11.15:39:50 zfs set reservation=20m myzfs/colin
2007-09-11.15:40:18 zfs set quota=20m myzfs/colin2
2007-09-11.15:40:35 zfs set compression=on myzfs/colin2
2007-09-11.15:40:48 zfs snapshot myzfs/colin@test
2007-09-11.15:40:59 zfs rollback myzfs/colin@test
2007-09-11.15:41:11 zfs clone myzfs/colin@test myzfs/colin3
2007-09-11.15:41:25 zfs destroy myzfs/colin2
2007-09-11.15:42:12 zfs promote myzfs/colin3
2007-09-11.15:42:26 zfs rename myzfs/colin3 myzfs/bob
2007-09-11.15:42:57 zfs destroy myzfs/colin
2007-09-11.15:43:23 zfs rename myzfs/bob@test myzfs/bob@newtest
2007-09-11.15:44:30 zfs receive myzfs/backup


Display the command history of all storage pools. This can be
limited to a single pool by specifying its name on the command
line. The history is only stored for existing pools. Once you’ve
destroyed the pool, you’ll no longer have access to it’s
history.



# zpool destroy -f myzfs
# zpool status -v
no pools available


Use the -f option to destroy a pool with files systems created.



Thanks to http://www.lildude.co.uk/2006/09/zfs-cheatsheet/

Tuning ZFS Checksums

Tuning ZFS Checksums

End-to-end checksumming is one of the great features of ZFS. It allows ZFS to detect and correct many kinds of errors other products can’t detect and correct. Disabling checksum is, of course, a very bad idea. Having file system level checksums enabled can alleviate the need to have application level checksums enabled. In this case, using the ZFS checksum becomes a performance enabler.

The checksums are computed asynchronously to most application processing and should normally not be an issue. However, each pool currently has a single thread computing the checksums (RFE below) and it’s possible for that computation to limit pool throughput. So, if disk count is very large (>> 10) or single CPU is weak (< Ghz), then this tuning might help. If a system is close to CPU saturated, the checksum computations might become noticeable. In those cases, do a run with checksums off to verify if checksum calculation is a problem. If you tune this parameter, please reference this URL in shell script or in an /etc/system comment. http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Checksums Verify the type of checksum used: zfs get checksum

Tuning is achieved dynamically by using:

zfs set checksum=off

And reverted:

zfs set checksum=’on | fletcher2 | fletcher4 | sha256′

Fletcher2 checksum (the default) has been observed to consume roughly 1Ghz of a CPU when checksumming 500 MByte per second.

RFEs

* single-threaded checksum & raidz2 parity calculations limit write bandwidth on thumper

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6533726

How do you remove a disk from a ZFS pool

Currently there is no easy way to permanently remove a disk from a ZFS storage pool without destroying the pool. This is something the developers have given a top priority to for the upcoming release of Solaris 11. What you have to do is destroy the pool and recreate the pool without the disk you wish to remove. This can be done three different ways.

1. Back up the data using traditional methods (tar, cpio), destroy and recreate the pool without the disk
2. Back up the data with sfz snaphot pool/@snap , zfs send pool/@snap to a backup device
3. Clone the pool

To get the removed disk back to UFS use:
# format -e

label
0
y
q

or
dd if=/dev/zero of=Ctag

hope this helps