(This post is for Jerry. He will know when he reads it)
I have been a great supporter of many flavours of virtualisation and my earliest experience with Xen goes back to Oracle VM 2 which was based on RHEL 4 and an early version of Xen. Why am I saying this? Because Xen is (was?) simple and elegant. Especially for building RAC systems: paravirtualised Linux was all you needed, and a dual-core machine: Xen is very lightweight even though recent achievements in processor architecture (nested page tables, single root IO virtualisation, others) make it more desirable to use hardware virtualisation with paravirtualised drivers. This is what this post is about!
Shared storage in Xen
As you know you need shared block devices for RAC for voting disks, OCR, data files, redo logs, the lot. In Xen that’s as straight forward as it gets:
I am proud to be able to speak at the first instalment of the Availability, infrastructure and management SIG on March 14th in the London City office.
The event is announced on the UKOUG website here:
http://www.ukoug.org/events/ukoug-availability-infrastructure-and-management-sig-meeting/
Unfortunately I will be between you and lunch! I hope that works out, and I don’t overrun.
While I was performing a three day seminar recently in Switzerland I came across this new option in cluvfy.
Normally you’d run cluvfy in preparation of the installation of Grid Infrastructure or a set of RAC binaries to ensure everything is ready for the next step in the RAC install process. Beginning with 11.2.0.3, there is another option that’s been sneaked in without too much advertisement: the healthcheck.
Part of the “comp” checks, it takes the following options:
cluvfy comp healthcheck [-collect {cluster|database}] [-db db_unique_name] [-bestpractice|-mandatory] [-deviations] [-html] [-save [-savedir directory_path]
The most extensive report is run without any options, as shown in the appendix (the output is too long to display at this stage of the post) You have the following options:
After a long time and lots of problems I decided to abandon openSuSE 11.4 and its xen implementation in favour of the PVOPS kernel and a different distribution.
It’s been difficult to choose the correct one for me, for now I’m working with Ubuntu 11.10. One reason is that it’s said to be user friendly, and highly customisable. It comes with all the right ingredients for running different hypervisors, including my favourite: xen.
Important update! See “Security” below.
Background on Xen
For those who don’t know the story behind xen, here’s a short summary (errors and omissions are not intentional!)
I have made a little mistake creating a RAC database for the OEM 12c repository-I now need a little more lightweight solution, especially since I’m going to do some fancy failover testing with this cluster soon! An 11.2.0.3 single instance database without ASM, that’s what I’ll have!
Now how to move the repository database? I have to admit I haven’t done this before, so the plan I came up with is:
Sounds simple enough, and it actually was! To add a little fun to it I decided to the use a NFS volume to backup to. My new database host is called oem12db, and it’s running Oracle 11.2.0.3 64bit on Oracle Linux 6.1 with UEK. I created the NFS export using the following entry in /etc/exports:
As part of a server move from one data centre to another I enjoyed working in the depths of Clusterware. This one has been a rather simple case though: the public IP addresses were the only part of the package to change: simple. One caveat though was the recreation of the OCR disk group I am using for the OCR and 3 copies of the voting file. I decided to reply on the backups I took before the server move.
Once the kit has been rewired in the new data centre, it was time to get active. The /etc/multipath.conf file had to be touched to add the new LUNs for my +OCR disk group. I have described the processes in a number of articles, for example here:
http://martincarstenbach.wordpress.com/2011/01/14/adding-storage-dynamic...
A few facts before we start:
I have been closely involved in the upgrade discussion of my current customer’s Enterprise Managers setup from an engineering point of view. The client uses OEM extensively for monitoring, alerts generated by it are automatically forwarded to an IBM product called Netcool.
Now some of the management servers are still on 10.2.0.5 in certain regions, and for a private cloud project I was involved in an 11.1 system was needed.The big question was: wait for 12.1 or upgrade to 11.1?
As I have hinted at during my last post about installing Oracle 11.2.0.3 on Oracle Linux 6.1 with Kernel UEK, I have planned another article about adding a node to a cluster.
I deliberately started the installation of my RAC system with only one node to allow my moderately spec’d hardware to deal with a second cluster node. In previous versions of Oracle there was a problem with node additions: the $GRID_HOME/oui/bin/addNode.sh script did pre-requisite checks that used to fail when you had used ASMLib. Unfortuntely, due to my setup I couldn’t test if that was solved (I didn’t use ASMLib).
Cluvfy
Installing Grid Infrastructure 11.2.0.3 on Oracle Linux 6.1
Yesterday was the big day, or the day Oracle release 11.2.0.3 for Linux x86 and x86-64. Time to download and experiment! The following assumes you have already configured RAC 11g Release 2 before, it’s not a step by step guide how to do this. I expect those to shoot out of the grass like mushrooms in the next few days, especially since the weekend allows people to do the same I did!
The Operating System
I have prepared a xen domU for 11.2.0.3, using the latest Oracle Linux 6.1 build I could find. In summary, I am using the following settings:
Configuring Oracle Linux 6.1
Installation of the operating environment is beyond the scope of this article, and it hasn’t really changed much since 5.x. All I did was to install the database server package group. I wrote this article for fans of xen-based para-virtualisation. Although initially for 6.0, it applies equally for 6.1. Here’s the xen native domU description (you can easily convert that to xenstore format using libvirt):
# cat node1.cfg name="rac11_2_0_3_ol61_node1" memory=4096 maxmem=8192 vcpus=4 on_poweroff="destroy" on_reboot="restart" on_crash="destroy" localtime=0 builder="linux" bootargs="" extra=" " disk=[ 'file:/var/lib/xen/images/rac11_2_0_3_ol61_node1/disk0,xvda,w', 'file:/var/lib/xen/images/rac11_2_0_3_ol61_node1/oracle,xvdb,w', 'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/ocr1,xvdc,w!', 'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/ocr2,xvdd,w!', 'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/ocr3,xvde,w!', 'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/data1,xvdf,w!', 'file:/var/lib/xen/images/rac11_2_0_3_ol61_shared/fra1,xvdg,w!' ] vif=[ 'mac=00:16:1e:2b:1d:ef,bridge=br1', 'mac=00:16:1e:2b:1a:e1,bridge=br2', 'mac=00:16:1e:2a:1d:1f,bridge=br3', ] bootloader = "pygrub"
Use the “xm create node1.cfg” command to start the domU. After the OS was ready I installed the following additional software to satisfy the installation requirements:
This is easiest done via yum and the public YUM server Oracle provides. It also has instructions on how to set your repository up.
# yum install compat-libcap1 compat-libstdc++-33 libstdc++-devel gcc-c++ ksh libaio-devel
On the first node only I wanted a VNC-like interface for a graphical installation. The older package vnc-server I loved from 5.x isn’t available anymore, the package you need is now called tigervnc-server. It also requires a new viewer to be downloaded from sourceforge. On the first node you might want to install these, unless you are brave enough to use a silent installation:
Ensure that SELinux and the IPTables packages are turned off. SELinux is still configured in /etc/sysconfig/selinux, where the setting has to be permissive at least. You can use “chkconfig iptables off” to disable the firewall service at boot. Check that there are no filter rules using “iptables -L”.
I created the oracle account using these usual steps-this hasn’t change since 11.2.0.2.
A few changes to /etc/sysctl.were needed; you can copy and paste the below example and append it to your existing settings. Ensure to up the limits where you have more resources!
kernel.shmall = 4294967296 kernel.shmmni = 4096 kernel.sem = 250 32000 100 128 fs.file-max = 6815744 net.core.rmem_default = 262144 net.core.wmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_max = 1048576 fs.aio-max-nr = 1048576 net.ipv4.ip_local_port_range = 9000 65500 net.ipv4.conf.eth1.rp_filter = 0 net.ipv4.conf.eth2.rp_filter = 0
Also ensure that you change the rp_filter for your private interconnect to 0 (or 2)-my devices are eth1 and eth2. This is a new requirement for reverse path filtering introduced with 11.2.0.3.
ASM “disks” must be owned by the GRID owner. The easiest way to change the permissions of the ASM disks is to create a new set of udev rules, such as the following:
# cat 61-asm.rules KERNEL=="xvd[cdefg]1", OWNER="oracle", GROUP="asmdba" MODE="0660"
After a quick “start_udev” as root these were applied.
Note that as per my domU config file I actually know the device names are persistent, so it was easy to come up with this solution. In real life you would use the dm-multipath package which allows setting the owner,group and permission now in /etc/multipath.conf for every ASM LUN.
There was an interesting problem initially in that kfod seemed to trigger a change of permissions back to root:disk whenever it ran. Changing the ownership back to oracle only lasted until the next execution of kfod. The only fix I could come up with involved the udev rules.
Good news for those who suffered from the multicast problem introduced in 11.2.0.2-cluvfy now knows about it and checks during the post hwos stage (I had already installed cvuqdisk):
[oracle@rac11203node1 grid]$ ./runcluvfy.sh stage -post hwos -n rac11203node1 Performing post-checks for hardware and operating system setup Checking node reachability... Node reachability check passed from node "rac11203node1" Checking user equivalence... User equivalence check passed for user "oracle" Checking node connectivity... Checking hosts config file... Verification of the hosts config file successful Node connectivity passed for subnet "192.168.99.0" with node(s) rac11203node1 TCP connectivity check passed for subnet "192.168.99.0" Node connectivity passed for subnet "192.168.100.0" with node(s) rac11203node1 TCP connectivity check passed for subnet "192.168.100.0" Node connectivity passed for subnet "192.168.101.0" with node(s) rac11203node1 TCP connectivity check passed for subnet "192.168.101.0" Interfaces found on subnet "192.168.99.0" that are likely candidates for VIP are: rac11203node1 eth0:192.168.99.129 Interfaces found on subnet "192.168.100.0" that are likely candidates for a private interconnect are: rac11203node1 eth1:192.168.100.129 Interfaces found on subnet "192.168.101.0" that are likely candidates for a private interconnect are: rac11203node1 eth2:192.168.101.129 Node connectivity check passed Checking multicast communication... Checking subnet "192.168.99.0" for multicast communication with multicast group "230.0.1.0"... Check of subnet "192.168.99.0" for multicast communication with multicast group "230.0.1.0" passed. Checking subnet "192.168.100.0" for multicast communication with multicast group "230.0.1.0"... Check of subnet "192.168.100.0" for multicast communication with multicast group "230.0.1.0" passed. Checking subnet "192.168.101.0" for multicast communication with multicast group "230.0.1.0"... Check of subnet "192.168.101.0" for multicast communication with multicast group "230.0.1.0" passed. Check of multicast communication passed. Check for multiple users with UID value 0 passed Time zone consistency check passed Checking shared storage accessibility... Disk Sharing Nodes (1 in count) ------------------------------------ ------------------------ /dev/xvda rac11203node1 /dev/xvdb rac11203node1 /dev/xvdc rac11203node1 /dev/xvdd rac11203node1 /dev/xvde rac11203node1 /dev/xvdf rac11203node1 /dev/xvdg rac11203node1 Shared storage check was successful on nodes "rac11203node1" Post-check for hardware and operating system setup was successful.
As always, I tried to fix as many problems before invoking runInstaller as possible. The “-fixup” option to runcluvfy is again very useful. I strongly recommend running the fixup script prior to executing the OUI binary.
The old trick to remove /etc/ntp.conf causes the NTP check to complete ok, in which case you are getting the ctsd service for time synchronisation. You should not do this in production-consistent times in the cluster are paramount!
I encountered an issue with the check for free space later in the installation during my first attemps. OUI wants 7.5G for GRID_HOME, even though the installation “only” took around 3 in the end. I exported TMP and TEMP to point to my 10G mount point to avoid this warning:
$ export TEMP=/u01/crs/temp $ export TMP=/u01/crs/temp $ ./runInstaller
The installation procedure for Grid Infrastructure 11.2.0.3 is almost exactly the same as for 11.2.0.2, except for the option to change the AU size for the initial disk group you create:
Once you have completed the wizard, it’s time to hit the “install” button. The magic again happens in the root.sh file, or rootupgrade.sh if you are upgrading. I included the root.sh output so you have something to compare against:
Performing root user operation for Oracle 11g The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /u01/crs/11.2.0.3 Enter the full pathname of the local bin directory: [/usr/local/bin]: Creating y directory... Copying dbhome to y ... Copying oraenv to y ... Copying coraenv to y ... Creating /etc/oratab file... Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Using configuration parameter file: /u01/crs/11.2.0.3/crs/install/crsconfig_params Creating trace directory User ignored Prerequisites during installation OLR initialization - successful root wallet root wallet cert root cert export peer wallet profile reader wallet pa wallet peer wallet keys pa wallet keys peer cert request pa cert request peer cert pa cert peer root cert TP profile reader root cert TP pa root cert TP peer pa cert TP pa peer cert TP profile reader pa cert TP profile reader peer cert TP peer user cert pa user cert Adding Clusterware entries to upstart CRS-2672: Attempting to start 'ora.mdnsd' on 'rac11203node1' CRS-2676: Start of 'ora.mdnsd' on 'rac11203node1' succeeded CRS-2672: Attempting to start 'ora.gpnpd' on 'rac11203node1' CRS-2676: Start of 'ora.gpnpd' on 'rac11203node1' succeeded CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac11203node1' CRS-2672: Attempting to start 'ora.gipcd' on 'rac11203node1' CRS-2676: Start of 'ora.gipcd' on 'rac11203node1' succeeded CRS-2676: Start of 'ora.cssdmonitor' on 'rac11203node1' succeeded CRS-2672: Attempting to start 'ora.cssd' on 'rac11203node1' CRS-2672: Attempting to start 'ora.diskmon' on 'rac11203node1' CRS-2676: Start of 'ora.diskmon' on 'rac11203node1' succeeded CRS-2676: Start of 'ora.cssd' on 'rac11203node1' succeeded ASM created and started successfully. Disk Group OCR created successfully. clscfg: -install mode specified Successfully accumulated necessary OCR keys. Creating OCR keys for user 'root', privgrp 'root'.. Operation successful. CRS-4256: Updating the profile Successful addition of voting disk 1621f2201ab94f32bf613b17f62982b0. Successful addition of voting disk 337a3f0b8a2d4f7ebff85594e4a8d3cd. Successful addition of voting disk 3ae328cce2b94f3bbfe37b0948362993. Successfully replaced voting disk group with +OCR. CRS-4256: Updating the profile CRS-4266: Voting file(s) successfully replaced ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 1621f2201ab94f32bf613b17f62982b0 (/dev/xvdc1) [OCR] 2. ONLINE 337a3f0b8a2d4f7ebff85594e4a8d3cd (/dev/xvdd1) [OCR] 3. ONLINE 3ae328cce2b94f3bbfe37b0948362993 (/dev/xvde1) [OCR] Located 3 voting disk(s). CRS-2672: Attempting to start 'ora.asm' on 'rac11203node1' CRS-2676: Start of 'ora.asm' on 'rac11203node1' succeeded CRS-2672: Attempting to start 'ora.OCR.dg' on 'rac11203node1' CRS-2676: Start of 'ora.OCR.dg' on 'rac11203node1' succeeded CRS-2672: Attempting to start 'ora.registry.acfs' on 'rac11203node1' CRS-2676: Start of 'ora.registry.acfs' on 'rac11203node1' succeeded Configure Oracle Grid Infrastructure for a Cluster ... succeeded
That’s it! After returning to the OUI screen you run the remaing assistants and finally are rewarded with the success message:
Better still, I could now log in to SQL*Plus and was rewarded with the new version:
$ sqlplus / as sysasm SQL*Plus: Release 11.2.0.3.0 Production on Sat Sep 24 22:29:45 2011 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> select * from v$version; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production PL/SQL Release 11.2.0.3.0 - Production CORE 11.2.0.3.0 Production TNS for Linux: Version 11.2.0.3.0 - Production NLSRTL Version 11.2.0.3.0 - Production SQL>
Summary
You might remark that in the output there has only ever been one node referenced. That is correct-my lab box has limited resources and I’d like to test the addNode.sh script for each new release so please be patient! I’m planning an article about upgrading to 11.2.0.3 soon, as well as the addition of a node. One thing I noticed was the abnormally high CPU usage for the CSSD processes: ocssd.bin, cssdagent and cssdmonitor-something I find alarming at the moment.
top - 22:53:19 up 1:57, 5 users, load average: 5.41, 4.03, 3.77 Tasks: 192 total, 1 running, 191 sleeping, 0 stopped, 0 zombie Cpu(s): 0.3%us, 0.2%sy, 0.0%ni, 99.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 4102536k total, 3500784k used, 601752k free, 59792k buffers Swap: 1048568k total, 4336k used, 1044232k free, 2273908k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27646 oracle RT 0 1607m 119m 53m S 152.0 3.0 48:57.35 /u01/crs/11.2.0.3/bin/ocssd.bin 27634 root RT 0 954m 93m 55m S 146.0 2.3 31:45.50 /u01/crs/11.2.0.3/bin/cssdagent 27613 root RT 0 888m 91m 55m S 96.6 2.3 5124095h /u01/crs/11.2.0.3/bin/cssdmonitor 28110 oracle -2 0 485m 14m 12m S 1.3 0.4 0:34.65 asm_vktm_+ASM1 28126 oracle -2 0 499m 28m 15m S 0.3 0.7 0:04.52 asm_lms0_+ASM1 28411 root RT 0 500m 144m 59m S 0.3 3.6 5124095h /u01/crs/11.2.0.3/bin/ologgerd -M -d /u01/crs/11.2.0.3/crf/db/rac11203node1 32394 oracle 20 0 15020 1300 932 R 0.3 0.0 5124095h top 1 root 20 0 19336 1476 1212 S 0.0 0.0 0:00.41 /sbin/init ...
11.2.0.2 certainly didn’t use that much CPU across 4 cores…
Update: I have just repeated the same installation on VirtualBox 4.1.2 with less potent hardware, and funny enough the CPU problem has disappeared. How is that possible? I need to understand more, and maybe update the XEN host to something more recent.
Some of you may have seen on twitter that I was working on understanding collectl. So why did I start with this? First of all, I was after a tool that records a lot of information on a Linux box. It can also play information back, but this is out of scope of this introduction.
In the past I have used nmon to do similar things, and still love it for what it does. Especially in conjunction with the nmon-analyzer, an Excel plug in it can create very impressive reports. How does collectl compare?
Getting collectl
Getting collectl is quite easy-get it from sourceforge: http://sourceforge.net/projects/collectl/
The project website including very good documentation is available from sourceforge as well, but uses a slightly different URL: http://collectl.sourceforge.net/
I suggest you get the archive-independent RPM and install it on your system. This is all you need to get started! The impatient could type “collectl” at the command prompt now to get some information. Let’s have a look at the output:
$ collectl waiting for 1 second sample... #<--------CPU--------><----------Disks-----------><----------Network----------> #cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut 1 0 1163 10496 113 14 18 4 8 55 5 19 0 0 1046 10544 0 0 2 3 164 195 30 60 0 0 1279 10603 144 9 746 148 20 67 11 19 3 0 1168 10615 144 9 414 69 14 69 5 20 1 0 1121 10416 362 28 225 19 11 71 8 35 Ouch!
The “ouch” has been caused by my CTRL-c to stop the execution.
Collectl is organised to work by subsystems, the standard option is to print CPU, disk and network subsystem, aggregated.
If you don’t know what information you are after, you could use the –all flag to display aggregated information across all subsystems. Be warned that you need a large screen for all that output! For even more output, add the –verbose flag to the –all option and you need a 22” screen at least. The verbose flag prints more output, as the name suggests. For the disk subsystem you can view the difference:
$ collectl -sd -i 5 --verbose waiting for 5 second sample... # DISK SUMMARY (/sec) #KBRead RMerged Reads SizeKB KBWrite WMerged Writes SizeKB 162 136 10 15 187 30 19 9 109 24 9 11 566 118 23 24 Ouch! $ collectl -sd -i 5 waiting for 5 second sample... #<----------Disks-----------> #KBRead Reads KBWrit Writes 9865 73 190 23 Ouch!
Each subsystem can be queried individually, the default monitoring interval is 1 second. The man page for collectl lists the following subsystems:
SUMMARY SUBSYSTEMS b - buddy info (memory fragmentation) c - CPU d - Disk f - NFS V3 Data i - Inode and File System j - Interrupts l - Lustre m - Memory n - Networks s - Sockets t - TCP x - Interconnect y - Slabs (system object caches)
As the name suggests, these sub systems provide summary information. Summaries are ok for a first overview, but don’t forget that information is aggregated and detail is lost.
From an Oracle point of view I’d probably be most interested in the CPU, disk and memory usage. If you are using RAC, network usage can also be interesting.
Detailed subsystem information is available for these (again taken from the excellen manual page):
C - CPU D - Disk E - Environmental data (fan, power, temp), via ipmitool F - NFS Data J - Interrupts L - Lustre OST detail OR client Filesystem detail N - Networks T - 65 TCP counters only available in plot format X - Interconnect Y - Slabs (system object caches) Z - Processes
You can combine subsystems, and you can combine detail and summary information. Bear in mind though that this becomes a lot of information for a putty session of gnome-terminal!
In interactive mode, you might want to consider the –home flag, which does a top-like refresh and prints real time information without scrolling: very neat!
But even with the –-home option, digesting all that information visually can be a bit daunting, which leads me to my next section.
Generating graphical output
While all the textual information is all nice and good, it is difficult to visualise. Collectl can help you with that as well. All you need to do is generate a file in tab format, which is as simple as adding the –P and –f options. Since you can’t be overwhelmed with the information gathered in a file (unlike on standard out), you could use the detail switches. If you have the luxury, create the file with the information in a directory expored via samba and analyse it with Excel or other utilities. It’s possible to use gnuplot as well, but I found that a bit lacking for interactive use. The collectl-utils provide a CGI script to analyse collectl files on the host which can be convenient. Here is an example for measuring CPU, memory and all disks with a monitoring interval of 15 seconds. The file will be in “Plot” format (-P) and goes to /export/collectl/plotfiles:
$ collectl -sdmn -i 15 -P -f /export/collectl/plotfiles
Note that you can’t use the verbose flag here, and you also shouldn’t use a file name with the –f switch!
The resulting file is called hostname-yyyymmdd.tab. After renaming it to hostname-yyyymmdd.txt it can quite easily be imported using your favourite spreadsheet application. Imagine all the graphs you could produce with it! Also the header contains interesting information:
################################################################################ # Collectl: V3.5.1-1 HiRes: 1 Options: -sdmn -i 15 -P -f /export/collectl/plotfiles # Host: node1 DaemonOpts: # Distro: Red Hat Enterprise Linux Server release 5.5 (Tikanga) Platform: # Date: 20110805-142647 Secs: 1312550807 TZ: +0100 # SubSys: dmn Options: z Interval: 1 NumCPUs: 16 NumBud: 0 Flags: i # Filters: NfsFilt: EnvFilt: # HZ: 100 Arch: x86_64-linux-thread-multi PageSize: 4096 # Cpu: AuthenticAMD Speed(MHz): 2210.190 Cores: 4 Siblings: 4 # Kernel: 2.6.18-194.el5 Memory: 65990460 kB Swap: 16809976 kB # NumDisks: 173 DiskNames: c0d0 sda sdb sdc sdd sde sdf sdg sdh sdi sdj sdk sdl sdm sdn sdo sdp sdq sdr sds sdt sdu sdv sdw sdx sdy sdz sdaa sdab sdac sdad sdae sdaf sdag sdah sdai sdaj sdak sdal sdam sdan sdao sdap sdaq sdar sdas sdat sdau sdav sdaw sdax sday sdaz sdba sdbb sdbc sdbd sdbe sdbf sdbg sdbh sdbi sdbj sdbk sdbl sdbm sdbn sdbo sdbp sdbq sdbr sdbs sdbt sdbu sdbv sdbw sdbx sdby sdbz sdca sdcb sdcc sdcd sdce sdcf sdcg sdch sdci sdcj sdck sdcl sdcm sdcn sdco sdcp sdcq sdcr sdcs sdct sdcu sdcv sdcw sdcx sdcy sdcz sdda sddb sddc sddd sdde sddf sddg dm-0 dm-1 dm-2 dm-3 dm-4 dm-5 dm-6 dm-7 dm-8 dm-9 dm-10 dm-11 dm-12 dm-13 dm-14 dm-15 dm-16 dm-17 dm-18 dm-19 dm-20 dm-21 dm-22 dm-23 dm-24 dm-25 dm-26 dm-27 dm-28 dm-29 dm-30 dm-31 dm-32 dm-33 dm-34 dm-35 dm-36 dm-37 dm-38 dm-39 dm-40 dm-41 dm-42 dm-43 dm-44 dm-45 dm-46 dm-47 dm-48 dm-49 dm-50 dm-51 dm-52 dm-53 dm-54 dm-55 dm-56 dm-57 dm-58 dm-59 dm-60 # NumNets: 8 NetNames: lo: eth0: eth1: eth2: eth3: sit0: bond0: bond1: # SCSI: DA:0:00:00: ... DA:2:00:00:00 ################################################################################
This should be enough to remind you of where you were running this test.
Run duration and interval
Use the –i flag to change the monitoring interval, this is the same as you’d do with SAR or iostat/vmstat and the like. You could then either use the –c option to count n samples, or alternatively use –R to run for n weeks, days, hours, minutes or seconds, each of which are abridged with their first letter. For example to run for 15 minutes with samples taken every 15 seconds, you’d say collectl –i 15 –R 15m.
Quick and dirty
If you need an interactive overview of what’s going on top-style, you can use the –top flag. This will print output very similar to the top command, but this time you have a lot more options to sort on. Use collectl –showtopops. This is so cool that I couldn’t help just listing the options here:
$ collectl --showtopopts The following is a list of --top's sort types which apply to either process or slab data. In some cases you may be allowed to sort by a field that is not part of the display if you so desire TOP PROCESS SORT FIELDS Memory vsz virtual memory rss resident (physical) memory Time syst system time usrt user time time total time I/O rkb KB read wkb KB written iokb total I/O KB rkbc KB read from pagecache wkbc KB written to pagecache iokbc total pagecacge I/O ioall total I/O KB (iokb+iokbc) rsys read system calls wsys write system calls iosys total system calls iocncl Cancelled write bytes Page Faults majf major page faults minf minor page faults flt total page faults Miscellaneous (best when used with --procfilt) cpu cpu number pid process pid thread total process threads (not counting main) TOP SLAB SORT FIELDS numobj total number of slab objects actobj active slab objects objsize sizes of slab objects numslab number of slabs objslab number of objects in a slab totsize total memory sizes taken by slabs totchg change in memory sizes totpct percent change in memory sizes name slab names
Filtering information
Let’s say you are running multiple ASM disk groups in your system, but you are only interested in the performance of disk group DATA. The –sD flag will print all the information for all disks (LUNs) of the system. Collectl reports disks as the native devices and dm- devices. For multipathed devices you obviously want to look at the dm- device. You could use the multipath –ll command to map dm- device to WWIDs and your disks in the end. Let’s say you found out that the disks you need to look at are /dev/dm-{1,3,5,8} you could use the –dskfilt flag, which takes a perl regex. In my example, I could use the following command to check on those disks:
collectl -sD -c 1 --dskfilt "dm-(1\b|3\b|5\b|8\b)" waiting for 1 second sample... # DISK STATISTICS (/sec) # <---------reads---------><---------writes---------><--------averages--------> Pct #Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util dm-1 0 0 0 0 0 0 0 0 0 0 0 0 0 dm-3 0 0 0 0 0 0 0 0 0 0 0 0 0 dm-5 0 0 0 0 0 0 1 1 0 0 0 0 0 dm-8 0 0 0 0 0 0 0 0 0 0 0 0 0 $
Note the “\b” boundary, which is my uneducated way to saying that the expression should match dm-1, but not 10, or anything else that extends beyond number one.
Additional filters you can apply can be found in the output of collectl –showsubopts as well as in section subsystem options in the manpage.
Summary
Used correctly, collectl is the swiss army knife for system monitoring, the level of detail which can be gathered is breathtaking. Thanks Mark Seger! And aplogies for all the good stuff I’ve been missing!
Recent comments
1 year 45 weeks ago
2 years 5 weeks ago
2 years 9 weeks ago
2 years 10 weeks ago
2 years 14 weeks ago
2 years 36 weeks ago
3 years 4 weeks ago
3 years 33 weeks ago
4 years 18 weeks ago
4 years 18 weeks ago