ACFS 12.2 Quirks with defragmentation

I have a 12.2 RAC cluster in my lab, on two virtual machines. Checking out what acfsutil can do, I figured out that it can supposedly do defragmentation:

[grid@rac1 ~]$ acfsutil defrag -help

Usage: acfsutil [-h] command ... # (Version 12.2.0.1.0)

-h - help

Command Subcmd Arguments
--------------- --------- ------------------------------------------------------
 defrag dir [-r] <path>
 defrag dir Defrag all files in a specific directory 
 defrag dir [-r] recursively traverse directories
 defrag dir <path> path to a specific dir
 defrag file [-i] <path> [<path> ...]
 defrag file Defrag specific files 
 defrag file [-i] Estimate reduction in number of extents possible
 defrag file <path> path to a specific file

 

So, let’s try it out:

[grid@rac1 ~]$ sudo su - oracle
Last login: Mon Aug 28 18:52:05 EDT 2017 from 192.168.1.100 on pts/0
[oracle@rac1 ~]$ acfsutil defrag -r /oradata
acfsutil INVALID: Version 12.2.0.1.0

I switched to “oracle” because /oradata mount point is owned by that user.

[oracle@rac1 ~]$ ls -ld /oradata
drwxrwxr-x 7 oracle grid 32768 Aug 28 01:47 /oradata

I tried with the user root, but no cigar. The error message is exactly the same. So, it’s time to take a closer look at the error message. The error message tells me that the version 12.2.0.1.0 is invalid. OK, time to look at ASM:

 

[grid@rac1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 12.2.0.1.0 Production on Thu Aug 31 21:40:28 2017

Copyright (c) 1982, 2016, Oracle. All rights reserved.

Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> column database_compatibility format a20
column compatibility format a20
column name format a15
select name,compatibility,database_compatibility from v$asm_diskgroup;
SQL> SQL> SQL> 
NAME COMPATIBILITY DATABASE_COMPATIBILI
--------------- -------------------- --------------------
ACFS 12.2.0.1.0 10.1.0.0.0
MGMT 12.2.0.1.0 10.1.0.0.0
OCR 12.2.0.1.0 10.1.0.0.0

The compatibility of all disk groups is correctly set to the software release, which is 12.2.0.1.0, as is clearly visible from the sqlplus header.  The database compatibility doesn’t matter, it simply marks the lowest release of the database that can be created on the ASM group. Time to check out the ACFS file system itself:

[grid@rac1 ~]$ acfsutil compat get /oradata
/oradata
 compatible.advm: 12.2.0.0.0
 ACFS compatibility: 12.2.0.0.0

So, ACFS compatibility is set to 12.2.0.0.0, which is an invalid version, as correctly noted by the error message. However, the only way to change the compatibility is to run “mkfs” with the correct options or, in common vernacular, to “reformat the file system”. I will not do that because that would destroy my database. I consider this to be a serious bug. Oracle should not set the default file system compatibility to an invalid version. Here are the arguments for the mkfs.acfs:

[grid@rac1 ~]$ mkfs.acfs
mkfs.acfs: Create an ACFS file system.
mkfs.acfs: Usage: [-h] [-v] [-f] [-i <metasize>] [-n <name>] [-a <volume>] [-c <release>] <volume> [<size>]
mkfs.acfs: [-h] - Prints out this usage message
mkfs.acfs: [-v] - Verbose mode
mkfs.acfs: [-f] - Force file system format, use to overwrite an existing file system
mkfs.acfs: [-i <metasize>] - Metadata block size
mkfs.acfs: [-n <name>] - Name of file system to be created
mkfs.acfs: [-a <volume>] - ACFS accelerator volume pathname
mkfs.acfs: [-c <release>] - Create ACFS file system compatible with specified release (For example, 12.2.0.0.0)
mkfs.acfs: <volume> - ACFS primary volume pathname
mkfs.acfs: [<size>] - Desired size of the file system in bytes or in units of K|M|G|T|P
mkfs.acfs: Version 12.2.0.1.0

The incorrect argument is “-c”. The funniest thing is that the invalid version in question is even listed at the help screen.  It looks like the defrag utility will stay out of range for now.

Advertisements
Posted in Uncategorized | Leave a comment

RAC 12.2, part 2

Well, today was an interesting day. My ASM, configured with so much effort, stopped working:

root@rac1 grid]# . ~grid/.bashrc
[root@rac1 grid]# crsctl start ohasd
CRS-4640: Oracle High Availability Services is already active
CRS-4000: Command Start failed, or completed with errors.
[root@rac1 grid]# crsctl start cluster
CRS-2672: Attempting to start 'ora.crf' on 'rac1'
CRS-2672: Attempting to start 'ora.storage' on 'rac1'
CRS-2676: Start of 'ora.crf' on 'rac1' succeeded
ORA-27140: attach to post/wait facility failed
ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
ORA-27301: OS failure message: OperatCRS-2674: Start of 'ora.storage' on 'rac1' failed
CRS-2679: Attempting to clean 'ora.storage' on 'rac1'
CRS-2681: Clean of 'ora.storage' on 'rac1' succeeded
CRS-2673: Attempting to stop 'ora.crf' on 'rac1'
CRS-2677: Stop of 'ora.crf' on 'rac1' succeeded
CRS-4000: Command Start failed, or completed with errors.
[root@rac1 grid]#

After digging through the log files, I found the following message:

ORA-27302: failure occurred at: sskgpreset1
 2017-08-27 18:26:16.056 : OCRRAW:1181941824: -- trace dump on error exit --
 2017-08-27 18:26:16.056 : OCRRAW:1181941824: Error [kgfoAl06] in [kgfokge] at kgfo.c:3083
 2017-08-27 18:26:16.056 : OCRRAW:1181941824: ORA-27140: attach to post/wait facility failed
 ORA-27300: OS system dependent operation:invalid_egid failed with status: 1
 ORA-27301: OS failure message: Operat
 2017-08-27 18:26:16.056 : OCRRAW:1181941824: Category: 7
 2017-08-27 18:26:16.056 : OCRRAW:1181941824: DepInfo: 27140
 2017-08-27 18:26:16.057 : OCRRAW:1181941824: -- trace dump end --
 2017-08-27 18:26:16.057 : OCRASM:1181941824: proprasmo: kgfoCheckMount returned [7]
 2017-08-27 18:26:16.057 : OCRASM:1181941824: proprasmo: The ASM instance is down

There is a problem with connecting to ASM instance, which is up and running:

[grid@rac1 ~]$ ORACLE_SID="+ASM1"
[grid@rac1 ~]$ sqlplus / as sysasm

SQL*Plus: Release 12.2.0.1.0 Production on Sun Aug 27 18:29:48 2017

Copyright (c) 1982, 2016, Oracle. All rights reserved.




Connected to:
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> select name,state from v$asm_diskgroup;

NAME STATE
------------------------------ -----------
ACFS MOUNTED
MGMT MOUNTED
OCR MOUNTED

Now, I was at a loss as to why CRS could not connect to the ASM instance. The user “grid” has all necessary groups:

[grid@rac1 ~]$ id
uid=1000(grid) gid=1000(grid) groups=1000(grid),10(wheel),984(vboxsf),54321(oinstall),54331(asmadmin),54332(asmdba)

The content of the $GRID_HOME/rdbms/lib/config.c looks like this:

/*
 * Assembler will not parse a file past the .end directive
 */
#endif

#define SS_DBA_GRP "asmdba"
#define SS_OPER_GRP ""
#define SS_ASM_GRP "asmadmin"
#define SS_BKP_GRP "asmadmin"
#define SS_DGD_GRP "asmadmin"
#define SS_KMT_GRP "asmadmin"
#define SS_RAC_GRP "asmadmin"

User “grid” is the member of all the groups listed above. So, let’s see what group is the ASM instance running as?

[root@rac1 lib]# ps -G asmdba
 PID TTY TIME CMD
[root@rac1 lib]# ps -G asmadmin
 PID TTY TIME CMD
[root@rac1 lib]# ps -G grid
 PID TTY TIME CMD
 2401 ? 00:00:00 sshd
 2402 pts/0 00:00:00 bash
 2576 ? 00:00:04 oraagent.bin
 2595 ? 00:00:01 mdnsd.bin
 2597 ? 00:00:05 evmd.bin
 2646 ? 00:00:02 gpnpd.bin
 2682 ? 00:00:01 evmlogger.bin
 2695 ? 00:00:04 gipcd.bin
 2940 ? 00:00:06 ocssd.bin
 3121 ? 00:00:00 asm_pmon_+asm1
 3127 ? 00:00:00 asm_clmn_+asm1
 3130 ? 00:00:00 asm_psp0_+asm1
 3133 ? 00:00:00 asm_vktm_+asm1
 3137 ? 00:00:01 asm_gen0_+asm1
 3141 ? 00:00:00 asm_mman_+asm1
 3143 ? 00:00:00 asm_scmn_+asm1
 3147 ? 00:00:01 asm_diag_+asm1
 3149 ? 00:00:00 asm_ping_+asm1
 3151 ? 00:00:00 asm_pman_+asm1
 3153 ? 00:00:02 asm_dia0_+asm1
 3155 ? 00:00:01 asm_lmon_+asm1
 3157 ? 00:00:01 asm_lmd0_+asm1
 3159 ? 00:00:03 asm_lms0_+asm1
 3163 ? 00:00:01 asm_lmhb_+asm1
 3165 ? 00:00:00 asm_lck1_+asm1
 3167 ? 00:00:00 asm_dbw0_+asm1
 3169 ? 00:00:00 asm_lgwr_+asm1
 3171 ? 00:00:00 asm_ckpt_+asm1
 3173 ? 00:00:00 asm_smon_+asm1
 3175 ? 00:00:00 asm_lreg_+asm1
 3177 ? 00:00:00 asm_pxmn_+asm1
 3179 ? 00:00:00 asm_rbal_+asm1
 3181 ? 00:00:00 asm_gmon_+asm1
 3183 ? 00:00:00 asm_mmon_+asm1
 3185 ? 00:00:00 asm_mmnl_+asm1
 3187 ? 00:00:02 asm_imr0_+asm1
 3189 ? 00:00:00 asm_lck0_+asm1
 3195 ? 00:00:02 asm_gcr0_+asm1
 6972 ? 00:00:00 sshd
 6973 pts/1 00:00:00 bash
 9317 pts/1 00:00:00 adrci
 9907 ? 00:00:00 sshd
 9908 pts/2 00:00:00 bash
10610 pts/2 00:00:00 perl
10646 ? 00:00:00 oracle_10646_+a
[root@rac1 lib]#

So, the ASM instance was running as a group “grid”, but the SYSASM group was asmadmin. I did make a configuration mistake by leaving the default group “grid” as the primary  group for user “grid”, so the database was started as “grid,grid”.  The first possible solution was to change the primary group for the user “grid” and hope for the best. If that fails, I would have to re-install.  The trick that got everything back up was to allow group access to the dba group:

SQL> show parameter allow_group

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
allow_group_access_to_sga boolean TRUE
SQL>

After setting that in the SPFILE for a single ASM instance, all was well:

[grid@rac1 ~]$ srvctl status filesystem
ACFS file system /oradata is mounted on nodes rac1,rac2
[grid@rac1 ~]$

Installing Oracle RDBMS after that was a breeze. This parameter is new in Oracle 12.2 and is set to “FALSE” by default. It increases security, so much that the clusterware doesn’t start, which prevents the SCOTT tables from falling into the enemy hands. Of course,  I should have heeded the warning duly shown by cluvfy, but I was lazy. However, I learned a lot.

Posted in Uncategorized | Leave a comment

RAC 12.2

I am in the process of installing RAC 12.2. The installation procedure has undergone a significant change, for the better. It’s now much easier to install it. However, memory and disk requirements have skyrocketed. I needed to allocate 10GB of memory for each of my nodes, 12 GB for OCR and 64GB for the management database. My computer will be barely large enough to run both Commserve and those two nodes together:

Screenshot_2017-08-26_17-34-25

I am running Oracle Linux 7.4, the latest and the greatest. The installation procedure is now named gridSetup.sh. The procedure is fairly resource intensive and it is taking more than 30 minutes. Here is the monitor from my host machine:

Screenshot_2017-08-26_17-41-18

As you can see from the monitor, it’s a 32 GB machine with quad core CPU and using flash disk to boot. So, if anyone wants to install RAC 12.2 in the home lab, be aware that this is a huge monstrosity which requires a lot of resources. I will have to buy a new machine, with 64 GB RAM to comfortably run both Commvault and 12.2 RAC. Installation process itself is convoluted and complex, using Java and Perl (expected), as well as gdb (GNU debugger) which is totally unexpected. The process also runs depmod, which means that new kernel modules are installed under the hood. This is probably for ACFS, Oracle’s cluster file system. Each of the nodes has 2 virtual CPU’s, the maximum I can afford. This is just the grid infrastructure. I still need the database and the July patchset. The installation is easier than before and I got the most important thing done:

grid@rac1 12.2.0]$ srvctl -version
srvctl version: 12.2.0.1.0
[grid@rac1 12.2.0]$ srvctl status asm
ASM is running on rac1,rac2

I will add the database tomorrow.

Posted in Uncategorized | Leave a comment

Rename Pluggable Database

I stumbled upon a convenient way to rename a pluggable database:

SQL> alter system enable restricted session;

System altered.

SQL> alter database rename global_name to test12.home.com;

Database altered.

SQL> alter session set container=cdb$root;

Session altered.

SQL> select name from v$pdbs;

NAME
——————————
PDB$SEED
TEST12

In other words, to rename a pluggable database, one only needs to rename its global name. Cute!

Posted in Uncategorized | 1 Comment

12c: OPTIMIZER_DYNAMIC_SAMPLING=11

Oracle 12c has a new feature related to optimizer dynamic sampling. The new level 11 is supposed to put the dynamic sampling on auto pilot and make the statistics available to other users. However, as soon as this was set, I started noticing processes spending significant time waiting for row cache latch. This was strange, especially having in mind that ASMM was configured. I checked the V$ROWCACHE  by using the following, fairly standard, query:

select parameter,gets,getmisses from v$rowcache order by getmisses;

Here is what I got:

PARAMETER GETMISSES GETS
dc_free_extents

0

0

dc_used_extents

0

0

dc_sql_prs_errors

0

0

kqlsubheap_object

0

0

dc_outlines

0

0

realm cache

0

0

Command rule cache

0

0

Realm Object cache

0

0

Rule Set Cache

0

0

DV Auth Cache

0

0

extensible security principal in

0

0

Unused

0

0

Unused

0

0

Unused

0

0

XS security class privilege

0

0

extensible security midtier cach

0

0

AV row cache 1

0

0

AV row cache 2

0

0

AV row cache 3

0

0

triton security name to ID

0

0

rule_or_piece

0

0

dc_qmc_ldap_cache_entries

0

0

qmc_app_cache_entries

0

0

qmc_app_cache_entries

0

0

qmtmrcin_cache_entries

0

0

qmtmrctn_cache_entries

0

0

qmtmrcip_cache_entries

0

0

qmtmrctp_cache_entries

0

0

qmtmrciq_cache_entries

0

0

qmtmrctq_cache_entries

0

0

qmrc_cache_entries

0

0

qmemod_cache_entries

0

0

SMO rowcache

0

0

dc_cdbfiles

0

0

dc_cdbservices

0

0

dc_pdbdba

0

0

dc_pdbstates

0

0

dc_users

0

0

dc_partition_scns

0

0

dc_users

0

0

realm auth

0

0

realm auth

0

0

Realm Subordinate Cache

0

0

dc_users

0

0

dc_users

0

0

rule_fast_operators

0

0

dc_users

0

0

dc_awr_control

2

45531

dc_profiles

6

381825

rule_info

8

8

dc_tablespace_quotas

18

44

dc_rollback_segments

60

1000560

dc_tablespaces

77

485307021

outstanding_alerts

146

2551

dc_props

180

6559399

dc_users

630

255650332

dc_constraints

989

2855

dc_files

1176

1782669

dc_sequences

2152

1999677

dc_global_oids

2314

996894

dc_table_scns

2795

2795

dc_object_grants

15522

28781047

dc_segments

145653

99182525

sch_lj_objs

195178

195278

dc_users

203715

855595322

sch_lj_oids

213295

435827

dc_histogram_defs

459886

74035168

dc_objects

613787

202217522

dc_histogram_data

1084138

76433495

dc_histogram_data

3409318

2919298230

So, the latch misses are related to the histograms. The only conceivable mechanism that can be considered a culprit is the new OPTIMIZER_DYNAMIC_SAMPLING=11 setting. As soon as that was reset to 2 (11g default), the problem was gone. What are the experiences of other list members with ODS in version 12c? I asked the almighty and omniscient Google and came across a very good article by Pythian:

https://www.pythian.com/blog/performance-problems-with-dynamic-statistics-in-oracle-12c/

Apparently, I wasn’t the first to encounter the problem. Contrary to Pythian, I didn’t notice inefficient plans, probably because the database that I was monitoring did not use in-memory option. However, the 3rd party application, which does a fairly large number of parse calls started running between 10% and 15% slower than before.

Posted in Uncategorized | Leave a comment

Oracle 12c, big table caching

In Oracle 11g,  there has been annoying change which has frequently had a seriously detrimental effect to the application performance. All tables, larger than 2% of SGA, were considered big tables and were read using direct reads into PGA, instead of SGA. That has had two undesired effects:

  • Increased paging, as PGA had to be expanded, in order to receive all those rows
  • No sharing, since the PGA is not visible from other processes (or threads).

Good news about it was preventing a full table scan of a big table (see the definition of big tables above) from flooding the SGA and throwing potentially useful blocks out of it. This mechanism is described here:

http://afatkulin.blogspot.co.uk/2012/07/serial-direct-path-reads-in-11gr2-and.html

However, this story has a sequel, in Oracle 12c. Direct reads were detrimental to performance, so Oracle decided to allow creating of an area withing db cache, dedicated for the large tables. That is so called “big table cache” which, when activated, houses blocks from big tables. Those blocks can no longer flood the SGA and throw the useful blocks out of the cache, as they are now stored in separate pool.  Also, the detrimental direct reads are greatly reduced by using the big table cache. I wrote about the mechanism before, here::

https://dbwhisperer.wordpress.com/2014/10/06/poor-mans-in-memory-caching/

However, now I got to test it and the results were very beneficial, across the board. Bad news is that Oracle 12c requires more memory than Oracle 11G. That should have been expected. Each new version ended up using significantly more memory than the previous one, so this is not really surprising. However, the prices of memory are falling so rapidly that  they more than offset increased hunger for memory by new versions of Oracle RDBMS, so that shouldn’t be a problem.

PS:

Also worth reading is the following article by Tanel Poder:

Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven)

 

Posted in Miscellaneous Oracle | Leave a comment

Smarter DG Broker in Oracle 12c

In Oracle 11G, DG Broker needed to be told whether the standby database was physical or logical. The command to add database in 11G was something like this:

ADD DATABASE <DB Unique Name> AS CONNECT IDENTIFIER is <TNS Descriptor>

MAINTAINED AS [PHYSICAL|LOGICAL];

Oracle 12c is definitely smarter and can figure it out on its own. The “ADD DATABASE” syntax is simpler:

DGMGRL for 64-bit Windows: Version 12.1.0.2.0 – 64bit Production

Copyright (c) 2000, 2013, Oracle. All rights reserved.

Welcome to DGMGRL, type “help” for information.
Connected as SYSDG.
DGMGRL> help add database

Adds a member to the broker configuration

Syntax:

ADD { BACKUP_APPLIANCE | DATABASE | FAR_SYNC } <object name>
[AS CONNECT IDENTIFIER IS <connect identifier>];

DGMGRL>

DG Broker will connect to the database and figure out what kind of a standby it is. The new version of DG Broker can also add Oracle’s “Zero Loss Backup Appliance”, and engineered system, or a “Far Sync Instance” for geographically dislocated standby databases. FAR SYNC looks like a very useful concept to me, but I haven’t yet had an opportunity to set one up. Here is a good article about it:

http://karandba.blogspot.com/2014/10/far-sync-in-oracle-12c-database.html

 

Posted in Uncategorized | 2 Comments