Multi-threaded Oracle 12c architecture on Linux

Introduction

Oracle has long had so called multi-threaded architecture on Windows. On the Unix platforms, Oracle was using multi-process architecture until version 12c. The difference is that Oracle server processes in the threaded architecture become threads of a single process. The reason for such an architecture is lower overhead caused by process switching. Switching threads is done on the user level and doesn’t need a system trap to proceed. System traps are expensive operations, implemented in the CPU hardware itself, which cause the CPU to save one set of registers, load another set of registers , update process statistics and return to the user mode execution. The same applies to handling semaphores. Semaphores are actually implemented as device drivers on Unix systems. Every semaphore manipulation causes an interrupt and a context switch. Threads model, supported by the PTHREAD library (/usr/lib/libpthread.so on Linux systems) avoids all those context switches. Threads have become an option since the advent of 64 bit systems which support large address spaces. On 32 bit systems, both the code and SGA would have to fit within a single process, with the address space of only 4GB. That was a severe restriction on the address space and multi-process model was used to avoid that. However, these days Oracle is hardly ever run on 32 bit platforms so this is a useful optimization that makes sense.

Threaded execution

As it may be expected, the threaded model is used when the THREADED_EXECUTION parameter is set to TRUE. This is a static parameter, cannot be changed dynamically by ALTER SYSTEM command, and its value must be the same across all instances of a RAC database. It is possible to use the same dedicated server model as in the process model, but the additional parameter must be set for the listener: DEDICATED_THROUGH_BROKER_LISTENER=ON

With this parameter set, the traditional “/ as sysdba” way of connecting to database no longer works:

SQL*Plus: Release 12.1.0.1.0 Production on Fri Oct 4 13:22:33 2013

Copyright (c) 1982, 2013, Oracle. All rights reserved.

ERROR:

ORA-01017: invalid username/password; logon denied

Enter user-name:

The only way to connect to the database is to connect as SYS using SYS/<passwd>@TNS as SYSDBA:

[oracle@ora12c ~]$ sqlplus /nolog

SQL*Plus: Release 12.1.0.1.0 Production on Fri Oct 4 13:26:01 2013

Copyright (c) 1982, 2013, Oracle. All rights reserved.

SQL> connect sys@test1 as sysdba

Enter password:

Connected.

SQL>

The unintended consequence of this is that dbstart and dbshut scripts no longer work:

[oracle@ora12c ~]$ dbshut $ORACLE_HOME

Processing Database instance “orcl”: log file /home/oracle/app/oracle/product/12.1.0/dbhome_1/shutdown.log

[oracle@ora12c ~]$ sqlplus /nolog

SQL*Plus: Release 12.1.0.1.0 Production on Fri Oct 4 13:28:07 2013

Copyright (c) 1982, 2013, Oracle. All rights reserved.

SQL> connect sys@test1 as sysdba

Enter password:

ERROR:

ORA-12541: TNS:no listener

SQL> connect sys as sysdba

Enter password:

Connected.

SQL> select count(*) from v$session;

COUNT(*)

———-

40

As you can se, the dbshut script has brought down the listener but not the instance. Of course, it is not possible to connect to pluggable databases without listener, but one will definitely need to replace the connection strings in dbshut and dbstart scripts. The same applies to the dbstart script, which starts the listener and starts the instance in the NOMOUNT mode, but doesn’t open the instance or any of the PDB components:

[oracle@ora12c ~]$ dbstart $ORACLE_HOME

Processing Database instance “orcl”: log file /home/oracle/app/oracle/product/12.1.0/dbhome_1/startup.log

[oracle@ora12c ~]$ sqlplus /nolog

SQL*Plus: Release 12.1.0.1.0 Production on Fri Oct 4 13:33:09 2013

Copyright (c) 1982, 2013, Oracle. All rights reserved.

SQL> connect sys as sysdba

Enter password:

Connected.

SQL> select status from V$INSTANCE;

STATUS

————

STARTED

SQL> alter database mount;

Database altered.

SQL> alter database open;

Database altered.

SQL> alter pluggable database test1 open;

Pluggable database altered.

SQL>

Details

Threaded execution is not very well documented. There is a short section in the “Concepts” manual, the parameter description in the “Reference” manual and that’s it. Search on My Oracle Support for the parameter THREADED_EXECUTION comes up empty. So does the search for the listener parameter DEDICATED_THROUGH_BROKER. That means that the situation must be investigated using the native Linux tools. First, let’s see what processes are running when the database is in the threaded mode:

[oracle@ora12c ~]$ ps -fp `pgrep ora_`

UID PID PPID C STIME TTY STAT TIME CMD

oracle 4488 1 0 13:31 ? Ss 0:01 ora_pmon_orcl

oracle 4490 1 0 13:31 ? Ss 0:03 ora_psp0_orcl

oracle 4492 1 8 13:31 ? Ss 3:13 ora_vktm_orcl

oracle 4496 1 0 13:31 ? Ssl 0:09 ora_u004_orcl

oracle 4502 1 3 13:31 ? Ssl 1:10 ora_u005_orcl

oracle 4508 1 0 13:31 ? Ss 0:01 ora_dbw0_orcl

So, we have PMON, PSP0 (aka “process spawner”), VKTM (“virtual time keeper”), database writer and two unknown and undocumented processes, u004 and u005. Not only are these two processes completely undocumented in the documentation and on My Oracle Support, nothing about them can be found in the X$MESSAGES table:

SQL> select description,dest from x$messages

2 where dest like ‘U0%';

no rows selected

The V$BGPROCESS table also comes up empty. So, what threads are running within those two processes? Fortunately, there is pidstat utility:

[oracle@ora12c ~]$ pidstat -t -p 4496 >/tmp/u004.txt

[oracle@ora12c ~]$ pidstat -t -p 4508 >/tmp/u005.txt

Here are threads that those two processes are running::

U004

Linux 2.6.32-358.18.1.el6.x86_64 (ora12c.home.com) 10/04/2013 _x86_64_ (2 CPU)

02:26:39 PM TGID TID %usr %system %guest %CPU CPU Command

02:26:39 PM 4496 – 0.06 0.16 0.00 0.22 1 ora_scmn_orcl

02:26:39 PM – 4496 0.00 0.00 0.00 0.00 1 |__ora_scmn_orcl

02:26:39 PM – 4497 0.00 0.00 0.00 0.00 0 |__oracle

02:26:39 PM – 4498 0.00 0.01 0.00 0.02 1 |__ora_gen0_orcl

02:26:39 PM – 4499 0.01 0.01 0.00 0.01 1 |__ora_mman_orcl

02:26:39 PM – 4505 0.01 0.01 0.00 0.02 1 |__ora_dbrm_orcl

02:26:39 PM – 4509 0.00 0.01 0.00 0.02 1 |__ora_lgwr_orcl

02:26:39 PM – 4510 0.03 0.06 0.00 0.09 1 |__ora_ckpt_orcl

02:26:39 PM – 4511 0.00 0.01 0.00 0.01 1 |__ora_lg00_orcl

02:26:39 PM – 4512 0.00 0.00 0.00 0.00 1 |__ora_lg01_orcl

02:26:39 PM – 4513 0.00 0.00 0.00 0.01 1 |__ora_smon_orcl

02:26:39 PM – 4515 0.00 0.01 0.00 0.01 1 |__ora_lreg_orcl

02:26:39 PM – 4595 0.00 0.01 0.00 0.01 1 |__ora_rvwr_orcl

U005

Linux 2.6.32-358.18.1.el6.x86_64 (ora12c.home.com) 10/04/2013 _x86_64_ (2 CPU)

02:26:34 PM TGID TID %usr %system %guest %CPU CPU Command

02:26:34 PM 4508 – 0.01 0.02 0.00 0.03 1 ora_dbw0_orcl

02:26:34 PM – 4508 0.01 0.02 0.00 0.03 1 |__ora_dbw0_orcl

The thread named ORA_SCMN_ORCL in the U005 process is actually executing the SQL commands, thus the name. This thread is launched in the process U005 upon the execution of the SQL command and has sub-threads, which can easily be seen by using the following comand to monitor a SQL execution: “top -H -u oracle”. The “-H” switch turns on thread display. This command will monitor threads being executed by user “oracle”. The output looks like this:

top – 14:40:55 up 1:52, 3 users, load average: 1.15, 1.07, 0.83

Tasks: 323 total, 2 running, 321 sleeping, 0 stopped, 0 zombie

top – 14:43:34 up 1:54, 3 users, load average: 1.07, 1.07, 0.86

Tasks: 323 total, 2 running, 321 sleeping, 0 stopped, 0 zombie

Cpu(s): 49.2%us, 2.8%sy, 0.0%ni, 48.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 5991356k total, 3484640k used, 2506716k free, 93948k buffers

Swap: 6127608k total, 0k used, 6127608k free, 2864648k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

5285 oracle 20 0 4501m 507m 377m R 97.1 8.7 26:14.85 oracle_5285_orc

4492 oracle -2 0 2065m 17m 15m S 2.7 0.3 5:44.27 ora_vktm_orcl

2222 oracle 20 0 195m 1624 1036 S 0.7 0.0 0:39.02 X11-NOTIFY

4490 oracle 20 0 2065m 17m 15m S 0.7 0.3 0:05.38 ora_psp0_orcl

4488 oracle 20 0 2065m 21m 19m S 0.3 0.4 0:02.43 ora_pmon_orcl

4510 oracle 20 0 3176m 1.8g 1.7g S 0.3 31.0 0:06.95 ora_ckpt_orcl

4506 oracle 20 0 4501m 507m 377m S 0.3 8.7 0:13.33 ora_dia0_orcl

4608 oracle 20 0 4501m 507m 377m S 0.3 8.7 0:09.52 ora_p001_orcl

5767 oracle 20 0 15172 1388 924 R 0.3 0.0 0:00.72 top

2165 oracle 20 0 146m 5512 2828 S 0.0 0.1 0:00.01 gnome-keyring-d

2166 oracle 20 0 146m 5512 2828 S 0.0 0.1 0:00.00 gnome-keyring-d

2249 oracle 20 0 146m 5512 2828 S 0.0 0.1 0:00.00 gnome-keyring-d

2174 oracle 20 0 246m 6396 5028 S 0.0 0.1 0:00.27 gnome-session

2243 oracle 20 0 246m 6396 5028 S 0.0 0.1 0:00.00 gnome-session

2182 oracle 20 0 20036 624 388 S 0.0 0.0 0:00.00 dbus-launch

2183 oracle 20 0 21804 1272 612 S 0.0 0.0 0:00.13 dbus-daemon

2199 oracle 20 0 196m 1948 1460 S 0.0 0.0 0:00.00 VboxClient

Our thread name is now oracle_5285_orcl, which conforms to the standard server process name. Now we can use V$PROCESS table which in the version 12c contains STID column, in addition to the well known SPID column. The STID column contains the thread id, the SPID column is “server process id”. We can trace our server process thread back to U005 process:

Connected to:

Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 – 64bit Production

With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> select spid from v$process where stid=5285;

SPID

————————

4502

Server process 4502 is our process U005, from the listing at the beginning of this section

Now, one more important question remains: where are threads allocating the process memory from? There are two possibilities: threads will allocate the process memory from the shared pool, as is the case with the shared server, or, the memory will be allocated to the process, as non-sharable memory, private to the process. It is important to note that the other threads will still have access to this specific thread PGA as all threads share the address space. So, let’s find out.

Memory Allocation

The details of the memory allocation can be discerned by an experiment. The idea is to measure the size of the shared pool free memory before the experiment, than create a large PL/SQL table which will be contained in the PGA and measure the size of the shared pool free memory again. If the shared pool free memory goes down by the size of this table, the memory is allocated from SGA. If, on the other hand, the virtual size of the process 4502 grows by that amount and the number of shared pages remains the same, the memory is allocated from the OS, using calloc or malloc. The first step toward creating a large PL/SQL table is to create a nad populate a large table:

CREATE TABLE SCOTT.TST

(ID NUMBER(15,0) NOT NULL,

VAL VARCHAR2(10),

CONSTRAINT TST_PK PRIMARY KEY (ID)

)

SEGMENT CREATION IMMEDIATE

7 NOLOGGING TABLESPACE USERS;

Table created.

Elapsed: 00:00:00.78

insert /*+ APPEND */into tst(id,val)

select level,dbms_random.string(‘A’,10)

from dual

connect by level <= 1000000;

commit;

1000000 rows created.

Elapsed: 00:01:26.47

SQL>

Commit complete.

Elapsed: 00:00:00.13

SQL>

The table contains two columns, ID and VAL, simulating key value store. The ID is a NUMBER column, while VAL is VARCHAR2(10) column, filled completely randomly, using DBMS_RANDOM. This is our baseline:

SQL> select pool,name,bytes from v$sgastat

2 where pool=’shared pool’ and name =’free memory';

POOL NAME BYTES

———— ————————– ———-

shared pool free memory 81919928

Now the process size:

[oracle@ora12c Desktop]$ ps -lp 4502

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD

0 S 500 4502 1 45 80 0 – 1203794 semtim ? 00:54:39 ora_scmn_orcl

Now, let’s execute our experiment:

SQL> set echo on

SQL> set termout on

SQL> @test

SQL> set serveroutput on

SQL> declare

2 type testtbl is table of TST%ROWTYPE;

3 biggie testtbl;

4 begin

5 select * bulk collect into biggie from tst;

6 dbms_output.put_line(‘Number of lines’||biggie.count);

7 end;

8 /

Number of lines1000000

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.94

The PL/SQL table biggie is allocated from the process memory, it should be placed entirely into the private memory of the process. Without that table, the size of the shared pool should not change significantly. Let’s see what do the measurements say:

SQL> /

POOL NAME BYTES

———— ————————– ———-

shared pool free memory 75519608

Shared pool free memory is almost 7M smaller than the original size,. Let’s see the address size of the process:

[oracle@ora12c Desktop]$ ps -lp 4502

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD

0 S 500 4502 1 45 80 0 – 1186025 semtim ? 00:54:43 ora_scmn_orcl

[oracle@ora12c Desktop]$

The address size is actually smaller than it was before the test was executed. The inescapable conclusion is that in threaded mode, Oracle dedicated servers allocate PGA from the shared pool, just like in the case of multi-threaded server (MTS). This is an important factor in determining how much memory to allocate to the instance. The conclusion is that Oracle Corp. should really do a better job with documentation and technical white papers. I am aware that this is only the first version and that likelihood of someone actually using this version in production, even without the threaded architecture, is very small, but this is a ground breaking architecture change and should be documented.

About these ads

About mgogala

I am a long time Oracle DBA, who has worked on very large databases. I have worked with OPS and RAC since its inception.I am also a published book writer, having published two books about PHP. This blog is about the challenges and adventures in my professional life. Sorry, no family pictures here.
This entry was posted in Uncategorized and tagged . Bookmark the permalink.

7 Responses to Multi-threaded Oracle 12c architecture on Linux

  1. Naresh says:

    Hello Mladen,

    Thanks for the Very Useful article. Definitely something to watch out for.
    Were you able to figure out what are the lg00 and lg01 processes – TID 4511, 4512 in U004?
    Thanks,
    Naresh

    • mgogala says:

      Naresh, these threads look to me as if they are here to monitor resource consumption by the PDB’s. I haven’t proven that yet, but promise to post something by this weekend. Thanks for the question.

      • Naresh says:

        Thanks Mladen.

        I was wondering if they were part of a “multi threaded lgwr” – one thread would write out redo to log files, another would post sessions waiting on “log file sync” as and when their writes were done. That would mitigate the LGWR bottleneck somewhat.

  2. mgogala says:

    Naresh, you are right. These two process are parts of “multi threaded lgwr” . They aren’t documented anywhere, not in V$BGPROCESS or X$MESSAGES. When I check stats, things look like this:
    SQL> select s.sid,s.serial#,s.program,s.module,s.con_id
    2 from v$session s,v$process p
    3 where s.paddr=p.addr and
    4 p.spid in (13730,13734);

    SID SERIAL# PROGRAM
    ———- ———- ————————————————
    MODULE CON_ID
    —————————————————————- ———-
    240 1 oracle@medo.home.com (LG00)
    0

    4 1 oracle@medo.home.com (LG01)
    0

    Elapsed: 00:00:00.02
    1 select n.name,s.value
    2 from v$sesstat s,v$statname n
    3 where s.STATISTIC#=n.STATISTIC# and
    4 s.value>0 and
    5 s.sid=240 and
    6* s.con_id=0
    SQL> /

    NAME VALUE
    —————————————————————- ———-
    logons cumulative 1
    logons current 1
    user calls 2
    non-idle wait time 165
    non-idle wait count 3457
    in call idle wait time 137418
    session uga memory 213160
    session uga memory max 213160
    messages sent 4
    session pga memory 741096
    session pga memory max 741096
    enqueue requests 1
    enqueue releases 1
    physical write total IO requests 3459
    physical write total multi block requests 42
    physical write total bytes 34234880
    cell physical IO interconnect bytes 34234880
    calls to kcmgcs 1
    redo blocks written 66865
    redo blocks written (group 0) 66865
    redo write size count ( 4KB) 3052
    redo write size count ( 8KB) 95
    redo write size count ( 16KB) 65
    redo write size count ( 32KB) 102
    redo write size count ( 128KB) 95
    redo write size count ( 256KB) 14
    redo write size count ( 512KB) 16
    redo write size count (1024KB) 6
    redo write size count (inf) 5
    redo write time 308
    redo blocks checksummed by FG (exclusive) 9430

    31 rows selected.

    Elapsed: 00:00:00.02
    SQL>
    1 select n.name,s.value
    2 from v$sesstat s,v$statname n
    3 where s.STATISTIC#=n.STATISTIC# and
    4 s.value>0 and
    5 s.sid=4 and
    6* s.con_id=0
    SQL> /

    NAME VALUE
    —————————————————————- ———-
    logons cumulative 1
    logons current 1
    user calls 2
    user I/O wait time 13
    non-idle wait time 24
    non-idle wait count 6
    in call idle wait time 98162
    session uga memory 213160
    session uga memory max 213160
    session pga memory 741096
    session pga memory max 741096
    enqueue requests 1
    enqueue releases 1
    physical write total IO requests 3
    physical write total bytes 10240
    cell physical IO interconnect bytes 10240
    calls to kcmgcs 1
    redo blocks written 20
    redo blocks written (group 1) 20
    redo write size count ( 4KB) 2
    redo write size count ( 8KB) 1
    redo write time 282

    22 rows selected.

    Elapsed: 00:00:00.02
    SQL>
    The only statistics for these processes that are larger than zero are redo log statistics. When you check the file descriptors for the processes, the only files that are open are redo logs:
    root@medo mgogala]# ls -l /proc/13730/fd
    total 0
    lr-x—— 1 oracle oinstall 64 Nov 16 14:28 0 -> /dev/null
    l-wx—— 1 oracle oinstall 64 Nov 16 14:28 1 -> /dev/null
    lr-x—— 1 oracle oinstall 64 Nov 16 14:28 11 -> /dev/urandom
    lrwx—— 1 oracle oinstall 64 Nov 16 14:28 15 -> anon_inode:[eventpoll]
    l-wx—— 1 oracle oinstall 64 Nov 16 14:28 2 -> /dev/null
    lr-x—— 1 oracle oinstall 64 Nov 16 14:28 21 -> /dev/urandom
    lr-x—— 1 oracle oinstall 64 Nov 16 14:28 22 -> /dev/urandom
    lrwx—— 1 oracle oinstall 64 Nov 16 14:28 256 -> /data/oradata/ora12c/ORA12/onlinelog/log1.rdo
    lr-x—— 1 oracle oinstall 64 Nov 16 14:28 27 -> /dev/urandom
    lr-x—— 1 oracle oinstall 64 Nov 16 14:28 28 -> /dev/urandom
    lr-x—— 1 oracle oinstall 64 Nov 16 14:28 3 -> /dev/null
    lr-x—— 1 oracle oinstall 64 Nov 16 14:28 4 -> /dev/null
    lr-x—— 1 oracle oinstall 64 Nov 16 14:28 5 -> /dev/null
    lr-x—— 1 oracle oinstall 64 Nov 16 14:28 6 -> /oracle/product/12.1.0.1/dbhome1/rdbms/mesg/oraus.msb
    lr-x—— 1 oracle oinstall 64 Nov 16 14:28 7 -> /proc/13730/fd
    lr-x—— 1 oracle oinstall 64 Nov 16 14:28 8 -> /dev/zero
    lr-x—— 1 oracle oinstall 64 Nov 16 14:28 9 -> /dev/urandom
    [root@medo mgogala]#

    Strace doesn’t give me much, it’s mostly the following line:
    semtimedop(393221, {{18, -1, 0}}, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)

    However,. when I did “alter system archive log current”, a successful “semctl” operation took
    place:
    semctl(393221, 16, SETVAL, 0x100000001) = 0

    When I created a dummy table and populated it by inserting random strings, committing after every 10 inserts, the stats for one of the processes changed dramatically:

    select n.name,s.value
    from v$sesstat s,v$statname n
    where s.STATISTIC#=n.STATISTIC# and
    s.value>0 and
    s.sid=&sid and
    6 s.con_id=0;
    Enter value for sid: 4
    old 5: s.sid=&sid and
    new 5: s.sid=4 and

    NAME VALUE
    —————————————————————- ———-
    logons cumulative 1
    logons current 1
    user calls 2
    user I/O wait time 13
    non-idle wait time 24
    non-idle wait count 7
    in call idle wait time 319360
    session uga memory 213160
    session uga memory max 213160
    session pga memory 741096
    session pga memory max 741096
    enqueue requests 1
    enqueue releases 1
    physical write total IO requests 3
    physical write total bytes 10240
    cell physical IO interconnect bytes 10240
    calls to kcmgcs 1
    redo blocks written 20
    redo blocks written (group 1) 20
    redo write size count ( 4KB) 2
    redo write size count ( 8KB) 1
    redo write time 282

    22 rows selected.

    Elapsed: 00:00:00.03
    SQL> /
    Enter value for sid: 240
    old 5: s.sid=&sid and
    new 5: s.sid=240 and

    NAME VALUE
    —————————————————————- ———-
    logons cumulative 1
    logons current 1
    user calls 2
    non-idle wait time 165
    non-idle wait count 3458
    in call idle wait time 356539
    session uga memory 213160
    session uga memory max 213160
    messages sent 4
    session pga memory 741096
    session pga memory max 741096
    enqueue requests 1
    enqueue releases 1
    physical write total IO requests 3459
    physical write total multi block requests 42
    physical write total bytes 34234880
    cell physical IO interconnect bytes 34234880
    calls to kcmgcs 1
    redo blocks written 66865
    redo blocks written (group 0) 66865
    redo write size count ( 4KB) 3052
    redo write size count ( 8KB) 95
    redo write size count ( 16KB) 65
    redo write size count ( 32KB) 102
    redo write size count ( 128KB) 95
    redo write size count ( 256KB) 14
    redo write size count ( 512KB) 16
    redo write size count (1024KB) 6
    redo write size count (inf) 5
    redo write time 308
    redo blocks checksummed by FG (exclusive) 9430

    31 rows selected.

    Elapsed: 00:00:00.01

    My conclusion is that those two processes are parts of multi-threaded log writer. That, of course, doesn’t give me any details about entrails of the new and improved log writing mechanism. For that, we will have to wait on either Oracle Corp. or Jonathan Lewis.

    • Naresh says:

      Thanks a lot for the great update Mladen – yeah, lets wait for JL :)

    • Markus says:

      The lg00 processes are “Log Writer Worker Processes”.
      They are documented in Oracle 12c Reference, Appendix F “Background Processes”:

      “On multiprocessor systems, LGWR creates worker processes to improve the performance of writing to the redo log. LGWR workers are not used when there is a SYNC standby destination. Possible processes include LG00-LG99.”

      Markus

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s