Faster – new book by Cary Millsap

A new book by Cary Millsap, named “Faster” is about to hit the shelves. For those few who don’t know, Cary is the author of “Optimizing Oracle for Performance”, the most revolutionary Oracle tuning book ever. Cary has laid foundation to the event based tuning method, or “method-R”, as he calls it in his book. The previous method was “method-C”, based on BCHR witchcraft. Today largely forgotten, BCHR used to stand for “Buffer Cache Hit Ratio”. The idea of BCHR-based witchcraft was that the goal of the tuning was to maximize the BCHR. The whole thing went downhill when Connor McDonald, a renowned Oracle wizard and one of the people behind the “Ask Tom” web page, created a little procedure which could propel BCHR as high as it was wanted, with largely negative impact to performance. Connor’s script took a small table and accessed it over and over again, until BCHR was higher than the desired value. The impact to performance was negative because this loop was burning lots of CPU, without contributing anything useful to the user.

Cary, on the other hand, had a different idea: let’s start from the application instead of the database and figure out what the application is waiting for. Once the wait point was known, tuning was basically done by trying to either eliminate or reduce wait. Cary has also developed a Perl script which was able to aggregate the SQL Trace file and summarize the waits per SQL. For those who are younger than 40, Perl is a scripting language, superior alternative to Python, obviously too complex for the modern generations of programmers. The simple idea above consists of the two even simpler ideas:

  • We should start from the application. The customer is not interested in the database indicators, the customer is interested in the application performance. That was a big heresy back in the day of the BCHR witchcraft.
  • Let’s see what the application is actually waiting for instead of assuming it is IO. Sometimes, it’s not I/O.

Cary’s “Optimizing Oracle for Performance” remains relevant to this very day because it lays outt the logical foundations for tuning an application running against Oracle, despite the fact that it was written back in the days of Oracle 8i. It was the best of times; It was the worst of times. Cary’s book lays out the logical foundation for tuning almost anything. It is phylosophically very similar to the famous book by Kevin Dowd called “High Performance Computing”. In his book, Kevin Dowd uses the idea of actually using the profiler for measuring performance and seeing in which part of the program is the most of the time spent.

The new book is called “Faster” and, according to Amazon, it will arrive to my mailbox on September the 22nd, 2022. I cannot wait for this new book. I have never met Cary in person but I have communicated with him through email. Knowing him, I think that his book will be well worth the wait.

Posted in Uncategorized | Leave a comment

Oracle autoupgrade utility

My favourite database has recently added a new upgrade tool to its already impressive arsenal. The tool is described here:

I have recently used it to upgrade a 12.2 on Oracle Linux 7 to Oracle 19c on the same machine. Oracle Linux remained version 7. This is a rather large development database with several tenants. The upgrade was uneventful but a bit slow. I was unable to find where to set the degree of parallelism, so it was running in single threaded mode. Nevertheless, a 5 TB database was upgraded in less than 1 hour. There were no unusable/invalid objects, no internal errors, the upgrade was completely uneventful. I wholeheartedly recommend the tool to anyone planning to upgrade database from 12c to 19c.

Posted in Uncategorized | Leave a comment

Oracle 21c on premises

Oracle 21c has finally arrived to Linux. You will need Oracle Linux 8 but it installs using a rpm package. The systemd service file still works:

oracle@ora21c ~]$ systemctl status oracle
 oracle.service - oracle
   Loaded: loaded (/usr/lib/systemd/system/oracle.service; enabled; vendor pres>
   Active: active (exited) since Sun 2021-08-29 14:56:54 EDT; 2min 48s ago
  Process: 939 ExecStart=/usr/local/bin/dbstart $ORACLE_HOME (code=exited, stat>
 Main PID: 939 (code=exited, status=0/SUCCESS)
    Tasks: 91 (limit: 100275)
   Memory: 7.1G
   CGroup: /system.slice/oracle.service
           ├─1056 /opt/oracle/product/21c/dbhome_1/bin/tnslsnr LISTENER -inherit
           ├─2544 ora_pmon_orcl
           ├─2548 ora_clmn_orcl
           ├─2552 ora_psp0_orcl
           ├─2556 ora_vktm_orcl
           ├─2562 ora_gen0_orcl
           ├─2566 ora_mman_orcl
           ├─2572 ora_gen1_orcl
           ├─2574 ora_gen2_orcl
           ├─2578 ora_vosd_orcl
           ├─2580 ora_diag_orcl
           ├─2582 ora_ofsd_orcl
           ├─2584 ora_dbrm_orcl
           ├─2586 ora_vkrm_orcl

There are some immediate differences from the earlier versions. The DBS directory is no longer at $ORACLE_HOME/dbs, it’s in $ORACLE_BASE/dbs:

oracle@ora21c ~]$ ls $ORACLE_BASE/dbs
hc_orcl.dat  initorcl.ora  lkORCL  orapworcl  spfileorcl.ora

The connection from the Oracle Instant Client 19.12 works without a problem. There is a whole slew of new parameters that I have to check. My agenda is the following:

  • DBMS_STATS changes
  • PL/SQL compilation
  • Any changes to the local temporary tables?
  • Any nifty new features.

I will have my work cut out for me in the incoming months. Of course, this post wouldn’t be complete without a connection:

[mgogala@umajor tns]$ sql system/qwerty@ora21c

SQLcl: Release 21.2 Production on Sun Aug 29 13:58:02 2021

Copyright (c) 1982, 2021, Oracle.  All rights reserved.

Last Successful login time: Sun Aug 29 2021 13:58:02 -04:00

Connected to:
Oracle Database 21c Enterprise Edition Release - Production

Elapsed: 00:00:00.001

So far, I’ve installed the instance, put the SGA in the huge pages and slightly adjusted my systemd service.

Posted in Uncategorized | Leave a comment

PostgreSQL advances in the recent years

A long, long time ago I wrote a little article comparing PostgreSQL and Oracle. The original article is here:

It is badly out of date and should be revised. I promise to do that when I have time and energy. The conclusion from the article was that PostgreSQL needs the following features to be considered for replacing Oracle RDBMS in applications:

1.Optimizer hints
2.Wait event interface
3.Much improved partitioning.
4.Some form of parallelism, at least for the partitioned tables.

It is my pleasure to say that most of the features above are now available:

  • Optimizer hints are provided by pg_hint_plan extension. Hints are not as detailed or fine grained as Oracle hints but they allow overriding optimizer decisions. Hints that force an index scan instead of the sequential scan, hints that force jon order and join method as well as the hints that set GUC parameters for the statement are all available.
  • Partitioning has improved considerably. The critical feature is the availability of the global indexes. According to FOSDEM 2021 presentation by Anastasia Lubennikova, global indexes are on the to do list. Hopefully, global indexes will appear in PostgreSQL 15 or 16.
  • Parallelism is well developed and available. It can be used for parallel query, parallel vacuum and parallel aggregates. For the most intents and purposes, PostgreSQL parallelism is adequate and can do the job as well as Oracle parallelism.

The event interface is not there and I doubt that it ever will be, but the latest log file parsers like pgBadger and extensions like pg_stat_statements can provide an acceptable substitute. Furthermore, creating statistics extensions makes it possible for the Postgres optimizer to deal with the correlated data columns and tables that are not in the 3rd normal form. There is also the “Bloom” extension which implements Bloom filters in Postgres. I tested it and it works very, very well.

Finally, I am engaged in the pilot project of porting a major financial application to PostgreSQL. For now, partitioning will not be used, but it will be added when the global indexes become available. Kudos to Postgres designers.

Posted in Uncategorized | 2 Comments

Size of the processes structure in SGA

I have recently recommended a client to increase PROCESSES parameter from 2000 to 8000. The question was how much memory will that use? The answer is surprisingly small. The first snippet shows the initial state:

QL> show parameter processes

------------------------------------ ----------- ------------------------------
aq_tm_processes 		     integer	 1
db_writer_processes		     integer	 1
gcs_server_processes		     integer	 0
global_txn_processes		     integer	 1
job_queue_processes		     integer	 80
log_archive_max_processes	     integer	 4
processes			     integer	 320

SQL> select name,value from v$sga
  2  /

-------------------- ----------
Fixed Size		9144984
Variable Size	     2214592512
Database Buffers     2734686208
Redo Buffers		7630848

After that, I increased the PROCESSES for 1000 and restarted the instance:

SQL> alter system set processes=1320 scope=spfile;

System altered.

SQL> startup force
ORACLE instance started.

Total System Global Area 4966052456 bytes
Fixed Size		    9146984 bytes
Variable Size		 2214592512 bytes
Database Buffers	 2734686208 bytes
Redo Buffers		    7626752 bytes
Database mounted.
Database opened.

With 320 processes, the fixed size was 9144984 bytes. With 1320 processes, it was 9146984 bytes. The difference is 2000 bytes. So, it gives only 2 bytes per process. That looks a bit too small. Alternatively, we could sum the column sizes from V$PROCESS. Unfortunately, that too gives an unreliable result

SQL> select sum(vsize(column_name)) from user_tab_columns where table_name='V_$PROCESS';

Elapsed: 00:00:00.003

Just the trace file name can take up to 513 bytes so 200 bytes is not a credible result. Unfortunately, V_$PROCESS is a view, not a table, so it is not possible to extract AVG_ROW_LEN from DBA_TABLES. To calculate the best possible size, we need a little script:

#!/usr/bin/env python3
""" This script prints the row size of V$PROCESS table
Author: Mladen Gogala
Date  : 04/07/2021
Version: 1.0
import argparse
import sys
import re
import cx_Oracle

cols = []
parser = argparse.ArgumentParser(
    description='V_PROCESS: describe and print the row size of V$PROCESS table')
parser.add_argument("-u", "--username", dest="username", required=True,
                    help='Oracle connection using <usr>/<pwd>@<tns> syntax')
args = parser.parse_args()
row_size = 0
    con = cx_Oracle.connect(args.username)
    query = con.cursor()
    query.execute("select * from v$process")
    for entry in query.description:"DB_TYPE_([A-Z]+)",str(entry[1]))
            f"Column name: {entry[0]} Type: {ftype} Size: {entry[2]}")
        row_size += int(entry[2])
    print(f"Row size={row_size}")
except cx_Oracle.DatabaseError as ora_err:
    oerr = ora_err.args[0]
    print(f"Oracle error: {oerr}")

This will give a lavish upper estimate on the size of the V$PROCESS table. This table is allocated when the instance is started, based on the PROCESSES parameter and cannot be changed until the the instance is restarted. So, here is the output:

mgogala@umajor python]$ ./ -u system/qwerty@ora19c
Column name: ADDR Type: RAW Size: 8
Column name: PID Type: NUMBER Size: 127
Column name: SOSID Type: VARCHAR Size: 24
Column name: SPID Type: VARCHAR Size: 24
Column name: STID Type: VARCHAR Size: 24
Column name: EXECUTION_TYPE Type: VARCHAR Size: 20
Column name: PNAME Type: VARCHAR Size: 5
Column name: USERNAME Type: VARCHAR Size: 15
Column name: SERIAL# Type: NUMBER Size: 127
Column name: TERMINAL Type: VARCHAR Size: 30
Column name: PROGRAM Type: VARCHAR Size: 48
Column name: TRACEID Type: VARCHAR Size: 255
Column name: TRACEFILE Type: VARCHAR Size: 513
Column name: BACKGROUND Type: VARCHAR Size: 1
Column name: LATCHWAIT Type: VARCHAR Size: 16
Column name: LATCHSPIN Type: VARCHAR Size: 16
Column name: PGA_USED_MEM Type: NUMBER Size: 127
Column name: PGA_ALLOC_MEM Type: NUMBER Size: 127
Column name: PGA_FREEABLE_MEM Type: NUMBER Size: 127
Column name: PGA_MAX_MEM Type: NUMBER Size: 127
Column name: NUMA_DEFAULT Type: NUMBER Size: 127
Column name: NUMA_CURR Type: NUMBER Size: 127
Column name: CPU_USED Type: NUMBER Size: 127
Column name: CON_ID Type: NUMBER Size: 127
Row size=2269

This was done on Oracle 19.10 on Oracle Linux 7.9 x86_64. So, increasing the PROCESSES parameter for 1000 increases the memory consumption at maximum for 2.2 MB. This is a bit too much since V$ tables map SGA structures. SGA structures don’t use database representation of numbers they use 64 bit integers, which means that every “NUMBER Size: 127” in the output above should be replaced by “8”. However, I will take it at face value since even with 127 bytes per number it really isn’t all that much today, when memory is counted in Gigabytes. It is hard to understand reluctance of some database administrators to increase the processes parameter.

For those who would like to know what the number looks like when “127” is replaced by “8”, it’s 1079. The above script can be trivially modified to replace “127” by “8”. However, that may not be the whole story since there may be some unmapped fields in the SGA structure. That is why I choose to operate with 2KB per row in the process table.

Posted in Uncategorized | Leave a comment

Unplug problem

I was asked to refresh one of the development PDB tenants from the corresponding production PDB. The platform is Linux x86_64 (OL 7.7) with Oracle 19.10 (Jan 2021 RU). Unfortunately, due to security measures, the only open port between the production server and the development server is 22. That eliminates remote clone. The tenant size in production is around 1TB. I opted for unplug pluggable database into ‘/tmp/tenant.pdb’. That took almost 2 hours with very fast disk storage (Pure array) and a powerful CPU. It was single threaded. Is there any way I can use multiple threads? The .pdb archive is compressed and my bottleneck was it being run on a single CPU. Is there any trick to do that with RMAN? The production has 144 cores and I could easily afford running 12 threads at a time. Unfortunately, I can’t restore pluggable database into another CDB using rman. This was like pulling teeth. I couldn’t get SQL*Net port (it isn’t 1521) open so I had to unplug. Fortunately, I knew what was I in for so I first did unplug to XML, dropped the pluggable database with “keep datafiles” and plugged it back in using XML. The whole thing is extremely awkward. The funny thing was that “CREATE PLUGGABLE DATABASE TESTDB3 USING ‘/tmp/tenant.pdb'” used 2 cores. It was 4 times as fast as creating the PDB archive. Oracle’s multi-tenant leaves a lot to be improved. I hope that Oracle 21 might address some of these issues.

Posted in Uncategorized | Leave a comment

PITR and BIGFILE tablespaces

I was lazy and I created a BIGFILE tablespace to alleviate the pain with constant space monitoring and adding 32767M files to the tablespace. However, there was a logical corruption caused by running a wrong script on the schema, so I was told to restore the pluggable database to the point in time before the script was run. And that is where the trouble started:

915389 alter database recover datafile list clear
915390 Completed: alter database recover datafile list clear
915391 2021-03-03T22:13:09.935629-05:00
915392 RMAN In-place recover pluggable database id 3 to scn 0
915393 Pluggable Database PSH_8141 (3) incomplete recovery start set at change 45008648710, at 03/03/2021 10:30:55
915394 Pluggable Database Media Recovery Start
915395 2021-03-03T22:13:09.939521-05:00
915396 Serial Media Recovery started
915397 Pluggable Database Media Recovery Log /data/archive/1_65868_1031066218.dbf
915398 2021-03-03T22:13:12.996469-05:00
915399 Incomplete Recovery applied until change 44962012457 time 03/03/2021 09:00:15
915400 Errors in file /app/oracle/diag/rdbms/orcl/orcl/trace/orcl_ora_86417_86514.trc (incident=13548103) (PDBNAME=CDB$ROOT):
915401 ORA-00600: internal error code, arguments: [kcpSetPDBIncompleteRecoverySCN-1], [44962012457], [45008648710], [], [], [], [], [], [], [], [], []
915402 Incident details in: /app/oracle/diag/rdbms/orcl/orcl/incident/incdir_13548103/orcl_ora_86417_86514_i13548103.trc
915403 2021-03-03T22:13:14.351287-05:00
915404 Some DDE async actions failed or were cancelled
915405 2021-03-03T22:13:14.407330-05:00
915406 Use ADRCI or Support Workbench to package the incident.

Long story short, the tenant was lost. It was impossible to do PITR on the tablespace with 8.7 TB file. I even tried recovering the whole database. I contacted the Oracle support but they weren’t able to help. The Oracle version is, the platform is Linux x86_64. Be aware that bigfile tablespaces can result in the inability to do point in time recovery. I haven’t tried the complete recovery, because that would also bring back the results of the “masking” script. Apparently, Oracle isn’t testing backup and recovery with the bigfile tablespaces, so surprises are possible. I thought that the feature will be stable by 12.2, but alas, no luck.

Posted in Uncategorized | Leave a comment

Delayed cleanout

Delayed cleanout is a long standing feature of Oracle RDBMS. I remember it from version 7. Essentially, Oracle doesn’t have a classic lock manager with queueing, lock conversion and lock escalation, like some other databases. Oracle creates a transaction entry in the in-memory copy of the Oracle block and then adds the pointer to that transaction entry to the row header. That’s it. The next part of the story happens when there is not enough room in the SGA so DBWR has to make some room by writing a batch of blocks out to the disk. DBWR will write the entire block, including the transaction entries and row headers, thereby including the row locks, to the database file.

The next time the block is brought back into the SGA, it will be automatically checked up. The transaction entries not pointing to the valid transactions will be cleaned up and so will any accompanying pointers in the row headers. That process briefly locks the rows being cleaned up and is known to sometimes cause deadlocks. The block is then flagged as dirty (modified) and sent to the closest DBWR to write it down to disk when the time comes. The usual advice was to do “select * from <table_name>” immediately aftet the large transaction, to clean out the invalid lock entries.

Now, why did I write “brought back into the SGA” in the bold font? Here come Oracle 11g and later versions which will sometimes read the table into the PGA, not SGA. DBWR doesn’t have access to the PGA and so the delayed block cleanout is not performed when full table scan is done. The ITL entries (aka “transaction entries”) will survive the full table scan and the delayed cleanout will be done at the least opportune moment, probably when an application needs to update the row. In Oracle 12c and later, there are two mechanisms for forcing the block cleanup:

  • Big table caching

Alter table move is classic, if expensive, method to force block cleanup. Big table caching is new, in Oracle 12c. Arup Nanda, my former colleague and tech lead from Oxford Health Plans, has written an article about big table caching in his inimitable style:

The information in the article is complete and useful. However, the point of this post is that the old advice about performing a full table scan is no longer valid because of the new method of doing the full table scan.

Posted in Uncategorized | Leave a comment

Starting oracle with huge pages

Last year I blogged about the systemd service to start and stop oracle:

Last week, I noticed that my Oracle 19.9 virtual machines have started swapping. That was strange. There are only test instances on those boxes, memory has been carefully balanced and swapping shouldn’t have occurred. Then I noticed that huge pages are not being used at all when I started Oracle from systemd service. They have been used normally when I started the instance manually, from sqlplus. So I’ve set “use_large_pages=only”, restarted the system and lo and behold, here is what I got:

Available system pagesizes:
   4K, 2048K
  Supported system pagesize(s):
      2048K             2536            2370               0   ORA-27125
  Reason for not supporting certain system pagesizes:
   4K - Large pagesizes only
 Increase per process memlock (soft) limit to at least 4740MB
 to lock 100% of SHARED GLOBAL AREA (SGA) pages into physical memory 
 SGA: Realm creation failed

So, systemd cannot start Oracle with huge pages but sqlplus from the command line can. I have set up the limits in /etc/security/limits.conf, that wasn’t the problem. For some reason, processes spawned by systemd weren’t using the limits. I searched Oracle support and came up with the following page:

How to set ulimit values to take effect for systemd service (Doc ID 2681853.1)

Unfortunately, the prescriptions from that page haven’t worked. I modified the service the way that Oracle Support recommended:


That didn’t work. Then, I did the same thing with /etc/systemd/system.conf and rebooted the system. That didn’t work either. Unsurprisingly, systemd is buggy and doesn’t do what it’s supposed to do. The solution was to add the startup to /etc/rc.local:

-bash-4.2$ cat /etc/rc.local
 It is highly advisable to create own systemd services or udev rules
 to run scripts during boot instead of using this file.
 In contrast to previous versions due to parallel execution during boot
 this script will NOT be run after all other services.
 Please note that you must run 'chmod +x /etc/rc.d/rc.local' to ensure
 that this script will be executed during boot.
 touch /var/lock/subsys/local
 export ORACLE_HOME=/opt/oracle/product/19c/dbhome_1
 su - oracle -c "dbstart $ORACLE_HOME"

That gave me the best of the both worlds: systemd would still shut the database down orderly when it was started and it would start using the large pages:

-bash-4.2$ grep -i huge /proc/meminfo
 AnonHugePages:         0 kB
 ShmemHugePages:        0 kB
 FileHugePages:         0 kB
 HugePages_Total:    2536
 HugePages_Free:      168
 HugePages_Rsvd:        2
 HugePages_Surp:        0
 Hugepagesize:       2048 kB
 Hugetlb:         5193728 kB

Since my machines are small virtual machines in VirtualBox, 16GB RAM each, I’d rather have Oracle not start at all than to start without configured huge memory. Until systemd is fixed, this is the solution that I will adopt. It does start Oracle with huge pages and it shuts it down orderly when the machine is shut down. The Linux that I am using is the latest Oracle Linux 7:

-bash-4.2$ cat /etc/oracle-release 
 Oracle Linux Server release 7.9
 -bash-4.2$ uname -r

Things may be better on OL 8.3 but I am too lazy to upgrade all my machines to the new Linux. It’s a lot of work and there is no reason to do that yet. For now, the /etc/rc.local trick will have to do.

Posted in Uncategorized | 2 Comments

PyODBC fast_executemany and Oracle RDBMS

Oracle has array interface well over two decades. The purpose of the array interface is to reduce the number of network trips needed for performing batch operations like fetching multiple row at once or inserting multiple rows at once. ODBC interface, however, is another story. ODBC interface is meant to be portable and database agnostic. In other words, ODBC operates with the smallest common denominator of the available features in various programmatic interfaces for relational databases.

Being an Oracle DBA, I use ODBC primarily to load data from an Oracle database into another type of database, like MySQL, PostgreSQL or SQL Server. However, some scripts are just performing general functions, like loading a CSV file into a database, so it’s nice to have one script to rule them all. These days, I mostly script in Python, due to the market demand, so Pyodbc library will be used to demonstrate the script.

This article is about the “fast_executemany” property of a Pyodbc connection. The script I used is called and it looks like this:

#!/usr/bin/env python3
""" This script loads the content of a CSV file into a table
This script uses ODBC connection and can be used to load table in
any database that supports ODBC. The connection syntax is Oracle
like, with the ODBC DSN instead of an Oracle TNS descriptor.
Author: Mladen Gogala
Version: 1.0
Date  : 11/08/2020

import argparse
import csv
import sys
import re
import pyodbc

parser = argparse.ArgumentParser(description='LOAD_CSV load a CSV into table')
parser.add_argument("-u", "--username", dest="username", required=True,
                    help='ODBC connection using <usr>/<pwd>@<DSN syntax')
parser.add_argument("-t", "--table", dest="table", required=True,
                    help='Table to be loaded')
parser.add_argument("-d", "--delimiter", dest="delimiter", default=',',
                    help="CSV delimiter character. The default is ','")
parser.add_argument("--noquote", dest="noquote", action="store_true",
                    help="Fields in CSV file are not quoted")
parser.add_argument("-F", "--fast-execmany", dest="fast_exec", action="store_true",
                    help="Use fast_executemany flag")
parser.add_argument("-i", dest="file", required=True,
                    help="File to load into the database")
parser.add_argument("-n", "--arraysize", dest="nrows", default=1024, type=int,
                    help="Batch size for bulk insert")
args = parser.parse_args()
(uid, pwd, dsn) = (None, None, None)
# Rows are read from the CSV file and inserted into the "batch_ins" list.
# The resulting list is then inserted into the database using "executemany"
# Pyodbc function. However, Pyodbc cheats and internally runs a loop which sends
# the rows to the database one by one instead of sending the rows in batches,
# unless the driver supports "fast_executemany" flag. 
batch_ins = []
# ODBC has some strange philosophy when it comes to counting rows, it's better
# to do it programmatically.
rows = 0
# The insert statement needs to be constructed for the table. The table name is
# a script argument. The number of the bind variables is determined by counting
# the fields in the CSV file.
insert_str = f"insert into {args.table} values(?"
str_complete = False
parse_usr ="(\w+)/(.*)@(.*)", args.username)
if parse_usr:
    uid =
    pwd =
    dsn =
if (uid is None) or (pwd is None) or (dsn is None):
    print("Incorrect connect string syntax. <username>/<password>@DSN expected")
    print("DSN is the name of the ODBC connection")
    conn = pyodbc.connect(f"DSN={dsn};UID={uid};PWD={pwd}")
    conn.autocommit = False
    cursor = conn.cursor()
# If the driver supports "fast_executemany" flag, use it. So far, only the original
# Microsoft ODBC SQL Server drivers, both on Windows and Linux support that flag.
    if args.fast_exec:
        cursor.fast_executemany = True
    with open(args.file, "r") as f:
        if args.noquote:
            reader = csv.reader(f, delimiter=args.delimiter,
            reader = csv.reader(f, delimiter=args.delimiter,
        for row in reader:
            if not str_complete:
                for i in range(1, len(row)):
                    insert_str += ",?"
                insert_str += ")"
                str_complete = True
            rows += 1
            if not len(batch_ins) % args.nrows:
                cursor.executemany(insert_str, batch_ins)
                batch_ins = []
    if len(batch_ins) > 0:
        cursor.executemany(insert_str, batch_ins)
except pyodbc.Error as odbc_exc:
    sqlstate = odbc_exc.args[0]
    print("Error: SQLSTATE =", sqlstate)
except OSError as oserr:
    os_err = oserr.args
    print("Cannot open file:", args.file)
print(rows, "rows loaded")

The “fast_executemany” flag activates the array interface that saves a lot of time and network trips. The cursor.executemany() function internally performs a loop and sends rows one by one, unless “fast_executemany” flag is specified. If the flag is specified and if the drivers supports it, the database will perform the operation (insert or update) on the entire array in a single operation. Without that flag, the array size, specified by the “-n” argument doesn’t have too much impact on the performance:

mgogala@umajor:~/work/python$ time ./ -u scott/tiger@Ora-19 -t test_tab -i /tmp/test_tab.csv 
 5000000 rows loaded
 real    18m30.113s
 user    1m7.887s
 sys    0m49.030s
 mgogala@umajor:~/work/python$ time ./ -u scott/tiger@Ora-19 -t test_tab -i /tmp/test_tab.csv -n 16384
 5000000 rows loaded
 real    16m22.102s
 user    1m7.751s
 sys    0m45.439s

Loading 5,000,000 rows into a table takes 18.5 minutes with the default 1024 rows batch and 16 minutes 22 seconds with 16,384 rows batch. Nothing to write home about. MyODBC DSN is Ora-19 and the driver is Instant Client 19.9 ODBC driver. Interestingly, this wasn’t working with the Instant Client 19.8 driver. My /etc/odbcinst.ini looks like this:

 Description=Oracle ODBC driver for Oracle 19

Now, let’s truncate the table and repeat the test with “fast_executemany”, controlled by the “-F” argument:

gogala@umajor:~/work/python$ time ./ -u scott/tiger@Ora-19 -t test_tab -i /tmp/test_tab.csv  -F
 5000000 rows loaded
 real    1m1.061s
 user    0m10.475s
 sys    0m0.523s

mgogala@umajor:~/work/python$ time ./ -u scott/tiger@Ora-19 -t test_tab -i /tmp/test_tab.csv  -F -n 16384
 5000000 rows loaded
 real    0m49.057s
 user    0m11.165s
 sys    0m6.441s

Now, that is much better.Please note that increasing the array size from 1024 rows to 16384 rows has increased performance for about 15%. Performance would be better still if I haven’t kept Golden Gate on, but I wanted to see how fast will GG replicate the load. Golden Gate did that practically in real time:

2020-11-21T15:29:04.201-0500  INFO    OGG-25222  Oracle GoldenGate Delivery for Oracle, rtgt1.prm:  Large transaction completed. Total records: 16,384.
 2020-11-21T15:29:04.404-0500  INFO    OGG-25221  Oracle GoldenGate Delivery for Oracle, rtgt1.prm:  Processing transaction (XID: 6.33.6307, Seqno: 54, RBA: 447591827) larger than eager size (15,100).
 2020-11-21T15:29:04.405-0500  INFO    OGG-25222  Oracle GoldenGate Delivery for Oracle, rtgt1.prm:  Large transaction completed. Total records: 16,384.
 2020-11-21T15:29:04.577-0500  INFO    OGG-25221  Oracle GoldenGate Delivery for Oracle, rtgt1.prm:  Processing transaction (XID: 4.2.6255, Seqno: 54, RBA: 449607101) larger than eager size (15,100).
 2020-11-21T15:29:04.577-0500  INFO    OGG-25222  Oracle GoldenGate Delivery for Oracle, rtgt1.prm:  Large transaction completed. Total records: 16,384.
 2020-11-21T15:29:11.240-0500  INFO    OGG-25221  Oracle GoldenGate Delivery for Oracle, rtgt1.prm:  Processing transaction (XID: 7.32.6253, Seqno: 54, RBA: 451622374) larger than eager size (15,100).
 2020-11-21T15:29:12.157-0500  INFO    OGG-25222  Oracle GoldenGate Delivery for Oracle, rtgt1.prm:  Large transaction completed. Total records: 16,384.
 2020-11-21T15:29:12.256-0500  INFO    OGG-25221  Oracle GoldenGate Delivery for Oracle, rtgt1.prm:  Processing transaction (XID: 10.29.6255, Seqno: 54, RBA: 453637648) larger than eager size (15,100).
 2020-11-21T15:29:12.256-0500  INFO    OGG-25222  Oracle GoldenGate Delivery for Oracle, rtgt1.prm:  Large transaction completed. Total records: 16,384.

The GG log clearly shows 16K rows transactions being processed. However, GG is not the subject of this story, the “fast_executemany” flag is. Unfortunately, the “fast_executemany” is not compatible with all drivers. The wiki for Pyodbc fast_executemany flag support is here

Some drivers, like FreeTDS driver for SQL Server will even crash the Python interpreter. For the drivers that do work, the benefits are immense.

Posted in Uncategorized | Leave a comment