Search

Top 60 Oracle Blogs

Recent comments

improving performance with stored procedures — a pgbench example.

improving performance with stored procedures — a pgbench example.

In a previous post I mentioned that I do not use pgbench to benchmark the platform. But when it comes to measuring client/server application, pgbench fully makes makes sense.

I initialize the pgbench schema with small data:

pgbench - initialize - init-steps=dtgvpf -h localhost -p 5432 -U postgres franck

And I run the pgbench builtin workload with does something like a TPC-B

tpcb-like builtin

pgbench --builtin tpcb-like --transactions 30000 --protocol=prepared --jobs=10 --client=10 -h localhost -p 5432 -U postgres franck

I run 30000 transactions there, from 10 threads. It runs for more than 4 minutes:

The rate is 1097 transactions per second with an average of 9 milliseconds per transaction.

That’s my baseline. What the builtin transaction runs is easy to get from the source:

postgres/postgres

As pgbench can also run custom workloads, I’ll run exactly the same workload by copying those statements in a file.

tpcb-like as a custom file

Here is the file containing the same as the builtin found in pgbench.c:

cat > /tmp/tpcb-like <<'CAT'
-- tpcb-like
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;
CAT

Now running the workload from this file, still 30000 transactions from 10 threads.

pgbench --file /tmp/tpcb-like --transactions 30000 --protocol=prepared --jobs=10 --client=10 -h localhost -p 5432 -U postgres franck

The result is very similar:

The rate is 1095 transactions per second with an average of 9 milliseconds per transaction. Same statements and same throughput.

tpcb-like as a procedure + select

I create a stored procedure with all INSERT/UPDATE statements:

create procedure P_TPCB_LIKE
(p_aid integer, p_bid integer, p_tid integer, p_delta integer) AS $$
BEGIN
UPDATE pgbench_accounts SET abalance = abalance + p_delta WHERE aid = p_aid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (p_tid, p_bid, p_aid, p_delta, CURRENT_TIMESTAMP);
UPDATE pgbench_tellers SET tbalance = tbalance + p_delta WHERE tid = p_tid;
UPDATE pgbench_branches SET bbalance = bbalance + p_delta WHERE bid = p_bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (p_tid, p_bid, p_aid, p_delta, CURRENT_TIMESTAMP);
END;
$$ language plpgsql;

Now, the custom file will only call that procedure for the modifications, and run the select:

cat > /tmp/simple-update-p <<'CAT'
-- tpcb-like
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
call P_SIMPLE_UPDATE(:aid, :bif, :tid, :delta);
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
END;
CAT

This is functionally equivalent and I run again the same number of transactions from the same number of threads:

pgbench --file /tmp/tpcb-like-p --transactions 30000 --protocol=prepared --jobs=10 --client=10 -h localhost -p 5432 -U postgres franck

Now I have a huge gain here as the throughput is 3 times higher:

The rate is 3024 transactions per second with an average of 3 milliseconds per transaction. Same statements but stored on the server.

tpcb-like as a function with refcursor

Ideally, each client/server call should be only one statement. And then I must include the SELECT part in my stored procedure. That is possible with a function that returns a refcursor:

create function F_TPCB_LIKE
(p_aid integer, p_bid integer, p_tid integer, p_delta integer) returns table(abalance integer) AS $$
DECLARE
c refcursor;
BEGIN
UPDATE pgbench_accounts SET abalance = pgbench_accounts.abalance + p_delta WHERE aid = p_aid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (p_tid, p_bid, p_aid, p_delta, CURRENT_TIMESTAMP);
UPDATE pgbench_tellers SET tbalance = tbalance + p_delta WHERE tid = p_tid;
UPDATE pgbench_branches SET bbalance = bbalance + p_delta WHERE bid = p_bid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (p_tid, p_bid, p_aid, p_delta, CURRENT_TIMESTAMP);
return query SELECT pgbench_accounts.abalance FROM pgbench_accounts WHERE aid = p_aid;
END;
$$ language plpgsql;

You note that I didn’t even change the variable names and for this reason, I prefixed abalance in the update statement.

Here is my simple call for the transaction, which does all the DML and returns a cursor:

cat > /tmp/tpcb-like-f <<'CAT'
-- tpcb-like
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
SELECT abalance from F_TPCB_LIKE(:aid, :bif, :tid, :delta);
END;
CAT

Running it with the same configuration:

pgbench --file /tmp/tpcb-like-f --transactions 30000 --protocol=prepared --jobs=10 --client=10 -h localhost -p 5432 -U postgres franck

the performance is even better:

The rate is 4167 transactions per second with an average of 2 milliseconds per transaction. Obviously this is better. The more you do in the database the better you can scale. But there’s even more: all the benefits from encapsulation. I’ll mention 3 main benefits:

  1. Here the API between the application and the database is independent of the SQL syntax. And the SQL statements that are tightly coupled with the data model (and the database system/version) are all encapsulated in the procedure, within this database.
  2. It is also a strong security advantage: no SQL injection possible as inputs go through the API which admits only a procedure name and typed parameters. And you can audit the procedure calls.
  3. When the junior architect tells you that your application is an ugly old monolith, add the following comment in front of each call to the stored procedures and you are trendy again:
--- calling the 'command' CQRS microservice
CALL my_app_procedure(...);
--- calling the 'query' CQRS microservice
SELECT my_app_function(...);