Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

The Layman’s Term Guide to AWR for Microsoft- Part I

As often as I use the Automatic Workload Repository(AWR) data in my day-to-day job at Microsoft, I think most can imagine how often I’m explaining its value, along with tracing to my Microsoft peeps. Its time to write a post dedicated to the Microsoft crowd who want to understand a bit more about AWR, so hopefully its helpful!

What is the Automatic Workload Repository?

The Automatic Workload Repository, (AWR) had been around since Oracle 10g and requires the diagnostic and tuning management pack licensing to use all of its features in Oracle’s Enterprise Edition database. Versions before 10.2.0.4 had limited collections vs. the modern reporting schema and every subsequent release of Oracle has added to it’s content, which explains the size increase stored in the objects/number of objects in the SYSAUX tablespace.

By default and since version 11.2.0.4, the AWR retention is 8 days and takes an automatic snapshot once per hour. It’s common for DBAs to up this retention to at least 31 days to capture a month of workload information and these snapshot identifiers can then be used to identify workload intervals for querying and reporting. Oracle can be also be configured to lessen the intervals between snapshots to change the granularity of the AWR reports, or my preference, the DBA or privileged user can take manual snapshots to identify an important beginning or ending of a period.

The AWR stores its cached data in it’s own buffer, which can be queried via the V$ views and then stores the values once the buffer is full or once an interval commit has been issued, writing essential data around performance and aggregated information to the DBA_HIST objects in the SYSAUX tablespace. Also retained in this repository are samples from Active Session History, (ASH). AWR retains one out of every ten ASH samples and also uses AWR data to produce Automatic Database Diagnostic Monitor, (ADDM) reports, The SQL Monitor and baseline reports.

When attempting a comparison to SQL Server or Azure SQL in regards to the AWR, it’s closest to the query store and Dynamic Management Views placed into a repository with extensive packaged reports and utilities built upon it. The data collects metrics, latency information, performance data and advisor recommendations, along with aggregates historically. The data is produced as part of Oracle, but then housed in the repository to provide incredible value to the DBA.  If you want to get the low-down on the AWR, including architecture, this slide deck can provide some great history and insight into the repository and products around it.

So how are these reports used and how would I explain them in layman’s terms?

AWR- the 360 Panorama View

Although a one-hour report can provide incredible value- this report is used by my team to collect the overall workload of an Oracle database environment, commonly 7+ days. The amount of information packed in this report offers everything from the basic facts about the database to:

  • Resource allocation, (Server memory, CPU, Database memory, IO, network, etc.)
  • Top SQL by elapsed time, CPU, IO, etc.
  • Wait event statistics
  • Parameter settings, (think sp_configure)
  • Advisor recommendations

There are specific report versions for Real Application Clusters, (RAC) databases, combining the data from each of the nodes to a single report, including Global Cache, (GC) information about the interconnectivity and performance between RAC nodes. There is a similar version for Exadata engineered systems that provides all this and information on the cell nodes, including IO and flash storage savings.

One of the biggest challenges for many is that the format of the report has changed over releases and how data is presented.  As of Oracle 12c, they started limiting some of the output to the text version of the report, so it’s always best to now request the HTML version, which kills my viewing it from the Linux server command line.  The biggest challenge is if you’re comfortable with an 11g single instance report, an Exadata 12c report can be a bit of an information overload, even for Oracle DBAs.

I have a few blog posts about digging into AWR reports, as they really are as much a science as an art, as certain data will result in you navigating the report a bit differently. I’ve tried to link some of those posts here and although they are from my previous work in Oracle, they’re still valuable for those new to the AWR.

So how should you think of an AWR report if there really isn’t anything like it in Azure SQL/SQL Server? Think of it like a panoramic picture.

https://dbakevlar.com/wp-content/uploads/2020/03/360_panorama-300x152.jpg 300w, https://dbakevlar.com/wp-content/uploads/2020/03/360_panorama-768x388.jpg 768w" sizes="(max-width: 800px) 100vw, 800px" />

You get a wide view of everything going on in the database. It’s great for looking at the overall health of the database, the resource usage and noting any red flags. I use them to size Oracle databases, including RAC and Exadata to Azure IaaS VMs. They give me the information that I need to perform this task correctly and accurately size the workload vs just shifting the current hardware over.  I’m able to also locate any problems that should be addressed beforehand and know if any resources are currently inadequate for the migration.

Using the Panorama example, if you needed to zoom in on a specific area of the picture, would a panorama be the best way to photograph it? Probably not and that’s how the other parts of the repository and reporting come into play in the AWR. If you want to understand more about AWR reports, check out this earlier slide deck that goes into the history of AWR/ASH and know that since it was written, there are new features that have been added, which I’ll update soon!

ASH Report: IOS Live Pic  or even a timelapse of Oracle

Active Session History, (ASH) unlike other parts of the AWR has its own background process collecting information. The session information is stored as samples and as samples can overlap in time, these should never be used to count executions, etc.  What it does offer you is great insight into what was happening at that time that was impactful to the database at a granule level.

As ASH isn’t dependent upon beginning and ending snapshots, instead using those samples, reports can be created from small intervals of time, allowing for diagnosis of performance issues during the time they occurred.  The reports work best when used this way, where an ASH report covering hours or days rarely if ever will provide value.

An ASH report will include the top sessions, SQL statements, modules and procedural calls, but also if there are blocking sessions, what parallel processing has occurred and a breakdown of the top event waits by the samples involved in the report.

If you consider an ASH report as an IOS Live Pic or time lapse vs. an AWR Report a panoramic view, the picture covers less area, but shows you exactly what is happening.

As with an IOS live pic, you can extend the window, but many times, just the small slice of time can provide a moving window of what the main sessions that occurred during the incident you need investigated.

If you want to know more about ASH, check out this post.

Oracle Trace: The GoPro of Oracle

There are a lot of different session traces levels that can be done, but the glory of a trace is that you get to experience the process from the session level and trace it through from beginning to end. It’s more similar to a gopro movie, where you get to experience how long the journey takes, what time is spent on each step of the trip and what happened on the journey from the process point of view. You get to view specifics that may be missed or transparent to the user:

This includes any triggers, logic built into procedures, etc. are all captured and time assessed to perform these actions, then written to the trace file. The file then can be turned into a report to trap each step that was performed as part of the database process from the login through the close of the session.

Traces have been my go-to when digging deep into code logic, when from the database level reports, wasn’t clear what was exactly being executed by the application or when there were other processes that are consuming time.

Need to know more about tracing, check out this great post from Tim Hall, since I only wrote one focused on EM13c- previous post.

AWR SQL ID Specific Report: Close up?  Nah…

I have one other favorite report when a specific SQL Statement shows as a red flag in either an AWR or ASH report and I’d like a little more information on just what happened in the SQL during window of time. The AWR SQL ID specific report, (awrsqrpt.sql) can be run from the command line or gathered inside Cloud Control, (aka OEM). This report is like a mini AWR report, but just for the SQL statement. It requires the beginning and ending snapshot ID, just like an AWR, but it also wants the unique identifier for a SQL Statement, also called a SQL_ID.

It provides the execution plan and secondary performance information for the different executions, (by differing sql_ID/hash_plan_values, think transaction IDs in SQL Server) that were performed during the window of the report. I can’t decide what kind of picture it would be, as it isn’t a panoramic view, but it’s not an ASH, as it’s snap_id dependent and it’s not like a trace, because it’s outside the session, but focused on one SQL Statement and how it is performing.  Maybe more like this:

Or, kind of like this- multiple executions, one SQL ID, but two different hash values:

No matter how you view it, it’s a great report and I have detailed information on it in this post here if you’re interested.

Hopefully this was helpful or at least amusing!



Tags:  , , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright ©  [The Layman's Term Guide to AWR for Microsoft- Part I], All Right Reserved. 2020.

Announcing a New Scholarship for LGBTQ+ WordPress Community Members

The Queeromattic Employee Resource Group, Automattic’s LGBTQ+ internal organization, is proud to announce a scholarship for LGBTQ+ WordPress Community members who need financial support to attend a WordCamp flagship event for the first time. 

For those unfamiliar with WordCamps, they are informal, community-organized events that are put together by WordPress users like you. Everyone from casual users to core developers participate, share ideas, and get to know each other. There are currently four flagship events each year: WordCamp Europe, WordCamp Asia, WordCamp US, and WordCamp Latin America. We’re going to sponsor one member of the LGBTQ+ community to attend each of these events!

Our hope in sponsoring folks to attend an initial WordCamp flagship event is that it will provide a career-enhancing opportunity for folks to connect more deeply with members of the WordPress community and level up their own WordPress skills to take back into their everyday life. Many of us at Automattic found our way here through the wider WordPress community and we’re really excited to share that chance with folks from the LGBTQ+ community who might not have the opportunity otherwise. 

Right now, we’re accepting applications to WordCamp US 2020. If you’re a member of the LGBTQ+ community and a WordPress user, we encourage you to apply: https://automattic.com/scholarships/queeromattic/  To be considered, please apply no later than Sunday, May 31, 2020 at 12 a.m. Pacific Time.

If you know someone who would be perfect for an opportunity like this, please share it with them! We want folks from all over the world to have the chance to benefit from this new scholarship.

Important note: Due to COVID-19, we will be working closely with the WordCamp event staff to ensure the health and safety of all attendees. This means that scholarships are not guaranteed for 2020 events. Please keep this in mind when applying and sharing this opportunity with others.

Data Guard and RAC on Docker : Perhaps I was wrong?

I’ve talked a lot about Docker and containers over the last few years. With respect to the Oracle database on Docker, I’ve given my opinions in this article.

Over the weekend Sean Scott tweeted the following.

“A while back @oraclebase said Data Guard didn’t make sense on Docker.

For those of us disinterested in the sensible I present #Oracle#DataGuard on #Docker. 19c only for now. Please let me know what’s broken. Enjoy!

https://github.com/oraclesean/DataGuard-docker

This was in reference to a statement in my article that said the following.

“Oracle database high availability (HA) products are complicated, often involving the coordination of multiple machines/containers and multiple networks. Real Application Clusters (RAC) and Data Guard don’t make sense in the Docker world. In my opinion Oracle database HA is better done without Docker, but remember not every database has the same requirements.”

For the most part I stick by my statement, for the reasons described in my article. Although both Data Guard and RAC will work in Docker, I generally don’t think they make sense.

But…

A few years ago I had a conversation with Seth Miller, who was doing RAC in Docker. In his case it made sense for testing because of his use cases. I discussed this in this post.

For that use case, Seth was right and I was wrong.

What about Data Guard?

For a two node data guard playground I don’t see any major advantages to using two containers in one VM, compared to two VMs. The overhead of the extra VM and OS is not significant for this use case. Remember, most of the resources are going to the Oracle instances, not the VM and OS. Also, the VM approach will give you something similar to what you will see in production. It feels like a more natural testing scenario to me.

But Sean’s scenario was not this simple. When I questioned him over the value of this, considering the two VM approach had so little extra overhead, he came back with the following.

“There I’ll disagree. I have a Docker/sharding build I’m working on. 7 databases. Starts in moments. On my laptop. I can’t do 7 VM. No way!”

Now this scenario changes the game significantly. All of a sudden we go from the overhead of one extra VM to an overhead of six extra VMs. That’s pretty significant on a laptop. All of a sudden the Docker method probably makes a lot more sense than the VM approach for testing that scenario on a laptop.

Once again, I’m wrong and Sean is correct for this use case.

Conclusion

If you are building a two node RAC or Data Guard playground, I still think the VM approach makes a lot more sense. It’s going to be a lot more like what you use at work, and you don’t have to deal with some of the issues that containers bring with them.

Having said that, if you are looking to build something more extreme, or you are just trying stuff for fun, then Docker may be the right solution for you.

I still don’t see a realistic future for an RDBMS monolith on containers. I don’t care if it’s a single container or a giant Kubernetes cluster. This is not a criticism of the RDBMS or of containers. They are just things from different worlds for different purposes and continuing to treat them differently seems totally fine to me. Having said all that, it doesn’t mean combining the two can’t be useful for some use cases.

Remember, this is just my opinion! </p />
</p></div>

    	  	<div class=

Turning a Page with Page Layouts

Need to add a new page to your site but don’t know where to start? Making a brand new site on WordPress.com and want to design a homepage quickly? There’s a new addition to the WordPress experience that’ll help with exactly that.

Let’s take a look at Page Layouts! They’re pre-designed pages you can drop content into, without needing to decide what to put where.

To add a Page Layout to your site, head to My Sites > Site > Pages and click the “Add New Page” button — it’s the pink one:

Next, we’ll show you a selection of layouts you can choose from — there are layouts available for

  • About pages
  • Contact pages
  • Services pages
  • Portfolio pages
  • Restaurant Menu, Team, and Blog pages
  • and even starting points for Home pages

Here’s one of the available Portfolio Page Layouts, for example.

These layouts are all made using blocks in our block editor, which means you can edit the images, content, and layout all in one place. Start by replacing the default images and text, and you’ll be on your way!

You can use Page Layouts to make great-looking pages with only a few clicks. For inspiration, here are a selection of layouts using a variety of WordPress.com themes.

https://en-blog.files.wordpress.com/2020/03/sample-layouts.png?w=1410 1410w, https://en-blog.files.wordpress.com/2020/03/sample-layouts.png?w=103 103w, https://en-blog.files.wordpress.com/2020/03/sample-layouts.png?w=207 207w, https://en-blog.files.wordpress.com/2020/03/sample-layouts.png?w=768 768w" sizes="(max-width: 705px) 100vw, 705px" />

And if you need help getting started you’ll find more detailed information on all of the above in our support documentation for Page Layouts.

What other types of pages and designs would be useful for your site? Let us know what you’d like to see — we’d love to hear from you!

Video : Returning REF CURSORs from PL/SQL : Functions, Procedures and Implicit Statement Results

Today’s video is a demonstration of returning REF CURSORs from PL/SQL using functions, procedures and implicit statement results.

I was motivated to do this after a conversation with my boss. He’s from a .NET and SQL Server background, and was a bit miffed about not being able to use a SELECT to pass out variable values from a procedure, like you can in T-SQL. So I piped up and said you can using Implicit Statement Results and another myth was busted. I guess most PL/SQL developers don’t use this, and I don’t either, but you should know it exists so you can be a smart arse when situations like this come up. </p />
</p></div>

    	  	<div class=

ysql_bench: the YugaByteDB version of pgbench

By Franck Pachot

.
This follows the previous post on testing YugaByteDB 2.1 performance with pgbench: https://blog.dbi-services.com/yugabytedb-2-1/
A distributed database needs to reduce inter-node synchronization latency and then replaces two-phase pessimistic locking by optimistic concurrency control in many places. This means more serialization errors where a transaction may have to be re-tried. But the PostgreSQL pgbench does not have this possibility and this makes benchmarking distributed database very hard. For example when CERN tested CoackroachDB the conclusion was: “comparative benchmarking of CockroachDB was not possible with the current tools used”.

pgBench retry: the PL/pgSQL solution

In a previous blog post my workaround for this was to implement the retry in a PL/pgSQL procedure.

Here is the PL/pgSQL procedure:


ysqlsh franck <10 then
        raise notice 'Give Up after % retries. tid=%',retries,p_tid;
        raise;
       end if;
       -- continue the retry loop
   end;
   retries=retries+1;
  end loop;
  commit;
  if retries > 2 then
   raise notice 'Required % retries (tid=%)',retries,p_tid;
  end if;
 END;
 $$ language plpgsql;
PSQL

Here is the code to call it (same functionality as the “simple update” builtin):


cat > /tmp/simple-update.sql <<'CAT'
 -- simple-update 
 \set aid random(1, 100000 * :scale)
 \set bid random(1, 1 * :scale)
 \set tid random(1, 10 * :scale)
 \set delta random(-5000, 5000)
  call SIMPLE_UPDATE_RETRY(:aid, :bid, :tid, :delta);
  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
CAT

And how I run it for 30 seconds:


pgbench --no-vacuum --protocol=prepared --file=/tmp/simple-update.sql --time 30 --jobs=10 --client=10 -h localhost -p 5433 -U postgres franck

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w" sizes="(max-width: 1024px) 100vw, 1024px" />

As I mentioned in my blog post, some serialization errors can still happen because of the limitations in PostgreSQL transaction control in procedures: I cannot retry the errors encountered at commit. This has been raised in the PostgreSQL hackers’s list: https://www.postgresql.org/message-id/flat/54C2FEA5-86FF-4A3C-A84B-4470A6998ACA%40thebuild.com#5a5a575ed5451603c906ba3167d043a1

pgBench retry: the “max-tries” patch

Around the same time when I came with the PL/pgSQL workaround, YugabyteDB has implemented the mentioned patch in their fork of the postgres code: https://github.com/yugabyte/yugabyte-db/commit/af25ba570e11c59e41a1243126ff9f9edae9d422, which is a much better solution. Adding a patch to the community PostgreSQL is hard because this database engine is widely used and they must be conservative to ensure stability. That’s different for a startup company building a new database engine. And what’s awesome with YugaByteDB is that their fork is Open Source and their work can then easily be given back to the community. What YugaByteDB is improving in PostgreSQL is public, documented and open-source. And, good news, this postgres fork is shipped, with all tools, in the YugaByteDB distribution. Did you see in the previous post that I’ve set my PATH with ~/yugabyte-2.1.0.0/postgresql/bin in addition to ~/yugabyte-2.1.0.0/bin? This is where you find them, with the command line tools renamed. ysql_bench is the YugaByteDB version of pgBench. Here is a comparison of the community PostgreSQL and the one compiled with YugaByteDB:

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w" sizes="(max-width: 1024px) 100vw, 1024px" />

The YugaByteDB version of pgbench has the following differences in version 2.1:

  • The default init steps do not include vacuum (as it makes no sense here): -I, –init-steps=[dtgvpf]+ (default “dtgvp”)
  • The primary keys are declared in the CREATE TABLE rather than an ALTER TABLE
  • The initialization can do intermediate commits, controlled with –batch-size
  • The number of retries (before ending the client thread) is controlled by –max-tries which defaults to zero (no retries)
  • The -r option reports not only the transactions that exceed –latency-limit but also the errors trapped for retry, and is renamed from –report-latencies to –report-per-command
  • Finally, they also changed the table names: ysql_bench_% instead of pgbench_%

To work on the same table names, I continue with a script file:


cat > /tmp/simple-update.sql <<'CAT'                                                                                                                         -- simple-update 
 \set aid random(1, 100000 * :scale)
 \set bid random(1, 1 * :scale)
 \set tid random(1, 10 * :scale)
 \set delta random(-5000, 5000)
 BEGIN;
   UPDATE pgbench_accounts SET abalance = abalance + :delta
    WHERE aid = :aid;
   SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
    INSERT INTO pgbench_history (tid, bid, aid, delta, mtime)
     VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
 END;
CAT

and I run with a “max-tries” settings and “-r” to report the number of retries:


ysql_bench -r --max-tries=10 --no-vacuum --protocol=prepared --file=/tmp/simple-update.sql --time 30 --jobs=10 --client=10 -h localhost -p 5433 -U postgres franck

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 2048w" sizes="(max-width: 1024px) 100vw, 1024px" />

That is 493 transactions per second. Do not compare with the PL/pgSQL version because here we have more client-server roundtrips.

In order to validate that I have the same result with the builtin script, I run it after renaming the tables because the YugaByte builtin scripts expect ysql_bench_% tables:

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1376w" sizes="(max-width: 1024px) 100vw, 1024px" />


ysql_bench -r --max-tries=10 --no-vacuum --protocol=prepared --builtin=simple-update --time 30 --jobs=10 --client=10 -h localhost -p 5433 -U postgres franck

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 2048w" sizes="(max-width: 1024px) 100vw, 1024px" />

We are in still about 493 transactions per second here.

Compared with “monolith” PostgreSQL

Of course, those numbers are far from what is possible with a monolithic database. The distributed architecture, the cross-shard foreign key validation, the replication to other nodes, have a huge overhead when implemented as remote procedure calls. When you scale-up within one server without the need to scale out, the throughput is higher with the community PostgreSQL:


sudo su - postgres
pg_ctl start
time pgbench --initialize --init-steps=dtgvpf --host localhost -p 5432 -U postgres
pgbench -r --no-vacuum --protocol=prepared --builtin=simple-update --time 30 --jobs=10 --client=10 -h localhost -p 5432 -U postgres

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w" sizes="(max-width: 1024px) 100vw, 1024px" />

The advantage of a distributed database comes with the possibility to scale out to multiple servers, data centers and even regions. Note that I had no replication with the PostgreSQL test above and running YugaByteDB with no replication doubles the throughput that I had with replication factor 3:

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 2032w" sizes="(max-width: 1024px) 100vw, 1024px" />

So, there’s no fair comparison possible. Just use what you need: monolith to speed-up, or distributed to scale-out.

The big advantage of YugaByteDB is that the YSQL API is more than just compatibility with PostgreSQL like what CockroachDB does. YugaByte re-uses the PostgreSQL upper layer. Then an application built for a PostgreSQL database with the best one-node performance can scale-out without changing the application when moving to YugaByteDB. And vice-versa. This looks similar to what Amazon is doing with AWS Aurora except that Aurora runs only on AWS but YugaByteDB is open-source and can run anywhere.

There are many comparative benchmarks published but I think that being able to use pgbench is very important to compare a specific workload between PostgreSQL and YugabyteDB in order to make the right deployment decisions. My goal was also to emphasize the need to have a good exception handling strategy in your applications, with retry possibility.

Cet article ysql_bench: the YugaByteDB version of pgbench est apparu en premier sur Blog dbi services.

YugaByteDB 2.1: the Open Source multi-region distributed database with PostgreSQL API is in GA with huge performance improvement

By Franck Pachot

.
9 months ago I was looking at YugaByteDB which was still in beta version for its ‘YSQL’ API. I published my first test on Medium: https://medium.com/@FranckPachot/running-pgbench-on-yugabytedb-1-3-3a15450dfa42. I have been very enthusiastic about the idea, the architecture, the way they open-sourced it and how all was documented in their blog. I’ve even met them in Sunnyvale when I traveled to California for Oracle Open World. Great people with a great vision on the future of databases. From this first test, I was not impressed by the performance but it was an early beta and a multi-master/multi-index database has many challenges to solve before tuning the details of implementation. This tuning task has been done for the General Availability version 2.1 released in February. I was eager to test it, but this first month back to dbi-services consulting was very busy. So finally here it is.

This post takes the same test I did last July and the result is impressive: the pgbench initialization time is fully fixed and the pgbench run shows 9x higher throughput.

I’m on the same VM which, as funny as it might sound, is an Oracle Linux 7 running on the Oracle Cloud.

Install and start:

I install this version 2.1 in the same way I installed the 1.3 in the past. All is documented: https://docs.yugabyte.com/latest/quick-start/install/linux/


wget -O yugabyte-2.1.0.0-linux.tar.gz  https://downloads.yugabyte.com/yugabyte-2.1.0.0-linux.tar.gz
tar -xzvf yugabyte-2.1.0.0-linux.tar.gz
yugabyte-2.1.0.0/bin/post_install.sh
export PATH="~/yugabyte-2.1.0.0/bin:~/yugabyte-2.1.0.0/postgresql/bin:$PATH"

I create a 3 nodes cluster with replication factor 3, with all nodes in the same host for this test:


yb-ctl --rf 3 create
yb-ctl status

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1112w" sizes="(max-width: 1024px) 100vw, 1024px" />

I create database for this:


ysqlsh
 \timing on
 drop database if exists franck;
 create database franck;
 \q

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1101w" sizes="(max-width: 1024px) 100vw, 1024px" />

pgbench initialization

I use pgbench that I have from a “normal” PostgreSQL installation on the same server.


[opc@db192 ~]$ type pgbench
pgbench is hashed (/usr/pgsql-11/bin/pgbench)
time pgbench --initialize --host localhost -p 5433 -U postgres franck

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1141w" sizes="(max-width: 1024px) 100vw, 1024px" />

If you compare with the previous post on version 1.3 you will see many differences.

  • No “ERROR: DROP multiple objects not supported yet” here. This issue has been fixed.
  • No “ERROR: VACUUM not supported yet” but just a warning because there’s no vaccum here in the YugaByteDB storage layer
  • And the best: 8 seconds instead of the 2 minutes we had before

However the “ALTER TABLE” to add the constraints afterward is still not supported so I run the same manually with the FOREIGN KEY declaration in the CREATE TABLE:


ysqlsh franck
 drop table if exists pgbench_history;
 drop table if exists pgbench_tellers;
 drop table if exists pgbench_accounts;
 drop table if exists pgbench_branches;
CREATE TABLE pgbench_branches (
    bid integer NOT NULL
   ,bbalance integer
   ,filler character(88)
   ,CONSTRAINT pgbench_branches_pkey PRIMARY KEY (bid)
 );
CREATE TABLE pgbench_accounts (
    aid integer NOT NULL
   ,bid integer references pgbench_branches
   ,abalance integer
   ,filler character(84)
   ,CONSTRAINT pgbench_accounts_pkey PRIMARY KEY (aid)
 );
CREATE TABLE pgbench_tellers (
    tid integer NOT NULL
   ,bid integer references pgbench_branches
   ,tbalance integer
   ,filler character(84)
   ,CONSTRAINT pgbench_tellers_pkey PRIMARY KEY (tid)
 );
CREATE TABLE pgbench_history (
    tid integer references pgbench_tellers
   ,bid integer references pgbench_branches
   ,aid integer references pgbench_accounts
   ,delta integer
   ,mtime timestamp without time zone
   ,filler character(22)
 );
\q

Now remains to insert the rows there. This was very long (about an hour) in 1.3:


time pgbench --initialize --init-steps=g -h localhost -p 5433 -U postgres franck

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w" sizes="(max-width: 1024px) 100vw, 1024px" />

No foreign key error and 6 seconds only!

Do you remember that I had to switch to SERIALIZABLE isolation level? I don’t have to here:

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1862w" sizes="(max-width: 1024px) 100vw, 1024px" />

This has been fixed with the support of SELECT locks, so no need to go to optimistic locking with SERIALIZABLE (which requires that the application implements a ‘retry’ logic).

Simple Update in 1 thread

Then, as I did before, I run a Simple Update workload from one session during 30 seconds:


pgbench --no-vacuum --builtin=simple-update --protocol=prepared --time 30 -h localhost -p 5433 -U postgres franck

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w" sizes="(max-width: 1024px) 100vw, 1024px" />

When compared with the previous run on 1.3 I’ve updated 9x more transactions. Yes, that’s exactly what has been announced for this version: huge performance imprevements:

Simple Update in 10 threads

Then, as I did before I’m running with 10 concurrent threads


pgbench --no-vacuum --protocol=prepared --builtin=simple-update --time 30 --jobs=10 --client=10 -h localhost -p 5433 -U postgres franck

https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w" sizes="(max-width: 1024px) 100vw, 1024px" />

Again, that 7x better than my test on 1.3 and still in read commited isolation level.

Note that I was lucky here but it can happen that we get a serialization error even in read commited like in this second run of the same:
https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 300w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 768w, https://blog.dbi-services.com/wp-insides/uploads/sites/2/2020/03/Annotat... 1536w" sizes="(max-width: 1024px) 100vw, 1024px" />

The community implementation of pgbench has no retry logic. Once an error is enountered the thead finishes, and that’s unusable for a benchmark. The patch proposed many times was, unfortunately, always rejected: https://commitfest.postgresql.org/18/1645/.

But YugaByteDB has a solution for that, which I’ll show in the next post: https://blog.dbi-services.com/ysql_bench/

Cet article YugaByteDB 2.1: the Open Source multi-region distributed database with PostgreSQL API is in GA with huge performance improvement est apparu en premier sur Blog dbi services.

RAC on Azure- Link to Microsoft Tech Blog

Some folks stated they were having trouble finding this, so I’m posting a link here for the blog I wrote on the Microsoft Tech Community Blog on my opinion about building out RAC on Azure.

 



Tags:  , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright ©  [RAC on Azure- Link to Microsoft Tech Blog], All Right Reserved. 2020.

Unreal Load Testing in the Cloud

Load testing as part of a cloud migration project is expected and  should be built into the requirements.  The goal is to set yourself up for success.

Log Latency

Recently I had a Cloud Solution Architect, (CSA) escalate an Oracle performance problem after migrating from on-prem to the cloud.

The information provided from the customer stated they were experiencing log sync latency in Oracle and that they hadn’t experienced this previously on-prem.  They said there weren’t really any changes, just a simple lift and shift to Azure, so they were at a loss as to why the latency was being experienced.  I was provided an AWR of the current environment, the user having taken manual snapshots for a 5 minute interval report.  I was initially told they didn’t have an AWR from the previous environment for comparison which was challenging, but some times that’s how it is.

Big Jump

Upon review, I quickly noted that the database was Oracle 19c.  This is the newest version of Oracle and when asked, they quickly admitted the database on-prem was an 11g RAC database and they had upgraded the database to 19c single instance for Azure.  They didn’t realize this would make a huge difference and hadn’t realized this was no longer a simple lift and shift.

Lift and Shift, Strike 1:  Upgrading from a very old release to very new release and had moved from RAC to single instance. Yeah, that’s a bit of a change.

This is something I would highly recommend you DON’T do.  When doing a lift and shift to the cloud, you want the database version to stay as close to the same as possible to eliminate added challenges to the migration.

If a major upgrade is required, do it on-prem first and then lift and shift to the cloud, addressing any upgrade problems beforehand.

From 11g to 19c there were significant changes just to archive logging, including LGWR slaves and other features that were assuredly going to make a different outside of 100’s of other changes.

Code Blues

The next challenge that struck me about the workload from the AWR run in the 19c database was the amount of waits from commits.

https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306-300x41.png 300w, https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306-768x104.png 768w, https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306.png 1051w" sizes="(max-width: 800px) 100vw, 800px" />

https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306_1-300x47.png 300w, https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306_1-768x119... 768w" sizes="(max-width: 812px) 100vw, 812px" />

I had nothing to compare it to, but the code logic built into the DML that I was observing, with the sheer quantity of single inserts and  updates that had to be committed were going to create log syncing issues, but…the customer said the workload was the same as in their on-prem and they hadn’t had the problem there.

https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306_2-300x282... 300w" sizes="(max-width: 526px) 100vw, 526px" />

Now I’ve highlighted that second count on executions for a reason.  It will all make sense in a moment and remember, this is a 5 minute workload in the database that was captured, so the number of executions is what transpired in 5 minutes.

AWR reports are pretty awesome, but they don’t have everything and commonly you would take the report, find the red flags and then investigate deeper into areas that cause you concern.  After a review of the waits and the customer assuring me that the log syncs were not an issue on-prem, I directed my attention to the redo logs and disk they resided on, asking for:

  • The sizes on the logs and were they located on a separate disk from the data.
  • Information about the disk that the logs were residing on, (type, IOPS, size, etc.)
  • Was read caching turned on the redo log disk
  • Had they aslo added write accelerator on the redo log disk

Lift and Shift, Strike 2: Didn’t follow guidelines for Oracle VMs on Azure around redo logs.

All of recommendations either came back as yes or were able to addressed, improving the overall performance for the 19c database, but the claim they’d never experienced this type of log sync latency in the past, considering the code logic really troubled me.

Why Comparison Data is Important

Even though they were satisfied with the results of the changes, someone finally, located an older AWR report.  Unlike the 5 minute report, the older report was for 45 minutes.

The first thing that it demonstrated is that log sync latency and commits were a problem before, just not as prevalent as they were in the 19c database:

https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306_3-300x67.png 300w" sizes="(max-width: 640px) 100vw, 640px" />https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306_4-300x169... 300w" sizes="(max-width: 372px) 100vw, 372px" />

I knew, whatever I was looking at for DML in this older, 11g report, it should have counts of 9 times higher in executions then multiplied by 4 for each of the RAC nodes it was scaled out to, than displayed in my 19c report.

Except….it didnt’:

https://dbakevlar.com/wp-content/uploads/2020/03/blog_pst_0306_5-236x300... 236w" sizes="(max-width: 421px) 100vw, 421px" />

I had the other RAC node reports and the executions were similar to the first node, (just in case the workload was askew across the nodes, I wanted to verify).

So….If I took the executions for each node and calculated it down to a 5 minute execution window for this main statement:

(22136 + 22714 + 23528 + 23363) / 9 =10193 executions in 5 min

174198 in the 19c, which results in 17 times the workload of all the workload thrown at the original RAC environment on 11g on-prem.

Lift and Shift, Strike 3:  The workload on the 19c environment was completely different than on-prem

When I brought this to the CSA and he asked the customer, discovered that they were just doing “load testing” against the environment and then escalated when they experienced the latency.  The problem is, their code logic was the real reason the system couldn’t scale, not Oracle or Azure.  We just ended up addressing it with the two latter platforms to workaround the software problem.

The reason they hadn’t experienced this in 11g or on-prem is that they’d never run the workload at this level before, but we have to ask, what was the purpose of this type of load testing?

Set Load Testing Up for Success

What do I mean by this?  Some might think the job of load testing is to find out how much you can throw at the cloud environment you’ve built before it falls over.  The truth is, this isn’t very helpful in the long run:

  1.  It can be considerably time consuming and provides very little valuable information on a globally distributed infrastructure, which contains numerous layers of redundancy and a resilient interconnected network.
  2. It can create fake issues to trouble-shoot or escalate.
  3. Doesn’t provide you with real data about how your on-prem environment will perform in the cloud.
  4. It doesn’t prove that the cloud can’t handle it, it just means you didn’t add enough resources to handle the silly monstrosity you decided to run.

The goal of load testing should be to take a current snapshot of your workload, including peak requirements and run it against what you’ve architected to support your deployment.

Take the time to identify a real workload, or consider using Oracle database replay.  It’s been around as part of Real Application Testing pack since 11g and is loved by many. It will allow you to take the SAME workload you run in your production system and duplicate everything in a new one without scrambling around trying to figure out what really changed.

 



Tags:  , ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright ©  [Unreal Load Testing in the Cloud], All Right Reserved. 2020.