Search

Top 60 Oracle Blogs

Recent comments

Oakies Blog Aggregator

Determined on Determinism

I’m feeling very determined on this one. Yes I have a lot of determination to inform blog readers about determinism, and yes I have run out of words that sound like DETERMINISTIC. But one of the most common misconceptions I see for PL/SQL functions is that developers treat them as if they were “extending” the existing database kernel. By this I mean that developers often assume that wherever an existing in-built function (for example TO_NUMBER or SUBSTR etc) could be used, then a PL/SQL function of their own creation will work in the exactly the same way.

Often that will be the case, but the most common scenario I see tripping up people is using PL/SQL functions within SQL statements. Consider the following simple example, where a PL/SQL function is utilizing the in-built SYSTIMESTAMP and TO_CHAR functions.


SQL> create or replace
  2  function f(i varchar2) return varchar2 is
  3  begin
  4    return i||'-'||to_char(systimestamp,'HH24MISS:FF');
  5    --dbms_lock.sleep(0.5);
  6  end;
  7  /

Function created.

Let us compare the output from the function when used within a SQL statement, with the results from same built-in functions used directly from the SQL statement.


SQL> select rownum, to_char(systimestamp,'HH24MISS:FF') x1, f(rownum) x2
  2  from   dual
  3  connect by level <= 9;

    ROWNUM X1                             X2
---------- ------------------------------ ------------------------------
         1 181557:351000                  1-181557:351000000
         2 181557:351000                  2-181557:361000000
         3 181557:351000                  3-181557:361000000
         4 181557:351000                  4-181557:364000000
         5 181557:351000                  5-181557:364000000
         6 181557:351000                  6-181557:366000000
         7 181557:351000                  7-181557:366000000
         8 181557:351000                  8-181557:372000000
         9 181557:351000                  9-181557:372000000

9 rows selected.

A direct call to SYSTIMESTAMP is fixed for the duration of the execution of a SQL statement, but this is NOT the case for the SYSTIMESTAMP call made within the PL/SQL function. The PL/SQL function is being called multiple times during the single execution of the SQL statement, and hence each execution is totally entitled to return a “fresh” result from SYSTIMESTAMP.

Moreover, the database makes no guarantees that a PL/SQL function will be called once per row encountered in a SQL statement, so if your PL/SQL function changes session state in some way (for example, a package variable) then you can never assume that there will be a 1-to-1 relationship between rows processed and PL/SQL function executions.

The only way to be sure that you won’t get unexpected results from PL/SQL function calls within SQL is for those functions to be deterministic, and responsibility for that lies entirely with the developer not with the database. So please don’t think that the solution to this is just throwing in the DETERMINISTIC keyword. You need to inspect your code and ensure you won’t get spurious results from that PL/SQL function when used from SQL.

PostgresConf 2019 Summary

https://ardentperf.files.wordpress.com/2019/03/pgconf-2.jpg?w=600&h=450 600w, https://ardentperf.files.wordpress.com/2019/03/pgconf-2.jpg?w=150&h=113 150w" sizes="(max-width: 300px) 100vw, 300px" />

PostgresConf 2019 in New York City is a wrap! Now that I’ve had a few days to recover a little brain capacity, I think I’m ready to attempt a summary. I love conferences for the learning opportunities… inside of sessions and also outside of them. Being the largest PostgreSQL-centered conference in the United States, PostgresConf offers unique opportunities to:

  1. Watch seasoned technical presenters go deep on PostgreSQL internals.
  2. Hear serious users present the latest about how they are using PostgreSQL and what they’ve learned.
  3. Connect businesses who are doing redundant work so they can collaborate instead of each repeating the other’s work.

Last week I published a summary of the first two days of the conference which included the summits, tutorials and training sessions.

I tried to keep a few notes during the final three days of the conference as well so I could share a bit about my experience this year. But before I dive into the day-by-day summary… first, a few short high-level points.

Highlights

Who attends PostgresConf? During my session about Wait Events I asked two questions to scratch the surface a little bit. First, it seemed clear that the vast majority of session attendees had been working with relational databases for more than a decade and a number for more than 25 years. Second, it seemed clear that the vast majority of session attendees had worked seriously with a relational database other than PostreSQL as part of their job. Of course I expected these things would be true for some attendees, but it surprised me just how many – it seemed to me like almost everyone in the room.

https://ardentperf.files.wordpress.com/2019/03/pgconf-1-1.jpg?w=600&h=450 600w, https://ardentperf.files.wordpress.com/2019/03/pgconf-1-1.jpg?w=150&h=113 150w" sizes="(max-width: 300px) 100vw, 300px" />

What was great about PostgresConf? First and foremost, I give PostgresConf an “A-plus” … highest possible marks … on the two things I personally value the most from a user conference: technical content and user representation. I went to as many sessions as I could and I can personally vouch that there was quality material. And I met a bunch of top notch engineers from a wide variety of industries using PostgreSQL in serious ways.

Besides that, I can point out two other general things I thought PostgresConf did especially well:

  • Running a special track for regulated industries using PostgreSQL. They support this class of users not only in the topic selection, but even in the room setup – for example rules that prohibit recording.
  • Encouraging as many different people as possible to attend and participate. The conference code of conduct was taken seriously with a third party to independently receive, review and investigate any reports. There was an excellent general session and a panel discussion that started conversations – which may not have otherwise happened – around workplace dynamics and common assumptions we make. (If I could hazard summarizing one takeaway: I still do things at work which unintentionally assume that my co-workers are similar to me in subtle ways where they might actually differ!) Finally, on Friday afternoon, a time and space was provided for people to have conversations about career trajectories and the job market. With the rapidly growing demand I see for PostgreSQL skills, we’re going to need everyone who can help! So lets listen to (and become better advocates for) those in our industry who aren’t sure whether they can fully be part of it.

Wednesday March 20

Interesting conversations:

  • Two large, established software vendors whose software ships with embedded PostgreSQL as a default data store (one of these software packages supports a number of customer-managed DB backends but many customers use the default embedded PostgreSQL database).
  • Major University (over 150 years old) actively working to migrate critical backend business systems off commercial databases and onto open source. They have learned that you need to look at the stack as a whole; it proved infeasible to change the backend of applications that are tightly integrated with one specific database when both are sold by a single vendor. Changing the application itself is sometimes necessary.
  • Two FinTech companies: a credit/lending support company and a brokerage both already relying on PostgreSQL in their business.
  • Medium-sized non-profit (100ish people in IT), historically a commercial database shop, exploring open source databases and cloud-based architectures.
  • Two individual DBAs that I didn’t catch their industry. One was following up from Monday’s Hands-On Lab and the other was asking about how to identify applications which might be good starting points for migrating to cloud-based PostgreSQL. We talked about migration success factors like application complexity and database procedural code.

I was able to attend three sessions:

  • The opening general session. It was fun to hear the PostgresConf organizers talk a bit about how the conference is continuing to grow (mark your calendars: they already announced PostgresConf 2020 will be March 23rd through 27th!) and right after that to hear Marc Linster fill in the back-story with the changes and growth that happened over the last ten years. One thing I remember Marc saying was that the questions people ask have changed: five years ago people were trying to understand PostgreSQL and today they are increasingly asking questions about integrating with their larger environment.
  • Dennis Tighe talked about PostgreSQL Security Best Practices from the perspective of Amazon RDS. I was very impressed with this talk for its breadth of security-related topics: external roles/principles/access, data encryption and key management, sharing data, network segmentation and wire encryption, database account/role management, password management, auditing… and all of these topics covered from the perspective of PostgreSQL. I told Dennis several times that talks like this should be available at more user conferences!
  • Denish Patel gave a session on SQL performance tips titled: why isn’t my query using an index? Denish gave an overview of join types and optimizations, introduced the PostgreSQL Genetic Query Optimizer, and then dove into topics including cardinality, partial indexes, histograms and optimizer parameters. I especially enjoyed his samples drawn from the public flight arrival on-time statistics for 2018.


Thursday March 21

Interesting conversations:

  • Small international software vendor in the healthcare industry. Their application supports a couple configurations for the database backend but most of their customers use the default embedded PostgreSQL database. Great conversation about troubleshooting a database-related challenge one of their customers is facing (which they are supporting). Also discussed procedural languages, taking block dumps and viewing raw data from PostgreSQL data files, and the future of javascript in the database with current industry investment in WebAssembly.
  • Individual DBA talking about their current PostgreSQL configuration: 400-500 databases consolidated into a single PostgreSQL instance, typically running around 800-1000 connections but needing to support around 3000ish connections. IIRC each database had its own app server connecting to it. All of this running on a single bare metal server with only 8GB memory! (BTW, while this ratio of memory to connection count is working for them, it wouldn’t work for every workload.)
  • DBAs from my own rather large employer! I mention this because it was a real treat to have DBAs from a few different teams around who could share their own stories and lessons learned as they have been running real production workloads on PostgreSQL and migrating new workloads from commercial databases onto PostgreSQL. I love running into these guys at conferences!

Session I attended:

https://ardentperf.files.wordpress.com/2019/03/pgconf-3.jpg?w=600&h=450 600w, https://ardentperf.files.wordpress.com/2019/03/pgconf-3.jpg?w=150&h=113 150w" sizes="(max-width: 300px) 100vw, 300px" />
  • Baron Schwartz said we were the very first people to find out that he is re-titling his talk: “Everything You Need To Know About PostgreSQL EXPLAIN.” This was a high-quality, accessible talk introducing the topic of execution plans and describing why they are important. After introducing them and peeling back the layers a bit, he then discussed what can go wrong and how to troubleshoot.
  • Grant McAlister gave a session titled HOT – UNDERSTANDING THIS IMPORTANT UPDATE OPTIMIZATION. I did point out to Grant that it’s the only session at PostgresConf which is shouting in all capital letters; he wasn’t quite sure why that happened. But I’m not complaining, because if you like database internals as much as I do then this might have been the best presentation at the conference. I once told someone that I think Grant is a combination of Julian Dyke’s ability to make genius slide animations with Jonathan Lewis’ ability to construct simple tests demonstrating complex database behaviors.
  • My session about Wait Events in PostgreSQL! Around lunch time I went to go hide somewhere and review the content. As I was reviewing, it became evident that I likely had too much content… and I ended up a bit stuck for the next few hours trying to work on that. Well… ask anyone who came to the session… I did not succeed in trimming up my content enough!! But the session seemed successful anyway based on the positive feedback from attendees.

Work:

Like many other attendees I wasn’t quite able to get rid of all my responsibilities for the week. In case you wonder how much hands-on work I do… well I ended up spending a couple hours Thursday evening on a rather fun project I’ve had this year which involved writing several hundred lines of SQL against a very large Posgres-based column-store analytical database. I get to use CTEs & window functions and leverage a deep understanding of the physical layout to write queries that run in minutes instead of hours/days. Important strategies for this project: making a single pass on source data, leveraging push-down filter processing as much as possible, avoiding unnecessary sorts, and minimizing the need to transfer data over the network between cluster nodes during different stages of query execution. (I love this stuff!)

Friday March 22

The conversations I most remember from Friday were around user group and community organizing. Conferences in South Africa and Spain, user groups in Chicago and Denver and Seattle, and a contact in Raleigh.

In addition to the general sessions on Friday, I made sure to attend Jim Nasby’s talk titled: All The Dirt On Vacuum (PG 11 Edition). There’s a version of this talk from 2015 on YouTube but it’s always worth catching Nasby live if you can, because he’s constantly updating and improving the talk. The question that I most often ask about vacuum: “how much longer until this finishes?” The PostgreSQL docs list the Vacuum Phases but they don’t mention that some of these phases might get repeated multiple times. Nasby’s talk brings it all together so that you can understand the whole process and the configuration settings that impact it.

https://ardentperf.files.wordpress.com/2019/03/all-the-buttons.jpg?w=600... 600w, https://ardentperf.files.wordpress.com/2019/03/all-the-buttons.jpg?w=150... 150w" sizes="(max-width: 300px) 100vw, 300px" />

Last but not least… last year I was one of the mentors helping high-school students all around the globe who made contributions to open source PostgreSQL through Google’s Code-In contest. One of the contributions these students made was to create some really cool sticker and pin designs that could be used to promote the open source project!

PostgresConf sponsored ordering a bunch of real stickers and pins with the designs created by these high school students. And we had some fun with this on the exhibition floor: each booth got one or two designs, and attendees who wanted to get all the sticker/pin designs would have to hunt around. In case you’re wondering whether anyone got them all… on Friday, I discovered at least one person who did: Ryan Lambert of RustProof Labs!

Growth Areas

So I’m fully aware (and JD recently reminded me just to make sure) that suggesting growth areas is tantamount to volunteering to implement them… I’m going to take that risk anyway and throw out two ideas for next year if the volunteer bandwidth exists to implement them. </p />
</p></div>

    	  	<div class=

PostgresConf 2019 Summary

https://ardentperf.files.wordpress.com/2019/03/pgconf-2.jpg?w=600&h=450 600w, https://ardentperf.files.wordpress.com/2019/03/pgconf-2.jpg?w=150&h=113 150w" sizes="(max-width: 300px) 100vw, 300px" />

PostgresConf 2019 in New York City is a wrap! Now that I’ve had a few days to recover a little brain capacity, I think I’m ready to attempt a summary. I love conferences for the learning opportunities… inside of sessions and also outside of them. Being the largest PostgreSQL-centered conference in the United States, PostgresConf offers unique opportunities to:

  1. Watch seasoned technical presenters go deep on PostgreSQL internals.
  2. Hear serious users present the latest about how they are using PostgreSQL and what they’ve learned.
  3. Connect businesses who are doing redundant work so they can collaborate instead of each repeating the other’s work.

Last week I published a summary of the first two days of the conference which included the summits, tutorials and training sessions.

I tried to keep a few notes during the final three days of the conference as well so I could share a bit about my experience this year. But before I dive into the day-by-day summary… first, a few short high-level points.

Highlights

Who attends PostgresConf? During my session about Wait Events I asked two questions to scratch the surface a little bit. First, it seemed clear that the vast majority of session attendees had been working with relational databases for more than a decade and a number for more than 25 years. Second, it seemed clear that the vast majority of session attendees had worked seriously with a relational database other than PostreSQL as part of their job. Of course I expected these things would be true for some attendees, but it surprised me just how many – it seemed to me like almost everyone in the room.

https://ardentperf.files.wordpress.com/2019/03/pgconf-1-1.jpg?w=600&h=450 600w, https://ardentperf.files.wordpress.com/2019/03/pgconf-1-1.jpg?w=150&h=113 150w" sizes="(max-width: 300px) 100vw, 300px" />

What was great about PostgresConf? First and foremost, I give PostgresConf an “A-plus” … highest possible marks … on the two things I personally value the most from a user conference: technical content and user representation. I went to as many sessions as I could and I can personally vouch that there was quality material. And I met a bunch of top notch engineers from a wide variety of industries using PostgreSQL in serious ways.

Besides that, I can point out two other general things I thought PostgresConf did especially well:

  • Running a special track for regulated industries using PostgreSQL. They support this class of users not only in the topic selection, but even in the room setup – for example rules that prohibit recording.
  • Encouraging as many different people as possible to attend and participate. The conference code of conduct was taken seriously with a third party to independently receive, review and investigate any reports. There was an excellent general session and a panel discussion that started conversations – which may not have otherwise happened – around workplace dynamics and common assumptions we make. (If I could hazard summarizing one takeaway: I still do things at work which unintentionally assume that my co-workers are similar to me in subtle ways where they might actually differ!) Finally, on Friday afternoon, a time and space was provided for people to have conversations about career trajectories and the job market. With the rapidly growing demand I see for PostgreSQL skills, we’re going to need everyone who can help! So lets listen to (and become better advocates for) those in our industry who aren’t sure whether they can fully be part of it.

Wednesday March 20

Interesting conversations:

  • Two large, established software vendors whose software ships with embedded PostgreSQL as a default data store (one of these software packages supports a number of customer-managed DB backends but many customers use the default embedded PostgreSQL database).
  • Major University (over 150 years old) actively working to migrate critical backend business systems off commercial databases and onto open source. They have learned that you need to look at the stack as a whole; it proved infeasible to change the backend of applications that are tightly integrated with one specific database when both are sold by a single vendor. Changing the application itself is sometimes necessary.
  • Two FinTech companies: a credit/lending support company and a brokerage both already relying on PostgreSQL in their business.
  • Medium-sized non-profit (100ish people in IT), historically a commercial database shop, exploring open source databases and cloud-based architectures.
  • Two individual DBAs that I didn’t catch their industry. One was following up from Monday’s Hands-On Lab and the other was asking about how to identify applications which might be good starting points for migrating to cloud-based PostgreSQL. We talked about migration success factors like application complexity and database procedural code.

I was able to attend three sessions:

  • The opening general session. It was fun to hear the PostgresConf organizers talk a bit about how the conference is continuing to grow (mark your calendars: they already announced PostgresConf 2020 will be March 23rd through 27th!) and right after that to hear Marc Linster fill in the back-story with the changes and growth that happened over the last ten years. One thing I remember Marc saying was that the questions people ask have changed: five years ago people were trying to understand PostgreSQL and today they are increasingly asking questions about integrating with their larger environment.
  • Dennis Tighe talked about PostgreSQL Security Best Practices from the perspective of Amazon RDS. I was very impressed with this talk for its breadth of security-related topics: external roles/principles/access, data encryption and key management, sharing data, network segmentation and wire encryption, database account/role management, password management, auditing… and all of these topics covered from the perspective of PostgreSQL. I told Dennis several times that talks like this should be available at more user conferences!
  • Denish Patel gave a session on SQL performance tips titled: why isn’t my query using an index? Denish gave an overview of join types and optimizations, introduced the PostgreSQL Genetic Query Optimizer, and then dove into topics including cardinality, partial indexes, histograms and optimizer parameters. I especially enjoyed his samples drawn from the public flight arrival on-time statistics for 2018.


Thursday March 21

Interesting conversations:

  • Small international software vendor in the healthcare industry. Their application supports a couple configurations for the database backend but most of their customers use the default embedded PostgreSQL database. Great conversation about troubleshooting a database-related challenge one of their customers is facing (which they are supporting). Also discussed procedural languages, taking block dumps and viewing raw data from PostgreSQL data files, and the future of javascript in the database with current industry investment in WebAssembly.
  • Individual DBA talking about their current PostgreSQL configuration: 400-500 databases consolidated into a single PostgreSQL instance, typically running around 800-1000 connections but needing to support around 3000ish connections. IIRC each database had its own app server connecting to it. All of this running on a single bare metal server with only 8GB memory! (BTW, while this ratio of memory to connection count is working for them, it wouldn’t work for every workload.)
  • DBAs from my own rather large employer! I mention this because it was a real treat to have DBAs from a few different teams around who could share their own stories and lessons learned as they have been running real production workloads on PostgreSQL and migrating new workloads from commercial databases onto PostgreSQL. I love running into these guys at conferences!

Session I attended:

https://ardentperf.files.wordpress.com/2019/03/pgconf-3.jpg?w=600&h=450 600w, https://ardentperf.files.wordpress.com/2019/03/pgconf-3.jpg?w=150&h=113 150w" sizes="(max-width: 300px) 100vw, 300px" />
  • Baron Schwartz said we were the very first people to find out that he is re-titling his talk: “Everything You Need To Know About PostgreSQL EXPLAIN.” This was a high-quality, accessible talk introducing the topic of execution plans and describing why they are important. After introducing them and peeling back the layers a bit, he then discussed what can go wrong and how to troubleshoot.
  • Grant McAlister gave a session titled HOT – UNDERSTANDING THIS IMPORTANT UPDATE OPTIMIZATION. I did point out to Grant that it’s the only session at PostgresConf which is shouting in all capital letters; he wasn’t quite sure why that happened. But I’m not complaining, because if you like database internals as much as I do then this might have been the best presentation at the conference. I once told someone that I think Grant is a combination of Julian Dyke’s ability to make genius slide animations with Jonathan Lewis’ ability to construct simple tests demonstrating complex database behaviors.
  • My session about Wait Events in PostgreSQL! Around lunch time I went to go hide somewhere and review the content. As I was reviewing, it became evident that I likely had too much content… and I ended up a bit stuck for the next few hours trying to work on that. Well… ask anyone who came to the session… I did not succeed in trimming up my content enough!! But the session seemed successful anyway based on the positive feedback from attendees.

Work:

Like many other attendees I wasn’t quite able to get rid of all my responsibilities for the week. In case you wonder how much hands-on work I do… well I ended up spending a couple hours Thursday evening on a rather fun project I’ve had this year which involved writing several hundred lines of SQL against a very large Posgres-based column-store analytical database. I get to use CTEs & window functions and leverage a deep understanding of the physical layout to write queries that run in minutes instead of hours/days. Important strategies for this project: making a single pass on source data, leveraging push-down filter processing as much as possible, avoiding unnecessary sorts, and minimizing the need to transfer data over the network between cluster nodes during different stages of query execution. (I love this stuff!)

Friday March 22

The conversations I most remember from Friday were around user group and community organizing. Conferences in South Africa and Spain, user groups in Chicago and Denver and Seattle, and a contact in Raleigh.

In addition to the general sessions on Friday, I made sure to attend Jim Nasby’s talk titled: All The Dirt On Vacuum (PG 11 Edition). There’s a version of this talk from 2015 on YouTube but it’s always worth catching Nasby live if you can, because he’s constantly updating and improving the talk. The question that I most often ask about vacuum: “how much longer until this finishes?” The PostgreSQL docs list the Vacuum Phases but they don’t mention that some of these phases might get repeated multiple times. Nasby’s talk brings it all together so that you can understand the whole process and the configuration settings that impact it.

https://ardentperf.files.wordpress.com/2019/03/all-the-buttons.jpg?w=600... 600w, https://ardentperf.files.wordpress.com/2019/03/all-the-buttons.jpg?w=150... 150w" sizes="(max-width: 300px) 100vw, 300px" />

Last but not least… last year I was one of the mentors helping high-school students all around the globe who made contributions to open source PostgreSQL through Google’s Code-In contest. One of the contributions these students made was to create some really cool sticker and pin designs that could be used to promote the open source project!

PostgresConf sponsored ordering a bunch of real stickers and pins with the designs created by these high school students. And we had some fun with this on the exhibition floor: each booth got one or two designs, and attendees who wanted to get all the sticker/pin designs would have to hunt around. In case you’re wondering whether anyone got them all… on Friday, I discovered at least one person who did: Ryan Lambert of RustProof Labs!

Growth Areas

So I’m fully aware (and JD recently reminded me just to make sure) that suggesting growth areas is tantamount to volunteering to implement them… I’m going to take that risk anyway and throw out two ideas for next year if the volunteer bandwidth exists to implement them. </p />
</p></div>

    	  	<div class=

PostgresConf 2019 Summary

https://ardentperf.files.wordpress.com/2019/03/pgconf-2.jpg?w=600&h=450 600w, https://ardentperf.files.wordpress.com/2019/03/pgconf-2.jpg?w=150&h=113 150w" sizes="(max-width: 300px) 100vw, 300px" />

PostgresConf 2019 in New York City is a wrap! Now that I’ve had a few days to recover a little brain capacity, I think I’m ready to attempt a summary. I love conferences for the learning opportunities… inside of sessions and also outside of them. Being the largest PostgreSQL-centered conference in the United States, PostgresConf offers unique opportunities to:

  1. Watch seasoned technical presenters go deep on PostgreSQL internals.
  2. Hear serious users present the latest about how they are using PostgreSQL and what they’ve learned.
  3. Connect businesses who are doing redundant work so they can collaborate instead of each repeating the other’s work.

Last week I published a summary of the first two days of the conference which included the summits, tutorials and training sessions.

I tried to keep a few notes during the final three days of the conference as well so I could share a bit about my experience this year. But before I dive into the day-by-day summary… first, a few short high-level points.

Highlights

Who attends PostgresConf? During my session about Wait Events I asked two questions to scratch the surface a little bit. First, it seemed clear that the vast majority of session attendees had been working with relational databases for more than a decade and a number for more than 25 years. Second, it seemed clear that the vast majority of session attendees had worked seriously with a relational database other than PostreSQL as part of their job. Of course I expected these things would be true for some attendees, but it surprised me just how many – it seemed to me like almost everyone in the room.

https://ardentperf.files.wordpress.com/2019/03/pgconf-1-1.jpg?w=600&h=450 600w, https://ardentperf.files.wordpress.com/2019/03/pgconf-1-1.jpg?w=150&h=113 150w" sizes="(max-width: 300px) 100vw, 300px" />

What was great about PostgresConf? First and foremost, I give PostgresConf an “A-plus” … highest possible marks … on the two things I personally value the most from a user conference: technical content and user representation. I went to as many sessions as I could and I can personally vouch that there was quality material. And I met a bunch of top notch engineers from a wide variety of industries using PostgreSQL in serious ways.

Besides that, I can point out two other general things I thought PostgresConf did especially well:

  • Running a special track for regulated industries using PostgreSQL. They support this class of users not only in the topic selection, but even in the room setup – for example rules that prohibit recording.
  • Encouraging as many different people as possible to attend and participate. The conference code of conduct was taken seriously with a third party to independently receive, review and investigate any reports. There was an excellent general session and a panel discussion that started conversations – which may not have otherwise happened – around workplace dynamics and common assumptions we make. (If I could hazard summarizing one takeaway: I still do things at work which unintentionally assume that my co-workers are similar to me in subtle ways where they might actually differ!) Finally, on Friday afternoon, a time and space was provided for people to have conversations about career trajectories and the job market. With the rapidly growing demand I see for PostgreSQL skills, we’re going to need everyone who can help! So lets listen to (and become better advocates for) those in our industry who aren’t sure whether they can fully be part of it.

Wednesday March 20

Interesting conversations:

  • Two large, established software vendors whose software ships with embedded PostgreSQL as a default data store (one of these software packages supports a number of customer-managed DB backends but many customers use the default embedded PostgreSQL database).
  • Major University (over 150 years old) actively working to migrate critical backend business systems off commercial databases and onto open source. They have learned that you need to look at the stack as a whole; it proved infeasible to change the backend of applications that are tightly integrated with one specific database when both are sold by a single vendor. Changing the application itself is sometimes necessary.
  • Two FinTech companies: a credit/lending support company and a brokerage both already relying on PostgreSQL in their business.
  • Medium-sized non-profit (100ish people in IT), historically a commercial database shop, exploring open source databases and cloud-based architectures.
  • Two individual DBAs that I didn’t catch their industry. One was following up from Monday’s Hands-On Lab and the other was asking about how to identify applications which might be good starting points for migrating to cloud-based PostgreSQL. We talked about migration success factors like application complexity and database procedural code.

I was able to attend three sessions:

  • The opening general session. It was fun to hear the PostgresConf organizers talk a bit about how the conference is continuing to grow (mark your calendars: they already announced PostgresConf 2020 will be March 23rd through 27th!) and right after that to hear Marc Linster fill in the back-story with the changes and growth that happened over the last ten years. One thing I remember Marc saying was that the questions people ask have changed: five years ago people were trying to understand PostgreSQL and today they are increasingly asking questions about integrating with their larger environment.
  • Dennis Tighe talked about PostgreSQL Security Best Practices from the perspective of Amazon RDS. I was very impressed with this talk for its breadth of security-related topics: external roles/principles/access, data encryption and key management, sharing data, network segmentation and wire encryption, database account/role management, password management, auditing… and all of these topics covered from the perspective of PostgreSQL. I told Dennis several times that talks like this should be available at more user conferences!
  • Denish Patel gave a session on SQL performance tips titled: why isn’t my query using an index? Denish gave an overview of join types and optimizations, introduced the PostgreSQL Genetic Query Optimizer, and then dove into topics including cardinality, partial indexes, histograms and optimizer parameters. I especially enjoyed his samples drawn from the public flight arrival on-time statistics for 2018.


Thursday March 21

Interesting conversations:

  • Small international software vendor in the healthcare industry. Their application supports a couple configurations for the database backend but most of their customers use the default embedded PostgreSQL database. Great conversation about troubleshooting a database-related challenge one of their customers is facing (which they are supporting). Also discussed procedural languages, taking block dumps and viewing raw data from PostgreSQL data files, and the future of javascript in the database with current industry investment in WebAssembly.
  • Individual DBA talking about their current PostgreSQL configuration: 400-500 databases consolidated into a single PostgreSQL instance, typically running around 800-1000 connections but needing to support around 3000ish connections. IIRC each database had its own app server connecting to it. All of this running on a single bare metal server with only 8GB memory! (BTW, while this ratio of memory to connection count is working for them, it wouldn’t work for every workload.)
  • DBAs from my own rather large employer! I mention this because it was a real treat to have DBAs from a few different teams around who could share their own stories and lessons learned as they have been running real production workloads on PostgreSQL and migrating new workloads from commercial databases onto PostgreSQL. I love running into these guys at conferences!

Session I attended:

https://ardentperf.files.wordpress.com/2019/03/pgconf-3.jpg?w=600&h=450 600w, https://ardentperf.files.wordpress.com/2019/03/pgconf-3.jpg?w=150&h=113 150w" sizes="(max-width: 300px) 100vw, 300px" />
  • Baron Schwartz said we were the very first people to find out that he is re-titling his talk: “Everything You Need To Know About PostgreSQL EXPLAIN.” This was a high-quality, accessible talk introducing the topic of execution plans and describing why they are important. After introducing them and peeling back the layers a bit, he then discussed what can go wrong and how to troubleshoot.
  • Grant McAlister gave a session titled HOT – UNDERSTANDING THIS IMPORTANT UPDATE OPTIMIZATION. I did point out to Grant that it’s the only session at PostgresConf which is shouting in all capital letters; he wasn’t quite sure why that happened. But I’m not complaining, because if you like database internals as much as I do then this might have been the best presentation at the conference. I once told someone that I think Grant is a combination of Julian Dyke’s ability to make genius slide animations with Jonathan Lewis’ ability to construct simple tests demonstrating complex database behaviors.
  • My session about Wait Events in PostgreSQL! Around lunch time I went to go hide somewhere and review the content. As I was reviewing, it became evident that I likely had too much content… and I ended up a bit stuck for the next few hours trying to work on that. Well… ask anyone who came to the session… I did not succeed in trimming up my content enough!! But the session seemed successful anyway based on the positive feedback from attendees.

Work:

Like many other attendees I wasn’t quite able to get rid of all my responsibilities for the week. In case you wonder how much hands-on work I do… well I ended up spending a couple hours Thursday evening on a rather fun project I’ve had this year which involved writing several hundred lines of SQL against a very large Posgres-based column-store analytical database. I get to use CTEs & window functions and leverage a deep understanding of the physical layout to write queries that run in minutes instead of hours/days. Important strategies for this project: making a single pass on source data, leveraging push-down filter processing as much as possible, avoiding unnecessary sorts, and minimizing the need to transfer data over the network between cluster nodes during different stages of query execution. (I love this stuff!)

Friday March 22

The conversations I most remember from Friday were around user group and community organizing. Conferences in South Africa and Spain, user groups in Chicago and Denver and Seattle, and a contact in Raleigh.

In addition to the general sessions on Friday, I made sure to attend Jim Nasby’s talk titled: All The Dirt On Vacuum (PG 11 Edition). There’s a version of this talk from 2015 on YouTube but it’s always worth catching Nasby live if you can, because he’s constantly updating and improving the talk. The question that I most often ask about vacuum: “how much longer until this finishes?” The PostgreSQL docs list the Vacuum Phases but they don’t mention that some of these phases might get repeated multiple times. Nasby’s talk brings it all together so that you can understand the whole process and the configuration settings that impact it.

https://ardentperf.files.wordpress.com/2019/03/all-the-buttons.jpg?w=600... 600w, https://ardentperf.files.wordpress.com/2019/03/all-the-buttons.jpg?w=150... 150w" sizes="(max-width: 300px) 100vw, 300px" />

Last but not least… last year I was one of the mentors helping high-school students all around the globe who made contributions to open source PostgreSQL through Google’s Code-In contest. One of the contributions these students made was to create some really cool sticker and pin designs that could be used to promote the open source project!

PostgresConf sponsored ordering a bunch of real stickers and pins with the designs created by these high school students. And we had some fun with this on the exhibition floor: each booth got one or two designs, and attendees who wanted to get all the sticker/pin designs would have to hunt around. In case you’re wondering whether anyone got them all… on Friday, I discovered at least one person who did: Ryan Lambert of RustProof Labs!

Growth Areas

So I’m fully aware (and JD recently reminded me just to make sure) that suggesting growth areas is tantamount to volunteering to implement them… I’m going to take that risk anyway and throw out two ideas for next year if the volunteer bandwidth exists to implement them. </p />
</p></div>

    	  	<div class=

Long running scheduler jobs

One of the nice things about the job scheduler in the Oracle database is the easily interpreted interval settings you can apply for job frequency. The days of cryptic strings like “sysdate+0.000694444” when all you really wanted to say was “Just run this job every minute” are a thing of the past. I covered how to get the database to convert interval strings into real execution dates here 

But it raises the question: What if I have a job that is scheduled to run every minute, but it takes more than 1 minute to run? Will the scheduler just crank out more and more concurrent executions of that job? Will I swamp my system with ever more background jobs? So I thought I’d find out with a simple test.

I created a table which will record the start and end time for executions of a procedure, and then crafted that procedure to always run for at least 2 minutes using dbms_lock.sleep. (If you are on 18c, you can replace this with dbms_session.sleep to avoid the need for an explicit grant.) Then I set this procedure to be run every minute via dbms_scheduler.


SQL> create table t ( tag varchar2(10), d date);

Table created.

SQL>
SQL> create or replace
  2  procedure P is
  3  begin
  4    insert into t values ('start',sysdate);
  5    commit;
  6    dbms_lock.sleep(120);
  7    insert into t values ('end',sysdate);
  8    commit;
  9  end;
 10  /

Procedure created.


SQL> begin
  2    dbms_scheduler.create_job (
  3      job_name        => 'JOB1',
  4      job_type        => 'PLSQL_BLOCK',
  5      job_action      => 'begin p; end;',
  6      start_date      => systimestamp,
  7      repeat_interval => 'freq=minutely;bysecond=0;',
  8      enabled         => true);
  9  end;
 10  /

PL/SQL procedure successfully completed.

I waited 20 minutes and then looked at both my table and the scheduler logs to see how many concurrent executions were recorded.


SQL> select * from t order by d, tag;

TAG        D
---------- -------------------
start      25/03/2019 21:36:00
end        25/03/2019 21:38:00
start      25/03/2019 21:38:00
end        25/03/2019 21:40:00
start      25/03/2019 21:40:00
end        25/03/2019 21:42:00
start      25/03/2019 21:42:01
end        25/03/2019 21:44:01
start      25/03/2019 21:44:01
end        25/03/2019 21:46:01
start      25/03/2019 21:46:01
end        25/03/2019 21:48:01
start      25/03/2019 21:48:01
end        25/03/2019 21:50:01
start      25/03/2019 21:50:01
end        25/03/2019 21:52:01
start      25/03/2019 21:52:01
end        25/03/2019 21:54:01

18 rows selected.

SQL> select log_date
  2  from   dba_scheduler_job_log
  3  where job_name = 'JOB1'
  4  order by log_date;

LOG_DATE
-----------------------------------------------
25-MAR-19 09.38.00.866000 PM +08:00
25-MAR-19 09.40.00.920000 PM +08:00
25-MAR-19 09.42.00.998000 PM +08:00
25-MAR-19 09.44.01.037000 PM +08:00
25-MAR-19 09.46.01.078000 PM +08:00
25-MAR-19 09.48.01.143000 PM +08:00
25-MAR-19 09.50.01.171000 PM +08:00
25-MAR-19 09.52.01.206000 PM +08:00
25-MAR-19 09.54.01.272000 PM +08:00

9 rows selected.

As you can see, the scheduler is not going to swamp your system. It will not run the “next” occurrence of your submitted job until the current execution has completed. So even though we requested an execution each minute, we are bound by the run time duration of the job itself. Once additional nice thing is that once the job has finished, the scheduler immediately sees that the next execution is overdue and launches the next job straight away. So no need to worry about an excessive number of jobs all running together.

Just as an aside, when you have a scheduler job that is “always” in a running state, then you need to take care when dropping the job because by default you cannot drop a running job. In such instances, you can always add the FORCE parameter to kill the current execution and remove the job from the scheduler.


SQL> exec dbms_scheduler.drop_job('JOB1')
BEGIN dbms_scheduler.drop_job('JOB1'); END;

*
ERROR at line 1:
ORA-27478: job "MCDONAC"."JOB1" is running
ORA-06512: at "SYS.DBMS_ISCHED", line 274
ORA-06512: at "SYS.DBMS_SCHEDULER", line 753
ORA-06512: at line 1


SQL> exec dbms_scheduler.drop_job('JOB1',force=>true)

PL/SQL procedure successfully completed.

 

Stats advisor

This is just a little shout-out about the Stats Advisor – if you decide to give it a go, what sort of things is it likely to tell you. The answer is in a dynamic performance view called v$stats_advisor_rules – which I’ve list below from an instance running 18.3.0.0.


SQL> set linesize 180
SQL> set trimspool on
SQL> set pagesize 40
SQL> column description format a75
SQL> column name format a32
SQL> break on rule_type duplicate skip 1
SQL> select * from v$stats_advisor_rules;

  RULE_ID NAME                             RULE_TYPE DESCRIPTION                                                                     CON_ID
---------- -------------------------------- --------- --------------------------------------------------------------------------- ----------
         0                                  SYSTEM                                                                                         0
         1 UseAutoJob                       SYSTEM    Use Auto Job for Statistics Collection                                               0
         2 CompleteAutoJob                  SYSTEM    Auto Statistics Gather Job should complete successfully                              0
         3 MaintainStatsHistory             SYSTEM    Maintain Statistics History                                                          0
         4 UseConcurrent                    SYSTEM    Use Concurrent preference for Statistics Collection                                  0
         5 UseDefaultPreference             SYSTEM    Use Default Preference for Stats Collection                                          0
         6 TurnOnSQLPlanDirective           SYSTEM    SQL Plan Directives should not be disabled                                           0

         7 AvoidSetProcedures               OPERATION Avoid Set Statistics Procedures                                                      0
         8 UseDefaultParams                 OPERATION Use Default Parameters in Statistics Collection Procedures                           0
         9 UseGatherSchemaStats             OPERATION Use gather_schema_stats procedure                                                    0
        10 AvoidInefficientStatsOprSeq      OPERATION Avoid inefficient statistics operation sequences                                     0

        11 AvoidUnnecessaryStatsCollection  OBJECT    Avoid unnecessary statistics collection                                              0
        12 AvoidStaleStats                  OBJECT    Avoid objects with stale or no statistics                                            0
        13 GatherStatsAfterBulkDML          OBJECT    Do not gather statistics right before bulk DML                                       0
        14 LockVolatileTable                OBJECT    Statistics for objects with volatile data should be locked                           0
        15 UnlockNonVolatileTable           OBJECT    Statistics for objects with non-volatile should not be locked                        0
        16 MaintainStatsConsistency         OBJECT    Statistics of dependent objects should be consistent                                 0
        17 AvoidDropRecreate                OBJECT    Avoid drop and recreate object seqauences                                            0
        18 UseIncremental                   OBJECT    Statistics should be maintained incrementally when it is beneficial                  0
        19 NotUseIncremental                OBJECT    Statistics should not be maintained incrementally when it is not beneficial          0
        20 AvoidOutOfRange                  OBJECT    Avoid Out of Range Histogram endpoints                                               0
        21 UseAutoDegree                    OBJECT    Use Auto Degree for statistics collection                                            0
        22 UseDefaultObjectPreference       OBJECT    Use Default Object Preference for statistics collection                              0
        23 AvoidAnalyzeTable                OBJECT    Avoid using analyze table commands for statistics collection                         0

24 rows selected.

As you can see the rules fall into three groups: system, operation, and object – and you can’t help noticing at all three levels how commonly the theme is: “just stick with the defaults!”.

As so often happens when I start writing a catch-up or “remind myself” not I found that Tim Hall has already written all about it.

Intro: Initial Thoughts On Oracle Autonomous Database Cloud Services (Automatic For The People)

I’m currently writing up a few blog pieces on indexing in the Oracle Autonomous Database environments, but I thought I’ll begin by introducing what exactly are Oracle Autonomous Database Cloud Services and some of my initial thoughts, as there’s still some confusion on all this. Introduced by Uncle Larry at Oracle OpenWorld 2017, Oracle Autonomous […]

Lost time

Here’s a little puzzle that came up in the ODC database forum yesterday – I’ve got a query that has been captured by SQL Monitor, and it’s taking much longer to run than it should but the monitoring report isn’t telling me what I need to know about the time.

Here’s a little model to demonstrate the problem – I’m going to join a table to itself (the self join isn’t a necessary feature of the demonstration, I’ve just been a bit lazy in preparing data). Here’s a (competely truthful) description of the table:

SQL> desc t1
 Name                          Null?    Type
 ----------------------------- -------- --------------------
 ID                                     NUMBER
 MOD_1000                               NUMBER
 V1                                     VARCHAR2(40)
 V2                                     VARCHAR2(40)
 PADDING                                VARCHAR2(100)

SQL> select num_rows, blocks from user_tables where table_name = 'T1';

  NUM_ROWS     BLOCKS
---------- ----------
    400000       7798

1 row selected.


And here’s the text version of the output I get from dbms_monitor.report_sql_monitor() for the query running on 18.3.0.0


Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 Session             :  TEST_USER (261:36685)
 SQL ID              :  g6j671u7zc9mn
 SQL Execution ID    :  16777218
 Execution Started   :  03/21/2019 08:54:56
 First Refresh Time  :  03/21/2019 08:54:56
 Last Refresh Time   :  03/21/2019 08:55:17
 Duration            :  21s
 Module/Action       :  MyModule/MyAction
 Service             :  SYS$USERS
 Program             :  sqlplus@linux183.localdomain (TNS V1-V3)
 Fetch Calls         :  2

Global Stats
==========================================================================
| Elapsed |   Cpu   |    IO    | PL/SQL  | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Time(s) | Calls |  Gets  | Reqs | Bytes |
==========================================================================
|      21 |      21 |     0.04 |    0.07 |     2 |     3M |  215 | 180MB |
==========================================================================

SQL Plan Monitoring Details (Plan Hash Value=83896840)
==================================================================================================================================================
| Id |       Operation       | Name |  Rows   | Cost |   Time    | Start  | Execs |   Rows   | Read | Read  |  Mem  | Activity | Activity Detail |
|    |                       |      | (Estim) |      | Active(s) | Active |       | (Actual) | Reqs | Bytes | (Max) |   (%)    |   (# samples)   |
==================================================================================================================================================
|  0 | SELECT STATEMENT      |      |         |      |        21 |     +1 |     1 |        1 |      |       |     . |          |                 |
|  1 |   FILTER              |      |         |      |        21 |     +1 |     1 |        1 |      |       |     . |          |                 |
|  2 |    HASH JOIN OUTER    |      |     400 | 2014 |        21 |     +1 |     1 |      400 |      |       |   2MB |          |                 |
|  3 |     TABLE ACCESS FULL | T1   |     400 | 1005 |         1 |     +1 |     1 |      400 |   62 |  60MB |     . |          |                 |
|  4 |     TABLE ACCESS FULL | T1   |    400K | 1001 |        21 |     +1 |     1 |     400K |      |       |     . |          |                 |
==================================================================================================================================================

As you can see the total elapsed time 21 seconds of which the CPU time is the whole 21 seconds.

This seems a little large for a simple hash join so we should look at the “Activity Detail” section of the plan because that will report any ASH samples that can be found for this execution of this query. And that’s where the problem lies: despite using 21 CPU seconds in 21 seconds there are no ASH samples for the execution! Moreover – looking for other numerical oddities – the plan says we read 60MB in 62 read requests (that’s the first tablescan of of the 7,798 blocks of t1), but the summary says we read 180MB – where did we lose (or gain) 120MB ?

It might help to see the query (which is why I didn’t show it in the output above) and it might help to see the predicate section (which SQL Monitor doesn’t report). So here’s the query and its plan, with the resulting predicate section, pulled from memory:


SQL_ID  g6j671u7zc9mn, child number 0
-------------------------------------
select  /*+ monitor */  t1a.mod_1000, t1b.mod_1000 from  t1 t1a,  t1
t1b where  t1a.mod_1000 = 500 and t1b.id(+) = t1a.id and
nvl(t1b.mod_1000,0) + f2(t1a.id) + f3(t1a.id) > 0

Plan hash value: 83896840

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |  2014 (100)|          |
|*  1 |  FILTER             |      |       |       |            |          |
|*  2 |   HASH JOIN OUTER   |      |   400 |  7200 |  2014   (4)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T1   |   400 |  3600 |  1005   (3)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T1   |   400K|  3515K|  1001   (3)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(NVL("T1B"."MOD_1000",0)+"F2"("T1A"."ID")+"F3"("T1A"."ID")>0)
   2 - access("T1B"."ID"="T1A"."ID")
   3 - filter("T1A"."MOD_1000"=500)


Notice the two function calls that are part of the FILTER operation. That’s where the time is going – but I don’t think you can infer that from the SQL Monitor report (which rather suggests that the problem might be with the second full tablescan of t1 at operation 4).

Functions f2() and f3() both do a table scan of another table that is as large as t1 – and it looks as if they’re both going to be called 400 times, which is a lot of work. The 120MB of “lost” reads is the first table scan of each of the two tables, the 3M buffer gets (in the summary, if you didn’t notice it) is from the repeated tablescans as each row comes out of the hash join and the filter is applied.

Why doesn’t the report show us the CPU (and disk) samples? Because it’s querying ASH (v$active_session_history) by the SQL_ID of the principle SQL statement – and the work is being done by two other statements with different SQL_IDs.

Since it’s PL/SQL functions doing all the work why isn’t the 21 CPU seconds showing up in the “PL/SQL time(s)” summary figure? Because the time isn’t being spent in PL/SQL, it’s being spent in the SQL being run by the PL/SQL.

Just to finish off, let’s try to find the SQL being run by the PL/SQL. I’ll post the script to recreate the example at the end of the posting but for the moment I’ll just show you the query against v$sql that I ran to extract the SQL that’s embedded in the functions:


select  sql_id, executions, disk_reads, buffer_gets, sql_text
from    V$sql 
where   sql_text like 'SELECT%T2%' 
or      sql_text like 'SELECT%T3%'
;

SQL_ID        EXECUTIONS DISK_READS BUFFER_GETS SQL_TEXT
------------- ---------- ---------- ----------- --------------------------------------------------
12ytf1rry45d9        400       7683     3072817 SELECT MAX(ID) FROM T2 WHERE MOD_1000 = :B1
85dmsgqg3bh4w          1       7680        7698 SELECT MAX(ID) FROM T3 WHERE MOD_1000 = :B1

As you can see, there are roughly 120MB of disk I/O and 3M buffer gets due to these two statement – and one of them has run the 400 times we expected. It looks as if Oracle has done a cute little optimisation with the other function, though. If you look at the predicate it says:

  • NVL(“T1B”.”MOD_1000″,0)+”F2″(“T1A”.”ID”)+”F3″(“T1A”.”ID”)>0

It looks as if Oracle has either cached the result of the f3() function call (in which case why didn’t it also cache the f2() result) or it’s stopped evaluating the predicate as soon as the running total exceeded zero (but that might be a problem since f3() could return a negative number !). I don’t know exactly why Oracle is doing what it’s doing – but if I reverse the order of the functions in the predicates the f3() query will run 400 times and the f2() query will run once.

Update

Following the comment below from “anonymous”, pointing out that the “Active” SQL Monitor (and, of course, the equivalent OEM screen) has a section showing the impact of everything the session has been doing while the query ran I re-ran my call to dbms_monitor.report_sql_monitor() with the “type” parameter set to ‘ACTIVE’ rather than ‘TEXT’. After spooling this to a text file (set linesize 255, set pagesize 0) with the suffix “html” and trimming the top and bottom off so that the file started and ended with opening and closing html tags, I opened it in Firefox.

You have to have Adobe Flash Player for this to work, and you have to be connected to the Internet as the file contains content that will call to Oracle’s home domain. After getting Adobe flash to work, here’s a snapshot of the resulting display:

https://jonathanlewis.files.wordpress.com/2019/03/active_monitor.jpg?w=1... 150w, https://jonathanlewis.files.wordpress.com/2019/03/active_monitor.jpg?w=3... 300w, https://jonathanlewis.files.wordpress.com/2019/03/active_monitor.jpg?w=7... 768w, https://jonathanlewis.files.wordpress.com/2019/03/active_monitor.jpg 1820w" sizes="(max-width: 1024px) 100vw, 1024px" />

The print may be a little small, but in the lower half of the screen (which usually shows the plan with execution statistics) I’ve selected the “Activity” tab, and this has produced a graphic display of all the samples captured for the session while the query was running. I took this report after re-running the test case, but I had swapped the order of the two functions in the interim, so this output is showing that (from Oracle’s perspective) all the work done during the execution of the query was done by a statement with SQL_ID 85dmsgqg3bh4w.

Appendix

If you want to re-run some tests here’s the code I used to create the demo:


create table t1
nologging
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum-1,1000)              mod_1000,
        lpad(rownum,10,'0')             v1,
        lpad(rownum,10,'0')             v2,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 4e5 -- > comment to avoid WordPress format issue
;

-- alter table t1 add constraint t1_pk primary key(id);

create table t2 nologging as select * from t1;
create table t3 nologging as select * from t1;

create function f2 (i_in number) return number
as
        m_ret number;
begin
        select max(id)
        into    m_ret
        from    t2
        where   mod_1000 = i_in
        ;

        return m_ret;
end;
/

create function f3 (i_in number) return number
as
        m_ret number;
begin
        select  max(id)
        into    m_ret
        from    t3
        where   mod_1000 = i_in
        ;

        return m_ret;
end;
/


begin
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T1',
                method_opt  => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T2',
                method_opt  => 'for all columns size 1'
        );
        dbms_stats.gather_table_stats(
                ownname     => null,
                tabname     => 'T3',
                method_opt  => 'for all columns size 1'
        );
end;
/

alter system flush buffer_cache;

variable b1 number
variable b2 number

set timing on

select
        /*+ monitor */
        t1a.mod_1000, t1b.mod_1000
from
        t1      t1a,
        t1      t1b
where
        t1a.mod_1000 = 500
and     t1b.id(+) = t1a.id
and     nvl(t1b.mod_1000,0) + f3(t1a.id) + f2(t1a.id) > 0
;


One interesting little detail, if you try the tests, is that the join has to be an outer join for the FILTER operation to appear.

Announcement: “Oracle Performance Diagnostics and Tuning” Webinar – 9-12 July 2019 !!

I’m very excited to announce the first public running of my new “Oracle Performance Diagnostics and Tuning” Webinar will run between 9-12 July 2019 (6pm-10pm AEST): Webinar Series 9-12 July 2019 (start 6pm AEST, end 10pm AEST):  This is a must attend seminar aimed at Oracle professionals (both DBAs and Developers) who are interested in Performance Tuning.  […]

Oracle vs. SQL Server Architecture

There are a lot of DBAs that are expected to manage both Oracle and MSSQL environments. This is only going to become more common as database platforms variations with the introduction of the cloud continue. A database is a database in our management’s world and we’re expected to understand it all.

Its not an easy topic, but I’m going to post on it, taking it step by step and hopefully the diagrams will help. Its also not an apple to apple comparison, so hopefully, but starting at the base and working my way into it with as similar as comparisons as I’m able to with features, it will make sense for those out there that need to understand it.

We have a number of customers that are migrating Oracle to Azure and many love Oracle and want to keep their Oracle database as is, just bringing their licenses over to the cloud. The importance of this is they may have Azure/SQL DBAs managing them, so I’m here to help.

To begin, let’s start with a diagram that I *believe* best compares the basic, (and pretty high level) comparison between the two database platforms:

https://dbakevlar.com/wp-content/uploads/2019/03/oracle_sql-300x164.jpg 300w, https://dbakevlar.com/wp-content/uploads/2019/03/oracle_sql-768x420.jpg 768w" sizes="(max-width: 1024px) 100vw, 1024px" />
Oracle and SQL Server

Instance vs. Database

The first thing you’ll notice is what Oracle refers to as an INSTANCE is different to what SQL Server calls one.

Oracle’s instance is most closely related to what SQL Server calls their database, (although it includes the files that Oracle includes in their description, too) and the Oracle home is *relatively* SQL Server’s version of an instance.

Please also note that the Oracle architecture in the diagram isn’t 12c+ multi-tenant, a RAC, (Real Application Cluster) or has an ASM, (Automatic Storage Management) secondary instance managing the database files. All of this we’ll take on in further posts, but for today, we’ll stick to the generic, traditional architecture and simplest installation.

Background Processes

The second major difference is in the way of background processing. Where Oracle has an Oracle executable and SQL Server has one for it’s database engine, too, Oracle also has numerous background processes running for each database. These processes each perform a job or back up another process as part of a secondary responsibility. SQL Server is multi-threaded architecture and although I can quickly assess what threads are performing what responsibilities inside the database, I’ve been unable to assess this from the server level tools, where in Oracle, each process is clearly named at the OS level of its description.

Two of the major background processes are the PMON, (Process Monitor) and SMON, (System Monitor, but you will hear some people refer to it as the Session Monitor). If either of these processes die or are killed, the database instance, (i.e. the running processes and accessibility) will discontinue.

Other important background processes, like RECO, (Recovery) DBWR, (Database Writer) LGWR, (Log Writer) QMNC, (Queueing Coordinator) and CKPT, (Checkpoint) address important and obvious responsibilities in a relational database. Most of the time, you can kill these individual processes and the database will restart the process and recover without the instance failing.

Performance Data Collection

There are also two memory processes, MMON, (Manageability manager) and MMNL, (manageability light) that are memory allocations and processing isolated to the Automatic Workload Repository, (AWR) and Active Session History, (ASH). For those in the MSSQL world, these are like the Query Store and Dynamic Management Views, (DMV) performance and session data. One of the important design features of the memory buffer isolated for this work is that it writes one way while the users read the other, eliminating much, if any locking. There is both the V$ view data and then a set of aggregated snapshots, (AWR) and samples, (ASH, also written to the snapshots) written to the AWR repository into the SYSAUX Tablespace.

Tablespace = Filegroups

There’s that funny word, too- “Tablespace”. A tablespace isn’t much different than “Filegroups” in MSSQL. Its just a way of grouping logical objects into a logical space, inside a datafile.

With the introduction of multi-tenant, more Oracle emphasis is shifting from schema centric data to having separate tenants, i.e. pluggable databases. This means that the architecture is becoming more similar to SQL/Azure DB, along with other multi-tenant databases and with that, simpler datafile and tablespace design. Having all the data in one tablespace isn’t viewed as negatively as it once was, along with simplifying the management of data, databases and development.

Control Files

Control Files are the God controls for the database. They are binary files that contain everything from transactional status in the database, undo and redo sequence to physical status of data files. Without a control file, a DBA is in a world of hurt to recover an Oracle database. Having mirrored copies of the control file is one of the first things a DBA learns as part of DBA 101.

Redo Logs

These are another important aspect of Oracle database architecture. Each database contains an UNDO tablespace that tracks all undo if something is rolled back. All of this undo, along with any redo, is written to the REDO logs. These are created in sets, just like transaction logs to handle ongoing workloads, while a second is archiving work and another is available. The busier the database, the larger and more numerous the redo logs are. These are also mirrored in case of corruption or loss, as they are necessary for recovery to undo and redo a database back to an assessible state.

The redo logs are written to archive logs on a regular interval to ensure the database can be recovered for PIT, (Point in Time) recovery situations from backup.

SP and Password Files

The SP, (Parameter) file is binary, but was once a text file called simply the pfile, so you may hear this term as well for those still using it. Similar to the integrated SP_CONFIGURE, it is used to set up the parameters for the database configuration, including version, naming, etc. The file is included in backups and can be copied to the text, pfile version easily.

These parameters can be updated from the database management console, (UI) or the command line with the “in memory” option, which means a database cycle will be required to solidify the change or for many, scope=both, which would make the change immediate and written to the file.

The Password file is configuration of password management for the database, including remote connectivity access for management. A shared option means that it can be shared among databases, eliminating extra management and setting the SYSDBA management of the database.

PGA- What is it?

PGA, (Process Global Area) is an allocation of memory used for sorting, hashing and PL/SQL tables, among a few things. As Oracle doesn’t have a TEMP database to perform these tasks and 99% of indexes are heap, not clustered, having this memory is essential for increased performance. The memory is outside of the SGA, (System Global Area) and its also outside of the configuration for SGA memory. This is important when a DBA is sizing out a machine and knowing that there are distinct limits per process, process type and workload for PGA allocation, no matter how much PGA is set in the SP File.

Why is the sizing of the PGA important? If you don’t have enough PGA allocated or if the SQL is written poorly, a process won’t run inside of memory and will swap to TEMP tablespace. Unlike a TEMP database, a tablespace is disk allocation and disk, unless SSD, if very slow.

I’m hoping this was a good introduction into how Oracle is similar and different from SQL Server. I’ll try to continue with this topic and dig in deeper as we go along and hopefully I didn’t melt anyone’s brain.



Tags:  ,


Del.icio.us



Facebook

TweetThis

Digg

StumbleUpon




Copyright © DBAKevlar [Oracle vs. SQL Server Architecture], All Right Reserved. 2019.