This post originally appeared over at Pythian. There are also some very smart comments over there that you shouldn’t miss, go take a look!
I spent some time last month getting up to speed on MySQL. One of the nice perks of working at Pythian is the ability to study during the workday. They could have easily said “You are an Oracle DBA, you don’t need to know MySQL. We have enough REAL MySQL experts”, but they didn’t, and I appreciate.
So how does an Oracle DBA goes about learning MySQL?
Obviously you start by reading the docs. Specifically, I looked for the MySQL equivalent of the famous Oracle “Concepts Guide”.
Unfortunately, it doesn’t exist. I couldn’t find any similar overview of the architecture and the ideas behind the database. The first chapter of “High Performance MySQL” had a high level architecture review, which was useful but being just one chapter in a book, it lacked many of the details I wanted to learn. Peter Zaitsev’s “InnoDB Architecture” presentation had the kind of information I needed – but covered just InnoDB.
Thats really too bad because I definitely feel the lack – which I can easily tell you what Oracle does when you connect to a database, run a select, an update, commit or rollback – I can’t say the same about MySQL. So far I managed without this knowledge, but I have a constant worry that this will come back and bite me later.
Lacking a concepts guide, I read the documentation I had access to: Sheeri has nice presentations available for Pythian employees (and probably customers too. I’m not sure if she ever released them to the whole world). The official documentation is not bad either – it covers syntax without obvious errors and serves as a decent “how do I do X?” guide.
But reading docs is only half the battle. The easier half too. So I installed MySQL 5.1 on my Ubuntu from ready packages. Then I installed MySQL 5.5 from the tarball – which was not nearly as much fun, but by the time this worked I know much more about where everything is located and the various ways one can mis-configure MySQL.
Once the installation was successfull, I played a bit with users, schemas and databases. MySQL is weird – Schemas are called databases, users have many-to-many relation with databases. If a user logs in from a differnet IP, it is almost like a different user. If you delete all the data files and restart MySQL – it will create new empty data files instead. You can easily start a new MySQL server on the same physical box by modifying one file and creating few directories.
MySQL docs make a very big deal about storage engines. There are only 2 things that are important to rememeber though: MyISAM is non-transactional and is used for mysql schema (the data dictionary), it doesn’t have foreign keys or row level locks. InnoDB is transactional, has row level locks and is used everywhere else.
There are a confusing bunch of tools for backing up MySQL. MySQLDump is the MySQL equivalent of Export. Except that it creates a file full of the SQL commands required to recreate the database. These files can grow huge very fast, but it is very easy to restore from them, restore any parts of the schema or even modifying the data or schema before restoring.
XTRABackup is a tool for consistent backups of InnoDB schema (remember that in MyISAM there are no transactions so consistent backups is rather meaningless). It is easy to use – one command to backup, two commands to restore. You can do PITR of sorts with it, and you can restore specific data files. It doesn’t try to manage the backup policies for you the way RMAN does – so cleaning old backups is your responsibility.
Replication is considered a basic skill, not an advanced skill like in the Oracle world. Indeed once you know how to restore from a backup, setting up replication is trivial. It took me about 2 hours to configure my first replication in MySQL. I think in Oracle Streams it took me few days, and that was on top of years of other Oracle experience.
Having access to experienced colleagues who are happy to spend time teaching a newbie is priceless. I already mentioned Sheeri’s docs. Chris Schneider volunteered around 2 hours of his time to introduce me to various important configuration parameters, innoDB secrets and replication tips and tricks. Raj Thukral helped me by providing step by step installation and replication guidance and helping debug my work. I’m so happy to work with such awesome folks.
To my shock and horror, at that point I felt like I was done. I learned almost everything important there was to know about MySQL. It took a month. As an Oracle DBA, after two years I still felt like a complete newbie, and even today there are many areas I wish I had better expertise. I’m sure it is partially because I don’t know how much I don’t know, but MySQL really is a rather simple DB – there is less to tweak, less to configure, fewer components, less tools to learn.
Jonathan Lewis once said that he was lucky to learn Oracle with version 6, because back then it was still relatively simple to learn, but the concepts didn’t change much since so what he learned back then is still relevant today. Maybe in 10 years I’ll be saying the same about MySQL.
There are only three join mechanisms used by Oracle: merge join, hash join and nested loop join. Here’s an important thought: all three join methods are nested loop joins with different startup cost. Discuss. (I’ll be saying more about this in about a week’s time.) And while I’m asking questions: what’s the perfect tense of [...]![]()
前回の続きでKingstonのお買い得SSD4本で構築したRAID-0のTPC-Hベンチマークを行った。
初めは、1セッション、パラレル度=6でテスト:
そのときのディスク転送量は:
300-400MB/s程度しか出ていない。
CrystalDiskMarkで計った限界量は
1GBのReadで最高522MB/sを出しているのに、およそ120MB足りない。
そこで、4セッションにして、もう一度実行してみた:
限界転送量が出た。
522MB/sはすごい。でも、1セッションだと、パラレル度を増やしても、そこまで行かなかった。
Indexレンジスキャンなどが動かないように:
optimizer_index_cost_adj=1000;
と設定している。
それにTemporary sortも起こさないようにPGA_AGREGATE_TARGETも十分にとっている。
そしてCPUは合計で30%-70%程度しか消費していない。
そして、限界転送量を出し続けているにもかかわらず、qphは減速を繰り返し、最後は計測時間のタイムオーバーとなった:
始めはqph性能を上げるのではなくdirect path readで522MB/sの限界転送量を出すことにこだわってみたい。
| waits | event | micro_sec | sql |
| 7 | direct path read | 101175 | select c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l_quantity) from customer, orders, lineitem where o_orderkey in ( select l_orderkey from lineitem group by l_orderkey having sum(l_quantity) > 313) and c_custkey = o_custkey and o_orderkey = l_orderkey group by c_name, c_custkey , o_orderkey , o_orderdate , o_totalprice order by o_totalprice desc, o_orderdate |
| 4 | direct path read | 106464 | select nation, o_year, sum(amount) as sum_profit from ( select n_name as nation, extract(year from o_orderdate) as o_year , l_extendedprice * (1 - l_discount) - ps_supplycost * l_quantity as amount from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%navy%') profit group by nation, o_year order by nation, o_year desc |
| 4 | direct path read | 111444 | select l_shipmode, sum(case when o_orderpriority = '1-URGENT' or o_orderpriority = '2-HIGH' then 1 else 0 end) as high_line_count , sum(case when o_orderpriority <> '1-URGENT' and o_orderpriority <> '2-HIGH' then 1 else 0 end) as low_line_count from orders, lineitem where o_orderkey = l_orderkey and l_shipmode in ('SHIP', 'FOB') and l_commitdate < l_receiptdate and l_shipdate < l_commitdate and l_receiptdate >= date '1997-01-01' and l_receiptdate < date '1997-01-01' + interval '1' year group by l_shipmode order by l_shipmode |
| 4 | direct path read | 112964 | select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus ='F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate) and s_nationkey= n_nationkey and n_name = 'CHINA' group by s_name order by numwait desc, s_name |
| 9 | direct path read | 127263 | select s_name, count(*) as numwait from supplier, lineitem l1, orders, nation where s_suppkey = l1.l_suppkey and o_orderkey = l1.l_orderkey and o_orderstatus = 'F' and l1.l_receiptdate > l1.l_commitdate and exists ( select * from lineitem l2 where l2.l_orderkey = l1.l_orderkey and l2.l_suppkey <> l1.l_suppkey) and not exists ( select * from lineitem l3 where l3.l_orderkey = l1.l_orderkey and l3.l_suppkey <> l1.l_suppkey and l3.l_receiptdate > l3.l_commitdate) and s_nationkey = n_nationkey and n_name = 'EGYPT' group by s_name order by numwait desc, s_name |
| 5 | direct path read | 161197 | select supp_nation, cust_nation, l_year, sum(volume) as revenue from ( select n1.n_name as supp_nation, n2.n_name as cust_nation , extract(year from l_shipdate) as l_year, l_extendedprice * (1 - l_discount) as volume from supplier, lineitem, orders, customer, nation n1, nation n2 where s_suppkey = l_suppkey and o_orderkey = l_orderkey and c_custkey = o_custkey and s_nationkey = n1.n_nationkey and c_nationkey = n2.n_nationkey and ( (n1.n_name = 'JAPAN' and n2.n_name = 'ETHIOPIA') or (n1.n_name = 'ETHIOPIA' and n2.n_name = 'JAPAN')) and l_shipdate between date '1995-01-01' and date '1996-12-31') shipping group by supp_nation, cust_nation, l_year order by supp_nation, cust_nation, l_year |
| 22 | direct path read | 169514 | select s_suppkey, s_name, s_address, s_phone, total_revenue from supplier, revenue0 where s_suppkey = supplier_no and total_revenue = ( select max(total_revenue) from revenue0) order by s_suppkey |
SQL> select count(*),event,sum(TIME_WAITED) TIME_WAITED,SQL_TEXT
from v$active_session_history a, v$sql b
where sample_time > sysdate -1/144
and event is not null
and a.sql_id is not null
and a.sql_id=b.sql_id
group by event, SQL_TEXT
order by TIME_WAITED;
When two tables are equi-partitioned on their join keys, the query optimizer is able to take advantage of partition-wise joins. To make sure that the tables are equi-partitioned, as of Oracle Database 11g reference partitioning can be used. In fact, per definition, with reference partitioning all “related” tables have exactly the same partitioning schema. If [...]
TPC-Hの環境をKingstonのお買い得SSD4本で構築したRAID-0に構築した。
以前のCrystalDiskMarkの結果は以下のとおりだった(SATA直結のRAID-0、exFAT 512K):
比較対象としたのは、Western Digital社のWD6000HLHX。SATAIII 6Gb/秒で接続可能な10,000 RPMハードドライブ
CrystalDiskMarkの結果は以下のとおりだった:
150万件のORDERSを外部表として使う:
SQL> desc orders
名前 NULL? 型
----------------------------------------- -------- ----------------------------
O_ORDERDATE DATE
O_ORDERKEY NOT NULL NUMBER
O_CUSTKEY NOT NULL NUMBER
O_ORDERPRIORITY CHAR(15)
O_SHIPPRIORITY NUMBER
O_CLERK CHAR(15)
O_ORDERSTATUS CHAR(1)
O_TOTALPRICE NUMBER
O_COMMENT VARCHAR2(79)
SQL> select count(*) from orders;
COUNT(*)
----------
1500000
Parallel度は4でテストを行った:
SQL> alter session force parallel query parallel 4;
テストSQL:
SQL> select * from orders where O_COMMENT like '%abcde%';
| 内部表 | 外部表 | |
| SSD | 0.82秒 | 2.15秒 |
| HDD | 3.08秒 | 2.74秒 |
テストは、外部表を作成後マシンのリブートを行い、すべてのキャッシュをクリア後の1回目の結果を取った。
結果Full Scanに於いて、
SSDの内部表、通常のテーブル、が圧倒的に速い。
HDD上では内部も外部も速度は変わらない。
SSDの外部表がHDDの内部表よりも速い。
150万件のORDERテーブルでもdirect path readでScanすると「いずれにせよ高速」だということが分かる。
外部表作成SQLとテストSQLの実行計画:
SQL> CREATE TABLE orders_xt
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY UNLOADER_DATA
6 LOCATION ( 'orders_xt.dmp' )
7 )
8 AS
9 SELECT *
10 FROM orders;
内部表
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 75000 | 8203K| 568 (2)| 00:00:07| | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM)| :TQ10000 | 75000 | 8203K| 568 (2)| 00:00:07| Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | 75000 | 8203K| 568 (2)| 00:00:07| Q1,00 | PCWC | |
|* 4 | TABLE ACCESS FULL| ORDERS | 75000 | 8203K| 568 (2)| 00:00:07| Q1,00 | PCWP | |
--------------------------------------------------------------------------------------------------------------
外部表
------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 20 | 2780 | 3 (0)| 00:00:01 | | | |
| 1 | PX COORDINATOR | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 | 20 | 2780 | 3 (0)| 00:00:01 | Q1,00 | P->S | QC (RAND) |
| 3 | PX BLOCK ITERATOR | | 20 | 2780 | 3 (0)| 00:00:01 | Q1,00 | PCWC | |
|* 4 | EXTERNAL TABLE ACCESS FULL| ORDERS_XT | 20 | 2780 | 3 (0)| 00:00:01 | Q1,00 | PCWP | |
------------------------------------------------------------------------------------------------------------------------
最後に、
TPC-HのORDERSを外部表とするのはリアリティがある。
CSVで定期的に注文情報が更新されそのままDWHに反映されるシステムはスマートだ。
Back on June 17th WordPress 3.0 “Thelonious” was released and it offered up a handful of new features. Just a few days ago (July 29th) the 3.0.1 release went GA so I decided it was time to investigate what the new 3.0 ready themes had to offer. After looking through a handful of themes I decided to give the Magazine Basic theme a try for now. It offered a 1024 pixel wide layout and threaded comments; two of the features I was really looking for. Feel free to share your comments: good, bad or otherwise. Thanks! Here is a capture of the previous version just in case you don’t recall what it looked like (click for full size).
This post originally appeared over at Pythian. There are also some very smart comments over there that you shouldn’t miss, go take a look!
This should have been the easiest task on my todo list: Install Oracle 10.2.0.3 EE standalone on a new Linux RHEL 5 server, later to be used as a standby for a production RAC system. This means 2 lines of “runinstall -silent …”, less than 5 minutes of DBA work and maybe 20 minutes of waiting. I did not expect to spend over 5 hours doing this.
Problems started when I discovered that I don’t have the 10.2.0.3 patchset and another patch that exists on production and should be installed on the standby. I had to wait for my Metalink credentials to be approved for this customer CSI before I could download the patches for them.
“Why don’t you just clone the software from production?” asked a helpful colleague.
Sounds like a great suggestion. I cloned Oracle software before and it is a simple process: tar $ORACLE_HOME, copy the tar file to the new server, untar, run the cloning script which will register the new home with the inventory, and you are done!
In theory, at least.
Here is what actually happened:
What I should have done: (I’m not sure if it is supported by Oracle, but at least it works)
I hope that I’m not the only DBA who always have to find the most difficult way to accomplish a task, and that this post will be useful to others. Perhaps the best piece of advice I can offer is to avoid this type of cloning in the first place.
I finally finished my first Exadata performance troubleshooting article.
This explains one bug I did hit when stress testing an Exadata v2 box, which caused smart scan to go very slow – and how I troubleshooted it:
Thanks to my secret startup company I’ve been way too busy to write anything serious lately, but apparently staying up until 6am helped this time! :-) Anyway, maybe next weekend I can repeat this and write Part 2 in the Exadata troubleshooting series ;-)
Enjoy! Comments are welcome to this blog entry as I haven’t figured out a good way to enable comments in the google sites page I’m using…
1.書き込みキャッシュ
Write Cacheがパフォーマンスに及ぼす影響は今回のテストで分かった。
また、揮発性のWrite Cacheだけの書き込みがブロック障害の原因になる可能性もあるということも気になる。
しかし、それを言っているときりがない。HDD自体にもWrite Cacheはあるのだから
今回のディスク WD6000HLHX
32 MB キャッシュ 600 GB,SATA 6 Gb/s,10,000 RPM
同タイプでRAID用「高ストライピング前提」「少Write Cacheタイプ」のモデルもある
WD3000BLFS 16 MB キャッシュ 300 GB,SATA 3 Gb/s,10,000 RPM
→だから容量が少ない。だから安全性優先でWrite Cacheが少ない。
2.読み込みキャッシュ
ハイブリッド型RAIDカードタイプ
Adaotec社HPより
アダプテックMaxIQ SSD キャッシュ - 8倍のIO/秒
アダプテックMaxIQ SSDキャッシュソフトウェアは、最高のリードパフォーマンスを発揮するために、読み出し頻度の高い(「ホット」)データを見分けて、このデータをSSDキャッシュ内にコピーする特許出願中の「ラーンドパス」アルゴリズムを導入しています。
SSD in HDDタイプ シーゲイト Momentus® XT ドライブ
シーゲートHPより
Adaptive Memory テクノロジは、頻繁に使用されるアプリケーションとデータ・ファイルを知的にモニタし、それらをドライブのソリッド・ステート部分に配置して、素早く呼び出しできるようにします。
Flash PCI Express Cardタイプ(以前紹介したioXtream)
「第4回 Oracle ExadataによるDWH高速化」より
ほとんどのデータを、低コストなストレージに格納しておき、利用頻度の高いデータのみを透過的にFlashメモリに移動します。このキャッシング・アルゴリズムは、現時点では公開されておりませんが、Read効率を求めないREDOログ・ファイルなどに対しては、データのキャッシングは行いません。キャッシュすべきオブジェクトをアプリケーションから明示的に設定することも可能です。
select count(*) ...????
最後に、
2TBのHDDは通常64MBの大きなWrite Cacheが装備されている。
そのため、停電によるブロック障害の可能性が高くなる。
だからSun Flash Cardの中央に白いバッテリーが見える。
そしてExadata(SATA版)では168玉X64MB=1GBのWrite Cacheが書き込み性能を支えている。
DB Optimizer XE product demo from Kyle Hailey on Vimeo.
Recent comments
21 weeks 1 day ago
31 weeks 5 hours ago
32 weeks 5 days ago
35 weeks 6 days ago
38 weeks 1 day ago
47 weeks 5 days ago
49 weeks 2 days ago
50 weeks 2 days ago
50 weeks 3 days ago
1 year 1 week ago