Python+Peewee ORM+SQLiteで1億レコード最速insertチャレンジ

イワシの大群が特に大規模になったとき、それをサーディンランと呼び、個体数は数千万とも数億とも数十億ともいわれるのだそうです。そのような生物量がそれほど密集したとき酸素濃度は足りるんだろうかと心配です。

さて、データ処理の一環で億オーダーのレコード数(ディスク上で~100GB)をもつSQLiteテーブルを構築しようということになり、データ自体は生CSVがある状態でこれをなるべく短時間でDBに流し込むという雑なチャレンジをしてみたので、雑な記録をまとめておきました。

できるだけPythonで閉じさせたかったため、C++などで書くという選択肢はなし。
またDBサイズがサイズなのでインメモリではなくファイルに吐き出します。
またスキーマ定義をさくっとやりたい・DB構築後の扱いを楽にしたいということで、PythonベースのORM Peeweeを使用することにしています。なおPeeweeについて詳細は触れないですが、(ユーザビリティ的な意味で)とても軽量でありまたPythonicであるので個人的にはとても好きです。selectのwhereの比較演算子を文字列で渡すとかしなくていいのです。

DB関連は全くの素人なので、もっといい方法があったらぜひアドバイスをいただきたいです。

完全なコードをGist上のipynbにまとめています。

Continue reading

Core i7 7700Kとi7 9700KのUnixBench

あけましておめでとうございます。毎年が魚年です。

手元のデスクトップPCのベンチマークをUnixBenchでとってみたのでメモ。

項目 自宅新PC 業務PC 自宅旧PC
CPU i7 9700K
8C8T 3.6GHz TB4.9GHz
i7 7700K
4C8T 4.2GHz TB4.5GHz
i5 3570
4C4T 3.4GHz TB3.8GHz
メモリ DDR4-2666 8GBx2 (16GB) DDR4-2400 16GBx2 (32GB) DDR3-1600 8GBx2 (16GB)
SSD WD Black 500GB High-Performance NVMe Crucial MX300 525GB N/A (Live USB)
(本来はCrucial M4 256GB)
OS Ubuntu MATE 18.04.1 Ubuntu MATE 18.04.1 Ubuntu MATE 18.04.1 (Liveモード)
GCC 7.3.0 7.3.0 7.3.0

6年戦士の自宅旧PCがLive USBなのは、処分前でディスクをすでに抜いてしまっため。File Copy関連は参考にならない(低く出ているはず)です。

いずれもCPU scaling governerをperformanceにしています。

% sudo cpufreq-set -g performance

なおIntel CPU搭載のUbuntuであればperformanceとpowersaveモードが利用可能ですが、powersaveにしてもスコアは誤差レベルでしか変わりませんでした。

i7 9700K

X Benchmarks (Version 5.1.3)

   System: sillaginoid: GNU/Linux
   OS: GNU/Linux -- 4.15.0-43-generic -- #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018
   Machine: x86_64 (x86_64)
   Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
   CPU 0: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz (7200.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
   CPU 1: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz (7200.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
   CPU 2: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz (7200.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
   CPU 3: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz (7200.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
   CPU 4: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz (7200.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
   CPU 5: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz (7200.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
   CPU 6: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz (7200.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
   CPU 7: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz (7200.0 bogomips)
          Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
   00:01:57 up 2 min,  4 users,  load average: 0.14, 0.05, 0.01; runlevel 2018-12-31

 ------------------------------------------------------------------------
Benchmark Run: 火  1月 01 2019 00:01:57 - 00:32:27
8 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       62027018.9 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     3567.3 MWIPS (25.1 s, 7 samples)
Execl Throughput                               5934.1 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       1738508.8 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          457482.9 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       4629351.3 KBps  (30.0 s, 2 samples)
Pipe Throughput                             2933013.1 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 282824.7 lps   (10.0 s, 7 samples)
Process Creation                               2678.1 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                   3386.2 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   5832.1 lpm   (60.0 s, 2 samples)
System Call Overhead                        4218840.7 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   62027018.9   5315.1
Double-Precision Whetstone                       55.0       3567.3    648.6
Execl Throughput                                 43.0       5934.1   1380.0
File Copy 1024 bufsize 2000 maxblocks          3960.0    1738508.8   4390.2
File Copy 256 bufsize 500 maxblocks            1655.0     457482.9   2764.2
File Copy 4096 bufsize 8000 maxblocks          5800.0    4629351.3   7981.6
Pipe Throughput                               12440.0    2933013.1   2357.7
Pipe-based Context Switching                   4000.0     282824.7    707.1
Process Creation                                126.0       2678.1    212.5
Shell Scripts (1 concurrent)                     42.4       3386.2    798.6
Shell Scripts (8 concurrent)                      6.0       5832.1   9720.1
System Call Overhead                          15000.0    4218840.7   2812.6
                                                                   ========
System Benchmarks Index Score                                        1977.0

 ------------------------------------------------------------------------
Benchmark Run: 火  1月 01 2019 00:32:27 - 01:02:03
8 CPUs in system; running 8 parallel copies of tests

Dhrystone 2 using register variables      360870045.0 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                    27462.9 MWIPS (19.2 s, 7 samples)
Execl Throughput                              30488.8 lps   (29.7 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       1423574.4 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          368401.6 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       4716377.0 KBps  (30.0 s, 2 samples)
Pipe Throughput                            17216147.8 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                2686191.6 lps   (10.0 s, 7 samples)
Process Creation                              84876.2 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  59599.6 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   9173.1 lpm   (60.0 s, 2 samples)
System Call Overhead                        7779244.7 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0  360870045.0  30922.9
Double-Precision Whetstone                       55.0      27462.9   4993.3
Execl Throughput                                 43.0      30488.8   7090.4
File Copy 1024 bufsize 2000 maxblocks          3960.0    1423574.4   3594.9
File Copy 256 bufsize 500 maxblocks            1655.0     368401.6   2226.0
File Copy 4096 bufsize 8000 maxblocks          5800.0    4716377.0   8131.7
Pipe Throughput                               12440.0   17216147.8  13839.3
Pipe-based Context Switching                   4000.0    2686191.6   6715.5
Process Creation                                126.0      84876.2   6736.2
Shell Scripts (1 concurrent)                     42.4      59599.6  14056.5
Shell Scripts (8 concurrent)                      6.0       9173.1  15288.5
System Call Overhead                          15000.0    7779244.7   5186.2
                                                                   ========
System Benchmarks Index Score                                        7787.0

アイドル時は800MHzまでクロックが落ちるのでいいですね。
CPUファンをちょっと良いやつにしてみたので、CPU温度もアイドル時30度・UnixBench時や動画並列エンコードなど最大負荷をかけても50度にはいかない程度にとどまるので、とても安心感あります。空冷でもちゃんとしたやつならハイエンドCPUに対しても十分みたいですね。

この構成で1070Tiを載せてパーツ代全部で税込み17万円ぐらい。BTOだったら20万を下回ることはない構成(2018年12月現在)なので自作だとやはり結構抑えられますね。PC組むの久々なのでまる1日仕事にはなりましたが。

i7 7700K

Benchmark Run: 火  1月 01 2019 03:48:55 - 04:19:10
8 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       56530085.0 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     3475.1 MWIPS (23.2 s, 7 samples)
Execl Throughput                               4584.7 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       1132522.6 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          293491.9 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       2832499.6 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1581458.1 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 248860.2 lps   (10.0 s, 7 samples)
Process Creation                              18400.0 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  15613.3 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   5388.5 lpm   (60.0 s, 2 samples)
System Call Overhead                        1167379.7 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   56530085.0   4844.1
Double-Precision Whetstone                       55.0       3475.1    631.8
Execl Throughput                                 43.0       4584.7   1066.2
File Copy 1024 bufsize 2000 maxblocks          3960.0    1132522.6   2859.9
File Copy 256 bufsize 500 maxblocks            1655.0     293491.9   1773.4
File Copy 4096 bufsize 8000 maxblocks          5800.0    2832499.6   4883.6
Pipe Throughput                               12440.0    1581458.1   1271.3
Pipe-based Context Switching                   4000.0     248860.2    622.2
Process Creation                                126.0      18400.0   1460.3
Shell Scripts (1 concurrent)                     42.4      15613.3   3682.4
Shell Scripts (8 concurrent)                      6.0       5388.5   8980.8
System Call Overhead                          15000.0    1167379.7    778.3
                                                                   ========
System Benchmarks Index Score                                        1913.0

 ------------------------------------------------------------------------
Benchmark Run: 火  1月 01 2019 03:48:55 - 04:19:10
8 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables       56530085.0 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                     3475.1 MWIPS (23.2 s, 7 samples)
Execl Throughput                               4584.7 lps   (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       1132522.6 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          293491.9 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       2832499.6 KBps  (30.0 s, 2 samples)
Pipe Throughput                             1581458.1 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                 248860.2 lps   (10.0 s, 7 samples)
Process Creation                              18400.0 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  15613.3 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   5388.5 lpm   (60.0 s, 2 samples)
System Call Overhead                        1167379.7 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0   56530085.0   4844.1
Double-Precision Whetstone                       55.0       3475.1    631.8
Execl Throughput                                 43.0       4584.7   1066.2
File Copy 1024 bufsize 2000 maxblocks          3960.0    1132522.6   2859.9
File Copy 256 bufsize 500 maxblocks            1655.0     293491.9   1773.4
File Copy 4096 bufsize 8000 maxblocks          5800.0    2832499.6   4883.6
Pipe Throughput                               12440.0    1581458.1   1271.3
Pipe-based Context Switching                   4000.0     248860.2    622.2
Process Creation                                126.0      18400.0   1460.3
Shell Scripts (1 concurrent)                     42.4      15613.3   3682.4
Shell Scripts (8 concurrent)                      6.0       5388.5   8980.8
System Call Overhead                          15000.0    1167379.7    778.3
                                                                   ========
System Benchmarks Index Score                                        1913.0

 ------------------------------------------------------------------------
Benchmark Run: 火  1月 01 2019 04:19:10 - 04:50:33
8 CPUs in system; running 8 parallel copies of tests

Dhrystone 2 using register variables      325161813.9 lps   (10.0 s, 7 samples)
Double-Precision Whetstone                    19474.7 MWIPS (28.5 s, 7 samples)
Execl Throughput                              24605.3 lps   (29.9 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks       1695174.5 KBps  (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks          445061.6 KBps  (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks       4865765.0 KBps  (30.0 s, 2 samples)
Pipe Throughput                             9554265.8 lps   (10.0 s, 7 samples)
Pipe-based Context Switching                1938914.1 lps   (10.0 s, 7 samples)
Process Creation                              86469.2 lps   (30.0 s, 2 samples)
Shell Scripts (1 concurrent)                  51215.3 lpm   (60.0 s, 2 samples)
Shell Scripts (8 concurrent)                   7675.0 lpm   (60.0 s, 2 samples)
System Call Overhead                        7493113.4 lps   (10.0 s, 7 samples)

System Benchmarks Index Values               BASELINE       RESULT    INDEX
Dhrystone 2 using register variables         116700.0  325161813.9  27863.1
Double-Precision Whetstone                       55.0      19474.7   3540.9
Execl Throughput                                 43.0      24605.3   5722.2
File Copy 1024 bufsize 2000 maxblocks          3960.0    1695174.5   4280.7
File Copy 256 bufsize 500 maxblocks            1655.0     445061.6   2689.2
File Copy 4096 bufsize 8000 maxblocks          5800.0    4865765.0   8389.2
Pipe Throughput                               12440.0    9554265.8   7680.3
Pipe-based Context Switching                   4000.0    1938914.1   4847.3
Process Creation                                126.0      86469.2   6862.6
Shell Scripts (1 concurrent)                     42.4      51215.3  12079.1
Shell Scripts (8 concurrent)                      6.0       7675.0  12791.7
System Call Overhead                          15000.0    7493113.4   4995.4
                                                                   ========
System Benchmarks Index Score                                        6854.2

i5 3570

BYTE UNIX Benchmarks (Version 5.1.3)</pre>
System: ubuntu-mate: GNU/Linux
OS: GNU/Linux -- 4.15.0-29-generic -- #31-Ubuntu SMP Tue Jul 17 15:39:52 UTC 2018
Machine: x86_64 (x86_64)
Language: en_US.utf8 (charmap="UTF-8", collate="UTF-8")
CPU 0: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz (6799.9 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 1: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz (6799.9 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 2: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz (6799.9 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
CPU 3: Intel(R) Core(TM) i5-3570 CPU @ 3.40GHz (6799.9 bogomips)
Hyper-Threading, x86-64, MMX, Physical Address Ext, SYSENTER/SYSEXIT, SYSCALL/SYSRET, Intel virtualization
18:56:39 up 2:06, 1 user, load average: 0.00, 0.00, 0.09; runlevel 2018-12-31

 ------------------------------------------------------------------------
Benchmark Run: Mon Dec 31 2018 18:56:39 - 19:25:38
4 CPUs in system; running 1 parallel copy of tests

Dhrystone 2 using register variables 46001947.1 lps (10.0 s, 7 samples)
Double-Precision Whetstone 3571.7 MWIPS (15.9 s, 7 samples)
Execl Throughput 5504.8 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 1080708.2 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 282868.5 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 2941169.9 KBps (30.0 s, 2 samples)
Pipe Throughput 1295184.1 lps (10.0 s, 7 samples)
Pipe-based Context Switching 209435.3 lps (10.0 s, 7 samples)
Process Creation 13663.2 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 9563.6 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 3438.4 lpm (60.0 s, 2 samples)
System Call Overhead 936350.5 lps (10.0 s, 7 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 46001947.1 3941.9
Double-Precision Whetstone 55.0 3571.7 649.4
Execl Throughput 43.0 5504.8 1280.2
File Copy 1024 bufsize 2000 maxblocks 3960.0 1080708.2 2729.1
File Copy 256 bufsize 500 maxblocks 1655.0 282868.5 1709.2
File Copy 4096 bufsize 8000 maxblocks 5800.0 2941169.9 5071.0
Pipe Throughput 12440.0 1295184.1 1041.1
Pipe-based Context Switching 4000.0 209435.3 523.6
Process Creation 126.0 13663.2 1084.4
Shell Scripts (1 concurrent) 42.4 9563.6 2255.6
Shell Scripts (8 concurrent) 6.0 3438.4 5730.6
System Call Overhead 15000.0 936350.5 624.2
========
System Benchmarks Index Score 1636.8

 ------------------------------------------------------------------------
Benchmark Run: Mon Dec 31 2018 19:25:38 - 19:54:23
4 CPUs in system; running 4 parallel copies of tests

Dhrystone 2 using register variables 172882513.2 lps (10.0 s, 7 samples)
Double-Precision Whetstone 14921.4 MWIPS (14.4 s, 7 samples)
Execl Throughput 19555.6 lps (30.0 s, 2 samples)
File Copy 1024 bufsize 2000 maxblocks 3048166.0 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 849643.4 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 7705026.7 KBps (30.0 s, 2 samples)
Pipe Throughput 4863898.8 lps (10.0 s, 7 samples)
Pipe-based Context Switching 1002740.5 lps (10.0 s, 7 samples)
Process Creation 47665.8 lps (30.0 s, 2 samples)
Shell Scripts (1 concurrent) 28147.8 lpm (60.0 s, 2 samples)
Shell Scripts (8 concurrent) 4407.1 lpm (60.0 s, 2 samples)
System Call Overhead 3330777.8 lps (10.0 s, 7 samples)

System Benchmarks Index Values BASELINE RESULT INDEX
Dhrystone 2 using register variables 116700.0 172882513.2 14814.3
Double-Precision Whetstone 55.0 14921.4 2713.0
Execl Throughput 43.0 19555.6 4547.8
File Copy 1024 bufsize 2000 maxblocks 3960.0 3048166.0 7697.4
File Copy 256 bufsize 500 maxblocks 1655.0 849643.4 5133.8
File Copy 4096 bufsize 8000 maxblocks 5800.0 7705026.7 13284.5
Pipe Throughput 12440.0 4863898.8 3909.9
Pipe-based Context Switching 4000.0 1002740.5 2506.9
Process Creation 126.0 47665.8 3783.0
Shell Scripts (1 concurrent) 42.4 28147.8 6638.6
Shell Scripts (8 concurrent) 6.0 4407.1 7345.2
System Call Overhead 15000.0 3330777.8 2220.5
========
System Benchmarks Index Score 5193.7

所感

i5 3570→i7 9700Kであっても倍精度演算(Whetstone MWIPS)が実はコア当たりだと全然速くなってないらしいのが意外。SIMDでなければ1FLOP/1clockに漸近していくからかな?
OS関連、プロセスとかシステムコールとかが圧倒的に速くなってるみたい。日頃の動作で差を感じることは多そう。