Some weeks back I helped a customer lower time to bulk-load data into MySQL, they where at the time using a MySQL dumpfile (containing SQL statements) to populate their tables during nightly jobs.
By using LOAD DATA INFILE command and creating secondary indexes after bulk-load of data load time went down by a factor of almost 2x.
My test environment:
DB: MySQL 5.6.23
OS: Ubuntu 14.04
HW: My Toshiba Portege laptop with 2 cores and SSD disk
Commands/tables used in tests:
I created a file named 1000000 that contains two columns and 1 millon rows:
Next step is to run some test, if you do not have a BBWC on your disksystem you might consider setting innodb_flush_log_at_trx_commit to 2 during bulk-load of data.
First let's create the full table and import the 1000000 file to get a baseline:
Result: Loaded 1.000.000 rows in 4.3 seconds (average from 10 runs)
By using LOAD DATA INFILE command and creating secondary indexes after bulk-load of data load time went down by a factor of almost 2x.
My test environment:
DB: MySQL 5.6.23
OS: Ubuntu 14.04
HW: My Toshiba Portege laptop with 2 cores and SSD disk
Commands/tables used in tests:
CREATE TABLE t1PK (i BIGINT UNSIGNED PRIMARY KEY, art VARCHAR(32)) ENGINE=InnoDB; CREATE TABLE t1 (i BIGINT UNSIGNED PRIMARY KEY, art VARCHAR(32) UNIQUE KEY) ENGINE=InnoDB; LOAD DATA INFILE '/home/ted/labb/load-data/1000000' INTO TABLE test.t1 FIELDS TERMINATED BY ','; |
I created a file named 1000000 that contains two columns and 1 millon rows:
$ head -5 1000000 0,test0 1,test1 2,test2 3,test3 4,test4 ..... $ wc -l 1000000 1000000 1000000 |
Next step is to run some test, if you do not have a BBWC on your disksystem you might consider setting innodb_flush_log_at_trx_commit to 2 during bulk-load of data.
First let's create the full table and import the 1000000 file to get a baseline:
mysql> set global innodb_flush_log_at_trx_commit=2; mysql> CREATE TABLE t1 (i BIGINT UNSIGNED PRIMARY KEY, art VARCHAR(32) UNIQUE KEY) ENGINE=InnoDB; mysql> LOAD DATA INFILE '/home/ted/labb/load-data/1000000' INTO TABLE test.t1 FIELDS TERMINATED BY ','; mysql> set global innodb_flush_log_at_trx_commit=1; |
Result: Loaded 1.000.000 rows in 4.3 seconds (average from 10 runs)
Next let's try to load the file into table with only PK and then add index afterwards:
mysql> set global innodb_flush_log_at_trx_commit=2; mysql> CREATE TABLE t1PK (i BIGINT UNSIGNED PRIMARY KEY, art VARCHAR(32)) ENGINE=InnoDB; mysql> LOAD DATA INFILE '/home/ted/labb/load-data/1000000' INTO TABLE test.t1PK FIELDS TERMINATED BY ','; mysql> ALTER TABLE t1PK ADD INDEX(art); mysql> set global innodb_flush_log_at_trx_commit=1; |
Result: Loaded 1.000.000 rows in 2.1 seconds (average from 10 runs) and another 1.9 seconds to add secondary index, total time 4 seconds in average. This improvement is due to InnoDB Fast Index Creation
Conclusion:
In my environment it took almost 6 seconds to load a dumpfile (using innodb_flush_log_at_trx_commit=2) of table t1 into mysql, this will depend on how many keys you have and many other factors but as you can see using LOAD DATA INFILE command is a fast way to load data into MySQL!
Inga kommentarer:
Skicka en kommentar