The purpose of this post if to show how you can automate the creation and pruning of partitioned tables. If you want to read about partitioning I recommend reading our manual.
In short partitioning makes it possible to spread your individual tables across a file system according to the partition rules you specify.
Reasons for partition your tables might be:
- Insert performance, smaller index trees for stable insert throughput.
- Select performance, only read data from selected (aka partitioning pruning) partitions.
- Delete performance, drop partitioning is must quicker than doing range deletes of old data.
Partitioning definitions is part for the CREATE/ALTER table statements, in the samples below we will use RANGE partitions on a DATE column in our table. This design would be appropriate for log/transaction tables where you want to remove old data.
Lets first create a simple table called logs:
CREATE TABLE logs (
id INT AUTO_INCREMENT NOT NULL,
message VARCHAR(500) NOT NULL,
code TINYINT UNSIGNED NOT NULL,
created DATETIME DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id,created)
);
Next step we want to partition this table using the column created, we will use range partitoning and partition data per day like:
ALTER TABLE logs PARTITION BY RANGE COLUMNS(created) (
PARTITION p20200217 VALUES LESS THAN ('2020-02-17'),
PARTITION p20200218 VALUES LESS THAN ('2020-02-18'),
PARTITION p20200219 VALUES LESS THAN ('2020-02-19'),
PARTITION p20200220 VALUES LESS THAN ('2020-02-20'),
PARTITION p20200221 VALUES LESS THAN ('2020-02-21')
);
After the initial setup of our partitioned table it's now time to think about how to make sure we always have new partitions to insert data to and also remove old partitions. You can of course do this as a manual task using ALTER TABLE statements but that will become a tedious and error prune task over time.
To solve this problem we can use events and stored procedures in MySQL, we will also have a helper table to store some meta data of the tables we want to automate partitioning for.
Lets first create a table that will contain some information about the tables we want to automate partition handling for.
CREATE TABLE IF NOT EXISTS `MyPartTables` (
`schema_name` VARCHAR(100) NOT NULL,
`table_name` VARCHAR(100) NOT NULL,
`no_of_future_partitions` INT NOT NULL,
`max_partitions` INT NOT NULL,
PRIMARY KEY (`schema_name`,`table_name`)
);
By looking at column names its quite clear what they mean, you simply insert one row per table you want to automate partition handling for. For our example table above this would look something like:
INSERT INTO MyPartTables VALUES ('ted','logs',2,20);
So, for out logs table in schema ted this means, create 2 partitions in the future and keep a maximum of 20 partitions before starting to remove old partitions.
Next we need to have some stored procedures to handle the create of new partitions and removal of old partitions. Both of these stored procedures will use INFORMATION_SCHEMA.PARTITIONS to look at current state and then add/remove any needed partitions as specified by information in MyPartTables for the specific table.
First lets have a look at AddPartitions, this procedure will fetch all rows from MyPartTables and then add any needed partitions. It's today hard coded to work only with RANGE partitioning using days for tables that look like our test table logs above.
CREATE PROCEDURE `AddPartitions`()
BEGIN
DECLARE done BOOL DEFAULT FALSE;
DECLARE maxp INT DEFAULT 0;
DECLARE pdate INT DEFAULT 0;
DECLARE pname VARCHAR(20);
DECLARE v_table_name VARCHAR(100);
DECLARE v_schema_name VARCHAR(100);
DECLARE v_no_future_part INT;
DECLARE list CURSOR FOR SELECT `table_name`,`schema_name`,`no_of_future_partitions` FROM MyPartTables;
DECLARE CONTINUE HANDLER FOR
SQLSTATE '02000'
SET done = TRUE;
OPEN list;
tloop: LOOP
IF done THEN
CLOSE list;
LEAVE tloop;
END IF;
FETCH list INTO v_table_name,v_schema_name,v_no_future_part;
-- DEBUG SELECT v_table_name,v_schema_name,v_no_future_part;
SET pdate=TO_DAYS(DATE(NOW() + INTERVAL (v_no_future_part) DAY));
SELECT TO_DAYS(TRIM(BOTH "'" FROM MAX(PARTITION_DESCRIPTION))) INTO maxp
FROM INFORMATION_SCHEMA.PARTITIONS
WHERE TABLE_NAME=v_table_name
AND TABLE_SCHEMA=v_schema_name
AND PARTITION_DESCRIPTION!="MAXVALUE";
-- DEBUG' SELECT pdate, maxp;
WHILE pdate > maxp DO
SET maxp = maxp + 1;
SET pname = CONCAT('p', DATE_FORMAT(FROM_DAYS(maxp), '%Y%m%d'));
SET @qry = CONCAT('ALTER TABLE ',v_schema_name,'.',v_table_name, ' ADD PARTITION (PARTITION ',pname,' VALUES LESS THAN ("',DATE_FOR
MAT(FROM_DAYS(maxp),'%Y-%m-%d'),'"))');
-- DEBUG SELECT @qry;
PREPARE stmt FROM @qry;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END WHILE;
END LOOP;
END ;;
Next lets look at the the procedure for removing old partitions, this is structured in the same way as the AddPartitions procedure above but instead of adding partitions it removes any partitions above the limit given in MyPartTables.
CREATE PROCEDURE RemovePartitions ()
BEGIN
DECLARE done BOOL DEFAULT FALSE;
DECLARE v_table_name VARCHAR(100);
DECLARE v_schema_name VARCHAR(100);
DECLARE v_max_partitions INT;
DECLARE v_no_of_partitions_to_remove INT;
DECLARE cur CURSOR FOR SELECT `table_name`,`schema_name`,`max_partitions` FROM MyPartTables;
DECLARE CONTINUE HANDLER FOR
SQLSTATE '02000'
SET done = TRUE;
OPEN cur;
tloop: LOOP
FETCH cur INTO v_table_name,v_schema_name,v_max_partitions;
IF done THEN
CLOSE cur;
LEAVE tloop;
END IF;
-- DEBUG SELECT v_table_name,v_schema_name,v_max_partitions;
SET @qry = CONCAT('SELECT COUNT(DISTINCT(PARTITION_DESCRIPTION)) INTO @v_no_of_partitions ',
'FROM INFORMATION_SCHEMA.PARTITIONS ',
'WHERE TABLE_NAME="',v_table_name, '" ',
'AND TABLE_SCHEMA = "',v_schema_name, '" ');
-- DEBUG SELECT @qry;
PREPARE stmt FROM @qry;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
-- DEBUG SELECT @v_no_of_partitions, v_max_partitions;
IF @v_no_of_partitions > v_max_partitions THEN
SET v_no_of_partitions_to_remove = @v_no_of_partitions - v_max_partitions;
SET @qry = CONCAT('SELECT GROUP_CONCAT(DISTINCT(PARTITION_NAME)) INTO @v_list_of_partitions_to_remove FROM ( ',
'SELECT PARTITION_NAME ',
'FROM INFORMATION_SCHEMA.PARTITIONS ',
'WHERE TABLE_NAME = "',v_table_name, '" ',
'AND TABLE_SCHEMA = "',v_schema_name, '" ',
'AND PARTITION_DESCRIPTION!="MAXVALUE" ',
'ORDER BY PARTITION_DESCRIPTION LIMIT ',v_no_of_partitions_to_remove,
' ) as slabb');
-- DEBUG SELECT @qry;
-- DEBUG SELECT @v_partitions_to_remove;
PREPARE stmt FROM @qry;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
SET @qry = CONCAT('ALTER TABLE ',v_table_name, ' DROP PARTITION ', @v_list_of_partitions_to_remove);
-- DEBUG SELECT @qry;
PREPARE stmt FROM @qry;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END IF;
END LOOP;
END ;;
Now we only need to create two events that will run our add and remove partition procedures above, as we are dealing with daily partitions its enough to run the procedures once per day like:
CREATE EVENT AddPartitions ON SCHEDULE EVERY 1 DAY
DO
BEGIN
CALL AddPartitions();
END ;;
CREATE EVENT RemovePartitions ON SCHEDULE EVERY 1 DAY
DO
BEGIN
CALL RemovePartitions();
END ;;
If you want to try this out there is complete running example here.
This is just an example on how you can automate the procedure of handling your partitioned tables in MySQL, the code in the stored procedures is hard coded to only work with RANGE and daily partitioned tables but you adopt these to work with any kind of partition setups.
Having a metadata table like MyPartTables makes it easy handle all of our partitioned tables.
If the result from the GROUP_CONCAT operation is large you need to increase the group_concat_max_len parameter, I recommend increasing group_concat_max_len to 50000.
Inga kommentarer:
Skicka en kommentar