The New Data Dictionary in MySQL 8.0
A data dictionary is the information and metadata (The metadata includes database name, table name, column data type, and access privileges) about the data objects themselves. The data dictionary metadata is needed to run SQL queries. Prior to MySQL 8.0, the data dictionary was stored in metadata files, non-transactional tables, and storage-engine specific transactional tables, and was accessed from the INFORMATION_SCHEMA tables. In 8.0, the data dictionary is stored in transactional tables. The INFORMATION_SCHEMA tables have been integrated as views on data dictionary tables. Next, we discuss the benefits of the data dictionary tables.
Centralized Data Dictionary
The data dictionary that was scattered across metadata files, and non-transactional tables hitherto has been centralized for efficient data dictionary storage, access, and management.
Transactional Tables
The data dictionary is stored in transactional tables in the InnoDB engine, the same transactional tables that are used for any other user-generated transactional tables. The data dictionary tables are stored in the mysql database along with the system tables, which are non-data dictionaries. The same data protection features such as commit, rollback, and crash recovery are available to data dictionary tables as to user tables.
Efficient Queries
Queries on the INFORMATION_SCHEMA tables are more efficient because prior to the data dictionary tables, the server has to create a temporary table for each query on the INFORMATION_SCHEMA. Query optimizer uses indexes on the data dictionary tables for more efficient queries.
Metadata Files Removed
Prior to 8.0, some of the data dictionary was stored in metadata files. File-based storage has some issues such as:
- Slow directory and file scans
- Dependency on the file-system and the associated file-system management and storage space requirements; not to mention the file-related bugs
- Replication has to make use of relatively involved code for a file-system based data dictionary
- Crash recovery has to make use of file-related code
- Not extensible
The slower directory and file scans were replaced with faster lookups from the data dictionary.
Data Dictionary Caching
A data dictionary object cache is used to cache recently used data dictionary objects in memory for faster reuse. Partitions are used to separate the different types of definition objects such as those for a tablespace, a schema, and a table. Table statistics contained in the STATISTICS and TABLES tables are cached to improve query performance.
Combining DDL Operations
With the data dictionary in the InnoDB transactional tables, InnoDB engine based table DDL operations such as CREATE, ALTER, and DROP statements for databases, tablespaces, tables, and indexes can be combined with data dictionary updates, storage engine operations, and binary log writes. Some non-table DDL statements are also combined. As a result, some common DDL statements such as DROP TABLE when called with multiple tables in a single statement either commit fully, or rollback. Prior to 8.0, if one of the tables in a DROP TABLE does not exist, or cannot be dropped, the other table/s are still dropped.
The data dictionary tables do incur some operational and upgrade differences, but the benefits outweigh the overhead.