Course
Materialized views are a powerful feature in SQL databases that help optimize query performance by storing the results of a query physically on disk, offering faster query performance by reducing recomputation. This makes them particularly useful for handling complex, resource-intensive queries that involve joins, aggregations, and large datasets.
As we get started, I recommend taking DataCamp’s Introduction to SQL course and SQL Fundamentals skill track to learn the basics of SQL and how to extract data using queries. The SQL Basics Cheat Sheet will be a helpful guide for common SQL functions for filtering and aggregating data.
What are SQL Materialized Views?
Materialized views are a special type of database object that physically stores a query's results rather than calculating them on the fly like regular views. While a regular SQL view is a saved SQL query that generates its results dynamically each time it is accessed, a materialized view precomputes and stores the data in a table-like structure.
By storing the result set on disk, materialized views can significantly reduce the load on a database, improve query performance, and help streamline the processing of computationally intensive operations.
Creating a Materialized View in SQL
The process of creating a materialized view involves using the CREATE MATERIALIZED VIEW syntax, which varies slightly across different SQL databases. The following methods show how to create materialized views in SQL Server, PostgreSQL, and Oracle.
Materialized view in PostgreSQL
In PostgreSQL, you can create a materialized view using the following syntax. This example creates a materialized view called sales_summary that aggregates each product's total quantity and revenue.
-- Create a materialized view to summarize sales data
CREATE MATERIALIZED VIEW sales_summary AS
SELECT product_id,
SUM(quantity) AS total_quantity,
SUM(price * quantity) AS total_revenue
FROM sales
GROUP BY product_id;
Materialized view in SQL Server
In SQL Server, materialized views are called "Indexed Views." The underlying tables must meet specific requirements to create an indexed view, such as enabling the WITH SCHEMABINDING option.
The WITH SCHEMABINDING option ensures the schema cannot change while the indexed view exists. A unique clustered index must be created for the view to be materialized.
-- Create an indexed view with schema binding to summarize sales data
CREATE VIEW sales_summary
WITH SCHEMABINDING
AS
SELECT product_id,
COUNT_BIG(*) AS record_count,
SUM(ISNULL(quantity, 0)) AS total_quantity,
SUM(ISNULL(price, 0) * ISNULL(quantity, 0)) AS total_revenue
FROM sales
GROUP BY product_id;
GO
-- Create a unique clustered index to materialize the view
CREATE UNIQUE CLUSTERED INDEX IX_sales_summary
ON sales_summary (product_id);
GO
If you want to learn more about SQL Server, I recommend checking out our SQL Server Fundamentals skill track to familiarize yourself with different SQL skills for data analysis.
Materialized view in Oracle
The syntax for creating materialized views in Oracle is similar to that of the PostgreSQL database. We can also specify refresh options, such as ON DEMAND or ON COMMIT.
-- Create a materialized view to summarize sales data
CREATE MATERIALIZED VIEW sales_summary
BUILD IMMEDIATE
REFRESH FAST ON DEMAND
AS
SELECT product_id, SUM(quantity) AS total_quantity, SUM(price * quantity) AS total_revenue
FROM sales
GROUP BY product_id;
Materialized view in distributed databases
Materialized views can also specify distribution methods for better performance for databases like Azure Synapse or Amazon Redshift that support distributed data warehouses.
Hash Distribution for Amazon Redshift
The query below creates a materialized view called sales_summary that aggregates total quantity and revenue by product_id. The DISTSTYLE KEY and DISTKEY(product_id) options ensure that data is distributed across nodes based on product_id, improving performance for queries that join on this column.
-- Create a materialized view with key-based distribution for efficient joins
CREATE MATERIALIZED VIEW sales_summary
DISTSTYLE KEY
DISTKEY(product_id)
AS
SELECT product_id,
SUM(quantity) AS total_quantity,
SUM(price * quantity) AS total_revenue
FROM sales
GROUP BY product_id;
Round-Robin Distribution for Azure Synapse
This query creates a materialized view named sales_summary that aggregates total quantity and revenue by product_id. The ROUND_ROBIN distribution evenly spreads data across nodes, which is useful for scenarios that do not rely heavily on joins.
-- Create a materialized view with round-robin distribution for balanced data storage
CREATE MATERIALIZED VIEW sales_summary
WITH (DISTRIBUTION = ROUND_ROBIN)
AS
SELECT product_id,
SUM(quantity) AS total_quantity,
SUM(price * quantity) AS total_revenue
FROM sales
GROUP BY product_id;Associate Data Engineer in SQL
Refreshing a Materialized View in SQL
Data in materialized views can be refreshed to stay updated with the underlying tables. The choice of refresh method depends on the business requirements and performance considerations of the particular database. Let us look at the following data refresh methods of materialized views.
Manual refresh
In manual refresh, the materialized view is refreshed only when explicitly requested by the user. This approach gives the most control over when data is updated, making it suitable for scenarios where data changes infrequently, or updates are performed during off-peak hours.
The following query shows the manual refresh method in PostgreSQL.
REFRESH MATERIALIZED VIEW sales_summary;
Periodic refresh
The materialized view is automatically refreshed at specified intervals during the refresh period, ensuring data is up-to-date without user intervention. This method is useful for time-sensitive applications where data needs to be relatively current.
The example below shows how to include periodic refreshes in Oracle by defining refresh schedules directly in the CREATE MATERIALIZED VIEW statement. The refresh time is set at hourly intervals.
-- Create a materialized view to aggregate sales data
-- Set to refresh completely every hour
CREATE MATERIALIZED VIEW sales_summary
REFRESH COMPLETE START WITH (SYSDATE) NEXT (SYSDATE + 1/24)
AS
SELECT product_id,
SUM(quantity) AS total_quantity,
SUM(price * quantity) AS total_revenue
FROM sales
GROUP BY product_id;
On-demand refresh
The on-demand refresh occurs whenever the underlying data changes, typically through a trigger mechanism. This ensures that the materialized view always contains the updated data. For example, in PostgreSQL, triggers can be set up to refresh the view when changes occur in the underlying table.
Full vs. incremental refreshes
You can refresh the data in full or incremental refreshes in materialized views. The following table summarizes the two methods and the use cases.
| Refresh Type | Description | Advantages | Drawbacks |
|---|---|---|---|
| Full Refresh | Reloads the entire dataset, replacing all existing data in the view | - Simple to implement - Rebuilds the entire view |
- Resource-intensive for large datasets - Longer refresh times |
| Incremental Refresh | Updates only the changed portions of the view | - More efficient, processing only modified data - Suitable for large datasets with frequent changes |
- Requires additional setup (e.g., logs to track changes) - Not always supported for all queries |
Best Practices for Materialized Views
When using materialized views, it is important to consider the following practices for optimal usage.
- Choosing the Right Queries to Materialize: Materialize complex, resource-intensive queries such as joins, aggregations, and subqueries. Such queries would benefit from precomputed results reducing the load on the database.
- Balancing Data Freshness and Performance: Depending on your use case, choose the appropriate refresh strategy, such as manual, periodic, or on-demand refresh methods. Use incremental refresh to reduce computational load and monitor data change frequency to decide refresh intervals.
- Using Materialized Views to Optimize Query-Heavy Workloads: Leverage materialized views for BI reports and dashboards where quick response times are essential. You can also index the columns used in materialized views for faster filtering and sorting.
Materialized Views in Different Database Systems
As you have seen, different database systems offer varying support for materialized views. The following table summarizes the unique features and limitations of materialized views in these databases.
| Database System | Refresh Methods | Incremental Refresh | Automatic Refresh | Special Features/Limitations |
|---|---|---|---|---|
| PostgreSQL | Manual (REFRESH MATERIALIZED VIEW) | No | No | Lacks native incremental refresh. Manual scheduling is needed. |
| SQL Server | Automatic (Indexed Views) | Yes (Automatic Sync) | Yes |
Requires |
| Oracle | Manual, On Commit, Scheduled | Yes (Fast Refresh) | Yes | Supports fast refresh, partitioning, and parallelism. Requires view logs. |
| Amazon Redshift | Manual, Scheduled | Yes | Yes | Supports distributed data with hash or round-robin distribution. |
| MySQL | Not Supported Natively | No | No | Workarounds needed (e.g., temporary tables, third-party tools). |
| Azure Synapse | Manual, Scheduled | Yes | Yes | Allows for different distribution strategies for optimization. |
If you use SQL Server as your preferred database, I recommend taking DataCamp’s Introduction to SQL Server course to master the basics of Microsoft SQL Server for data analysis. Also, check out our SQL Server Developer career track to understand how to optimize queries and troubleshoot issues in SQL Server.
Additional Things to Consider
While SQL materialized views are useful for query optimization, they also come with some challenges and limitations. The following are the common issues with materialized views and how to address them.
- Storage Overhead: Materialized views store query results physically on disk, which increases storage requirements. To avoid consuming unnecessary storage space, only materialize views for resource-intensive queries and partition the materialized views for large datasets.
- Update Costs and Refresh Overhead: Keeping materialized views in sync with the underlying tables can be resource-intensive, especially for views that require frequent updates or involve complex calculations. To avoid the refresh overhead, use incremental refresh where supported or set the appropriate refresh intervals when database usage is lower.
- Data Consistency and Synchronization: Materialized views can become outdated if the underlying data changes frequently, leading to stale data issues. To avoid this problem, select the appropriate refresh strategy and monitor the data changes to adjust the refresh strategy as needed.
- Maintenance Overhead: Materialized views require ongoing maintenance, such as setting appropriate refresh schedules, monitoring storage usage, and tracking dependencies on the underlying tables. To overcome this challenge, always use automated refresh schedules, monitor system performance, and set up alerts for failed refreshes.
Conclusion
Materialized views are useful in SQL databases for optimizing query performance. They store the query's results physically on disk, offering faster query performance by reducing recomputation. This feature makes the materialized views useful for handling complex, resource-intensive queries that involve joins, aggregations, and large datasets. Understanding how to implement materialized views in different databases will help you enhance your query and database optimization skills.
If you are looking to advance your SQL skills, I recommend trying out DataCamp’s Associate Data Analyst in SQL career track to become a proficient data analyst. The Reporting in SQL course will also help you become proficient in building complex reports and dashboards for effective data presentation. Finally, you should obtain the SQL Associate Certification to showcase your mastery in using SQL to solve business problems and stand out among other professionals.
Become SQL Certified
FAQs
What is a materialized view in SQL?
A materialized view is a database object that physically stores the result of a query, optimizing performance by avoiding recomputation.
How is a materialized view different from a regular view?
Unlike regular views, which dynamically retrieve data on each access, materialized views store data as physical tables, allowing for faster query execution.
What is the difference between full and incremental refresh?
A full refresh recalculates the entire view, while an incremental refresh updates only the changed data, improving efficiency.
Which databases support materialized views?
PostgreSQL, SQL Server (indexed views), Oracle, Amazon Redshift, and Azure Synapse Analytics support materialized views, each with different features and limitations, while MySQL lacks native support.
Do materialized views consume additional storage?
Yes, since they store data physically, materialized views increase storage requirements.
