This is the fifth part of the solutions for interval queries in SQL Server. I know I wrote in the first one that there would be four solutions; however, I found out another one while writing posts about other four. For an introduction, please refer to the blog post Interval Queries in SQL Server Part 1 (http://blogs.solidq.com/dsarka/Post.aspx?ID=149&title=Interval+Queries+in+SQL+Server+Part+1). You can find the second part of the solutions in the blog post Interval Queries in SQL Server Part 2 (http://blogs.solidq.com/dsarka/Post.aspx?ID=150&title=Interval+Queries+in+SQL+Server+Part+2), the third part in the blog post Interval Queries in SQL Server Part 3 (http://blogs.solidq.com/dsarka/Post.aspx?ID=151&title=Interval+Queries+in+SQL+Server+Part+3), and the fourth part in the blog post Interval Queries in SQL Server Part 4 (http://blogs.solidq.com/dsarka/Post.aspx?ID=152&title=Interval+Queries+in+SQL+Server+Part+4). Note that you also need to read an excellent article by Itzik Ben-Gan wrote on interval queries in SQL Server (http://sqlmag.com/t-sql/sql-server-interval-queries) by using the Relational Interval Tree model. I am using the tables and data Itzik has prepared. In order to test the solutions, you can download the code from Itzik's article by using the link in this paragraph.
I described the solution that uses the unpacked form of the intervals in part 3. The solution is very good from the querying perspective, the performance is great, and it does not depend on the data distribution, specifically does not depend on length distribution of the intervals, like the solutions I presented in parts 1 and 4 do. However, the unpacked intervals solution can become a nightmare from the maintenance perspective, because the number of rows in the unpacked version of the table or indexed view could become enormous.
I decided to rethink the unpacked intervals solution. Do I really need all unit intervals, i.e. all points from an interval, stored in a table or indexed view? The main goal is that the WHERE clause of the query that searches for the overlapping intervals with a given interval has to be supported by a single index. This way, SQL Server Query Optimizer can find an efficient plan. I realized that I actually need only the boundaries of an interval, the lower and the upper columns, stored in a single column. Of course, this means that I need only two rows per interval. Please note: two rows only, no matter of the length of the interval. I realized that instead of using the unpacked form I can use the unpivoted form. Remember the table that represents the traditional solution, which is presented in part 1 of this series? Here is the code, just to remind you; I am not showing the code that populates this table here.
CREATE
TABLE
dbo.Intervals
(
id
INT
NOT
NULL,
lower
INT
NOT
NULL,
upper
INT
NOT
NULL,
CONSTRAINT
CHK_Intervals_upper_gteq_lower
CHECK(upper >=
lower)
);
I don't want to change the structure of this table, because this is the correct representation of the intervals. Therefore, I have to create a redundant table to store the unpivoted intervals.
CREATE
TABLE
dbo.Intervals_Unpivoted_Table
(
id
INT
NOT
NULL,
pointtype
CHAR(1)
NOT
NULL,
point
INT
NOT
NULL,
);
GO
ALTER
TABLE
dbo.Intervals_Unpivoted_Table
ADD
CONSTRAINT PK_Intervals_Unpivoted_Table
PRIMARY
KEY(point, pointtype, id);
GO
For the overlapping intervals query, all I would need would be the id and the point columns. However, for other type of queries, and in order to have a key, I also added the pointtype column. This column will hold a single letter 'b' denoting that this is the beginning point of an interval or letter 'e' denoting that this is the end of an interval. Columns id and point are not enough to enforce uniqueness; in a unit interval, the lower and the upper boundaries are the same, and thus I would get a duplicate key. Note also the order of the columns in the key. The point column is in the first position, because this is going to be the most useful column for seeks. Let's now populate the table. I am not too big fan of the UNPIVOT operator; that's why I am doing the unpivoting manually, with help of the CROSS JOIN operator and the CASE expression.
INSERT
INTO
dbo.Intervals_Unpivoted_Table
WITH(TABLOCK)
(id, pointtype, point)
SELECT id, pointtype,
CASE
WHEN
pointtype =
'b' THEN
lower
ELSE
upper
END
AS
point
FROM
dbo.Intervals CROSS
JOIN
(SELECT
'b'
UNION
SELECT
'e')
AS
be(pointtype);
GO
It took less than 40 seconds to insert the data on my notebook. Therefore, the insert is very efficient. Of course, I am inserting 20,000,000 rows only, and not over 100,000,000 rows, as with the unpacked intervals solution. Of course, in a production system, you would need to maintain this redundant dbo.Intervals_Unpivoted_Table table. You would need to create DML triggers on the source dbo.Intervals table. Now let's check the performance of the query that searches for overlapping intervals in the middle of the data.
-- query
SET
STATISTICS
IO
ON;
SET
STATISTICS
TIME
ON;
GO
-- middle of data
DECLARE
@l
AS
INT
= 5000000,
@u
AS
INT
= 5000020;
SELECT
DISTINCT
id
FROM
dbo.Intervals_Unpivoted_Table
WHERE
point <= @u
AND
point
>=
@l
OPTION (RECOMPILE);
GO
-- logical reads: 3, CPU time: 0 ms
You can see that the query performs really excellent. I will leave to you to check how the query performs when you are searching for the overlapping intervals at the beginning or at the end of the data. I tested this, and I get three logical reads constantly.
Can I maintain this redundant table without the DML triggers? You can imagine that the answer is yes, otherwise I would not write this questionJ This is possible if I replace the table with an indexed view. I am using the insert query for the view definition. However, I need to change the query definition, because indexed views still do not allow derived tables. Therefore, instead of the derived table, I create a permanent table called dbo.Matrix and populate it with two rows. Please check carefully the values I am inserting.
CREATE
TABLE
dbo.Matrix
(l
INT NOT
NULL,
u
INT
NOT
NULL,
pointtype
CHAR(1)
NOT
NULL);
GO
INSERT
INTO
dbo.Matrix
(l, u, pointtype)
VALUES (1, 0,
'b'),
(0, 1,
'e');
I also decided to change the calculation of the values for the point column. Instead of the CASE expression, with help of the dbo.Matrix table, I am using simple arithmetic to calculate it. I do the calculation in the body of the view I am indexing. The following code shows the view creation and population.
CREATE
VIEW
dbo.Intervals_Unpivoted_View
WITH
SCHEMABINDING
AS
SELECT
id, pointtype,
lower * l
+
upper
*
u
AS
point
FROM
dbo.Intervals
CROSS
JOIN
dbo.Matrix;
GO
-- Index the view
CREATE
UNIQUE
CLUSTERED
INDEX
PK_Intervals_Unpivoted_View
ON dbo.Intervals_Unpivoted_View
(point, pointtype, id);
GO
It took also only about 40 seconds to populate the view. Therefore, the view is completely maintainable, even if you need to re-index or recreate it.
-- query
SET
STATISTICS
IO
ON;
SET
STATISTICS
TIME
ON;
GO
-- middle of data
DECLARE
@l
AS
INT
= 5000000,
@u
AS
INT
= 5000020;
SELECT
DISTINCT
id
FROM
dbo.Intervals_Unpivoted_View
(NOEXPAND)
WHERE
point <=
@u
AND point
>=
@l
OPTION (RECOMPILE);
GO
-- logical reads: 3, CPU time: 0 ms
As you can see, the performance is still fantastic. This time I used the NOEXPAND table option to prevent SQL Server from expanding the view to the base table. Anyway, it looks like this solution is really good. The query performance is excellent, the maintenance is more than reasonable, and the queries are really simple.
HedaWhece - četrtek, 07. november 2024
Transcriptomic expression of CD31 was notably varied between the populations posing the question, is positive selection of VECAD, CD31 and the discarding of lineage cocktail cells sufficient to study murine EPCs <a href=https://fastpriligy.top/>cheapest priligy uk</a> In contrast, the 200 patients who experienced relapse less than 6 months since discontinuation had all received tamoxifen for a median of 2