I am continuing with presenting different solutions for interval queries in SQL Server. For an introduction, please refer to the blog post Interval Queries in SQL Server Part 1 (http://blogs.solidq.com/dsarka/Post.aspx?ID=149&title=Interval+Queries+in+SQL+Server+Part+1). Note that you also need to read an excellent article by Itzik Ben-Gan wrote on interval queries in SQL Server (http://sqlmag.com/t-sql/sql-server-interval-queries) by using the Relational Interval Tree model. I am using the tables and data Itzik has prepared. In order to test the solutions, you can download the code from Itzik's article by using the link in this paragraph.
The second solution in addition to Itzik's two RI tree solutions and to the classical solution was introduced to me by my friend and distinguished SQL Server MVP Davide Mauri. Therefore, the merits for this post go to him.
Davide found a way to use the existing SQL Server infrastructure. Although there is no temporal data support, spatial data support exists in SQL Server from version 2008, if I remember correctly. Besides two spatial data types, the geometry and geography types, SQL Server also features specific spatial indexes. Both data types include a method called STIntersects, which returns 1 if a spatial type (geometry or geography) instance intersects another spatial type instance and returns 0 if it does not. If you remember, Albert Einstein actually unified space and time; Davide did the same, therefore we can call him ZweisteinJ
Without further delays, let's create and populate the table that uses the geometry data type to store the intervals. Here is the creation and population code that also creates a regular primary key and a spatial index.
CREATE
TABLE
dbo.IntervalsSP
(
id
INT
NOT
NULL,
segment
geometry
NOT
NULL
);
-- SLOW! (20+ min!)
INSERT
INTO
dbo.IntervalsSP
WITH(TABLOCK)
(id, segment)
SELECT
id,
segment =
CASE
WHEN
lower !=
upper
then
geometry::STGeomFromText('LINESTRING
(' +
CAST(lower
AS
VARCHAR(50))
+
' 0, '
+
CAST(upper
AS
VARCHAR(50))
+
' 0)', 0)
ELSE
geometry::STGeomFromText('POINT
(' +
CAST(lower
AS
VARCHAR(50))
+
' 0)', 0)
END
FROM
dbo.Stage;
GO
-- ~ 1 min
ALTER
TABLE
dbo.IntervalsSP
ADD
CONSTRAINT
PK_IntervalsSP
PRIMARY
KEY(id);
GO
-- ~ 3 min
CREATE
SPATIAL
INDEX
ixs_1
ON
dbo.IntervalsSP
(segment)
USING GEOMETRY_AUTO_GRID
WITH (BOUNDING_BOX =
(0, 0, 10000000,1));
GO
Without further delays, let's create and populate the table that uses the geometry data type to store the intervals. Here is the creation and population code that also creates a regular primary key and a spatial index. From the comments in the code, you can notice that the table population is pretty slow. It took more than 20 minutes on my computer. Nevertheless, remember I am inserting 10,000,000 rows in the table; for a production system, 10,000,000 rows in 20 minutes might be completely adequate performance.
Time for querying the table. I am showing here the query that selects the same rows from the middle of the data as the query in my previous article and as the queries in Itzik's article.
-- query
SET
STATISTICS
IO
ON
SET
STATISTICS
TIME
ON
GO
-- middle of data
SELECT
id
FROM
dbo.IntervalsSP
WHERE segment.STIntersects
(geometry::STGeomFromText('LINESTRING (5000000 0, 5000020 0)', 0))
= 1
ORDER
BY
id;
-- logical reads: 396 (284 table + 112 index), CPU time: 16 ms
The performance is slightly worse that you can get with the RI tree solution, or with the solution I introduced in the part 1 of this blog series. In addition, the insert performance is much worse. Still, this is a viable solution, because it is very simple and uses only existing SQL Server features, without complex queries and mathematics. I also like Davide's Italo-Balkan way of thinking. We are actually very good at NOT using things for what they are intended forJ