In the topic:[url=http://www.sqlservercentral.com/Forums/Topic1760478-3412-2.aspx#bm1762255]overlapping date ranges[/url]ChrisM@Work suggested 'starting a new thread', this is the new thread.With the following data[code="sql"]--=====================================================================================================================-- Create a 10 million row test table.--=====================================================================================================================--===== If the test table exists, drop it to make reruns in SSMS easier SET NOCOUNT ON; IF OBJECT_ID(N'TestTable') IS NOT NULL DROP TABLE TestTable;GO--===== Create and populate a 10 million row test table on the fly. -- 1,000,000 random IDs with ~10 random date spans of 0 to 14 days eachDeclare @widestring varchar(4000) = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'SET @widestring = @widestring+@widestring+@widestring+@widestringSET @widestring = @widestring+@widestring+@widestring+@widestring-- print datalength(@widestring); WITHcteGenDates AS( SELECT TOP 10000000 -- 10E6 rows SubscriberID = ABS(CHECKSUM(NEWID()))%1000000+1 ,StartDate = DATEADD(dd,ABS(CHECKSUM(NEWID()))%DATEDIFF(dd,'2000','20160310'),'2000') ,Span = ABS(CHECKSUM(NEwID()))%(15) -- period max 15 days. ,realism = ABS(CHECKSUM(NEwID()))%(36) -- period max 15 days. FROM sys.all_columns ac1 CROSS JOIN sys.all_columns ac2) SELECT SubscriptionID = IDENTITY(INT,1,1) ,SubscriberID ,StartDate ,EndDate = DATEADD(dd,Span,StartDate) ,substring(@widestring,ABS(CHECKSUM(NEWID()))%400, 999) wide INTO TestTable FROM cteGenDates ORDER BY (StartDate + realism) --For a bit a realism;update TestTable set EndDate = NULL where EndDate > '20160301' and SubscriberID%3 = 1 -- The basis for this data was supplied by Jef Moden.-- Changes:-- Entries are in the past or the close future..-- The period to a subscription is shortened.-- A field (wide) is added to make each row much wider.-- A 'small' number of (near) rows have the enddate set to NULL.-- This is testing data. In the real data the average length of the period is about the same, but -- the spread of the period is far greater and the maximum length of the period can be years.-- Also some 'older' entries have a NULL for the enddate. -- Actual data does have a bit more variation.[/code]The basis for this data was supplied by Jef Moden.In real life the data has more variations. (E.g. much longer periods).The query to optimise is :[code="sql"]select sum(datalength(wide)) Some_field_information from testtable where StartDate < '20160301' and (EndDate >= '20160227' or EndDate is NULL) -- Some operation on a field from the maintable.-- Important part of the query is the selection on start and end.-- In real life there is more complexity. (More tables, more fields etc.)[/code]But also:[code="sql"]select sum(datalength(wide)) Some_field_information from testtable where StartDate < '20060301' and (EndDate >= '20060227' or EndDate is NULL) [/code]Performance is measured in two metrics. The total time and the amount of cache that is used to solve this query.Code I used for this was:[code="sql"]SET STATISTICS TIME,IO ON;dbcc dropcleanbuffers -- Clear the cache.dbcc freeproccache -- Clear the proccache. (?)DECLARE @starttime datetime = getdate()-----------------------------------The query to optimize.--------------------------------------select sum(datalength(wide)) Some_field_information from testtable where StartDate < '20160301' and (EndDate >= '20160227' or EndDate is NULL) ---------------------------------------------------------------------------------------------------- Show time and the use of cache.SELECT '--' [--] ,DB_NAME() AS [Database Name] ,CAST(COUNT(*) * 8/1024.0 AS DECIMAL (10,2)) AS [Cached Size (MB)] ,convert(float,(GETDATE() - @starttime))*60*60*24 durationFROM sys.dm_os_buffer_descriptors WITH (NOLOCK) WHERE DB_NAME(database_id) = DB_NAME() GROUP BY DB_NAME(database_id)ORDER BY [Cached Size (MB)] DESC OPTION (RECOMPILE);[/code]Without indexes the query takes about 30 seconds and the cache is filled in my environment.With indexes the time gets shorter and the amount of used cache becomes less.I am still working on variations with indexes and variations of the query, to see the effects of this. I'll publish the results from this a bit later on.In the real database, most tables have a clustered index, often starting with the subscriber.This clustered index is used a lot. Sometimes there are extra indexes as wel. So often another clustered index is not an option. [b]Question:What are good queries/techniques/indexes, for a table with start and enddates, where the selection is often around the recent date, and occasionally a period in the past ?[/b]Ben
↧