Hi,i have 2 tables created in SQL Server 2014 viz., Staging table & yearweek. Year week gives the Yearweek, Yr & Week running numbers starting from year 2011 (created using Tally function). I am joining these 2 tables in below CTE to extract correct Sales week ranges that satisfy some criteria. The Staging table has data imported from multiple huge CSV's. It doesnt contain any primary keys. For every unique SKU (Description column), I am trying to find the different consecutive Sales weeks ranges containing Gaps (blanks or Zeroes) of <3 in them. If there is a gap of >=3, then it splits a range into 2. I have been able to find the correct gaps. The result of the CTE look like this:[code="plain"]SCOUNTRY SCHAR DESCRIPTION WEEKS CNTWKS MINWEEK MAXWEEKSPAIN GLOBAL 0161003613 7 2 201207 201208SPAIN GLOBAL 0161003613 41 14 201141 201202SPAIN GLOBAL 0161003613 9 27 201109 201135SPAIN GLOBAL 0161003850 14 23 201214 201236SPAIN GLOBAL 0161003850 41 22 201141 201210SPAIN GLOBAL 0241004245 24 4 201324 201327SPAIN GLOBAL 0241004245 31 5 201331 201335SPAIN GLOBAL 0241004245 2 18 201302 201319SPAIN GLOBAL 0241004245 37 14 201237 201250SPAIN GLOBAL 0241004258 22 18 201322 201339SPAIN GLOBAL 0241004258 10 9 201310 201318SPAIN GLOBAL 0241004258 41 17 201241 201305Attaching a Sample file containing the shown example :[/code] Staging table structure:[code="sql"]USE [master]GODROP TABLE [dbo].[staging]GOSET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOSET ANSI_PADDING ONGOCREATE TABLE [dbo].[staging]( [Level] [varchar](5) NULL, [Week] [varchar](9) NULL, [Category] [varchar](50) NULL, [Manufacturer] [varchar](50) NULL, [Brand] [varchar](50) NULL, [Description] [varchar](100) NULL, [EAN] [varchar](100) NULL, [Sales Value with Innovation] [float] NULL, [Sales Units with Innovation] [float] NULL, [Price Per Item] [float] NULL, [Importance Value w Innovation] [float] NULL, [Importance Units w Innovation] [float] NULL, [Numeric Distribution] [float] NULL, [Weighted Distribution] [float] NULL, [Average Number of Item] [float] NULL, [Value] [float] NULL, [Volume] [float] NULL, [Units] [float] NULL, [Sales Value New Manufacturer] [float] NULL, [Sales Value New Brand] [float] NULL, [Sales Value New Line Extension] [float] NULL, [Sales Value New Packaging] [float] NULL, [Sales Value New Size] [float] NULL, [Sales Value New Product Form] [float] NULL, [Sales Value New Style Type] [float] NULL, [Sales Value New Flavour Fragr] [float] NULL, [Sales Value New Claim] [float] NULL, [Sales Units New Manufacturer] [float] NULL, [Sales Units New Brand] [float] NULL, [Sales Units New Line Extension] [float] NULL, [Sales Units New Packaging] [float] NULL, [Sales Units New Size] [float] NULL, [Sales Units New Product Form] [float] NULL, [Sales Units New Style Type] [float] NULL, [Sales Units New Flavour Fragr] [float] NULL, [Sales Units New Claim] [float] NULL, [filename] [nvarchar](260) NULL, [importdate] [datetime] NULL CONSTRAINT [DF_staging_importdate] DEFAULT (getdate()), [sCountry] [varchar](50) NULL, [sChar] [varchar](50) NULL, [yr] [int] NULL, [wk] [int] NULL, [wkno] [int] NULL) ON [PRIMARY]GOSET ANSI_PADDING OFFGO[/code]YearWeek table structure:[code="sql"]USE [master]GODROP TABLE [dbo].[yearweek]GOSET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOSET ANSI_PADDING ONGOCREATE TABLE [dbo].[yearweek]( [yrwk] [varchar](6) NULL, [yr] [int] NULL, [wk] [int] NULL, [RN] [bigint] NULL) ON [PRIMARY]GOSET ANSI_PADDING OFFGO[/code]The following CTE gives the proper Sales week ranges :[code="sql"] USE MASTERGOWITH Salesrows AS ( SELECT [SCOUNTRY], [SCHAR], [CATEGORY], [MANUFACTURER], [BRAND], [DESCRIPTION], [EAN], [SALES VALUE WITH INNOVATION]=IIF([SALES VALUE WITH INNOVATION] IS NULL,0,[SALES VALUE WITH INNOVATION]), CONVERT(INT, SUBSTRING([WEEK], 8, 2)) Wk, CONVERT(INT, SUBSTRING([WEEK], 3, 4)) Yr, [wkno], ROW_NUMBER() OVER (PARTITION BY [SCOUNTRY],[SCHAR],[CATEGORY],[MANUFACTURER],[BRAND],[DESCRIPTION],[EAN] ORDER BY [WEEK]) RN FROM STAGING WHERE ([Level] = 'Item') ),SalesRanges as ( SELECT *, LAG([SALES VALUE WITH INNOVATION], 1) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[CATEGORY],[MANUFACTURER],[BRAND],[DESCRIPTION],[EAN] ORDER BY RN) L1, LAG([SALES VALUE WITH INNOVATION], 2) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[CATEGORY],[MANUFACTURER],[BRAND],[DESCRIPTION],[EAN] ORDER BY RN) L2, LEAD([SALES VALUE WITH INNOVATION], 1) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[CATEGORY],[MANUFACTURER],[BRAND],[DESCRIPTION],[EAN] ORDER BY RN) L5, LEAD([SALES VALUE WITH INNOVATION], 2) OVER (PARTITION BY [SCOUNTRY],[SCHAR],[CATEGORY],[MANUFACTURER],[BRAND],[DESCRIPTION],[EAN] ORDER BY RN) L6 FROM SalesRows ),Clearcontents as( SELECT *, (CASE WHEN ISNULL([SALES VALUE WITH INNOVATION], 0) = 0 AND ISNULL(L1,0) = 0 AND ISNULL(L2,0) = 0 THEN 1 ELSE 0 END) RemoveMe0, (CASE WHEN ISNULL([SALES VALUE WITH INNOVATION], 0) = 0 AND ISNULL(L5,0) = 0 AND ISNULL(L6,0) = 0 THEN 1 ELSE 0 END) RemoveMe1, (CASE WHEN ISNULL([SALES VALUE WITH INNOVATION], 0) = 0 AND ISNULL(L1,0) = 0 AND L2<>0 AND ISNULL(L5,0) = 0 AND L6<>0 THEN 1 ELSE 0 END) RemoveMe2 FROM SalesRanges),CleanedData AS( SELECT *, ROW_NUMBER() OVER (PARTITION BY [SCOUNTRY],[SCHAR],[CATEGORY],[MANUFACTURER],[BRAND],[DESCRIPTION],[EAN] ORDER BY yr, RN) NewRn FROM ClearContents WHERE RemoveMe0 != 1 and RemoveMe1 != 1 and RemoveMe2 != 1),WeekGaps as ( SELECT *, (NewRn - Rn) Ref FROM CleanedData),CorrectWeekPeriods as ( SELECT [SCOUNTRY], [SCHAR], [CATEGORY], [MANUFACTURER], [BRAND], [DESCRIPTION], [EAN], COUNT([wkno]) AS CNTWKS, MIN([wkno]) AS MINWEEK, MAX([wkno]) AS MAXWEEK, REF FROM WeekGaps GROUP BY [SCOUNTRY],[SCHAR],[CATEGORY],[MANUFACTURER],[BRAND],[DESCRIPTION],[EAN],[REF]) SELECT C.[SCOUNTRY], C.[SCHAR], C.[CATEGORY], C.[MANUFACTURER], C.[BRAND], C.[DESCRIPTION], C.[EAN], CONVERT(INT, SUBSTRING(yw1.yrwk ,5,2)) WEEKS, C.CNTWKS, yw1.yrwk AS MINWEEK, yw2.yrwk AS MAXWEEK FROM CorrectWeekPeriods AS C INNER JOIN yearweek AS yw1 ON C.MINWEEK = yw1.rn INNER JOIN yearweek AS yw2 ON C.MAXWEEK = yw2.rn WHERE (C.CNTWKS > 13) AND (C.CNTWKS <= 52) AND (C.CNTWKS=(SELECT MAX(A.CNTWKS) FROM CorrectWeekPeriods A WHERE C.[SCOUNTRY]=A.[SCOUNTRY] AND C.[SCHAR]=A.[SCHAR] AND C.[DESCRIPTION]=A.[DESCRIPTION])) AND SUBSTRING(CAST(yw1.yrwk AS VARCHAR(6)),5,2) >= 1 ORDER BY [EAN],[DESCRIPTION][/code] 1. What fields of CTE do i need to join together to Staging table fields to only have these selective periods rows show in table? 2. I am sure this query can be optimized and made more concise. But how? 3. Also if i **comment** the last **WHERE** clause from the **CorrectWeekPeriods** above, and run the query multiple times, i get **different row counts**. I checked the Execution plan and donot get any errors. If i just **uncomment** the WHERE clause: WHERE (C.CNTWKS > 13) AND (C.CNTWKS <= 52) AND (C.CNTWKS=(SELECT MAX(A.CNTWKS) FROM CorrectWeekPeriods A WHERE C.[SCOUNTRY]=A.or this one: WHERE C.Description='0241004245'i get the proper min & max sales week ranges. 1. Also, if i **uncomment** : WHERE C.Description='0241004245'i get the error showing in execution plan:[quote] /* Missing Index Details from SQL_Correct Gaps.sql - ABC.master (ALPHA\SIFAR (52)) The Query Processor estimates that implementing the following index could improve the query cost by 97.7228%. */[/quote][code="sql"] /* USE [master] GO CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [dbo].[staging] ([Level],[Description]) INCLUDE ([Week],[Sales Value with Innovation],[sCountry],[sChar],[wkno]) GO */[/code]But if i keep this last WHERE clause **commented**, i don't get this error. BTW i have already created the above index, so don't know why it is asking me to create same index again. Any reason why this happens?Also, the last few commented code is the RULES i was trying to create but not able to write the proper code. Here is the rule:1. if there are 2 or more SKU sales week ranges, then pick up the max one (& better if it starts from Week 1 of 2011).2. exclude any ranges which are >52, to bring them to <=52.3. if all SKU sales week ranges are >13 &<=52, then keep only the max one (& better if it starts from week 1 of 2011).4. exclude any ranges <=13.Hope somebody can guide me in the right direction (especially my main point 1 to join back to Staging table to extract the appropriate SKU sales week ranges).Edit...I just uncommented any of the last WHERE clause again :[code="sql"] WHERE (C.CNTWKS > 13) AND (C.CNTWKS <= 52) AND (C.CNTWKS=(SELECT MAX(A.CNTWKS) FROM CorrectWeekPeriods A WHERE C.[SCOUNTRY]=A.[SCOUNTRY] AND C.[SCHAR]=A.[SCHAR] AND C.[DESCRIPTION]=A.[DESCRIPTION])) AND SUBSTRING(CAST(yw1.yrwk AS VARCHAR(6)),5,2) >= 1[/code]and looked at the execution plan. it shows Warnings on SORT & HASH. the warning message is:[quote] Operator used tempdb to spill data during execution with spill level 1[/quote]and everytime that i execute the query, i get different count of rows. The query also takes ~1 min to execute. I think its somehow related to the Joins to the **yearweek** table, but dont know how to resolve this issue.any help would be most appreciated.
↧