Quantcast
Channel: Answers for "Has anyone tried variations on 8K Splitter?"
Viewing all articles
Browse latest Browse all 4

Answer by Usman Butt

$
0
0
As I said, we did some testing on VARCHAR(MAX) versions as well. As tally table solution does not behave well with wider strings. Two of them performed really well 1. Inline Recursive CTE solution (Split_RCTE, Tweaked a little bit for better performance) 2. TVF XML solution by Oleg (Split_XML_Solution_By_Oleg) But surprisingly, RCTE was winner on some occasions with good margin as compared to losing. If I do the same 10000 rows, 1133 elements with 125-150 characters width, RCTE beat the XML solution marginally. But if the numbers of elements are decreased to 500, the RCTE was almost twice as fast as XML. But I would not count out any of the solutions, as they could fit according the requirement and environment. For RCTE solution, I must say it is more resource intensive. The index spool and SORT operators in the execution plan clearly indicates that it would hit the memory, tempdb and processors hard. It also reflects that with more CPUs, memory and capacity planned tempdb, the solution is viable to a better performance as compared to other solutions. Having said that, I still may tilt towards the XML solution. To me, why the RCTE performs much better than the tally table solution (both used the Charindex and Substring), is because the tally table solution compare each character with the delimiter, and due to the Out of Row phenomenon, this does not scale well. Whereas, RCTE solution does it for only the required number of times. Moreover, till VARCHAR(8000), the SUBSTRING(@pString, N,1) = @pDelimiter is dealt as a predicate in addition to the Seek predicate. But for VARCHAR(MAX), this is divided into two steps i.e. A Seek predicate followed by a Filter predicate which decreases the performance quite a bit. Now the performance test for RCTE and XML solutions USE [tempdb] GO SET NOCOUNT ON; GO IF OBJECT_ID(N'dbo.iFunction', N'V') IS NOT NULL DROP VIEW iFunction GO SET QUOTED_IDENTIFIER ON SET ANSI_NULLS ON GO CREATE VIEW dbo.iFunction AS /********************************************************************************************************************** Purpose: This view is callable from UDF's which allows us to indirectly get a NEWID() within a function where we can't do such a thing directly in the function. This view also solves the same problem for GETDATE(). Usage: SELECT MyNewID FROM dbo.iFunction; --Returns a GUID SELECT MyDate FROM dbo.iFunction; --Returns a Date Revision History: Rev 00 - 06 Jun 2004 - Jeff Moden - Initial creation Rev 01 - 06 Mar 2011 - Jeff Moden - Formalize code. No logic changes. **********************************************************************************************************************/ SELECT MyNewID = NEWID(), MyDate = GETDATE(); GO IF OBJECT_ID(N'dbo.CsvTest', N'U') IS NOT NULL DROP TABLE CsvTest GO DECLARE @MaxElementWidth INT, @MinElementWidth INT, @NumberOfElements INT, @NumberOfRows INT /*======== PARAMETER VALUES ==============================*/ SELECT @MaxElementWidth = 150 , @MinElementWidth = 125 , @NumberOfElements = 500 , @NumberOfRows = 10000 /*========================================================*/ SELECT TOP (@NumberOfRows) --Controls the number of rows in the test table ISNULL(ROW_NUMBER() OVER (ORDER BY(SELECT NULL)),0) AS RowNum, CSV = (--==== This creates each CSV SELECT CAST( STUFF( --=== STUFF get's rid of the leading comma ( --=== This builds CSV row with a leading comma SELECT TOP (@NumberOfElements) --Controls the number of CSV elements in each row ',' + LEFT(--==== Builds random length variable within element width constraints LEFT(REPLICATE('1234567890',CEILING(@MaxElementWidth/10.0)), @MaxElementWidth), ABS(CHECKSUM((SELECT MyNewID FROM dbo.iFunction))) % (@MaxElementWidth - @MinElementWidth + 1) + @MinElementWidth ) FROM sys.All_Columns ac3 --Classic cross join pseudo-cursor CROSS JOIN sys.All_Columns ac4 --can produce row sets up 16 million. WHERE ac3.Object_ID ac1.Object_ID --Without this line, all rows would be the same. FOR XML PATH('') ) ,1,1,'') AS VARCHAR(MAX)) ) INTO CsvTest FROM sys.All_Columns ac1 --Classic cross join pseudo-cursor CROSS JOIN sys.All_Columns ac2 --can produce row sets up 16 million rows GO PRINT '/*====== dbo.CSVTest Population completed ================*/' PRINT CHAR(10) + CHAR(13) GO ALTER TABLE CsvTest ADD PRIMARY KEY CLUSTERED (RowNum) WITH FILLFACTOR = 100; GO IF OBJECT_ID(N'dbo.Split_RCTE', N'IF') IS NOT NULL DROP FUNCTION dbo.Split_RCTE GO SET QUOTED_IDENTIFIER ON SET ANSI_NULLS ON GO --Create Split_RCTE function VARCHAR(MAX) version CREATE FUNCTION dbo.Split_RCTE ( @pString VARCHAR(MAX) ,@pDelimiter VARCHAR(1) ) RETURNS TABLE WITH SCHEMABINDING AS RETURN WITH cteSplit AS ( SELECT StartPosition = 0 , EndPosition = CONVERT(INT, CHARINDEX(@pDelimiter, @pString COLLATE Latin1_General_BIN)) UNION ALL SELECT StartPosition = EndPosition + 1 , EndPosition = CONVERT(INT, CHARINDEX(@pDelimiter, @pString COLLATE Latin1_General_BIN, EndPosition + 1)) FROM cteSplit WHERE EndPosition > 0 ) SELECT [ItemNumber] = ROW_NUMBER() OVER ( ORDER BY StartPosition ) , SUBSTRING(@pString, StartPosition, CASE EndPosition WHEN 0 THEN CONVERT(INT, LEN(@pString)) + 1 ELSE EndPosition - StartPosition END) ItemValue FROM cteSplit GO IF OBJECT_ID(N'dbo.Split_XML_Solution_By_Oleg', N'TF') IS NOT NULL DROP FUNCTION dbo.Split_XML_Solution_By_Oleg GO --Create Split_XML function VARCHAR(MAX) version SET QUOTED_IDENTIFIER ON SET ANSI_NULLS ON GO CREATE FUNCTION dbo.Split_XML_Solution_By_Oleg ( @Parameter VARCHAR(MAX) ,@Delimiter VARCHAR(1) ) RETURNS @Result TABLE ( ItemNumber INT ,ItemValue VARCHAR(MAX) ) AS BEGIN DECLARE @XML XML ; SET @Parameter = ( SELECT @Parameter FOR XML PATH('') ) ; SELECT @XML = '' + REPLACE(@Parameter, @Delimiter, '') + '' ; INSERT INTO @Result ( ItemNumber ,ItemValue ) SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL) ) AS ItemNumber , Item.value('text()[1]', 'VARCHAR(MAX)') AS ItemValue FROM @XML.nodes('//r') R ( Item ) ; RETURN ; END ; GO PRINT '/*====== dbo.Split_RCTE ==================================*/' DBCC FREEPROCCACHE WITH NO_INFOMSGS SET STATISTICS TIME ON DECLARE @ItemNumber BIGINT , @Item VARCHAR(MAX) ; SELECT @ItemNumber = V.ItemNumber , @Item = V.ItemValue FROM dbo.CsvTest D CROSS APPLY dbo.Split_RCTE(D.Csv, ',') V OPTION ( MAXRECURSION 0 ) SET STATISTICS TIME OFF PRINT '/*========================================================*/' PRINT CHAR(10) + CHAR(13) GO PRINT '/*====== dbo.Split_XML ===================================*/' DBCC FREEPROCCACHE WITH NO_INFOMSGS SET STATISTICS TIME ON DECLARE @ItemNumber BIGINT , @Item VARCHAR(MAX) ; SELECT @ItemNumber = V.ItemNumber , @Item = V.ItemValue FROM dbo.CsvTest D CROSS APPLY dbo.Split_XML_Solution_By_Oleg(D.Csv, ',') V SET STATISTICS TIME OFF PRINT '/*========================================================*/' PRINT CHAR(10) + CHAR(13) GO As always, everyone's mileage may differ. Please note that testing was done on SQL Server 2005. For `SQL 2008`, the conversion to **INT** may not be needed in RCTE solution.

Viewing all articles
Browse latest Browse all 4

Latest Images

Trending Articles





Latest Images