As I said, we did some testing on VARCHAR(MAX) versions as well. As tally table solution does not behave well with wider strings. Two of them performed really well
1. Inline Recursive CTE solution (Split_RCTE, Tweaked a little bit for better performance)
2. TVF XML solution by Oleg (Split_XML_Solution_By_Oleg)
But surprisingly, RCTE was winner on some occasions with good margin as compared to losing. If I do the same 10000 rows, 1133 elements with 125-150 characters width, RCTE beat the XML solution marginally. But if the numbers of elements are decreased to 500, the RCTE was almost twice as fast as XML. But I would not count out any of the solutions, as they could fit according the requirement and environment.
For RCTE solution, I must say it is more resource intensive. The index spool and SORT operators in the execution plan clearly indicates that it would hit the memory, tempdb and processors hard. It also reflects that with more CPUs, memory and capacity planned tempdb, the solution is viable to a better performance as compared to other solutions. Having said that, I still may tilt towards the XML solution.
To me, why the RCTE performs much better than the tally table solution (both used the Charindex and Substring), is because the tally table solution compare each character with the delimiter, and due to the Out of Row phenomenon, this does not scale well. Whereas, RCTE solution does it for only the required number of times. Moreover, till VARCHAR(8000), the
SUBSTRING(@pString, N,1) = @pDelimiter
is dealt as a predicate in addition to the Seek predicate. But for VARCHAR(MAX), this is divided into two steps i.e. A Seek predicate followed by a Filter predicate which decreases the performance quite a bit.
Now the performance test for RCTE and XML solutions
USE [tempdb]
GO
SET NOCOUNT ON;
GO
IF OBJECT_ID(N'dbo.iFunction', N'V') IS NOT NULL
DROP VIEW iFunction
GO
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
GO
CREATE VIEW dbo.iFunction AS
/**********************************************************************************************************************
Purpose:
This view is callable from UDF's which allows us to indirectly get a NEWID() within a function where we can't do such
a thing directly in the function. This view also solves the same problem for GETDATE().
Usage:
SELECT MyNewID FROM dbo.iFunction; --Returns a GUID
SELECT MyDate FROM dbo.iFunction; --Returns a Date
Revision History:
Rev 00 - 06 Jun 2004 - Jeff Moden - Initial creation
Rev 01 - 06 Mar 2011 - Jeff Moden - Formalize code. No logic changes.
**********************************************************************************************************************/
SELECT MyNewID = NEWID(),
MyDate = GETDATE();
GO
IF OBJECT_ID(N'dbo.CsvTest', N'U') IS NOT NULL
DROP TABLE CsvTest
GO
DECLARE @MaxElementWidth INT,
@MinElementWidth INT,
@NumberOfElements INT,
@NumberOfRows INT
/*======== PARAMETER VALUES ==============================*/
SELECT @MaxElementWidth = 150
, @MinElementWidth = 125
, @NumberOfElements = 500
, @NumberOfRows = 10000
/*========================================================*/
SELECT TOP (@NumberOfRows) --Controls the number of rows in the test table
ISNULL(ROW_NUMBER() OVER (ORDER BY(SELECT NULL)),0) AS RowNum,
CSV =
(--==== This creates each CSV
SELECT CAST(
STUFF( --=== STUFF get's rid of the leading comma
( --=== This builds CSV row with a leading comma
SELECT TOP (@NumberOfElements) --Controls the number of CSV elements in each row
','
+ LEFT(--==== Builds random length variable within element width constraints
LEFT(REPLICATE('1234567890',CEILING(@MaxElementWidth/10.0)), @MaxElementWidth),
ABS(CHECKSUM((SELECT MyNewID FROM dbo.iFunction)))
% (@MaxElementWidth - @MinElementWidth + 1) + @MinElementWidth
)
FROM sys.All_Columns ac3 --Classic cross join pseudo-cursor
CROSS JOIN sys.All_Columns ac4 --can produce row sets up 16 million.
WHERE ac3.Object_ID ac1.Object_ID --Without this line, all rows would be the same.
FOR XML PATH('')
)
,1,1,'')
AS VARCHAR(MAX))
)
INTO CsvTest
FROM sys.All_Columns ac1 --Classic cross join pseudo-cursor
CROSS JOIN sys.All_Columns ac2 --can produce row sets up 16 million rows
GO
PRINT '/*====== dbo.CSVTest Population completed ================*/'
PRINT CHAR(10) + CHAR(13)
GO
ALTER TABLE CsvTest
ADD PRIMARY KEY CLUSTERED (RowNum) WITH FILLFACTOR = 100;
GO
IF OBJECT_ID(N'dbo.Split_RCTE', N'IF') IS NOT NULL
DROP FUNCTION dbo.Split_RCTE
GO
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
GO
--Create Split_RCTE function VARCHAR(MAX) version
CREATE FUNCTION dbo.Split_RCTE
(
@pString VARCHAR(MAX)
,@pDelimiter VARCHAR(1)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
WITH cteSplit
AS ( SELECT StartPosition = 0
, EndPosition = CONVERT(INT, CHARINDEX(@pDelimiter,
@pString COLLATE Latin1_General_BIN))
UNION ALL
SELECT StartPosition = EndPosition + 1
, EndPosition = CONVERT(INT, CHARINDEX(@pDelimiter,
@pString COLLATE Latin1_General_BIN,
EndPosition + 1))
FROM cteSplit
WHERE EndPosition > 0
)
SELECT [ItemNumber] = ROW_NUMBER() OVER ( ORDER BY StartPosition )
, SUBSTRING(@pString, StartPosition,
CASE EndPosition
WHEN 0 THEN CONVERT(INT, LEN(@pString)) + 1
ELSE EndPosition - StartPosition
END) ItemValue
FROM cteSplit
GO
IF OBJECT_ID(N'dbo.Split_XML_Solution_By_Oleg', N'TF') IS NOT NULL
DROP FUNCTION dbo.Split_XML_Solution_By_Oleg
GO
--Create Split_XML function VARCHAR(MAX) version
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
GO
CREATE FUNCTION dbo.Split_XML_Solution_By_Oleg
(
@Parameter VARCHAR(MAX)
,@Delimiter VARCHAR(1)
)
RETURNS @Result TABLE
(
ItemNumber INT
,ItemValue VARCHAR(MAX)
)
AS
BEGIN
DECLARE @XML XML ;
SET @Parameter = ( SELECT @Parameter
FOR XML PATH('')
) ;
SELECT @XML = '' + REPLACE(@Parameter, @Delimiter, ' ') + ' ' ;
INSERT INTO @Result
(
ItemNumber
,ItemValue
)
SELECT ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL) ) AS ItemNumber
, Item.value('text()[1]', 'VARCHAR(MAX)') AS ItemValue
FROM @XML.nodes('//r') R ( Item ) ;
RETURN ;
END ;
GO
PRINT '/*====== dbo.Split_RCTE ==================================*/'
DBCC FREEPROCCACHE WITH NO_INFOMSGS
SET STATISTICS TIME ON
DECLARE @ItemNumber BIGINT
, @Item VARCHAR(MAX) ;
SELECT @ItemNumber = V.ItemNumber
, @Item = V.ItemValue
FROM dbo.CsvTest D
CROSS APPLY dbo.Split_RCTE(D.Csv, ',') V
OPTION ( MAXRECURSION 0 )
SET STATISTICS TIME OFF
PRINT '/*========================================================*/'
PRINT CHAR(10) + CHAR(13)
GO
PRINT '/*====== dbo.Split_XML ===================================*/'
DBCC FREEPROCCACHE WITH NO_INFOMSGS
SET STATISTICS TIME ON
DECLARE @ItemNumber BIGINT
, @Item VARCHAR(MAX) ;
SELECT @ItemNumber = V.ItemNumber
, @Item = V.ItemValue
FROM dbo.CsvTest D
CROSS APPLY dbo.Split_XML_Solution_By_Oleg(D.Csv, ',') V
SET STATISTICS TIME OFF
PRINT '/*========================================================*/'
PRINT CHAR(10) + CHAR(13)
GO
As always, everyone's mileage may differ. Please note that testing was done on SQL Server 2005. For `SQL 2008`, the conversion to **INT** may not be needed in RCTE solution.
Trending Articles
More Pages to Explore .....