Slow While Loop, Query Improvment AssistanceIndexing - Uniqueidentifier Foreign Key or Intermediary mapping...
Which communication protocol is used in AdLib sound card?
What incentives do banks have to gather up loans into pools (backed by Ginnie Mae)and selling them?
Why is it that Bernie Sanders is always called a "socialist"?
Has any human ever had the choice to leave Earth permanently?
What sets the resolution of an analog resistive sensor?
How much mayhem could I cause as a sentient fish?
Crontab: Ubuntu running script (noob)
What is the data structure of $@ in shell?
Does Skippy chunky peanut butter contain trans fat?
Why is Agricola named as such?
How to not let the Identify spell spoil everything?
Absorbing damage with Planeswalker
Clues on how to solve these types of problems within 2-3 minutes for competitive exams
Why would space fleets be aligned?
Eww, those bytes are gross
Alien invasion to probe us, why?
Why TEventArgs wasn't made contravariant in standard event pattern in the .Net ecosystem?
What is the wife of a henpecked husband called?
Is there any risk in sharing info about technologies and products we use with a supplier?
Why did the villain in the first Men in Black movie care about Earth's Cockroaches?
Is Krishna the only avatar among dashavatara who had more than one wife?
How to tell if a BJT is PNP or NPN by looking at the circuit?
Is it a fallacy if someone claims they need an explanation for every word of your argument to the point where they don't understand common terms?
Can we harness gravitational potential energy?
Slow While Loop, Query Improvment Assistance
Indexing - Uniqueidentifier Foreign Key or Intermediary mapping table?Parent-Child Tree Hierarchical ORDEROracle GoldenGate add trandata errorsSHOWPLAN does not display a warning but “Include Execution Plan” does for the same queryBest approach to have a Live copy of a table in the same databaseQuery Performance - Why is a query slower when there are two criteria on the same column?deteriorating stored procedure running timesSelect Into removes IDENTITY property from target tableChanging primary key from IDENTITY to being persisted Computed column using COALESCEUnderstanding why a query is slow
I am working on creating a Datawarehouse.
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:
CREATE TABLE [dbo].[Dim_Time](
[TimeID] [int] IDENTITY(1,1) NOT NULL,
[StartDateTime] [datetime] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED
([TimeID] ASC)
) ON [PRIMARY]
GO
Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
CREATE TABLE [dbo].[Stg_IncomingQueue](
[IncomingID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[TimeID] [int] NULL,
[InsertTime] [datetime] NULL,
CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED
([IncomingID] ASC)
) ON [PRIMARY]
GO
I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime
FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
;WITH DimTime
AS (
SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
)
UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
SET TimeID = (SELECT MaxTimeID FROM DimTime)
WHERE IncomingID = @IncomingID
END
It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you very much for your time and assistance.
Wade
sql-server t-sql sql-server-2016
add a comment |
I am working on creating a Datawarehouse.
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:
CREATE TABLE [dbo].[Dim_Time](
[TimeID] [int] IDENTITY(1,1) NOT NULL,
[StartDateTime] [datetime] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED
([TimeID] ASC)
) ON [PRIMARY]
GO
Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
CREATE TABLE [dbo].[Stg_IncomingQueue](
[IncomingID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[TimeID] [int] NULL,
[InsertTime] [datetime] NULL,
CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED
([IncomingID] ASC)
) ON [PRIMARY]
GO
I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime
FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
;WITH DimTime
AS (
SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
)
UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
SET TimeID = (SELECT MaxTimeID FROM DimTime)
WHERE IncomingID = @IncomingID
END
It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you very much for your time and assistance.
Wade
sql-server t-sql sql-server-2016
1
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
21 mins ago
add a comment |
I am working on creating a Datawarehouse.
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:
CREATE TABLE [dbo].[Dim_Time](
[TimeID] [int] IDENTITY(1,1) NOT NULL,
[StartDateTime] [datetime] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED
([TimeID] ASC)
) ON [PRIMARY]
GO
Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
CREATE TABLE [dbo].[Stg_IncomingQueue](
[IncomingID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[TimeID] [int] NULL,
[InsertTime] [datetime] NULL,
CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED
([IncomingID] ASC)
) ON [PRIMARY]
GO
I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime
FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
;WITH DimTime
AS (
SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
)
UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
SET TimeID = (SELECT MaxTimeID FROM DimTime)
WHERE IncomingID = @IncomingID
END
It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you very much for your time and assistance.
Wade
sql-server t-sql sql-server-2016
I am working on creating a Datawarehouse.
I have created a Time Dimension (Dim_Time), at 5-minute intervals. Hour Aggregations will have [Minutes] = NULL.
For the purpose of this example:
CREATE TABLE [dbo].[Dim_Time](
[TimeID] [int] IDENTITY(1,1) NOT NULL,
[StartDateTime] [datetime] NULL,
[Hour] [int] NULL,
[Minute] [int] NULL,
CONSTRAINT [PK_Dim_Time] PRIMARY KEY CLUSTERED
([TimeID] ASC)
) ON [PRIMARY]
GO
Then I have an Incoming Table, which is updated every 5 minutes from the OLTP Database.
CREATE TABLE [dbo].[Stg_IncomingQueue](
[IncomingID] [int] IDENTITY(1,1) NOT NULL,
[CustomerID] [int] NOT NULL,
[TimeID] [int] NULL,
[InsertTime] [datetime] NULL,
CONSTRAINT [PK_IncomingQueueMonitor] PRIMARY KEY CLUSTERED
([IncomingID] ASC)
) ON [PRIMARY]
GO
I then have the following While loop. It's purpose is to get the correct 5-Minute time slot (TimeID) that relates to a particular incoming row:
WHILE 0 < (SELECT COUNT(*) FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL)
BEGIN
SELECT TOP 1 @IncomingID = IncomingID, @RowInserTime = InsertTime
FROM [dba_local].[dbo].[Stg_IncomingQueue] WHERE TimeID IS NULL
;WITH DimTime
AS (
SELECT MAX(TimeID) AS MaxTimeID FROM [dba_local].[dbo].[Dim_Time]
WHERE StartDateTime < @RowInserTime AND [Minute] IS NOT NULL
)
UPDATE [dba_local].[dbo].[Stg_IncomingQueue]
SET TimeID = (SELECT MaxTimeID FROM DimTime)
WHERE IncomingID = @IncomingID
END
It's such a simple process, and yet I cannot figure out a simpler way to update the TimeID. As per the CTE SELECT in the loop, I need to get the MAX(TimeID) where the StartDateTime is less then the rows InsertTime.
Because time is the only relationship, I am struggling with all options to do this in 1 query without the loop, but I feel it is possible
Please can someone help me out here either with a better option or confirming that this is the simplest way.
Thank you very much for your time and assistance.
Wade
sql-server t-sql sql-server-2016
sql-server t-sql sql-server-2016
edited 21 mins ago
Aaron Bertrand♦
152k18289489
152k18289489
asked 54 mins ago
WadeHWadeH
174110
174110
1
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
21 mins ago
add a comment |
1
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
21 mins ago
1
1
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
21 mins ago
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
21 mins ago
add a comment |
1 Answer
1
active
oldest
votes
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE
loop, with a single UPDATE
statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT()
to convert the incoming datetime
column into a time(0)
value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue
to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
7 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)
value.
– Max Vernon
5 mins ago
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "182"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230890%2fslow-while-loop-query-improvment-assistance%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE
loop, with a single UPDATE
statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT()
to convert the incoming datetime
column into a time(0)
value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue
to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
7 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)
value.
– Max Vernon
5 mins ago
add a comment |
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE
loop, with a single UPDATE
statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT()
to convert the incoming datetime
column into a time(0)
value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue
to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
7 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)
value.
– Max Vernon
5 mins ago
add a comment |
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE
loop, with a single UPDATE
statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT()
to convert the incoming datetime
column into a time(0)
value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue
to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
I created the following minimally complete and verifiable example, based on the two tables in your original question. It uses the LEAD T-SQL statement to obtain a time range from the dbo.Dim_Time table, which can be compared to the incoming rows quite easily.
IF OBJECT_ID(N'dbo.Stg_IncomingQueue', N'U') IS NOT NULL
DROP TABLE dbo.Stg_IncomingQueue;
IF OBJECT_ID(N'dbo.Dim_Time', N'U') IS NOT NULL
DROP TABLE dbo.Dim_time;
CREATE TABLE dbo.Dim_Time(
TimeID int IDENTITY(1,1) NOT NULL,
StartDateTime time(0) NULL,
CONSTRAINT PK_Dim_Time PRIMARY KEY CLUSTERED
(TimeID ASC)
) ON [PRIMARY]
GO
;WITH src AS
(
SELECT TOP (10) sv.number
FROM master.dbo.spt_values sv
WHERE sv.type = N'P'
ORDER BY sv.number
)
INSERT INTO dbo.Dim_Time (StartDateTime)
SELECT TOP(289) CONVERT(time(0), DATEADD(minute, (s3.number * 100 + s2.number * 10 + s1.number) * 5, CONVERT(time(0), '00:00:00')))
FROM src s1
CROSS JOIN src s2
CROSS JOIN src s3
ORDER BY s3.number * 100 + s2.number * 10 + s1.number
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
INSERT INTO dbo.Stg_IncomingQueue (CustomerID, InsertTime)
VALUES (1, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (2, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (3, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'))
, (4, DATEADD(SECOND, CONVERT(int, CRYPT_GEN_RANDOM(4), 0), '1901-01-01 00:00:00'));
This piece replaces your entire WHILE
loop, with a single UPDATE
statement, which is both more efficient, and easier to understand.
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON CONVERT(time(0), iq.InsertTime) >= t.StartDateTime AND CONVERT(time(0), iq.InsertTime) < t.EndDateTime;
The results, compared side-by-side with the Dim_Time table:
SELECT *
FROM dbo.Stg_IncomingQueue iq
INNER JOIN dbo.Dim_Time dt ON iq.TimeID = dt.TimeID;
The output looks like:
╔════════════╦════════════╦════════╦═════════════════════════╦════════╦═══════════════╗
║ IncomingID ║ CustomerID ║ TimeID ║ InsertTime ║ TimeID ║ StartDateTime ║
╠════════════╬════════════╬════════╬═════════════════════════╬════════╬═══════════════╣
║ 1 ║ 1 ║ 271 ║ 1875-06-30 22:31:49.000 ║ 271 ║ 22:30:00 ║
║ 2 ║ 2 ║ 116 ║ 1857-07-01 09:38:59.000 ║ 116 ║ 09:35:00 ║
║ 3 ║ 3 ║ 218 ║ 1854-09-18 18:08:39.000 ║ 218 ║ 18:05:00 ║
║ 4 ║ 4 ║ 221 ║ 1860-05-31 18:22:25.000 ║ 221 ║ 18:20:00 ║
╚════════════╩════════════╩════════╩═════════════════════════╩════════╩═══════════════╝
Assuming there isn't a massive amount of incoming rows, this may work fairly well. Be aware, I'm using CONVERT()
to convert the incoming datetime
column into a time(0)
value, which comes at a cost of the query optimizer not being able to use available statistics to help create a great plan. The "actual" query plan for the insert statement shows this warning:
Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)>=[dt].[StartDateTime]) may affect "SeekPlan" in query plan choice, Type conversion in expression (CONVERT(time(0),[iq].[InsertTime],0)<[Expr1002]) may affect "SeekPlan" in query plan choice.
If you need to avoid the type-conversion during the update, you can move that workload to the insert operation by updating the definition of dbo.Stg_IncomingQueue
to include a persisted computed column, as in:
CREATE TABLE dbo.Stg_IncomingQueue(
IncomingID int IDENTITY(1,1) NOT NULL,
CustomerID int NOT NULL,
TimeID int NULL,
InsertTime datetime NULL,
InsertTime0 AS CONVERT(TIME(0), InsertTime) PERSISTED
CONSTRAINT PK_IncomingQueueMonitor PRIMARY KEY CLUSTERED
(IncomingID ASC)
) ON [PRIMARY]
GO
The update statement then becomes:
UPDATE dbo.Stg_IncomingQueue
SET TimeID = t.TimeID
FROM dbo.Stg_IncomingQueue iq
INNER JOIN (
SELECT dt.TimeID
, dt.StartDateTime
, EndDateTime = LEAD(dt.StartDateTime, 1) OVER (ORDER BY dt.StartDateTime)
FROM dbo.Dim_Time dt
) t ON iq.InsertTime0 >= t.StartDateTime AND iq.InsertTime0 < t.EndDateTime;
edited 1 min ago
answered 23 mins ago
Max VernonMax Vernon
51.1k13112225
51.1k13112225
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
7 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)
value.
– Max Vernon
5 mins ago
add a comment |
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
7 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to atime(0)
value.
– Max Vernon
5 mins ago
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
7 mins ago
This looks great thank you. I am busy testing it now. I removed the CONVERT(time(0) as I need to compare that entire DateTime values as I have a few years of data to load. In addition I added "WHERE iq.TimeID IS NOT NULL" so that rows are not re-processed. Performance is way better as expected. Thank you.
– WadeH
7 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to a
time(0)
value.– Max Vernon
5 mins ago
Your question seems to imply you only need the 5-minute time-slot for each row, which is date-independent. That's the purpose of the conversion to a
time(0)
value.– Max Vernon
5 mins ago
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230890%2fslow-while-loop-query-improvment-assistance%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Why is this a while loop in the first place? When you think "I need to do x to each row" try to change your thinking to "I need to do x to all of the rows."
– Aaron Bertrand♦
21 mins ago