Substring Between Two Characters
-
Posted on October 9, 2012 by Derek Dieter
-
1
Though SQL may not be the most elegant language for string handling, it does perform most functions and in a set based manner. The SQL substring function basically has the same syntax as found in other languages. In this example we will take the common scenario of extracting a string from between two fixed characters.
In our example we’ll start with a domain name that contains the subdomain. We will extract only the top level domain from the results. Keep in mind for the purposes of this example all the URLs listed in the table need to have a subdomain.
Setup the Data
Let’s begin by creating a new table named URLs and insert some data.
[cc lang=”sql”]
IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N’dbo.URLs’) AND OBJECTPROPERTY(id, N’IsUserTable’) = 1)
–DROP TABLE dbo.URLs
CREATE TABLE dbo.URLs(
url nvarchar(1024) NULL
)
INSERT dbo.URLs (url) VALUES (N’www.litwareinc.com/’)
GO
INSERT dbo.URLs (url) VALUES (N’www.treyresearch.net/’)
GO
INSERT dbo.URLs (url) VALUES (N’www.adatum.com/’)
GO
INSERT dbo.URLs (url) VALUES (N’www.northwindtraders.com/’)
GO
INSERT dbo.URLs (url) VALUES (N’www.wideworldimporters.com/’)
GO
INSERT dbo.URLs (url) VALUES (N’www.proseware.com/’)
[/cc]
Now that we have the data setup let’s start with the extraction. For the substring function, the first parameter is the string we want to parse from, the next parameter is the starting position to be extracted and the last parameter is the number of characters to extract.
[cc lang=”sql”]
DECLARE @first_char nvarchar(10)
DECLARE @second_char nvarchar(10)
SET @first_char = ‘.’;
SET @second_char = ‘/’;
SELECT
SUBSTRING
(
— column
url
— start position
,CHARINDEX(@first_char, url , 1) + 1
— length
,CASE
WHEN (CHARINDEX(@second_char, url , 0) – CHARINDEX(@first_char, url, 0)) > 0
THEN CHARINDEX(@second_char, url, 0) – CHARINDEX(@first_char, url, 0) – 1
ELSE 0
END
)
from dbo.URLs u
[/cc]
From here we see that the thir parameter is the most difficult to understand. What we are doing is subtracting a single character’s length from the number of characters to extract. This is because we do not want to include the @second_char in our string. What we have to do however is use a case statement to determine if the @second_char exists. If it does not, then we end up subtracting 1 from 0, leaving us with -1 for the 3rd parameter which is invalid.
And the result:
litwareinc.com
treyresearch.net
adatum.com
northwindtraders.com
wideworldimporters.com
proseware.com
If we want to return just the domain name without the .com or .net, we need to alter this a bit:
[cc lang=”sql”]
DECLARE @first_char nvarchar(10)
SET @first_char = ‘.’;
SELECT
SUBSTRING
(
— column
url
— start position
,CHARINDEX(@first_char, url , 1) + 1
— length
,LEN(url) – CHARINDEX(@first_char, url) – CHARINDEX(@first_char, REVERSE(url))
)
from dbo.URLs u
[/cc]
And the result for this:
litwareinc
treyresearch
adatum
northwindtraders
wideworldimporters
proseware
- Comments (RSS)
- Trackback
- Permalink