Substring Between Two Characters

Though SQL may not be the most elegant language for string handling, it does perform most functions and in a set based manner. The SQL substring function basically has the same syntax as found in other languages. In this example we will take the common scenario of extracting a string from between two fixed characters.

Use caution when using the substring function. Invalid parameters (negative numbers) will cause errors

In our example we’ll start with a domain name that contains the subdomain. We will extract only the top level domain from the results. Keep in mind for the purposes of this example all the URLs listed in the table need to have a subdomain.

Setup the Data

Let’s begin by creating a new table named URLs and insert some data.
[cc lang=”sql”]
IF EXISTS (SELECT * FROM dbo.sysobjects WHERE id = OBJECT_ID(N’dbo.URLs’) AND OBJECTPROPERTY(id, N’IsUserTable’) = 1)
–DROP TABLE dbo.URLs

CREATE TABLE dbo.URLs(
url nvarchar(1024) NULL
)
INSERT dbo.URLs (url) VALUES (N’www.litwareinc.com/’)
GO
INSERT dbo.URLs (url) VALUES (N’www.treyresearch.net/’)
GO
INSERT dbo.URLs (url) VALUES (N’www.adatum.com/’)
GO
INSERT dbo.URLs (url) VALUES (N’www.northwindtraders.com/’)
GO
INSERT dbo.URLs (url) VALUES (N’www.wideworldimporters.com/’)
GO
INSERT dbo.URLs (url) VALUES (N’www.proseware.com/’)
[/cc]

Now that we have the data setup let’s start with the extraction. For the substring function, the first parameter is the string we want to parse from, the next parameter is the starting position to be extracted and the last parameter is the number of characters to extract.
[cc lang=”sql”]
DECLARE @first_char nvarchar(10)
DECLARE @second_char nvarchar(10)

SET @first_char = ‘.’;
SET @second_char = ‘/’;

SELECT
SUBSTRING
(
— column
url
— start position
,CHARINDEX(@first_char, url , 1) + 1
— length
,CASE
WHEN (CHARINDEX(@second_char, url , 0) – CHARINDEX(@first_char, url, 0)) > 0
THEN CHARINDEX(@second_char, url, 0) – CHARINDEX(@first_char, url, 0) – 1
ELSE 0
END
)
from dbo.URLs u
[/cc]

From here we see that the thir parameter is the most difficult to understand. What we are doing is subtracting a single character’s length from the number of characters to extract. This is because we do not want to include the @second_char in our string. What we have to do however is use a case statement to determine if the @second_char exists. If it does not, then we end up subtracting 1 from 0, leaving us with -1 for the 3rd parameter which is invalid.

And the result:

litwareinc.com
treyresearch.net
adatum.com
northwindtraders.com
wideworldimporters.com
proseware.com

If we want to return just the domain name without the .com or .net, we need to alter this a bit:

[cc lang=”sql”]

DECLARE @first_char nvarchar(10)
SET @first_char = ‘.’;

SELECT
SUBSTRING
(
— column
url
— start position
,CHARINDEX(@first_char, url , 1) + 1
— length
,LEN(url) – CHARINDEX(@first_char, url) – CHARINDEX(@first_char, REVERSE(url))
)
from dbo.URLs u
[/cc]

And the result for this:

litwareinc
treyresearch
adatum
northwindtraders
wideworldimporters
proseware

2 comments
Eric Smith 24 Aug 2015 at 11:24 pm

Here’s a dirty hack that works if you need to do something quick and you are sure the server does not have more than 4 parts:

select PARSENAME(‘www.treyresearch.net’, 2) + ‘.’ + PARSENAME(‘www.treyresearch.net’,1)

Result:
treyresearch.net

Carlos 24 Feb 2015 at 11:40 pm

I found this post well explained and very useful. Thank you for doing this. 🙂

Featured Articles

 Site Author

  • Thanks for visiting!