Find Duplicate Fields in a Table

Posted on June 6, 2010 by Derek Dieter
6

A common scenario when querying tables is the need to find duplicate fields within the same table. To do this is simple, it requires utilizing the GROUP BY clause and counting the number of recurrences. For example, lets take a customers table. Within the customers table, we want to find all the records where the customers FirstNames are the same. We also want to find which FirstNames are the same and count them. First off, let’s get a count of how many customers share the same first name: [cc lang=”sql”] SELECT FirstName ,DuplicateCount = COUNT(1) FROM SalesLT.Customer GROUP BY FirstName HAVING COUNT(1) > 1 — more than one value ORDER BY COUNT(1) DESC — sort by most duplicates [/cc] So here we see the results of the query sorted by the names which are duplicated the most. Using the having clause restricts the result set to only those customers that have duplicates. The order by clause orders the results by those who are duplicated the most. This method can also be expanded to include multiple columns, like FirstName and LastName. In order to expand the criteria, we simply add the columns to the select list and the group by clause. [cc lang=”sql”] SELECT FirstName ,LastName ,DuplicateCount = COUNT(1) FROM SalesLT.Customer GROUP BY FirstName ,LastName HAVING COUNT(1) > 1 — more than one value ORDER BY COUNT(1) DESC — sort by most duplicates [/cc] Ok, so now that we have found the duplicate items, how do we join that back on […] Continue reading ...

Using OpenQuery

Posted on June 6, 2010 by Derek Dieter
4

The OPENQUERY command is used to initiate an ad-hoc distributed query using a linked-server. It is initiated by specifying OPENQUERY as the table name in the from clause. Essentially, it opens a linked server, then executes a query as if executing from that server. While executing queries directly and receiving data directly in this way is not bad, there are implications when trying to join the results of OPENQUERY with a local table. Typically joining remote tables with local tables across the network is not the most efficient method of querying. In some cases it may be better to subscribe to a remote table by importing it locally, then joining it locally. Setup Ad Hoc Distributed Queries In order to utilize, a server-level configuration needs to be applied to the default configuration in order to allow ‘Ad Hoc Distributed Queries’. In order to see the current setting, run the following: [cc lang=”sql”] EXEC sp_configure [/cc] If you see the config_value set to 1 (as the figure below), then the option is already set. Otherwise you need to change the configuration. To set the configuration, simply run the following code below: [cc lang=”sql”] — Enable Ad Hoc Distributed Queries EXEC sp_configure ‘Ad Hoc Distributed Queries’,1 RECONFIGURE WITH OVERRIDE GO [/cc] OpenQuery Example The next step is to find or define the link server required and specify the linked server name as the first parameter in OPENQUERY. This tutorial already assumes you have a linked server setup, so from here we execute […]

Clustered Index

Posted on June 6, 2010 by Derek Dieter
No

The clustered index on a table can be defined as: the sort order for how the data for the table is actually stored. Being that the clustered index is the actual data itself, you cannot have two clustered indexes. You can however have many non clustered indexes. These non clustered indexes are a subset of the columns for the table that point back to the clustered index itself. If a clustered index is not defined, then the table is considered to be a heap. Or just a scattered array of rows with no particular sort order. The only time a heap is considered useful, is when insert speed is a factor. When new rows are inserted into a heap, they do not need to get stored in any particular order and thus do not need to be squeezed into any particular “page” of an index. Whereas, the clustered index insert needs to be placed (or squeezed) into a specific location in order to maintain the underlying sort order. This process of being squeezed occurs on what is called a page of the index. Multiple pages comprise the entire index, and there is also a setting defining how “full” pages should be kept. This fullness percentage is called “fill factor” and a lower percentage accommodates inserts into the proper sort order without having to push rows to the next page in order to fit the incoming rows. So if insert speed is not a factor, or inserts are not happening often […] Continue reading ...

SQL Server Cast

Posted on June 4, 2010 by Derek Dieter
8

The SQL Server cast function is the easiest data type conversion function to be used compared to the CONVERT function. It takes only one parameter followed by the AS clause to convert a specified value. A quick example is the following: [cc lang=”sql”] SELECT UserID_String = CAST(UserID AS varchar(50)) FROM dbo.User [/cc] This example will convert the integer to a character value. So what is the use of this? Well, more likely that not if you’re on this page you know the use, however it is usually used to concatenate or perform string like comparisons on numeric values, or to convert a numeric or string value in order to compare. An example is when trying to manually construct date parts into a custom format, you need to append the number comprising the date into a string. Say you wanted to make the date 2010/06/05 into a custom format of: 20100605. In order to do so, you need to utilize the datepart function then concatenate the results. The problem is, when you try to concatenate the results of datepart, instead of concatenating, the number get summed together. To get around this, simply use CAST. [cc lang=”sql”] DECLARE @mydate as DATETIME SET @mydate = GETDATE() — wrong way SELECT DATEPART(yy, @mydate) + DATEPART(mm, @mydate) + DATEPART(dd, @mydate) — output: 2020 — correct way SELECT CAST(DATEPART(yy, @mydate) AS VARCHAR(10)) — append a zero before the date so we get two digits on a one digit month + RIGHT(‘0’ + CAST(DATEPART(mm, @mydate) AS VARCHAR(10)), […]

using sp_addrolemember

Posted on June 4, 2010 by Derek Dieter
No

To add a user or group to a role, you need to use sp_addrolemember. This procedure is easy to use as it only accepts two parameters, the role name, and the username (or group). Roles are utilized in order to provide a layer of abstraction from permissions from being applied directly to users. While there are two different places in which roles can be defined (the server level or the database level), the sp_addrolemember procedure only works at the database level. (In order to add users or groups to the server-level roles, use: sp_addsrvrolemember). To add a user or group to a particular role execute the following: [cc lang=”sql”] USE YourDatabaseHere GO EXEC sp_addrolemember ‘db_datareader’, ‘derekdieter’ [/cc] This code adds my own account to the dv_datareader role for my database. While this practice may be relatively ok, it is still far better to create a custom role and assign read permissions to the custom role, then use sp_addrolemember to add all the users or the group. As an example, let’s create a sample role: [cc lang=”sql”] CREATE ROLE ReadExecute AUTHORIZATION dbo [/cc] Now that we have created the empty role, let’s assign it read & execute permissions to the entire ‘dbo’ schema. This will allow read & execute permissions for the ReadExecute role to the entire dbo schema. [cc lang=”sql”] use YourDatabaseHere GO GRANT EXECUTE ON SCHEMA::[dbo] TO [ReadExecute] GO use YourDatabaseHere GO GRANT SELECT ON SCHEMA::[dbo] TO [ReadExecute] GO [/cc] Finally, we add the role members. Remember, you can […] Continue reading ...

Find Duplicate Fields in a Table

Using OpenQuery

Clustered Index

SQL Server Cast

using sp_addrolemember

Featured Articles

Site Author

Resources