David M. Richter's Blog

SQL Server- Get value of not grouped columns ( Group By )

by David on Okt.02, 2009, under Software-Development

I have some table with documents in. Now there are to delete the all the versions of the documents but the latest one. First thought went to Groupings in SQL to get the latest (MAX)Date of the grouped document rows. But I found that I cannot put a column after the select Command which is not in the Group. Here is the case and my solution.

First the table:
create table #Hierarchies (documentId int not null, publishedDate datetime not null, ComplianceId tinyint, languageId tinyint, instId int not null, HierarchyName varchar(255) not null)
– HierarchyName would be the folder  the document belongs to
– instId is just some Tag as ComplianceId

Here would be the data:
documentId  publishedDate    ComplianceId  languageId   instId      HierarchyName

———– ———————– —————– ———- ———– ———-

368334      2009-04-15 17:15:11.000 110               102        302371      Corporate

368335      2009-04-15 17:15:11.000 110               101        302371      Corporate

370644      2009-05-05 09:46:08.000 110               102        302371      Corporate

370645      2009-05-05 09:46:08.000 110               101        302371      Corporate

379675      2009-07-23 13:51:25.000 110               102        302371      Corporate

379676      2009-07-23 13:51:25.000 110               101        302371      Corporate

382243      2009-08-13 04:30:34.000 110               102        302371      Corporate

382244      2009-08-13 04:30:34.000 110               101        302371      Corporate

383903      2009-08-27 17:07:41.000 110               102        302371      Corporate

383904      2009-08-27 17:07:41.000 110               101        302371      Corporate

First I find the duplicates and put them in an quasi temporary table:
With Duplicates( instId, HierarchyName, languageId, ComplianceId, MaxDate) AS
(
select distinct
h.instId, h.HierarchyName, h.languageId, h.ComplianceId, Max(publishedDate) as MaxDate
from #Hierarchies h
group By h.instId, h.HierarchyName , h.languageId, h.ComplianceId
having Count(documentId) >1
)

This gives me for the document distinction criteria as there are instId, HierarchyName, LanguageId and ComplianceId a list of all the documents which do not differ within the grouping criteria, but  have more than one occurrence which does differ in documentId or in PublishedDate. Also I get the most recent date with MAX(publishedDate).

Now I want to get all the documentIds of the older documents, in order to delete them.
Or one could get the DocumentId of the most recent document and then delete all the other older documents.
Unfortunately one cannot put a not grouped column after the select, in my case documentId. But how could I get to those DocumentIds?! I need those to delete the documents!!

After a bit of thinking the solution was pretty straightforward:
Join the Duplicates with #Hierarchies on the former grouping criterias and there the publishedDate is not as the most recent date.

select h.documentID from #Hierarchies h, Duplicates d
where h.publishedDate <> d.MaxDate
AND d. ComplianceId = h. ComplianceId
AND d.languageId = h.languageId
AND d.instId = h.instId
AND d.HierarchyName = h.HierarchyName

So that was the solution to get column values of group by selects which are not in the grouping.
Most probably there is a better solution. I would highly appreciate if someone lets me know or has any comments about the issue…

:,

2 Comments for this entry

  • Daniel J

    Untested, but the following should to the same in one shot:

    Cheers
    Daniel

    delete x from #Hierarchies x
    inner join
    (
    select
    max(publishedDate) maxdate,
    h.instId instId,
    h.HierarchyName hierarchyName,
    h.languageId languageId,
    h.ComplianceId complianceId
    from
    #Hierarchies h
    group by h.instId, h.HierarchyName , h.languageId, h.ComplianceId
    ) t
    on
    x.instId = t.instId and
    x.HierarchyName = t.HierarchyName and
    x.languageId = t.languageId and
    x.ComplianceId = t.ComplianceId
    where
    x.publishedDate < t.maxdate

  • David

    Yes right, but I require the the records in Duplicates for further processing
    So I have to know later which records were processed.

Leave a Reply

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!

Visit our friends!

A few highly recommended friends...

Archives

All entries, chronologically...