SQL Server- Get value of not grouped columns ( Group By )
by David on Okt.02, 2009, under Software-Development
I have some table with documents in. Now there are to delete the all the versions of the documents but the latest one. First thought went to Groupings in SQL to get the latest (MAX)Date of the grouped document rows. But I found that I cannot put a column after the select Command which is not in the Group. Here is the case and my solution.
First the table:
create table #Hierarchies (documentId int not null, publishedDate datetime not null, ComplianceId tinyint, languageId tinyint, instId int not null, HierarchyName varchar(255) not null)
– HierarchyName would be the folder the document belongs to
– instId is just some Tag as ComplianceId
Here would be the data:
documentId publishedDate ComplianceId languageId instId HierarchyName
———– ———————– —————– ———- ———– ———-
368334 2009-04-15 17:15:11.000 110 102 302371 Corporate
368335 2009-04-15 17:15:11.000 110 101 302371 Corporate
370644 2009-05-05 09:46:08.000 110 102 302371 Corporate
370645 2009-05-05 09:46:08.000 110 101 302371 Corporate
379675 2009-07-23 13:51:25.000 110 102 302371 Corporate
379676 2009-07-23 13:51:25.000 110 101 302371 Corporate
382243 2009-08-13 04:30:34.000 110 102 302371 Corporate
382244 2009-08-13 04:30:34.000 110 101 302371 Corporate
383903 2009-08-27 17:07:41.000 110 102 302371 Corporate
383904 2009-08-27 17:07:41.000 110 101 302371 Corporate
First I find the duplicates and put them in an quasi temporary table:
With Duplicates( instId, HierarchyName, languageId, ComplianceId, MaxDate) AS
(
select distinct
h.instId, h.HierarchyName, h.languageId, h.ComplianceId, Max(publishedDate) as MaxDate
from #Hierarchies h
group By h.instId, h.HierarchyName , h.languageId, h.ComplianceId
having Count(documentId) >1
)
This gives me for the document distinction criteria as there are instId, HierarchyName, LanguageId and ComplianceId a list of all the documents which do not differ within the grouping criteria, but have more than one occurrence which does differ in documentId or in PublishedDate. Also I get the most recent date with MAX(publishedDate).
Now I want to get all the documentIds of the older documents, in order to delete them.
Or one could get the DocumentId of the most recent document and then delete all the other older documents.
Unfortunately one cannot put a not grouped column after the select, in my case documentId. But how could I get to those DocumentIds?! I need those to delete the documents!!
After a bit of thinking the solution was pretty straightforward:
Join the Duplicates with #Hierarchies on the former grouping criterias and there the publishedDate is not as the most recent date.
select h.documentID from #Hierarchies h, Duplicates d
where h.publishedDate <> d.MaxDate
AND d. ComplianceId = h. ComplianceId
AND d.languageId = h.languageId
AND d.instId = h.instId
AND d.HierarchyName = h.HierarchyName
So that was the solution to get column values of group by selects which are not in the grouping.
Most probably there is a better solution. I would highly appreciate if someone lets me know or has any comments about the issue…
Oktober 6th, 2009 on 17:20
Untested, but the following should to the same in one shot:
Cheers
Daniel
delete x from #Hierarchies x
inner join
(
select
max(publishedDate) maxdate,
h.instId instId,
h.HierarchyName hierarchyName,
h.languageId languageId,
h.ComplianceId complianceId
from
#Hierarchies h
group by h.instId, h.HierarchyName , h.languageId, h.ComplianceId
) t
on
x.instId = t.instId and
x.HierarchyName = t.HierarchyName and
x.languageId = t.languageId and
x.ComplianceId = t.ComplianceId
where
x.publishedDate < t.maxdate
Oktober 22nd, 2009 on 12:49
Yes right, but I require the the records in Duplicates for further processing
So I have to know later which records were processed.