A pivot query is when you want to take some data such as:
C1 C2 C3
----- ----- ------
a1 b1 x1
a1 b1 x2
a1 b1 x3
and you would like to display it as:
C1 C2 C3(1) C3(2) C3(3)
----- ----- ------ ----- ----
a1 b1 x1 x2 x3
Basically it turning rows into columns. For example taking the distinct jobs within a department and
making them be columns so the output would look like:
DEPTNO JOB_1 JOB_2 JOB_3
---------- --------- --------- ---------
10 CLERK MANAGER PRESIDENT
20 ANALYST ANALYST CLERK
30 CLERK MANAGER SALESMAN
instead of
DEPTNO JOB
---------- ---------
10 CLERK
10 MANAGER
10 PRESIDENT
20 ANALYST
20 CLERK
20 MANAGER
30 CLERK
30 MANAGER
30 SALESMAN
I'm going to show two examples for pivots. The first will be another implementation of the
preceding question. The second shows how to pivot any result set in a generic fashion and gives you
a template for doing so.
In the first case let's say you wanted to show the top 3 salary earners in each department as
COLUMNS. That is the query would return exactly 1 row per department and the row would have 4
columns the DEPTNO, the name of the highest paid employee in the department, the name of the next
highest paid and so on. Using this new functionality this is almost easy (before these functions
this was virtually impossible):
ops$tkyte@DEV816> select deptno,
2 max(decode(seq,1,ename,null)) highest_paid,
3 max(decode(seq,2,ename,null)) second_highest,
4 max(decode(seq,3,ename,null)) third_highest
5 from ( SELECT deptno, ename,
6 row_number() OVER
7 (PARTITION BY deptno
8 ORDER BY sal desc NULLS LAST ) seq
9 FROM emp )
10 where seq <= 3
11 group by deptno
12 /
DEPTNO HIGHEST_PA SECOND_HIG THIRD_HIGH
---------- ---------- ---------- ----------
10 KING CLARK MILLER
20 SCOTT FORD JONES
30 BLAKE ALLEN TURNER
That simply created an inner result set that had a sequence assigned to employees BY DEPTNO in
order of SAL. The decode in the outer query keeps only rows with sequences 1, 2, or 3 and assigns
them to the correct "column". The group by gets rid of the redundant rows and we are left with our
collapsed result. It may be easier to understand what I mean by that if you see the result set
without the group by and max:
scott@TKYTE816> select deptno,
2 (decode(seq,1,ename,null)) highest_paid,
3 (decode(seq,2,ename,null)) second_highest,
4 (decode(seq,3,ename,null)) third_highest
5 from ( SELECT deptno, ename,
6 row_number() OVER
7 (PARTITION BY deptno
8 ORDER BY sal desc NULLS LAST ) seq
9 FROM emp )
10 where seq <= 3
11 /
DEPTNO HIGHEST_PA SECOND_HIG THIRD_HIGH
---------- ---------- ---------- ----------
10 KING
10 CLARK
10 MILLER
20 SCOTT
20 FORD
20 JONES
30 ALLEN
30 BLAKE
30 MARTIN
9 rows selected.
The MAX aggregate function will be applied by the GROUP BY column DEPTNO. In any given DEPTNO
above only one row will have a non null value for HIGHTEST_PAID, the remaining rows in that group
will always be NULL. The MAX function will pick out the non-null row and keep that for us. Hence
the group by and MAX will collapse our result set, removing the NULL values from it and giving us
what we want.
In general, if you have a table T with columns C1, C2 and you would like to get a result like:
C1 C2(1) C2(2) . C2(N)
Where column C1 is to stay "cross record" and column C2 will be pivoted to be "in record" the
values of C2 are to become columns instead of rows you will generate a query of the form:
Select c1
max(decode(rn,1,c2,null)) c2_1,
max(decode(rn,2,c2,null)) c2_2,
max(decode(rn,N,c2,null)) c2_N
from ( select c1, c2
row_number() over ( partition by C1
order by
from T
)
group by C1
. In the above example, C1 was simply DEPTNO and C2 was ENAME. Since we ordered by SAL DESC, the
first three columns we retrieved where the top three paid employees in that department (bearing in
mind that if four people made the top three, we would of course lose one).
The second example is a more generic "I want to pivot my result set". Here, instead of having a
single column C1 to anchor on and a single column C2 to pivot we'll look at the more general case
where C1 is a set of columns as is C2. As it turns out, this is very similar to the above.
Suppose you want to report by JOB and DEPTNO the employees in that job and their salary. The
report needs to have the employees going ACROSS the page as columns however, not down the page
the same with their salaries. Additionally, the employees need to appear from left to right in
order of their salary. The steps would be:
scott@TKYTE816> select max(count(*)) from emp group by deptno, job;
MAX(COUNT(*))
-------------
4
That tells us the number of columns, now we can generate the query:
scott@TKYTE816> select deptno, job,
2 max( decode( rn, 1, ename, null )) ename_1,
3 max( decode( rn, 1, sal, null )) sal_1,
4 max( decode( rn, 2, ename, null )) ename_2,
5 max( decode( rn, 2, sal, null )) sal_2,
6 max( decode( rn, 3, ename, null )) ename_3,
7 max( decode( rn, 3, sal, null )) sal_3,
8 max( decode( rn, 4, ename, null )) ename_4,
9 max( decode( rn, 4, sal, null )) sal_4
10 from ( select deptno, job, ename, sal,
11 row_number() over ( partition by deptno, job
12 order by sal, ename ) rn
13 from emp
14 )
15 group by deptno, job
16 /
DEPTNO JOB ENAME_1 SAL_1 ENAME_2 SAL_2 ENAME_3 SAL_3 ENAME_ SAL_4
------ --------- ------ ----- --------- ----- ---------- ----- ------ -----
10 CLERK MILLER 1300
10 MANAGER CLARK 2450
10 PRESIDENT KING 5000
20 ANALYST FORD 3000 SCOTT 3000
20 CLERK SMITH 800 ADAMS 1100
20 MANAGER JONES 2975
30 CLERK JAMES 99
30 MANAGER BLAKE 99
30 SALESMAN ALLEN 99 MARTIN 99 TURNER 99 WARD 99
9 rows selected.
In general, to pivot a result set, we can generalize further. If you have a set of columns C1, C2,
C3, CN and you want to keep columns C1 .. Cx cross record (going down the page) and Cx+1 CN
in record (across the page), you can:
Select C1, C2, CX,
max(decode(rn,1,C{X+1},null)) cx+1_1, max(decode(rn,1,CN,null)) CN_1
max(decode(rn,2,C{X+1},null)) cx+1_2, max(decode(rn,1,CN,null)) CN_2
max(decode(rn,N,c{X+1},null)) cx+1_N, max(decode(rn,1,CN,null)) CN_N
from ( select C1, C2, CN
row_number() over ( partition by C1, C2, CX
order by
from T
)
group by C1, C2, CX
In the example, we used C1, C2 = DEPTNO, JOB and C3, C4 = ENAME, SAL
One other thing we must know is the MAXIMUM number of rows per partition we anticipate. This will
dictate the number of columns we will be generating. Without it we cannot pivot. SQL needs to
know the number of columns and there is no way around that fact. That leads us into the next more
generic example of pivoting. If we do not know the number of total columns until runtime, we'll
have to use dynamic SQL to deal with the fact that the SELECT list is variable. We can use PL/SQL
to demonstrate how to do this and end up with a generic routine that can be reused whenever you
need a pivot. This routine will have the following specification:
scott@TKYTE816> create or replace package my_pkg
2 as
3 type refcursor is ref cursor;
4 type array is table of varchar2(30);
5
6 procedure pivot( p_max_cols in number default NULL,
7 p_max_cols_query in varchar2 default NULL,
8 p_query in varchar2,
9 p_anchor in array,
10 p_pivot in array,
11 p_cursor in out refcursor );
12 end;
13 /
Package created.
Here, you must send in either P_MAX_COLS or P_MAX_COLS_QUERY. SQL needs to know the number of
columns in a query and this parameter will allow us to build a query with the proper number of
columns. The value you should send in here will be the output of a query similar to:
scott@TKYTE816> select max(count(*)) from emp group by deptno, job;
That is: it is the count of the discrete values that are currently in ROWS that we will put into
COLUMNS. You can either send in the query to get this number, or the number if you already know
it.
The P_QUERY parameter is simply the query that gathers your data together. Using the last example
from above the query would be:
10 from ( select deptno, job, ename, sal,
11 row_number() over ( partition by deptno, job
12 order by sal, ename ) rn
13 from emp
14 )
The next two inputs are arrays of column names. The P_ANCHOR tells us what columns will stay CROSS
RECORD (down the page) and P_PIVOT states the columns that will go IN RECORD (across the page). In
our example from above, P_ANCHOR = ( DEPTNO, JOB ) and P_PIVOT = (ENAME,SAL). Skipping
over the implementation for a moment, the entire call put together might look like this:
scott@TKYTE816> variable x refcursor
scott@TKYTE816> set autoprint on
scott@TKYTE816> begin
2 my_pkg.pivot
3 ( p_max_cols_query => 'select max(count(*)) from emp
group by deptno,job',
4 p_query => 'select deptno, job, ename, sal,
5 row_number() over ( partition by deptno, job
6 order by sal, ename ) rn
7 from emp a',
8 p_anchor => my_pkg.array( 'DEPTNO','JOB' ),
9 p_pivot => my_pkg.array( 'ENAME', 'SAL' ),
10 p_cursor => :x );
11 end;
12 /
PL/SQL procedure successfully completed.
DEPTNO JOB ENAME_ SAL_1 ENAME_2 SAL_2 ENAME_3 SAL_3 ENAME_ SAL_4
------ --------- ------ ----- ---------- ----- ---------- ----- ------ -----
10 CLERK MILLER 1300
10 MANAGER CLARK 2450
10 PRESIDENT KING 5000
20 ANALYST FORD 3000 SCOTT 3000
20 CLERK SMITH 800 ADAMS 1100
20 MANAGER JONES 2975
30 CLERK JAMES 99
30 MANAGER BLAKE 99
30 SALESMAN ALLEN 99 MARTIN 99 TURNER 99 WARD 99
9 rows selected.
As you can see that dynamically rewrote our query using the generalized template we developed.
The implementation of the package body is straightforward:
scott@TKYTE816> create or replace package body my_pkg
2 as
3
4 procedure pivot( p_max_cols in number default NULL,
5 p_max_cols_query in varchar2 default NULL,
6 p_query in varchar2,
7 p_anchor in array,
8 p_pivot in array,
9 p_cursor in out refcursor )
10 as
11 l_max_cols number;
12 l_query long;
13 l_cnames array;
14 begin
15 -- figure out the number of columns we must support
16 -- we either KNOW this or we have a query that can tell us
17 if ( p_max_cols is not null )
18 then
19 l_max_cols := p_max_cols;
20 elsif ( p_max_cols_query is not null )
21 then
22 execute immediate p_max_cols_query into l_max_cols;
23 else
24 raise_application_error(-20001, 'Cannot figure out max cols');
25 end if;
26
27
28 -- Now, construct the query that can answer the question for us...
29 -- start with the C1, C2, ... CX columns:
30
31 l_query := 'select ';
32 for i in 1 .. p_anchor.count
33 loop
34 l_query := l_query || p_anchor(i) || ',';
35 end loop;
36
37 -- Now add in the C{x+1}... CN columns to be pivoted:
38 -- the format is "max(decode(rn,1,C{X+1},null)) cx+1_1"
39
40 for i in 1 .. l_max_cols
41 loop
42 for j in 1 .. p_pivot.count
43 loop
44 l_query := l_query ||
45 'max(decode(rn,'||i||','||
46 p_pivot(j)||',null)) ' ||
47 p_pivot(j) || '_' || i || ',';
48 end loop;
49 end loop;
50
51 -- Now just add in the original query
52 l_query := rtrim(l_query,',')||' from ( '||p_query||') group by ';
53
54 -- and then the group by columns...
55
56 for i in 1 .. p_anchor.count
57 loop
58 l_query := l_query || p_anchor(i) || ',';
59 end loop;
60 l_query := rtrim(l_query,',');
61
62 -- and return it
63 execute immediate 'alter session set cursor_sharing=force';
64 open p_cursor for l_query;
65 execute immediate 'alter session set cursor_sharing=exact';
66 end;
67
68 end;
69 /
Package body created.
It only does a little string manipulation to rewrite the query and open a REF CURSOR dynamically.
In the likely event the query had a predicate with constants and such in it, we set cursor sharing
on and then back off for the parse of this query to facilitate bind variables (see the section on
tuning for more information on that). Now we have a fully parsed query that is ready to be fetched
from.
Tom,
I have a tricky turning "columns into rows" predicament...
Here are the steps to set up my problem...
----------------------------
-- TARGET table to store list of account_ids
----------------------------
create table demo_account_list
(
account_id number
)
/
----------------------------
-- SOURCE table of account_ids
----------------------------
create table demo_account_sources
(
acct_1 number
, acct_2 number
, acct_3 number
, acct_4 number
)
/
----------------------------
-- Populate the account_id SOURCE table
----------------------------
insert into demo_account_sources values (1,2,3,4);
insert into demo_account_sources values (13,22,433,44261);
insert into demo_account_sources values (10,342,32342,33443);
insert into demo_account_sources values (15,26,737,48);
commit;
------------------------------
-- Want to capture only the EVEN account_ids
-- This doesn't work!
------------------------------
insert into demo_account_list
(
select acct_1, acct_2, acct_3, acct_4
from demo_account_sources
-----------------------------
-- Want to load only EVEN account_ids
-----------------------------
where mod(acct_1,2) = 0
or mod(acct_2,2) = 0
or mod(acct_3,2) = 0
or mod(acct_4,2) = 0
)
/
----------------------------
Of course, the above query returns this error...
----------------------------
insert into demo_account_list
*
ERROR at line 1:
ORA-00913: too many values
----------------------------
My question is, how can I write this query in such a way that I can select >1 rows yet insert into
a single column?
Thanks,
Robert
Followup December 20, 2006 - 7pm US/Eastern:
ops$tkyte%ORA10GR2> select decode( r, 1, acct_1, 2, acct_2, 3, acct_3, 4, acct_4 ) acct
2 from demo_account_sources,
3 (select 1 r from dual union all select 2 from dual
4 union all select 3 from dual union all select 4 from dual )
5 /
ACCT
----------
1
13
10
15
2
22
342
26
3
433
32342
737
4
44261
33443
48
16 rows selected.
No comments:
Post a Comment