Tag Archives: Group by operator

Group by operator of hive execution plan

Group By Operator</code bbb>oup aggregate, common attribute

aggregations、grouping is for which aggregation function
mode, generally hash, computes the hash of keys
keys When there is no keys attribute, there is only one grouping.
outputColumnNames Temporary column names for output

For example

 explain select sum(sal) from tb_emp;

Look at its Group By Operator

+---------------------------------------------------------------------------------------------+
|Explain                                                                                      |
+---------------------------------------------------------------------------------------------+
|              Group By Operator                                                              |
|                aggregations: sum(sal)                                                       |
|                mode: hash                                                                   |
|                outputColumnNames: _col0                                                     |
|                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE|
+---------------------------------------------------------------------------------------------+

Again for instance

explain select deptno,sum(sal) from tb_emp group by deptno;

Look at its Group By Operator</code bbb>

+------------------------------------------------------------------------------------------------+
|Explain                                                                                         |
+------------------------------------------------------------------------------------------------+
|              Group By Operator                                                                 |
|                aggregations: sum(sal)                                                          |
|                keys: deptno (type: int)                                                        |
|                mode: hash                                                                      |
|                outputColumnNames: _col0, _col1                                                 |
|                Statistics: Num rows: 89 Data size: 718 Basic stats: COMPLETE Column stats: NONE|
+------------------------------------------------------------------------------------------------+

The group by implementation principle
The process of transforming a GROUP BY task into a MR task is as follows:

Map: Generate key-value pairs, using the column in the GROUP BY condition as the Key and the result of the aggregation function as the Value
Shuffle: Hash according to the value of the Key, and send the key-value pairs to different Reducers according to the Hash value
Reduce: Reduce based on the columns of the SELECT clause and the aggregation function

conclusion
Group By Operator</code bbb>s four attributes. g> By Operator can als>ve Group By oper> . Group by>rator
reference
Group by Execution Plan Analysis (Hive