Group By Operator</code bbb>oup aggregate, common attribute
aggregationsăgrouping is for which aggregation function
mode, generally hash, computes the hash of keys
keys When there is no keys attribute, there is only one grouping.
outputColumnNames Temporary column names for output
For example
explain select sum(sal) from tb_emp;
Look at its Group By Operator
+---------------------------------------------------------------------------------------------+
|Explain |
+---------------------------------------------------------------------------------------------+
| Group By Operator |
| aggregations: sum(sal) |
| mode: hash |
| outputColumnNames: _col0 |
| Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE|
+---------------------------------------------------------------------------------------------+
Again for instance
explain select deptno,sum(sal) from tb_emp group by deptno;
Look at its Group By Operator</code bbb>
+------------------------------------------------------------------------------------------------+
|Explain |
+------------------------------------------------------------------------------------------------+
| Group By Operator |
| aggregations: sum(sal) |
| keys: deptno (type: int) |
| mode: hash |
| outputColumnNames: _col0, _col1 |
| Statistics: Num rows: 89 Data size: 718 Basic stats: COMPLETE Column stats: NONE|
+------------------------------------------------------------------------------------------------+
The group by implementation principle
The process of transforming a GROUP BY task into a MR task is as follows:
Map: Generate key-value pairs, using the column in the GROUP BY condition as the Key and the result of the aggregation function as the Value
Shuffle: Hash according to the value of the Key, and send the key-value pairs to different Reducers according to the Hash value
Reduce: Reduce based on the columns of the SELECT clause and the aggregation function
conclusion
Group By Operator</code bbb>s four attributes.
g> By Operator
can als>ve Group By oper>
. Group by>rator
reference
Group by Execution Plan Analysis (Hive