explain select deptno `dept`,
year(hiredate) `year`,
sum(sal)
from tb_emp
group by deptno, year(hiredate);
1. There are several stages
So let’s say we have two of them in this case
+------------------------------------+
|Explain |
+------------------------------------+
|STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1|
+------------------------------------+
Stage 0 is dependent on stage1, which means stage1 is executed first, then stage 0
1 View the Map phase of stage1
You can see that the Map phase is mostly done
Table Scanning The table data volume is statistically retrieved by the Expressions block
+-------------------------------------------------------------------------------------------------+
|Explain |
+-------------------------------------------------------------------------------------------------+
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: tb_emp |
| Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: deptno (type: int), year(hiredate) (type: int), sal (type: float) |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE |
| Group By Operator |
| aggregations: sum(_col2) |
| keys: _col0 (type: int), _col1 (type: int) |
| mode: hash |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: _col0 (type: int), _col1 (type: int) |
| sort order: ++ |
| Map-reduce partition columns: _col0 (type: int), _col1 (type: int) |
| Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE|
| value expressions: _col2 (type: double) |
+-------------------------------------------------------------------------------------------------+
3. Look at the Reduce phase
Determine input and output formats
+-------------------------------------------------------------------------------------------+
|Explain |
+-------------------------------------------------------------------------------------------+
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: sum(VALUE._col0) |
| keys: KEY._col0 (type: int), KEY._col1 (type: int) |
| mode: mergepartial |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 3 Data size: 359 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 3 Data size: 359 Basic stats: COMPLETE Column stats: NONE|
| table: |
| input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
+-------------------------------------------------------------------------------------------+
reference
Hive Experiment 5: Check out the HQL execution plan and key steps in _HeroicPoem column -CSDN blog _Hive View the execution plan
LanguageManual Explain – Apache Hive – Apache Software Foundation