Tag Archives: The execution plan

Hive view execution plan

explain select deptno `dept`,
       year(hiredate) `year`,
       sum(sal)
from tb_emp
group by deptno, year(hiredate);

1. There are several stages
So let’s say we have two of them in this case

+------------------------------------+
|Explain                             |
+------------------------------------+
|STAGE DEPENDENCIES:                 |
|  Stage-1 is a root stage           |
|  Stage-0 depends on stages: Stage-1|
+------------------------------------+

Stage 0 is dependent on stage1, which means stage1 is executed first, then stage 0
1 View the Map phase of stage1
You can see that the Map phase is mostly done
Table Scanning The table data volume is statistically retrieved by the Expressions block

+-------------------------------------------------------------------------------------------------+
|Explain                                                                                          |
+-------------------------------------------------------------------------------------------------+
|    Map Reduce                                                                                   |
|      Map Operator Tree:                                                                         |
|          TableScan                                                                              |
|            alias: tb_emp                                                                        |
|            Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE      |
|            Select Operator                                                                      |
|              expressions: deptno (type: int), year(hiredate) (type: int), sal (type: float)     |
|              outputColumnNames: _col0, _col1, _col2                                             |
|              Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE    |
|              Group By Operator                                                                  |
|                aggregations: sum(_col2)                                                         |
|                keys: _col0 (type: int), _col1 (type: int)                                       |
|                mode: hash                                                                       |
|                outputColumnNames: _col0, _col1, _col2                                           |
|                Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE  |
|                Reduce Output Operator                                                           |
|                  key expressions: _col0 (type: int), _col1 (type: int)                          |
|                  sort order: ++                                                                 |
|                  Map-reduce partition columns: _col0 (type: int), _col1 (type: int)             |
|                  Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE|
|                  value expressions: _col2 (type: double)                                        |
+-------------------------------------------------------------------------------------------------+

3. Look at the Reduce phase
Determine input and output formats

+-------------------------------------------------------------------------------------------+
|Explain                                                                                    |
+-------------------------------------------------------------------------------------------+
|      Reduce Operator Tree:                                                                |
|        Group By Operator                                                                  |
|          aggregations: sum(VALUE._col0)                                                   |
|          keys: KEY._col0 (type: int), KEY._col1 (type: int)                               |
|          mode: mergepartial                                                               |
|          outputColumnNames: _col0, _col1, _col2                                           |
|          Statistics: Num rows: 3 Data size: 359 Basic stats: COMPLETE Column stats: NONE  |
|          File Output Operator                                                             |
|            compressed: false                                                              |
|            Statistics: Num rows: 3 Data size: 359 Basic stats: COMPLETE Column stats: NONE|
|            table:                                                                         |
|                input format: org.apache.hadoop.mapred.SequenceFileInputFormat             |
|                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat   |
|                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                  |
+-------------------------------------------------------------------------------------------+

reference
Hive Experiment 5: Check out the HQL execution plan and key steps in _HeroicPoem column -CSDN blog _Hive View the execution plan
LanguageManual Explain – Apache Hive – Apache Software Foundation