explain select deptno `dept`,
year(hiredate) `year`,
sum(sal)
from tb_emp
group by deptno, year(hiredate);
1. There are several stages
So let’s say we have two of them in this case
+------------------------------------+
|Explain |
+------------------------------------+
|STAGE DEPENDENCIES: |
| Stage-1 is a root stage |
| Stage-0 depends on stages: Stage-1|
+------------------------------------+
Stage 0 is dependent on stage1, which means stage1 is executed first, then stage 0
1 View the Map phase of stage1
You can see that the Map phase is mostly done
Table Scanning The table data volume is statistically retrieved by the Expressions block
+-------------------------------------------------------------------------------------------------+
|Explain |
+-------------------------------------------------------------------------------------------------+
| Map Reduce |
| Map Operator Tree: |
| TableScan |
| alias: tb_emp |
| Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE |
| Select Operator |
| expressions: deptno (type: int), year(hiredate) (type: int), sal (type: float) |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE |
| Group By Operator |
| aggregations: sum(_col2) |
| keys: _col0 (type: int), _col1 (type: int) |
| mode: hash |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE |
| Reduce Output Operator |
| key expressions: _col0 (type: int), _col1 (type: int) |
| sort order: ++ |
| Map-reduce partition columns: _col0 (type: int), _col1 (type: int) |
| Statistics: Num rows: 6 Data size: 718 Basic stats: COMPLETE Column stats: NONE|
| value expressions: _col2 (type: double) |
+-------------------------------------------------------------------------------------------------+
3. Look at the Reduce phase
Determine input and output formats
+-------------------------------------------------------------------------------------------+
|Explain |
+-------------------------------------------------------------------------------------------+
| Reduce Operator Tree: |
| Group By Operator |
| aggregations: sum(VALUE._col0) |
| keys: KEY._col0 (type: int), KEY._col1 (type: int) |
| mode: mergepartial |
| outputColumnNames: _col0, _col1, _col2 |
| Statistics: Num rows: 3 Data size: 359 Basic stats: COMPLETE Column stats: NONE |
| File Output Operator |
| compressed: false |
| Statistics: Num rows: 3 Data size: 359 Basic stats: COMPLETE Column stats: NONE|
| table: |
| input format: org.apache.hadoop.mapred.SequenceFileInputFormat |
| output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat |
| serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe |
+-------------------------------------------------------------------------------------------+
reference
Hive Experiment 5: Check out the HQL execution plan and key steps in _HeroicPoem column -CSDN blog _Hive View the execution plan
LanguageManual Explain – Apache Hive – Apache Software Foundation
Read More:
- Group by operator of hive execution plan
- Error: cannot fetch last explain plan from plan_table
- Failed: execution error, return code 1 from org.apache.hadoop . hive.ql.exec .DDLTask…
- FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(me
- [Solved] hiveonspark:Execution Error, return code 30041 from org.apache.hadoop.hive.ql.exec.spark.SparkTask
- Execution error, return code 1 from org.apache.hadoop . hive.ql.exec .DDLTask.
- Hive execution task report cannot find main class error
- “Hive metadata problem” hive.metastore.HiveMetaException : Failed to get schema version.
- About error 1005 (HY000) in MySQL: can’t create table ‘_______ ‘(errno: 150) fool’s plan
- PSQLException: ERROR: cached plan must not change result type
- The difference between hive and relational database
- Hive SQL syntax error and corresponding solutions
- Invalid column reference when using round in hive
- Error reporting when importing sqoop from Mysql to hive
- Java connection zookeeper high availability hive error
- Error while instantiating ‘org.apache.spark.sql.hive.HiveExternalCatalog’:
- hive is not allowed to impersonate anonymous
- Hive: How to Solve dearby database initialization error
- Centos7 view and close firewall
- Beeline connection hive2 reports an error permission denied