Hive SQL syntax error and corresponding solutions

Hive SQL syntax is different from the frequently used MySQL syntax. SQL written according to the habit of writing MySQL often reports errors, and it is difficult to see the cause of the problem. Therefore, this paper records the phenomenon of the problem and the solution

If you don’t find any problem with the alias: select from error ‘> select from error = 4200* From a) treror: error while compiling statement: failed: semanticexception [error 10025]: expression not in group by key ID (state = 42000, code = 10025)
cause: fields in the select statement but not in the group by statement will cause the error
solution: change the select id, name from a group by name to select collect_ Set (ID), name from a group by NameError: error while compiling statement: failed: semanticexception [error 10004]: Line 1:13 invalid table alias or column reference ‘ID’:
cause: the corresponding field in the subquery statement has changed, such as using a function or renaming
solution: select id, name from (select collect)_ Set (ID), name from a group by name) t “is changed to” select id, name from (select collect)_ Set (ID) id, name from a group by name) t or select t.id, name from (select collect)_ Set (ID), name from a group by name) tproblem: unable to query data after hive multiple SQL unions
cause: the data after union is saved in HDFS to multiple new directories under the table directory
solution: add configuration (which can be directly input on the CLI command line) set mapred.input.dir .recursive=true;
Or use a select statement to package multiple union statements and then execute hsql on tez to report an error. Out of memory
needs to adjust the size of the container
set hive.tez.container .size=4096;
set hive.tez.java . opts = – xmx3072m; hive does not query subdirectories recursively by default, so when creating a table, if there are subdirectories in the specified directory, it will report ERROR:not a file
You can perform the following four configurations in hive cli to enable recursive access to subdirectories in the callback. Instead of recursive query, all the data under the directory will be loaded in. Therefore, when the subdirectories are very deep or there are many subdirectories, the speed will be very slow.
set hive.input.dir .recursive=true; 
set hive.mapred.supports .subdirectories=true; 
set hive.supports.subdirectories=true ; 
set mapred.input.dir .recursive=true;

Read More: