Tag Archives: Process Data

hive: How to Process Data (Delete, View, Query, and etc)

Operation command

Data preprocessing: Eliminate any fields in the data with null values

INSERT OVERWRITE TABLE result01
    select * from salary
    where userid is not null and dept_id is not null and salarys is not null

Eliminate values ​​other than 0-100 in the identity field

INSERT OVERWRITE TABLE result
    SELECT * FROM hittable
    WHERE identity between 0 and 100 ;

View total

select count(*) from item;

Query the total number of rows for the first product (Item01) that offers beer or wine

select count(*) as num from item where item='Beer' or item01 = 'Wine';

Query the ranking of the best-selling products in the first-purchased products (item01)

select item01,count(item01) as num from item 
group by item01 
order by num desc;

Query the probability of milk appearing in each row to buy offerings

select b.num/a.num as rate from(
select count(*) num from item) a,
(select count(*) num from item
where item01=='Milk' 
or item02=='Milk' 
or item02=='Milk'
or item03=='Milk'
or item04=='Milk') b ;