Hive QL場景題第五部分詳解
21、使用sql實現如下:
樣例資料: t1表
gender,cookie,ip,timestampe,ua
F,1707041428491566106,111。200。195。186,1208524973899,Dalvik%2F2。1。0%20%28Linux%3B%20U%3B%20Android
。。。具體資料如下圖
將圖片中的awk修改為使用sql編寫,然後將上訴題作出回答?
統計pv/uv的使用sql,其它問題語言描述即可。
22、使用hive求出兩個資料集的差集?
資料 diff_t1表:
id
name
1
zs
2
ls
diff_t2表:
id
name
1
zs
3
ww
結果如下:
id
name
2
ls
3
ww
create
table
diff_t1
(
id
string
,
name
string
)
row
format
delimited
fields
terminated
by
‘ ’
;
load
data
local
inpath
‘。/hivedata/diff_t1。txt’
overwrite
into
table
diff_t1
;
create
table
diff_t2
(
id
string
,
name
string
)
row
format
delimited
fields
terminated
by
‘ ’
;
load
data
local
inpath
‘。/hivedata/diff_t2。txt’
overwrite
into
table
diff_t2
;
分析:求差值
select
t1
。
id
id
,
t1
。
name
name
from
diff_t1
t1
left
join
diff_t2
t2
on
t1
。
id
=
t2
。
id
where
t2
。
id
is
null
union
select
t2
。
id
id
,
t2
。
name
name
from
diff_t1
t1
right
join
diff_t2
t2
on
t1
。
id
=
t2
。
id
where
t1
。
id
is
null
result
:
2
ls
3
ww
23、使用hive的hql實現如下需求
現有某網站購買記錄欄位如下
orderid
,
userid
,
productid
,
price
,
timestamp
,
date
121
,
張三
,
3
,
100
,
1535945356
,
2018
-
08
-
07
122
,
張三
,
3
,
200
,
1535945356
,
2018
-
08
-
08
123
,
李四
,
3
,
200
,
1535945356
,
2018
-
08
-
08
124
,
王五
,
1
,
200
,
1535945356
,
2018
-
08
-
08
125
,
張三
,
3
,
200
,
1535945356
,
2018
-
08
-
09
126
,
張三
,
2
,
200
,
1535945356
,
2018
-
08
-
09
127
,
李四
,
3
,
200
,
1535945356
,
2018
-
08
-
09
128
,
李四
,
3
,
200
,
1535945356
,
2018
-
08
-
10
129
,
李四
,
3
,
200
,
1535945356
,
2018
-
08
-
11
用sql統計今日及昨日都購買過商品productid為3的使用者及其昨日消費。
drop
table
product
;
create
table
product
(
orderid
string
,
userid
string
,
productid
int
,
price
int
,
tamp
int
,
dt
date
)
row
format
delimited
fields
terminated
by
‘,’
;
load
data
local
inpath
‘。/hivedata/product。txt’
overwrite
into
table
product
;
分析:統計今天和昨天都買了某一商品,
——-使用者
count
()
和昨日的消費
???
select
*
from
(
select
userid
,
dt
,
lag
(
dt
,
1
)
over
(
partition
by
userid
order
by
dt
)
yestoday
,
case
when
datediff
(
dt
,
lag
(
dt
,
1
)
over
(
partition
by
userid
order
by
dt
))
=
1
then
lag
(
total_price
)
over
(
partition
by
userid
order
by
dt
)
else
null
end
yestoday_price
from
(
select
userid
,
dt
,
productid
,
sum
(
price
)
over
(
partition
by
userid
,
dt
order
by
dt
)
total_price
from
product
order
by
userid
,
dt
)
A
where
productid
=
3
)
B
where
yestoday_price
id
not
null
;
24、使用hive的hql如下:
表user_action_log使用者行為故據
uid
time
action
1
Time1
Read
3
Time2
Comment
1
Time3
Share
2
Time4
Like
1
Time5
Write
2
Time6
like
3
Time7
Write
2
Time8
Read
分析使用者行為習慣找到毎一個使用者在表中的第一次行為
drop
table
user_action_log
;
create
table
user_action_log
(
uid
int
,
time
string
,
action
string
)
row
format
delimited
fields
terminated
by
‘\t’
;
load
data
local
inpath
‘。/hivedata/user_action_log。txt’
overwrite
into
table
user_action_log
;
使用程式碼實現
分析:
每一個使用者
group
by
uid
表中的第一次行為
select
uid
,
time
,
action
from
(
select
uid
,
time
,
action
,
row_number
()
over
(
partition
by
uid
order
by
time
)
rn
from
user_action_log
)
A
where
rn
=
1
;
result
:
1
Time1
Read
2
Time4
Like
3
Time2
Comment
25、每個使用者連續登陸的最大天數?
資料: user_login表
uid
,
dt
1
,
2019
-
08
-
01
1
,
2019
-
08
-
02
1
,
2019
-
08
-
03
2
,
2019
-
08
-
01
2
,
2019
-
08
-
02
3
,
2019
-
08
-
01
3
,
2019
-
08
-
03
4
,
2019
-
07
-
28
4
,
2019
-
07
-
29
4
,
2019
-
08
-
01
4
,
2019
-
08
-
02
4
,
2019
-
08
-
03
結果如下:
uid
cnt_days
1
3
2
2
3
1
4
3
create
table
user_login
(
uid
int
,
dt
date
)
row
format
delimited
fields
terminated
by
‘,’
;
load
data
local
inpath
‘。/hivedata/user_login。txt’
overwrite
into
table
user_login
;
分析:
透過使用者進行分組,
求連續登入的最大天數
select
uid
,
max
(
cnt
)
from
(
select
uid
,
dt_sub
,
count
(
1
)
cnt
from
(
select
uid
,
dt
,
date_sub
(
dt
,
row_number
()
over
(
partition
by
uid
order
by
dt
))
dt_sub
from
user_login
)
A
group
by
uid
,
dt_sub
)
B
group
by
uid
;
rusult
:
OK
1
3
2
2
3
1
4
3