您當前的位置:首頁 > 遊戲

Hive QL場景題第五部分詳解

作者:由 Dream丶aju 發表于 遊戲時間:2021-10-05

21、使用sql實現如下:

樣例資料: t1表

gender,cookie,ip,timestampe,ua

F,1707041428491566106,111。200。195。186,1208524973899,Dalvik%2F2。1。0%20%28Linux%3B%20U%3B%20Android

。。。具體資料如下圖

將圖片中的awk修改為使用sql編寫,然後將上訴題作出回答?

統計pv/uv的使用sql,其它問題語言描述即可。

22、使用hive求出兩個資料集的差集?

資料 diff_t1表:

id

name

1

zs

2

ls

diff_t2表:

id

name

1

zs

3

ww

結果如下:

id

name

2

ls

3

ww

create

table

diff_t1

id

string

name

string

row

format

delimited

fields

terminated

by

‘ ’

load

data

local

inpath

‘。/hivedata/diff_t1。txt’

overwrite

into

table

diff_t1

create

table

diff_t2

id

string

name

string

row

format

delimited

fields

terminated

by

‘ ’

load

data

local

inpath

‘。/hivedata/diff_t2。txt’

overwrite

into

table

diff_t2

分析:求差值

select

t1

id

id

t1

name

name

from

diff_t1

t1

left

join

diff_t2

t2

on

t1

id

=

t2

id

where

t2

id

is

null

union

select

t2

id

id

t2

name

name

from

diff_t1

t1

right

join

diff_t2

t2

on

t1

id

=

t2

id

where

t1

id

is

null

result

2

ls

3

ww

23、使用hive的hql實現如下需求

現有某網站購買記錄欄位如下

orderid

userid

productid

price

timestamp

date

121

張三

3

100

1535945356

2018

-

08

-

07

122

張三

3

200

1535945356

2018

-

08

-

08

123

李四

3

200

1535945356

2018

-

08

-

08

124

王五

1

200

1535945356

2018

-

08

-

08

125

張三

3

200

1535945356

2018

-

08

-

09

126

張三

2

200

1535945356

2018

-

08

-

09

127

李四

3

200

1535945356

2018

-

08

-

09

128

李四

3

200

1535945356

2018

-

08

-

10

129

李四

3

200

1535945356

2018

-

08

-

11

用sql統計今日及昨日都購買過商品productid為3的使用者及其昨日消費。

drop

table

product

create

table

product

orderid

string

userid

string

productid

int

price

int

tamp

int

dt

date

row

format

delimited

fields

terminated

by

‘,’

load

data

local

inpath

‘。/hivedata/product。txt’

overwrite

into

table

product

分析:統計今天和昨天都買了某一商品,

——-使用者

count

()

和昨日的消費

???

select

*

from

select

userid

dt

lag

dt

1

over

partition

by

userid

order

by

dt

yestoday

case

when

datediff

dt

lag

dt

1

over

partition

by

userid

order

by

dt

))

=

1

then

lag

total_price

over

partition

by

userid

order

by

dt

else

null

end

yestoday_price

from

select

userid

dt

productid

sum

price

over

partition

by

userid

dt

order

by

dt

total_price

from

product

order

by

userid

dt

A

where

productid

=

3

B

where

yestoday_price

id

not

null

24、使用hive的hql如下:

表user_action_log使用者行為故據

uid

time

action

1

Time1

Read

3

Time2

Comment

1

Time3

Share

2

Time4

Like

1

Time5

Write

2

Time6

like

3

Time7

Write

2

Time8

Read

分析使用者行為習慣找到毎一個使用者在表中的第一次行為

drop

table

user_action_log

create

table

user_action_log

uid

int

time

string

action

string

row

format

delimited

fields

terminated

by

‘\t’

load

data

local

inpath

‘。/hivedata/user_action_log。txt’

overwrite

into

table

user_action_log

使用程式碼實現

分析:

每一個使用者

group

by

uid

表中的第一次行為

select

uid

time

action

from

select

uid

time

action

row_number

()

over

partition

by

uid

order

by

time

rn

from

user_action_log

A

where

rn

=

1

result

1

Time1

Read

2

Time4

Like

3

Time2

Comment

25、每個使用者連續登陸的最大天數?

資料: user_login表

uid

dt

1

2019

-

08

-

01

1

2019

-

08

-

02

1

2019

-

08

-

03

2

2019

-

08

-

01

2

2019

-

08

-

02

3

2019

-

08

-

01

3

2019

-

08

-

03

4

2019

-

07

-

28

4

2019

-

07

-

29

4

2019

-

08

-

01

4

2019

-

08

-

02

4

2019

-

08

-

03

結果如下:

uid

cnt_days

1

3

2

2

3

1

4

3

create

table

user_login

uid

int

dt

date

row

format

delimited

fields

terminated

by

‘,’

load

data

local

inpath

‘。/hivedata/user_login。txt’

overwrite

into

table

user_login

分析:

透過使用者進行分組,

求連續登入的最大天數

select

uid

max

cnt

from

select

uid

dt_sub

count

1

cnt

from

select

uid

dt

date_sub

dt

row_number

()

over

partition

by

uid

order

by

dt

))

dt_sub

from

user_login

A

group

by

uid

dt_sub

B

group

by

uid

rusult

OK

1

3

2

2

3

1

4

3

標簽: 08  2019  dt  200  1535945356