Icon Créer jeu Créer jeu

Mastering Hive Table Design

Compléter

Implement and get drilled on Hive Table design problems.

Téléchargez la version pour jouer sur papier

11 fois fait

Créé par

United States

Top 10 résultats

  1. 1
    19:03
    temps
    69
    but
Voulez-vous apparaître dans le Top 10 de ce jeu? pour vous identifier.
Créez votre propre jeu gratuite à partir de notre créateur de jeu
Affrontez vos amis pour voir qui obtient le meilleur score dans ce jeu

Top Jeux

  1. temps
    but
  1. temps
    but
temps
but
temps
but
 
game-icon

Compléter

Mastering Hive Table DesignVersion en ligne

Implement and get drilled on Hive Table design problems.

par Good Sam
1

department INT ROW TABLE FIELDS TERMINATED BY TEXTFILE INT age STRING STRING FORMAT DELIMITED ',' STORED AS employees name id CREATE

Practice Problem #1 - Create a simple Hive Table :

Create a table named employees with four columns ( id , name , age , department ) . The ROW FORMAT DELIMITED clause specifies how Hive should interpret data to fit into this table schema .

(
,
,
,

)


;

2

INT BIGINT '/path/to/user/activity/logs' INTO user_activity_logs user_id CREATE STORED LOCATION user_id 32 PARTITIONED BY TABLE CLUSTERED ORC STRING BUCKETS AS STRING timestamp BY activity_details activity_type

Practice Problem #2 - Design a Hive Table :


Let's say you're given a dataset containing user activity logs with fields : timestamp , user_id , activity_type , and activity_details . Design a Hive table to store this data , partitioned by activity_type and optimized for querying by user_id .

(
,
,

)
( )
( )

;

3

STRING INT INT LOCATION product_id review_date user_id EXTERNAL STORED review_text product_reviews BY TABLE STRING '/path/to/product/reviews' ORC AS PARTITIONED review_id INT rating CREATE INT

Practice Problem #3 :


Given a dataset of product reviews with fields : review_id , product_id , review_text , user_id , rating , and review_date ( in YYYY - MM - DD format ) , design a Hive table to store this data , optimized for querying reviews by product and date . Think about how you would partition and store the table .

(
,
,
,

)
(
,

)

;

4

10 CREATE transaction_date 2 STORED PARQUET transaction_id INT DECIMAL DATE PARTITIONED BY transaction_amount daily_transactions TABLE INT user_id AS

Practice Problem #4 - Daily Transaction Logs : Design a Hive table for the scenario


Scenario : You have daily transaction logs containing transaction_id , user_id , transaction_amount , and transaction_date .

(
,
,
( , )
)
( )
;

5

STORED login_month TIMESTAMP login_id BY CREATE login_timestamp INT TABLE INTO login_timestamp login_id login_timestamp AS PARTITION FROM AS user_id AS login_history '/path/to/login/history' TIMESTAMP logout_timestamp ORC user_id TIMESTAMP logout_timestamp INT login_history STORED user_id login_timestamp INT login_month EXTERNAL SELECT TIMESTAMP STRING login_month TABLE PARTITIONED INSERT login_id login_history_staging ORC TABLE login_history_staging CREATE INT date_format logout_timestamp LOCATION 'yyyy-MM'

Practice Problem #5 - User Login History : Design a Hive table for the scenario


Scenario : Track user login history with login_id , user_id , login_timestamp , and logout_timestamp , optimizing for queries on monthly login activity .

Solution :

- - Staging table creation
(
,
,
,

)

;

- - Main table creation with partitioning
(
,
,
,

)
( )
;

- - Data insertion from staging to main table
( )

,
,
,
,
( , )
;

6

inventory_count CREATE STRING EXTERNAL store_location STORED LOCATION PARTITIONED BY DATE '/path/to/inventory' product_inventory last_update_date product_id INT ORC AS INT TABLE

Practice Problem #6 - Product Inventory : Design a Hive table for the scenario


Scenario : Store product inventory records including product_id , store_location , inventory_count , and last_update_date , optimized for querying inventory by location .

Solution :

(
,
,

)
( )

;

7

received_date STRING INT STRING TEXTFILE TABLE CREATE DATE feedback_id customer_id message STORED AS category customer_feedback PARTITIONED BY INT

Practice Problem #7 - Customer Feedback Messages : Design a Hive table for the scenario


Scenario : Manage customer feedback with feedback_id , customer_id , message , category , and received_date , optimized for reviewing feedback by category and date .

Solution :

(
,
,

)
( , )
;

8

TABLE ORC sale_date STRING sale_amount region DECIMAL BY sales_records CREATE 10 INT INT sale_id 2 DATE PARTITIONED product_id AS STORED

Practice Problem #8 - Sales Records with Geography : Design a Hive table for the scenario


Scenario : Analyze sales records with sale_id , product_id , sale_amount , sale_date , and region , needing frequent access by region and specific dates .

(
,
,
( , )
)
( , )
;

9

INT INTO 10,2 STRING account_id amount BUCKETS DECIMAL DATE CLUSTERED BY transaction_date INT CREATE 100 transaction_id STORED AS financial_transactions account_id PARQUET PARTITIONED BY TABLE transaction_type

Problem #9 : Financial Transactions ( Parquet )

Scenario : You are tasked with managing a dataset of financial transactions that includes transaction_id , account_id , amount , transaction_type , and transaction_date . You need efficient querying by account_id and transaction_date .

Solution :

(
,
,
( ) ,

)
( )
( )
;

10

CREATE PARTITIONED BY INT DATE email INT EXTERNAL customer_profiles STORED year LOCATION AS customer_id STRING STRING TABLE '/path/to/customer/profiles' name signup_date AVRO

Problem #10 : Customer Profiles ( Avro )
Scenario : You need to store customer profile data including customer_id , name , email , signup_date , and last_login . The data must support evolving schemas as new fields might be added in the future .

Solution :

(
,
,
,

)
( )

;

11

ORC event_type TABLE STORED AS STRING CREATE event_id user_id STRING INT PARTITIONED BY event_date INT event_details event_logs DATE

Problem #11 : Event Logs ( Orc )
Scenario : Design a table to manage web event logs with fields : event_id , user_id , event_type , event_details , and event_date . You expect frequent complex queries involving multiple fields .

Solution :

(
,
,
,

)
( )
;

12

INT EXTERNAL DECIMAL JSON marketing_campaigns campaign_id PARTITIONED CREATE STORED AS start_year TABLE LOCATION '/path/to/marketing/campaigns' 10,2 BY campaign_name STRING INT budget

Problem #12 : Marketing Campaign Data ( JSON )
Scenario : Store marketing campaign data including campaign_id , campaign_name , start_date , end_date , and budget . The data is occasionally queried by marketing analysts who prefer readable format for ad - hoc queries .

Solution :

(
,
,
( )
)
( )

;

13

study_field PARTITIONED BY DATE record_id researcher_id research_data STRING INT data CREATE entry_date STORED AS INT TEXTFILE STRING TABLE

Problem #13 : Research Data ( TEXTFILE )
Scenario : Store research data including record_id , researcher_id , study_field , data , and entry_date . Data is primarily textual and occasionally accessed .

Solution :

(
,
,
,

)
( )
;

14

STORED department_name STORED AS CONSTRAINT PRIMARY fk_dept pk_user department_id user_id STRING INT CONSTRAINT FOREIGN CREATE KEY STRING AS users INT department_id user_name department_id INT user_id TABLE REFERENCES pk_dept departments ORC ORC KEY TABLE departments KEY department_id PRIMARY CREATE CONSTRAINT department_id

Problem #14 : Implementing Constraints
Scenario : Design a table to store user information with a unique user_id and a reference to a department_id from a departments table .

Solution :

(
,
,
( )
) ;

(
,
,
,
( ) ,
( ) ( )
) ;

15

price price COLUMNS INT CHANGE TABLE DECIMAL 2 ALTER products products TABLE 10 ADD ALTER category_id COLUMN

Problem #15 : Table Schema Modification
Scenario : You already have a products table and need to add a new column category_id and change the data type of the existing price column .

Solution :

( ) ;
( , ) ;

16

SELECT GROUP sales FROM AVG sales_amount category_id BY INSERT category_id TABLE OVERWRITE sales_summary

Problem #16 : Hive SQL Query
Scenario : Calculate and update the average sales for each product category in a sales_summary table .

Solution :


, ( )

;

17

TABLE '/path/to/transactions.csv' INPATH transactions INTO DATA LOAD

Problem #17 : Loading Data into Hive Table
Scenario : Load data into a transactions table from a CSV file located in HDFS .

Solution :

;

18

s.department_id = d.department_id AS ON BY sales SUM JOIN SELECT GROUP departments d.department_name s.amount FROM d.department_name total_sales

Problem #18 : Filtering , Aggregation , and Join
Scenario : Retrieve the total sales by department from a sales table and a departments table .

Solution :

, ( )
s
d
;

19

daily_total BY transaction_date AS CREATE TABLE SELECT transaction_date AS sales SUM temp_daily_sales GROUP amount TEMPORARY FROM

Problem #19 : Temporary Tables
Scenario : Create a temporary table to hold daily sales data for analysis within a session .

Solution :


, ( )

;

20

age customers SELECT customer_demographics VIEW CREATE FROM AS region customer_name

Problem #20 : Creating and Using Views
Scenario : Create a view to simplify access to customer demographics data without exposing sensitive details like personal IDs or payment methods .

Solution :


, ,
;

21

TBLPROPERTIES STRING url' AVRO id 'hdfs INT TABLE 'avro path/to/schema/file' AS schema STORED CREATE name

Problem #21 : Configuring Schema Evolution for Avro

1 . Avro
Avro format supports schema evolution out of the box with Hive . When using Avro , the schema is stored with the data , which helps Hive manage changes seamlessly . However , to explicitly enable and manage Avro schema evolution , you can use table properties like the following :

avro_table (
,

)

( . . = : / / ) ;

22

STRING schema sensitive' case 'orc INT schema orc evolution exec STORED TBLPROPERTIES AS hive 'false' SET strategy SET id ETL 'orc CREATE evolution 'true' column split renames TABLE first_name exec true allowed' hive ORC orc

Problem #22 : Configuring Schema Evolution for ORC

ORC supports schema evolution through its columnar format and metadata storage capabilities . To manage schema changes , you might need to adjust the following Hive configuration settings :

. . . . = ;
. . . . = ;

hive . exec . orc . split . strategy : Setting this to ETL optimizes reading of ORC files that might have evolved schemas .

hive . exec . orc . schema . evolution : Enabling this allows Hive to handle changes in the ORC file schemas over time .

Additionally , when creating ORC tables , consider enabling column renaming as part of schema evolution :

orc_table (
,

)

( . . . . = , . . . = ) ;

23

name dictionary AS parquet SET INT TABLE enable STORED PARQUET true CREATE id STRING

Problem #23 : Configuring Schema Evolution for PARQUET

Parquet also supports schema evolution to a degree , especially with additions of new columns . To use Parquet effectively with schema evolution in Hive , ensure that your Hive version and settings align with Parquet ? s capabilities :

parquet_table (
,

)
;

For schema evolution in Parquet , the changes are mostly handled transparently by Hive , but you can ensure better management with configurations like :

. . = ;

educaplay suscripción