# Data Re-run Function
# I. Overview
The data re-run function is a function of querying the data in the TA database through SQL statements, and then entering the returned results into the TA database to generate new events or new user features.
# II. Instructions for Use
# 2.1 Command Description
Log in to any TA server, execute the su - ta
command, and switch to the TA user.
Execute ta-tool user_event_import -conf
to read the configuration file, the command is as follows:
ta-tool user_event_import -conf <config文件> [--date 数据日期]
# 2.2 Command Parameter Description
# 2.2.1 -conf
Must pass, the parameter is the path corresponding to the data re-run task configuration file, which supports wild-card methods, such as: /data/config/*
or ./config/* .json
# 2.2.1 --date
Optionally, the parameter represents the data date, the time macro will be replaced based on this reference time, can not pass, not pass the default to take the current date, the format is YYYY-MM-DD
, the specific use of the time macro, you can refer to the time macro use method
# 2.2.3 Example
ta-tool user_event_import -conf /data/home/ta/import_configs/*.json
The parameter is the full path of the configuration file, which supports reading multiple configuration files using wild-card
# 2.3 Description of Configuration File
# 2.3.1 The sample configuration file is as follows:
The core of the data re-run function is a configuration file containing query statements and configuration parameters. A configuration file corresponds to a data re-run task, and the configuration file for a backtracking event is as follows:
{
"event_desc": {
"ltv_event": "User Life Cycle"
},
"appid": "APPID",
"type": "event",
"property_desc": {
"register_date": "registration date",
"date_prop": "LTV Days"
},
"sql": "SELECT 'thinkinggame' \"#account_id\",'ltv_event' \"#event_name\",register_date \"#time\",register_date,ltv,date_prop FROM (SELECT recharge_money ltv,register_date,CASE date_trunc('day', cast(register_date AS TIMESTAMP)) WHEN CURRENT_DATE - interval '1' DAY THEN '次日' WHEN CURRENT_DATE - interval '2' DAY THEN '三日' WHEN CURRENT_DATE - interval '6' DAY THEN '七日' WHEN CURRENT_DATE - interval '13' DAY THEN '十四日' WHEN CURRENT_DATE - interval '29' DAY THEN '三十日' ELSE NULL END date_prop FROM (SELECT sum(recharge_money) recharge_money ,register_date FROM (SELECT \"#user_id\" ,sum(recharge_value) recharge_money ,\"$part_date\" recharge_date FROM v_event_0 WHERE \"$part_event\" = 'recharge' AND \"$part_date\" > '2018-06-30' AND \"$part_date\" < '2018-07-30' GROUP BY \"#user_id\" , \"$part_date\") a LEFT JOIN (SELECT \"#user_id\" , \"$part_date\" register_date FROM v_event_0 WHERE \"$part_event\" = 'player_register' AND \"$part_date\" > '2018-06-30' AND \"$part_date\" < '2018-07-30') b ON a.\"#user_id\" = b.\"#user_id\" WHERE b.\"#user_id\" IS NOT NULL AND recharge_date >= register_date GROUP BY register_date) c) d WHERE date_prop IS NOT NULL"
}
::: tips
When you need to backtrack the List type, because the underlying storage list type is string separated by\ t, you need to do the following:
split("arrayColumn", chr(0009)) as arrayColumn.
:::
# 2.3.2 Description of configuration parameters
Each configuration file is represented as JSON, and the following is the meaning of each element:
event_desc
:- Description: Optional configuration, JSON object, to set the display name of the new event
- Key: event name
- Value: Display name
appid
:- Description: The APPID of the target item that must be configured to write the query result
type
:- Description: Must configure whether to write the
event
table or theuser
table of the target item - Available values: event, user
- Description: Must configure whether to write the
property_desc
:- Description: Optional configuration, JSON object, to set the display name of the attribute name
- Key: attribute name
- Value: Display name
sql
:- Description: You must configure string for the query statement. Please note that the column name of the returned result will determine the specific meaning of the column data, which must have the following:
- Necessary column name 1:
#account_id
or#distinct_id
, at least one of which should correspond to the account ID and anonymous ID of the triggering user. If the generated event is not triggered by an individual, such as LTV in the above example, it is recommended to use a fixed value outside the ID rule, such as 'system', 'admin', etc - Necessary column name 2:
#event_name
, event type, recommended setting value - Necessary column name 3:
#time
, when the event occurred, the format must beyyy-MM-dd HH: mm: ss
oryyy-MM-dd HH: mm: ss. SSS
- Necessary column name 1:
- Description: You must configure string for the query statement. Please note that the column name of the returned result will determine the specific meaning of the column data, which must have the following:
In addition to the above column names, the data of the remaining columns will be used as the attributes of the event, and the column name is the attribute name.
In addition to backtracking events, it is also supported to generate new user features or overwrite existing user features through query results. The configuration file is as follows:
{
"appid": "8d1820678a064397bbfcc9732f352e75",
"type": "user",
"property_desc": {
"user_level": "User Level",
"coin_num": "Gold coin stock"
},
"sql": "select \"#account_id\",localtimestamp \"#time\",user_level,coin_num from v_user_0"
}
Similar to the backtrace event, its configuration file is also represented by JSON, which is different from the configuration file of the backtrace event as follows:
- No need for
event_desc
Type
is 'user'- The necessary column names in sql do not require
#event_name
, only the following two:- Necessary column 1:
#account_id
or#distinct_id
, at least one of them - Necessary column 2:
#time
, indicating time
- Necessary column 1:
# 2.4 Time Macro Use
You can use time macros to replace time parameters inside the data re-run task configuration file. When executing the data re-run command, the ta-tool tool will use --date
as a benchmark to calculate the offset of time based on the parameters of the time macro, and replace the time macro in the configuration file. Supported time macro formats: @[{yyyyMMdd}]
, @[{yyyMMdd} - {nday}]
, @[{yyyMMdd} + {nday} ]
Wait
YyyyMMdd
can be replaced with any date format that can be parsed JavadateFormat
, for example:yyyy-MM-dd HH: mm: ss. SSS
,yyyyMMddHH000000
- N can be any integer representing the offset of time
- Day represents the offset unit of time, which can be taken as follows:
day
,hour
,minute
,week
,month
- Example: Suppose the current time is
2018-07-01 15:13:23.234
@[{yyyyMMdd}]
Replace with20180701
@[{yyyy-MM-dd} - {1day}]
Replace with2018-06-31
@[{yyyyMMddHH} + {2hour}]
Replace with2018070117
@[{yyyyMMddHmm00} - {10minute}]
Replace with20180701150300