# External User Attribute Association Import Function
# I. Introduction
In some cases, you need to import external user data into the TA cluster, but the user ID in the data is not #account_id or #distinct_id in the TA system, for example, the data uses the mobile phone number, ID number or other identification ID as the primary key. If you need to import this part of data into the TA system as user features, you need to set the association relationship through update_user_by_foreignkey
commands to update the external user features to the TA system. Currently, all supported data sources of datax are supported:
# II. Instructions for Use
# 2.1 Command Description
The command for data import is as follows:
ta-tool update_user_by_foreignkey -conf <config files> [--date xxx]
# 2.2 Command Parameter Description
# 2.2.1 -conf
The incoming parameters are the configuration file path of the import task. Each task is a configuration file. It supports multiple tasks to be imported at the same time. It supports wild-card methods, such as: /data/config/*
or ./config/* .json
# 2.2.1 --date
Optional parameter ** --date **: Optional, the parameter indicates the data date, the time macro will be replaced based on this reference time, can not be transmitted, not the default to take the current date, the format is YYYY-MM-DD
, the specific use of the time macro, you can refer to the time macro usage
# 2.3 Time Macros Usage
Inside the configuration file, you can use time macros to replace time parameters. The ta-tool tool will use the imported start time as a benchmark, calculate the offset of time based on the parameters of the time macro, and replace the time macro in the configuration file. Supported time macro formats: @[{yyyyMMdd}]
, @[{yyyMMdd} - {nday}]
, @[{yyyMMdd} + {nday}]
etc
YyyyMMdd
can be replaced with any date format that can be parsed JavadateFormat
, for example:yyyy-MM-dd HH: mm: ss. SSS
,yyyyMMddHH000000
- N can be any integer representing the offset of time
- Day represents the offset unit of time, which can be taken as follows:
day
,hour
,minute
,week
,month
- Example: Suppose the current time is
2018-07-01 15:13:23.234
@[{yyyyMMdd}]
Replace with20180701
@[{yyyy-MM-dd} - {1day}]
Replace with2018-06-31
@[{yyyyMMddHH} + {2hour}]
Replace with2018070117
@[{yyyyMMddHmm00} - {10minute}]
Replace with20180701150300
# III. Function Description
# 3.1 Sample Configuration
{
"job": {
"content": [{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "username",
"password": "password",
"connection": [
{
"querySql": [
"SELECT card_id, property1, property2,property3 FROM table1;"
],
"jdbcUrl": [
"jdbc:mysql://ip:port/database"
]
}
]
}
},
"writer": {
"parameter": {
"appid": "6f9e64da5bc74792b9e9c1db4e3e3822",
"column": [{
"type": "string",
"name": "card_id"
},
{
"type": "string",
"name": "property1"
},
{
"type": "string",
"name": "property2"
},
{
"type": "double",
"name": "property3"
}
],
"joinkey":{
"importDataKey": ["card_id"],
"taUserTableKey": ["card_id"]
}
}
}
}]
}
}
# 3.2 Parameter Description
# 3.2.1 reader part
- The configuration of the reader is consistent with the reader supported by datax
# 3.2.2 writer part
- appid
- Description: The appid of the corresponding item.
- Must choose: Yes
- Default: None
- column
- Description: Read the list of fields,
type
specifies the type of data,name
specifies the column at the corresponding position of reader, and the attribute name when importing ta system.
- Description: Read the list of fields,
The user can specify the Column
field information, configured as follows:
[
{
"type": "double",
"name": "property1"
},
{
"type": "string",
"name": "property2"
},
{
"type": "bigint",
"name": "property3"
}
]
- joinkey.importDataKey
- Description: The column of the writer in the configuration information is used as the associated column name.
- Must choose: Yes
- Default: None
- joinkey.taUserTableKey
- Description: The user table in the TA system is used as the associated column name.
- Must choose: Yes
- Default: None
# 3.3 Type Conversion
DataX internal type | HIVE data type |
---|---|
Long | TINYINT,SMALLINT,INT,BIGINT |
Double | FLOAT,DOUBLE |
String | STRING,VARCHAR,CHAR |
Boolean | BOOLEAN |
Date | DATE,TIMESTAMP |