目录
此内容是否有帮助?

# External User Attribute Association Import Function

# I. Introduction

In some cases, you need to import external user data into the TA cluster, but the user ID in the data is not #account_id or #distinct_id in the TA system, for example, the data uses the mobile phone number, ID number or other identification ID as the primary key. If you need to import this part of data into the TA system as user features, you need to set the association relationship through update_user_by_foreignkeycommands to update the external user features to the TA system. Currently, all supported data sources of datax are supported:

# II. Instructions for Use

# 2.1 Command Description

The command for data import is as follows:

ta-tool update_user_by_foreignkey -conf <config files> [--date xxx]

# 2.2 Command Parameter Description

# 2.2.1 -conf

The incoming parameters are the configuration file path of the import task. Each task is a configuration file. It supports multiple tasks to be imported at the same time. It supports wild-card methods, such as: /data/config/*or ./config/* .json

# 2.2.1 --date

Optional parameter ** --date **: Optional, the parameter indicates the data date, the time macro will be replaced based on this reference time, can not be transmitted, not the default to take the current date, the format is YYYY-MM-DD, the specific use of the time macro, you can refer to the time macro usage

# 2.3 Time Macros Usage

Inside the configuration file, you can use time macros to replace time parameters. The ta-tool tool will use the imported start time as a benchmark, calculate the offset of time based on the parameters of the time macro, and replace the time macro in the configuration file. Supported time macro formats: @[{yyyyMMdd}], @[{yyyMMdd} - {nday}], @[{yyyMMdd} + {nday}]etc

  • YyyyMMddcan be replaced with any date format that can be parsed Java dateFormat, for example: yyyy-MM-dd HH: mm: ss. SSS, yyyyMMddHH000000
  • N can be any integer representing the offset of time
  • Day represents the offset unit of time, which can be taken as follows: day, hour, minute, week, month
  • Example: Suppose the current time is 2018-07-01 15:13:23.234
    • @[{yyyyMMdd}]Replace with 20180701
    • @[{yyyy-MM-dd} - {1day}]Replace with 2018-06-31
    • @[{yyyyMMddHH} + {2hour}]Replace with 2018070117
    • @[{yyyyMMddHmm00} - {10minute}]Replace with 20180701150300

# III. Function Description

# 3.1 Sample Configuration

{
	"job": {
		"content": [{
			 "reader": {
                    "name": "mysqlreader",
                    "parameter": {
                        "username": "username",
                        "password": "password",
                        "connection": [
                            {
                                "querySql": [
                                    "SELECT card_id, property1, property2,property3 FROM table1;"
                                ],
                                "jdbcUrl": [
                                    "jdbc:mysql://ip:port/database"
                                ]
                            }
                        ]
                    }
                },
			"writer": {
				"parameter": {
                    "appid": "6f9e64da5bc74792b9e9c1db4e3e3822",
					"column": [{
							"type": "string",
							"name": "card_id"
						},
						{
							"type": "string",
							"name": "property1"
						},
						{
							"type": "string",
							"name": "property2"
						},
						{
							"type": "double",
							"name": "property3"
						}
					],
						"joinkey":{
					      "importDataKey": ["card_id"],
						  "taUserTableKey": ["card_id"]
					}
				}
			}
		}]
	}
}

# 3.2 Parameter Description

# 3.2.1 reader part

  • The configuration of the reader is consistent with the reader supported by datax

Refer to datax doc

# 3.2.2 writer part

  • appid
    • Description: The appid of the corresponding item.
    • Must choose: Yes
    • Default: None
  • column
    • Description: Read the list of fields, typespecifies the type of data, namespecifies the column at the corresponding position of reader, and the attribute name when importing ta system.

The user can specify the Columnfield information, configured as follows:

[
  {
    "type": "double",
    "name": "property1"
  },
  {
    "type": "string",
    "name": "property2"
  },
  {
    "type": "bigint",
    "name": "property3"
  }
]
  • joinkey.importDataKey
    • Description: The column of the writer in the configuration information is used as the associated column name.
    • Must choose: Yes
    • Default: None
  • joinkey.taUserTableKey
    • Description: The user table in the TA system is used as the associated column name.
    • Must choose: Yes
    • Default: None

# 3.3 Type Conversion

DataX internal type
HIVE data type
Long
TINYINT,SMALLINT,INT,BIGINT
Double
FLOAT,DOUBLE
String
STRING,VARCHAR,CHAR
Boolean
BOOLEAN
Date
DATE,TIMESTAMP