目录
此内容是否有帮助?

# TaDataWriter Plug-ins

# I. Introduction

TaDataWriter provides the ability for DataX to transfer data to the Ta cluster, and the data will be sent to the TA's receiver.

# II. Functions and Limitations

TaDataWriter implements the function of changing from DataX protocol to Ta cluster internal data. TaDataWriter is agreed in the following aspects:

  1. Supports and only supports writing to Ta clusters.
  2. Support data compression, the existing compression format is gzip, lzo, lz4, snappy.
  3. Support multi-threaded transmission.
  4. Supported and only supported on TA nodes.

# III. Function description

# 3.1 Sample Configuration

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "streamreader",
          "parameter": {
            "column": [
              {
                "value": "ABCDEFG-123-abc",
                "type": "string"
              },
              {
                "value": "F53A58ED-E5DA-4F18-B082-7E1228746E88",
                "type": "string"
              },
              {
                "value": "login",
                "type": "string"
              },
              {
                "value": "2020-01-01 01:01:01",
                "type": "date"
              },
              {
                "value": "abcdefg",
                "type": "string"
              },
              {
                "value": "2019-08-08 08:08:08",
                "type": "date"
              },
              {
                "value": 123456,
                "type": "long"
              },
              {
                "value": true,
                "type": "bool"
              }
            ],
            "sliceRecordCount": 1000
          }
        },
        "writer": {
          "name": "ta-data-writer",
          "parameter": {
            "type": "track",
            "appid": "34c703a885014208a737911748a7b51c",
            "column": [
              {
                "index": "0",
                "colTargetName": "#account_id",
                "type": "string"
              },
              {
                "index": "1",
                "colTargetName": "#distinct_id"
              },
              {
                "index": "2",
                "colTargetName": "#event_name"
              },
              {
                "index": "3",
                "colTargetName": "#time",
                "type": "date",
                "dateFormat": "yyyy-MM-dd HH:mm:ss.SSS"
              },
              {
                "index": "4",
                "colTargetName": "testString",
                "type": "string"
              },
              {
                "index": "5",
                "colTargetName": "testDate",
                "type": "date",
                "dateFormat": "yyyy-MM-dd HH:mm:ss.SSS"
              },
              {
                "index": "6",
                "colTargetName": "testLong",
                "type": "number"
              },
              {
                "index": "7",
                "colTargetName": "testBoolean",
                "type": "boolean"
              },
              {
                "colTargetName": "add_clo",
                "value": "addFlag",
                "type": "string"
              }
            ]
          }
        }
      }
    ]
  }
}

# 3.2 Parameter Description

  • type
    • Description: The data type written user_set, track.
    • Required: Yes
    • Default: None
  • appid
    • Description: The appid of the corresponding item.
    • Required: Yes
    • Default: None
  • thread
    • Description: Number of threads.
    • Required: No
    • Default: 3
  • compress
    • Description: Text compression type. Default non-filling means no compression. Supported compression types are zip, lzo, lzop, tgz, bzip2.
    • Required: No
    • Default: No compression
  • connType
    • Description: The way to accept data within the cluster, go receiver or send it directly to kafka.
    • Required: No
    • Default: http
  • column
    • Description: Read the list of fields, typespecifies the type of data, indexspecifies the current column corresponding to the reader(starting with 0), valuespecifies the current type as a constant, does not read data from the reader, but automatically generates the corresponding column according to the valuevalue.

The user can specify the Columnfield information, configured as follows:

[
  {
    "type": "Number",
    "colTargetName": "test_col", //Generate column names corresponding to data
    "index": 0 //Transfer the first column from reader to dataX to get the Number field
  },
  {
    "type": "string",
    "value": "testvalue",
    "colTargetName": "test_col" //Generate the string field of testvalue from within TaDataWriter as the current field
  },
  {
    "index": 0,
    "type": "date",
    "colTargetName": "testDate",
    "dateFormat": "yyyy-MM-dd HH:mm:ss.SSS"
  }
]
  • For user-specified Column information, index/ valuemust be selected, typeis not required, when setting the datetype, you can set the dataFormatis not required.
    • Must choose: Yes
    • Default: all read by reader type

# 3.3 Type Conversion

The type is defined as TaDataWriter:

DataX internal type
TaDataWriter data type
Int
Number
Long
Number
Double
Number
String
String
Boolean
Boolean
Date
Date