# TaDataxWriterプラグイン

# TaDataxWriter Plug-in

# Introduction

Ta-Datax-Writer is a DataX plug-in for writing data, which provides the function of transferting data to TA clusters in the DataX ecosystem. You can deploy DataX on the data transmission server, and use the data source supported by DataX to read the plug-in and this plug-in, thus achieving data synchronization between multiple data sources and TA clusters.

To learn about DataX, you can visit DataX's Github homepage (opens new window)

The data is sent to the TA receiver for data transfer

# Functions and Limitations

TaDataWriter can convert the data from the DataX protocol to the internal data in the TA clusters. TaDataWriter has the following functions:

Support and only support writing to TA clusters.
Support data compression. Existing compression formats are gzip, lzo, lz4, snappy.
Support multi-thread transmission

# Instructions for Use

# 3.1 Download datax

Visit DataX official website (opens new window)
Download DataX toolkit: DataX download (opens new window)

wget http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

# 3.2 Decompres datax

tar -zxvf datax.tar.gz

# 3.3 Install ta-datax-writer plugin

Download ta-datax-writer plugin: ta-datax-writer download (opens new window)

wget https://download.thinkingdata.cn/tools/release/ta-datax-writer.tar.gz

copy ta-datax-writer.tar.gz datax/plugin/writer

cp ta-datax-writer.tar.gz datax/plugin/writer

Decompress the plugin package

tar -zxvf ta-datax-writer.tar.gz

Delete package

rm -rf  ta-datax-writer.tar.gz

# Function Description

# 4.1 Sample configuration

{
  "job": {
    "setting": {
      "speed": {
        "channel": 1
      }
    },
    "content": [
      {
        "reader": {
          "name": "streamreader",
          "parameter": {
            "column": [
              {
                "value": "123123",
                "type": "string"
              },
              {
                "value": "testbuy",
                "type": "string"
              },
              {
                "value": "2019-08-16 08:08:08",
                "type": "date"
              },
              {
                "value": "2222",
                "type": "string"
              },
              {
                "value": "2019-08-16 08:08:08",
                "type": "date"
              },
              {
                "value": "test",
                "type": "bytes"
              },
              {
                "value": true,
                "type": "bool"
              }
            ],
            "sliceRecordCount": 10
          }
        },
        "writer": {
          "name": "ta-datax-writer",
          "parameter": {
            "thread": 3,
            "type": "track",
            "pushUrl": "http://{data receiving address}",
            "appid": "6f9e64da5bc74792b9e9c1db4e3e3822",
            "column": [
              {
                "index": "0",
                "colTargetName": "#distinct_id"
              },
              {
                "index": "1",
                "colTargetName": "#event_name"
              },
              {
                "index": "2",
                "colTargetName": "#time",
                "type": "date",
                "dateFormat": "yyyy-MM-dd HH:mm:ss.SSS"
              },
              {
                "index": "3",
                "colTargetName": "#account_id",
                "type": "string"
              },
              {
                "index": "4",
                "colTargetName": "testDate",
                "type": "date",
                "dateFormat": "yyyy-MM-dd HH:mm:ss.SSS"
              },
              {
                "index": "5",
                "colTargetName": "os_1",
                "type": "string"
              },
              {
                "index": "6",
                "colTargetName": "testBoolean",
                "type": "boolean"
              },
              {
                "colTargetName": "add_clo",
                "value": "123123",
                "type": "string"
              }
            ]
          }
        }
      }
    ]
  }
}

# 4.2 Parameter description

thread
- Description: number of threads, used concurrently within each channel, not related to the number of channels in DataX.
- Required: No
- Default value: 3
pushUrl
- Description: access point address.
- Required: Yes
- Default value: none
uuid
- Description: Add "#uuid": "uuid value" in the transferred data, and enable it with the data unique ID (opens new window) function.
- Required: No
- Default value: false
type
- Description: written data type user_set, track.
- Required: Yes
- Default value: none
compress
- Description: text compression type. By default, non-filling means no compression. The compression type is zip, lzo, lzop, tgz, bzip2.
- Required: No
- Default value: no compression
appid
- Description: project appid.
- Required: Yes
- Default value: none
column
- Description: read the list of fields. type specifies the type of data, index specifies the current column corresponding to reader (starting with 0). value specifies the current type as a constant, does not read data from reader, but automatically generates the corresponding column according to the value.

The user can specify the Column field information, configured as follows:

[
  {
    "type": "Number",
    "colTargetName": "test_col", //generate the column names corresponding to the data
    "index": 0 //transfer the first column from reader to dataX to get the Number field
  },
  {
    "type": "string",
    "value": "testvalue",
    "colTargetName": "test_col"
    //generate the string field of testvalue from TaDataWriter as the current field
  },
  {
    "index": 0,
    "type": "date",
    "colTargetName": "testdate",
    "dateFormat": "yyyy-MM-dd HH:mm:ss.SSS"
  }
]

4.3 Array type description

When using the array type, the data at the read end is required to be of string type, split by \t
- Sample data on read end: "aaa\tbbb\tccc\tddd"
- Converted result: ["aaa","bbb","ccc","ddd"]

# 4.3 Type conversion

The type is defined by TaDataWriter:

DataX internal type	TaDataWriter data type
Int	Number
Long	Number
Double	Number
String	String
Boolean	Boolean
Date	Date
Array	Array

← TaCustomReaderプラグインイベント分析モデルAPI →