# LogBus Windows User Guide
This section mainly introduces how to use the Windows version of the data transmission tool LogBus:
Before docking, you need to read the data rules first, and then read this user guide after you have been familiar with TA's data format and data rules.
LogBus must upload data in the data format of TA
# Download LogBus Windows
Latest version: 1.3.0
Update time: 2021-10-19
# I. Introduction to LogBus
LogBus is mainly used to import back-end log data to the TA background in real time. Its core working principle is similar to that of Flume. It will monitor the file flow in the log directory of the server. When any log file in the directory has new data, it will verify new data and send it to the TA background in real time.
We recommend the following types of users access data by using LogBus:
- Users who use server SDK and upload data through LogBus
- Users who have high requirements for accuracy and dimension of the data, whose data requirements cannot be met only depending on the client SDK, or for whom it is inconvenient to access the client SDK
- Users who don't want to develop the back-end data push process by themselves
- Users who need to transmit bulk historical data
# II. Data preparation before use
First, convert the to-be-transmitted data into TA's data formatby ETL, and write it locally or transmit it to the Kafka cluster. If the server SDKs such as Java are involved to generate Kafka or local file consumer, the data has been correctly formatted without further conversion.
Determine the directory where the uploaded data files are stored, or the address and topic of Kafka, and configure LogBus. In this case, LogBus will monitor the file changes in the file directory (monitor new files or tail existing files) or subscribe to data in Kafka.
Do not directly rename the uploaded data logs stored in the monitoring directory. Renaming the log is equivalent to creating a new file, and LogBus may re-upload these files, resulting in data duplication.
Since the LogBus data transmission component contains a data buffer, the LogBus directory may occupy a slightly larger disk. Ensure that the LogBus installation node has sufficient disk space, and at least 10G of storage space must be reserved for each data transmission to a project (that is, adding an APP_ID).
# III. Installation and update of LogBus
# 3.1 Install LogBus
Download and decompress the LogBus package (opens new window).
Decompressed directory structure:
bin: startup folder
conf: configuration file folder
lib: function folder
# IV. Parameter setting of LogBus
- Enter the decompressed
conf
directory containing the configuration filelogBus.conf.Template
. This file contains all configuration parameters of LogBus, and it can be renamed aslogBus.conf
during first use. - Open the
logBus.conf
file to set related parameters
# 4.1 Project and data source setting (required)
- Project APP_ID
APP_ID cannot be set repeatedly
##APPID from the token of tga's official website, please get the APPID of the implementation project on the project configuration page of TA background, fill it here, and split multiple Appids by ","
APPID=APPID_1,APPID_2
- Monitor file configuration (please choose any one, required)
# 4.1.1. When the data source is a local file
## For the path and file name of the data file read by ##LogBus (the file name supports fuzzy matching), you must have read permissions
## Different APPIDs are split by ",", and different directories under the same APPID are split by spaces
## The file name of TAIL_FILE supports wildcard matching
TAIL_FILE=C:/path1/dir*/log.*,C:/path3/txt.*
TAIL_FILE supports monitoring multiple files in multiple sub-directories from multiple paths
Corresponding parameter setting:
APPID=APPID1,APPID2
TAIL_FILE=C:/root/log_dir1/dir_*/log.* C:/root/log_dir*/log*/log.*,C:/test_log/*
Specific rules are as follows:
- Multiple monitoring paths under the same APP_ID are split by spaces
- The monitoring paths under different APP_IDs are split by comma ",", and the monitoring paths correspond to APP_IDs after being split by comma "
,
" - The directory in the monitoring path supports monitoring through wildcards
- The file name supports wildcard monitoring
- For path splitters, you may use "
/
" or "\\
", do not use "</code>", for example,
C:/root/.log
orC:\\root\\.log
Do not store the log files that need to be monitored in the root directory of the server.
# 4.1.2. When the data source is kafka
When KAFKA_TOPICS
needs to monitor multiple topics, you can split the topics by spaces; if there are multiple APP_IDs, you can split the topics under APP_IDs by ",".KAFKA_GROUPID
must be unique. KAFKA_OFFSET_RESET
is used to set the kafka.consumer.auto.offset.reset
parameter of Kafka. The possible values are earliest
and latest
, and the default setting is earliest
.
Note: The Kafka version of the data source must be 0.10.1.0 or higher
Example of single APP_ID:
APPID=appid1
######kafka configuration
#KAFKA_GROUPID=tga.group
#KAFKA_SERVERS=localhost:9092
#KAFKA_TOPICS=topic1 topic2
#KAFKA_OFFSET_RESET=earliest
Example of multiple APP_IDs:
APPID=appid1,appid2
######kafka configuration
#KAFKA_GROUPID=tga.group
#KAFKA_SERVERS=localhost:9092
#KAFKA_TOPICS=topic1 topic2,topic3 topic4
#KAFKA_OFFSET_RESET=earliest
# 4.2 Transmission parameter setting (required)
##Transmission setting
##Transmission url
##http transmission
PUSH_URL=https://global-receiver-ta.thinkingdata.cn/logbus
##If you enable privatization deployment, please modify the transmission URL to http://data acquisition address/logbus
##Maximum amount per transmission
#BATCH=10000
##Minimum transmission time interval (unit: second)
#INTERVAL_SECONDS=600
##Number of transmission threads, single thread by default is recommended under poor network conditions, and multi-thread will consume more memory and CPU resources
#NUMTHREAD=1
##Compression format for file transmission: gzip, snappy, none
#COMPRESS_FORMAT=none
# 4.3 Converter configuration (optional)
##The converter type temporarily is json csv regex splitter
#PARSE_TYPE=json
##Additional fixed attributes, format: name value, name1 value1
#LABELS=
##Property and type, PARSE_TYPE: csv regex splitter, format: name type, name1 type1
##Type: float int string date list bool
#SCHEMA=
##Specify the splitter, not empty when PARSE_TYPE is csv splitter
#SPLITTER=
##Specify the splitter of the list type, applicable when the list type exists, default,
#LIST_SPLITTER=,
##Regular expression, not empty when PARSE_TYPE is regex
#FORMAT_REGEX=
# 4.4 Monitor file deletion configuration (optional)
# Monitor directory file deletion, uncomment to enable the "delete file" function
# Only delete by day or hour
# UNIT_REMOVE=hour
# Files to delete
# OFFSET_REMOVE=20
# Frequency of deleting the monitoring files that have been uploaded
# FREQUENCY_REMOVE=60
# 4.5 Configuration file example
##################################################################################
## Thinkingdata data analysis platform transmission tool logBus configuration file
##Uncommented parameters are required, and commented ones are optional, which can be filled in at will
##Appropriate configuration
##Environmental requirements: java8, please refer to tga's official website for more detailed requirements
##http://doc.thinkinggame.cn/tdamanual/installation/logbus_installation.html
##################################################################################
##APPID from the token of tga's official website
##Different APPIDs are split by "," and cannot be configured repeatedly
APPID=from_tga1,from_tga2
#-----------------------------------source----------------------------------------
######file-source
##For the path and file name of the data file read by ##LogBus (the file name supports fuzzy matching), you must have read permissions
##Different APPIDs are split by ",", and different directories under the same APPID are split by spaces
##The file name of TAIL_FILE supports regular expressions of java
TAIL_FILE=C:/path1/log.* C:/path2/txt.*,C:/path3/log.* C:/path4/log.* C:/path5/txt.*
######kafka-source
#KAFKA_GROUPID=tga.flume
#KAFKA_SERVERS=
#KAFKA_TOPICS=
#KAFKA_OFFSET_RESET=earliest
#------------------------------------sink-----------------------------------------
##Transmission setting
##Transmission url
##If you enable privatization deployment, please modify the transmission URL to http://data acquisition address/logbus
##PUSH_URL=https://global-receiver-ta.thinkingdata.cn/logbus
PUSH_URL=http://${data collection address}/logbus
##Maximum amount per transmission
#BATCH=10000
##Minimum transmission time interval (unit: second)
#INTERVAL_SECONDS=60
##### http transmission
##Compression format for file transmission:gzip,snappy,none
#COMPRESS_FORMAT=none
##Add the uuid property in each piece of data or not
#IS_ADD_UUID=true
#------------------------------------parse----------------------------------------
##The converter type temporarily is json csv regex splitter
#PARSE_TYPE=json
##Additional fixed attributes, format: name value, name1 value1
#LABELS=
##Property and type, PARSE_TYPE: csv regex splitter, format: name type, name1 type1
##Type: float int string date list bool
#SCHEMA=
##Specify the splitter, not empty when PARSE_TYPE is csv splitter
#SPLITTER=
##Specify the splitter of the list type, applicable when the list type exists, default,
#LIST_SPLITTER=,
##Regular expression, not empty when PARSE_TYPE is regex
#FORMAT_REGEX=
#------------------------------------other----------------------------------------
##Monitor file deletion in the directory, to open the comment (the following two fields must be opened) is to enable the file deletion function, and start the file deletion program every hour
##Press unit to delete the file before offset
##Files to delete
#OFFSET_REMOVE=
##Delete by day or hour
#UNIT_REMOVE=
# V. Start LogBus
Please check the following before first start:
- Check java version
Enter the bin
directory, including two scripts, i.e.,check_java.bat
and logbus.bat
Among them, check_java
is used to test whether the java version meets the requirements and run the script. If the java version does not meet the requirements, there will be prompts such as Java version is less than 1.8
or Can't find java, please install jre first
.
You can update the JDK version or see the next section to install JDK separately for LogBus
- Install the independent JDK of LogBus
If LogBus has a node, the JDK version does not meet the LogBus requirements, and it cannot be replaced with JDK version that meets the LogBus requirements due to the environment. This function can be enabled.
Enter the bin
directory, including install_logbus_jdk.bat
.
After running this script, add a java directory to the LogBus working directory. LogBus will use the JDK environment in this directory by default.
- Configure logBus.conf and run the parameter check command
For the configuration of logBus.conf, please refer to "Configure LogBus"
After configuration, run the env
command to check whether the configuration parameters are correct
logbus.bat env
If a red exception message is output, it means that there is a problem with the configuration, and you need to modify it again until there is a prompt that the configuration file has no exception.
After you modify the configuration of logBus.conf, you need to restart LogBus to put the new configuration into effect
- Start LogBus
logbus.bat start
After start, open logkit.exe, otherwise it may cause data to be uploaded repeatedly
# VI. Details of LogBus command
# 6.1 Help information
If without a parameter or --help or -h, help information will be shown
Mainly introduce LogBus commands:
usage: logbus <command" auxiliary command> [option]
Command:
start Start logBus.
restart Restart logBus.
stop Stop logBus.
reset Reset ogBus read records
stop_atOnce Stop logBus.
Auxiliary command:
env Verify runtime environment
server [-url <url>|-url <url> -appid <appid>] Test the receiver network
show_conf Show current logBus configuration information.
version Show the version number.
update Update logbus to the latest version.
Options:
-appid <appid> Project appid
-h,--help Show and exit help file.
-path <path> Specify the absolute path of the test file
-url <url> Specify the test url
Example:
logbus.bat start Start logBus.
logbus.bat stop Stop logBus.
logbus.bat restart Restart logBus.
logbus.bat server -url http://${receiver address }/logbus -appid ***** Test the receiver network
# 6.2 Transmission channel check server -url
After verifying the format, you should check whether the data channel is open, and you can verify the format by using the server -url
command. You can enter the APP_ID from the TA platform while verifying. Please note that the APP_ID and your project are bound, and ensure that your APP_ID corresponds to your project before input
logbus.bat server -url http://${receiver address}/logbus -appid ${appid}
# 6.3 Show configuration information show_conf
You can view the LogBus configuration information by issuing the show_conf
command, as shown in the following figure:
logbus.bat show_conf
# 6.4 Startup environment check env
You can check the startup environment by using env
. If the output information is followed by an asterisk, it means that the configuration is faulty and needs to be modified until there is no asterisk.
logbus.bat env
# 6.5 Update LogBus version update
You can update the version online by using update
. After that, LogBus will be updated to the latest version
logbus.bat update
# 6.6 Start start
After you have verified the format, data channel check and environment check, you can start LogBus to upload data, and LogBus will automatically test whether your file has new data written. If there is new data, please update the data.
logbus.bat start
# 6.7 Stop stop
If you want to stop LogBus, please issue the stop
command. This command will take a certain time, but without data loss.
logbus.bat stop
# 6.8 Stop stop_atOnce
If you want to stop the LogBus immediately, please issue the stop_atOnce
command, which may cause data loss.
logbus.bat stop_atOnce
# 6.9 Restart restart
You can restart the LogBus by issuing the restart
command to put the new configuration into effect after modifying the configuration parameters.
logbus.bat restart
# 6.10 Reset reset
reset
will reset LogBus. Please issue this command with caution. Once issued, the file transfer records will be cleared, and LogBus will re-upload all data. If you issue this command under unclear conditions, it may cause duplication of your data. It is recommended to issue it after communicating with TA staff.
logbus.bat reset
After issuing the reset command, you need to execute start
to restart data transmission
# 6.11 View version number version
If you want to know your LogBus version, you can use the version
command. If your LogBus does not have this command, your version is an earlier version
logbus.bat version
# VII. ChangeLog
# Version 1.3.0 --- 2021/10/19
- Support non-TA data upload
# Version 1.2.0 --- 2021/05/26
- Support cygWin
# Version 1.1.0 --- 2020/08/28
- Add #UUID
- Support #event_id and #first_check_id
- Support multi-thread sending
- Support splitter parsing and regular parsing
# Version 1.0.0 --- 2020/06/25
- LogBus-Windows release