# Pre-access Preparations
Pre-access Preparations
TIP
Basic knowledge section introduces the TA data rules that must be understood before access.
Access essential information section lists the system parameters you need to prepare before accessing.
TA (Thinking Analytics) system provides a full-end data access scheme.
Generally, access to TA requires three steps: first, sort out the data acquisition scheme according to the business needs, and a number of technical analysts will assist you in this work; Then, the research and development personnel complete the data access according to the data acquisition scheme; Finally, verify the correctness of data access. The access process is shown in the following diagram:
Before data access, it is very important to understand the basics of the TA system. This doc will provide an overview of the necessary access-related knowledge, and will also list how to get help when you want to learn more about relevant content.
The target readers of this article are all colleagues related to access, including business personnel, research and development personnel, testers, etc.
# I. Overview
TA provides a full-end data access scheme, the main access methods include:
- Client side SDK : It can collect device information and user behavior data that does not communicate with the server, which is simple and easy to use
- Server level SDK : more accurate content acquisition, suitable for collecting core business data
- Data import tool : usually used for historical data import; server level SDK with LogBus is also a more common server level data collection scheme
For general application and web development, we provide:
- React Native SDK Usage Guide
- Flutter SDK User Guide
- iOS SDK Weex Framework Support
- Android SDK Weex Framework Support
- Original SDK: Android SDK, iOS SDK
- Third-party frameworks: Flutter , Reactive Native , Weex ( Android , iOS )
- H5 development: JavaScript SDK , H5 and native SDK open solution
- Mainstream Mini Program, Fast Application Platform: Mini Program SDK
For small game development, we provide:
- Mainstream game engine support: LayaBox , Egret Egret Engine , Cocos Creator
- Mainstream small game, fast game platform support: small game SDK
For mobile game development, we provide:
For server level acquisition scenarios, we recommend the server level SDK + Logbus . This scheme has better performance in the stability, real-time and efficiency of data import.
If you have some heterogeneous historical data that needs to be imported, or if some data needs to be added to the TA system, you can consider using DataX import. Unlike the Logbus solution , DataX is not a resident service and cannot monitor the generation of new data and import it in time, so it cannot guarantee the real-time performance of the data. The advantage of DataX is that it supports heterogeneous data import from multiple data sources and is easy to operate.
If you use Filebeat and Logstash to collect logs and want to import log data into TA systems, you can use the Filebeat + Logstash scheme.
When designing a data collection scheme, you can choose a solution suitable for your product technical architecture and business needs according to your business situation. If you have any questions about the collection scheme, you can consult our analysts or technical support colleagues in the support group.
# II. Basic knowledge
# 2.1 TA Data Model
Before data access, we first need to understand the data in TA.
The design of the data collection scheme is actually the process of determining which user behavior events to collect according to the objectives of business analysis. For example, if you want to analyze the user recharge situation, what you want to collect may be user payment behavior data. User behavior data can be broken down into: who (WHO), when (WHEN), where (WHERE), and in what way (HOW), recharge behavior (WHAT), as shown in the following figure:
User behavior data is organized into user-related data and event-related data in TA, and stored in user tables and transaction tables respectively. User data is mainly used to describe the user's state and attributes that do not change frequently. Event data is used to describe information related to specific behavior events.
In a data collection scenario, you need to determine when to trigger the reporting of user data and when to trigger the reporting of events. For further understanding of user data and event data, please refer to: User features and event attributes setting .
In all our data access guides, we will explain how to report event data and user data respectively.
# 2.2 User Identification Rules
For each piece of user data or event data, it is necessary to specify which user the data belongs to. In scenarios without an account system, you can use a device-related ID to uniquely identify the user. However, for scenarios with account systems, a user may generate data in multiple devices. In the analysis, it is necessary to combine the user's data at multiple ends for analysis, and the unique ID related to the device is not applicable.
In order to deal with the above two scenarios, in the process of data access, it is necessary to combine two user IDs to identify the user:
- Guest ID (#distinct_id): By default, the client side generates a random guest ID to identify the user, and also provides an interface to read and modify the default guest ID.
- Account ID (#account_id): When the customer logs in, the account ID can be set. Through the account ID, data in multiple devices can be associated.
Each piece of data must contain a guest ID or account ID. The client SDK will randomly generate a guest ID by default. After you call the login
interface to set the account ID, all data will be reported with both the guest ID and the account ID. When reporting through the server level, you need to pass in at least one of the IDs.
In the TA background, the unique ID that identifies a user is the TA user ID (#user_id field). When data is received, we will create a new user according to the specified user identification rules or bind the data to an existing user.
User identification rules are very important content. If the user ID is not set correctly, the data may be bound to the wrong user and affect the analysis effect. Before accessing, please be sure to carefully understand the rule and specify the user identification scheme in the data collection scheme.
# 2.3 Data Format
No matter which way data is accessed, a uniform data format and the same data restrictions are used when sending it to the data receiver. The Data Rules chapter provides a detailed description of the data format and corresponding data limitations.
If you connect data through the SDK, you only need to call the corresponding interface, and the SDK will organize the data into the required data format for reporting; if you access data through the data import tool or the Restful API , you need to organize the data according to the data rules Description in the format, and then report.
Regarding data formats, special attention should be paid to name rules and data types:
- Name rules: Event names and attribute names can only contain letters, numbers, and underscores _, start with letters, and cannot exceed 50 characters
Note: Property names are not case sensitive; event names and property values are case sensitive
- Attribute value data type:
TA data type | Sample value | Value selection description | Data type |
---|---|---|---|
Numerical value | 123,1.23 | The data range is -9E15 to 9E15 | Number |
Text | "ABC", "Shanghai" | The default limit for characters is 2KB | String |
Time | "2019-01-01 00:00:00","2019-01-01 00:00:00.000" | "Yyyy-MM-dd HH: mm: ss. SSS" or "yyyy-MM-dd HH: mm: ss", if you want to indicate the date, you can use "yyyy-MM-dd 00:00:00" | String |
Boolean | true,false | - | Boolean |
List | ["a","1","true"] | The elements in the list will be converted to a string type, and the most in the list | Array(String) |
Object | {hero_name: "Liu Bei", hero_level: 22, hero_equipment: ["Male and Female Double Sword", "Lu"], hero_if_support: False} | Each child attribute (Key) in the object has its own data type. Please refer to the general attribute of the corresponding type above for the value selection description · Up to 100 child attributes within the object | Object |
Object group | [{hero_name: "Liu Bei", hero_level: 22, hero_equipment: ["Male and Female Double Sword", "Lu"], hero_if_support: False}, {hero_name: "Liu Bei", hero_level: 22, hero_equipment: ["Male and Female Double Sword", "Lu"], hero_if_support: False}] | Each child attribute (Key) in the object group has its own data type. For value selection description, please refer to the general attributes of the corresponding type above Up to 500 objects in the object group | Array(Object) |
Note: In TA, the type of attribute value will be determined according to the type of attribute value received for the first time. If the type of value of a certain attribute in the data does not match the previously determined type, the attribute will be discarded.
In the TA background, you may notice that some attribute names start with #, and such attribute are preset attributes. Preset attributes do not require special settings, and the SDK collects them by default. For details, please refer to: Preset attribites and system fields .
You should pay attention to the fact that data cannot be stored when the data format or data type is not set correctly. Therefore, in the access phase and after access, you may need to check or observe the correctness of data reporting through the Data Reporting Managementmodule, and correct the problems in time.
# III. Access Required Information
Before data access is formally carried out by R & D, it is necessary to ensure that the following information is ready:
- Project APP ID: When creating a project in the background of TA, the APP ID of the project will be generated, which can also be viewed on the project management page.
- Determine the address of the data receiving end
- If using Cloud as a Service, the receiver address is: https://ta-receiver.thinkingdata.io
- In the case of private deployment, you need to bind the domain name in the private cluster (or access point) and configure the SSL certificate
- Verify that the receiving end address: the browser visits https://YOUR_RECEIVER_URL/health-check, and the page returns ok to indicate correct
- A data collection scheme that includes:
- Data access methods: client side SDK, server level SDK, data import tools, or a combination of several solutions
- Content and trigger timing of data to be accessed
You completed the reading of preparing doc before accessing. Now, you can start data access by referring to the corresponding access guide document based on the selected access method.