Data Collection Overview
One of the primary purposes of the Systems Directorate application is to collect data from agents and help administrators analyze and report on it.
In order to do this, the administrator can setup many data collection packages to be deployed to agents. Each of these packages is made up of one or more data collection items.
Data Sources
Data can be read from many different places and can be even customized further by user provided external applications or Python code.
- WMI tables in any namespace
- Static registry values in HKLM
- Dynamic registry values or lists of registry records in HKLM
- Delimited files using almost any type of field and row separators
- Output from user provided external programs or tools. Output should be in delimited format.
- Database queries against a SQL Server or PostgreSQL table located on the managed computer
- Custom Python script code
- User provided .NET DLL plugins
Post-Processing
Data that is collected by the above sources can adjusted or extended by user defined Python script as well. For example, if you are collecting file system information, you might want to add a custom field that calculates the proper minimum free space to allow on a drive based on its total size, purpose, and even host information. This custom field can then be used by a monitoring package to trigger alerts.
Storage on the Agent
All collected data is stored in a data cache located in the agent’s data directory. This data is all in standard tab delimited CSV format in UTF8 encoded files so it is readable by anyone.
Transmission
Data is sent to the Directorate server based on two different scenarios.
- Inventory data is sent immediately after a package is finished collecting all its defined items. Each item is sent as a separate upload to reduce individual upload sizes and to provide the server a better ability to optimize loading into the database. Inventory packages are usually set to run once per day.
- Performance data is sent once per day after midnight local time. The raw data is collected all day and then is summarized based on the package definition. The summarized data is then sent to the server. Performance data is usually collected once every 1 or 5 minutes depending on the package definition.
Summarization can be defined as raw, 5 minutes, 10 minutes, 15 minutes, 30 minutes, or 60 minutes.
Storage on the Server
Data is stored within the database on the management server. Directorate automatically creates three types of tables based on the data item definition. These tables are also automatically updated if the item definition is changed in the web interface.
- Inventory tables start with “inv_” and store the last copy of inventory information collected from an agent.
- History tables start with “hist_” and store every copy of inventory information collected from an agent.
- Performance tables start with “perf_” and store every time-based instance of data from a performance collection. If a performance file is resent, it will fill in any time gaps found, but will not overwrite the data for a specific time if it is already there.
Both history and performance tables are pruned based on settings in the region configuration. By default, they keep 90 days of data each.
Each row in either table contains these three fields at a minimum . They allow for easy cross-referencing of data for reporting.
Field | Value |
---|---|
computer_id | the GUID of the computer |
rec_time | The agent local time when the data was collected |
rec_time_utc | The agent local UTC time when the data was collected |
rec_order | The order of the record in the table |
rec_instance | A unique instance name to identify the row |