Data Collection

Data Collection Overview

One of the primary purposes of the Systems Directorate application is to collect data from agents and help administrators analyze and report on it.

In order to do this, the administrator can setup many data collection packages to be deployed to agents. Each of these packages is made up of one or more data collection items.

Data Sources

Data can be read from many different places and can be even customized further by user provided external applications or Python code.

  • WMI tables in any namespace
  • Static registry values in HKLM
  • Dynamic registry values or lists of registry records in HKLM
  • Delimited files using almost any type of field and row separators
  • Output from user provided external programs or tools. Output should be in delimited format.
  • Database queries against a SQL Server or PostgreSQL table located on the managed computer
  • Custom Python script code
  • User provided .NET DLL plugins

Post-Processing

Data that is collected by the above sources can adjusted or extended by user defined Python script as well. For example, if you are collecting file system information, you might want to add a custom field that calculates the proper minimum free space to allow on a drive based on its total size, purpose, and even host information. This custom field can then be used by a monitoring package to trigger alerts.

Storage on the Agent

All collected data is stored in a data cache located in the agent’s data directory. This data is all in standard tab delimited CSV format in UTF8 encoded files so it is readable by anyone.

Transmission

Data is sent to the Directorate server based on two different scenarios.

  1. Inventory data is sent immediately after a package is finished collecting all its defined items. Each item is sent as a separate upload to reduce individual upload sizes and to provide the server a better ability to optimize loading into the database. Inventory packages are usually set to run once per day.
  2. Performance data is sent once per day after midnight local time. The raw data is collected all day and then is summarized based on the package definition. The summarized data is then sent to the server. Performance data is usually collected once every 1 or 5 minutes depending on the package definition.

Summarization can be defined as raw, 5 minutes, 10 minutes, 15 minutes, 30 minutes, or 60 minutes.

Storage on the Server

Data is stored within the database on the management server. Directorate automatically creates three types of tables based on the data item definition. These tables are also automatically updated if the item definition is changed in the web interface.

  • Inventory tables start with “inv_” and store the last copy of inventory information collected from an agent.
  • History tables start with “hist_” and store every copy of inventory information collected from an agent.
  • Performance tables start with “perf_” and store every time-based instance of data from a performance collection. If a performance file is resent, it will fill in any time gaps found, but will not overwrite the data for a specific time if it is already there.

Both history and performance tables are pruned based on settings in the region configuration. By default, they keep 90 days of data each.

Each row in either table contains these three fields at a minimum . They allow for easy cross-referencing of data for reporting.

Field Value
computer_id the GUID of the computer
rec_time The agent local time when the data was collected
rec_time_utc The agent local UTC time when the data was collected
rec_order The order of the record in the table
rec_instance A unique instance name to identify the row