Our customer service engineers wanted a script to list unique asset tags, and metadata data about assets. Currently, there is no straightforward way, like invoking one API, to list asset tags. You can only list the asset tags for a specific asset. This first implementation discovers all the unique asset tags and counts their usage. The result is placed in a CSV file. The outline is as follows:
- Collect all the assets via an asset export
- Run through all the asset collection asset tags and check for uniqueness.
The code for this blog is at
blog1_uniq_asset_tags.py. As in some previous code, logging is used. The log file is
uniq_asset_tags.log. And again, it is not managed, so you will have to clean it up as you see fit. The script produces the two-column CSV file,
Asset collection is done via “Data Exports” APIs. The code was modified from; and by the way, there is a newer stand-alone
export_assets.py. Let’s go over the code in
First, the code checks if there is a command-line parameter, search ID. In most cases, this option should not be used. It is there in case the asset export takes too long. The normal path is
id will be zero which means the code will request an asset export and then check the status. The number, 50,000, is used as a guess and is used to calculate the wait time.
Let’s delve into
filter_params in the request, the body specifies an active asset export in the JSONL format. As you recall JSONL takes fewer resources to process than JSON. For more details, check out the “Exporting Asset Data” section in “Acquiring Vulnerabilities Per Asset“. The “Request Data Export” API is invoked. A difference in code is that now the
except has been removed and error checking is done by a direct
record_count from the API response are returned.
check_export_status() is used to call
get_export_status(), which invokes the “Check Data Export Status” API to discover if the export is ready. When
message is “Export ready for download”, we’re good to go.
check_export_status(), the export status is immediately checked. This is done so that if the export is done, there is no wait time. The rest of the code sleeps and checks the export status in a loop. A maximum export time is calculated and if exceeded will end the script with instructions. Any mathematician should relate to
2718. Hopefully, the unction returns as successful. If not successful, the script is ended.
Once the asset export is ready, we retrieve the data into a
.gz file. The file name is
asset_, and unzipped into a file named
asset_. For example,
asset_14967.gz is unzipped into
asset_14967.jsonl. Looking at the details, the code checks if there is an existing JSONL file. If there is, the function returns the file name.
If the JSONL file doesn’t exist, it is fetched using the “Retrieve Data Export” API. Before the API is invoked, the HTTP headers
Accept is modified to accept
gzip. Also note that the
stream parameter is set to
Next, the gzip file is read, ungziped using Python’s gzip library, and written to a JSONL file. The JSONL file name is returned. I used the “Extracting only one file” at “Unzip a file in Python: 5 Scenarios You Should Know” as a guide.
Processing the JSONL File
Now that the assets have been collected, it is time to process them. The JSONL file lines or records are counted. If the line count is equal to one, then it is a very short JSONL file; or it is not a JSONL file. I took the stance that it is not a JSON file.
Here is where the JSONL file is processed. The function parameters are the JSONL file name and the
asset_tags dictionary. (Since
asset_tags is a dictionary, it can be modified.).
A JSONL file is read one line at a time, converted to a dictionary in the function
I decided to wrap a function around
json.loads(), because of the exception handling. If the line can’t be converted, it is assumed that the file is not in JSONL, but possibly XML or CSV.
After the line is converted to a dictionary, the code checks for “locator.” Why? As far as I know, every asset has a
"locator." If the asset has
"tags," the asset tags are processed.
The asset tag uniqueness and counting are done in this function. The parameters are:
asset_id– The unique ID of the asset.
tags_to_process– An array of asset tags to process for uniqueness.
asset_tags– The asset tag dictionary maps the asset tag name to the
Each asset tag in the array
tags_to_process is examined if it is a key in the
asset_tags dictionary. If the asset tag is a key, then it is counted by
incr() method; however, if the tag is not a key, an
Asset_Tag_Info object is created, attached to the key with a count of one.
After all the tags in all the assets are processed, the
asset_tags dictionary is written to a CSV file. The Python csv library is used. Finally, the CSV file is made available.
The results are in the CSV file,
uniq_asset_tags.csv so you can either sort by asset tag name or usage count. Once you know all your asset tag names, you might find some that are underutilized or even over-utilized. As always this code is in Kenna Security’s GitHub repository.
Until next time,