[DOCS-10653] Obs Pipelines OCSF Custom Configuration #28892

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

maycmlee wants to merge 12 commits into may/obs-pipelines-2.5-components from may/op-ocsf-custom-configuration

+427 −3

Contributor

maycmlee commented Apr 21, 2025 •

edited

Loading

What does this PR do? What is the motivation?

Adds OP OCSF Custom Configuration doc.
Adds 4th-level nav item.
Moves Remap OCSF to its own folder.

Merge instructions

Merge readiness:

Ready for merge

For Datadog employees:
Merge queue is enabled in this repo. Your branch name MUST follow the <name>/<description> convention and include the forward slash (/). Without this format, your pull request will not pass in CI, the GitLab pipeline will not run, and you won't get a branch preview. Getting a branch preview makes it easier for us to check any issues with your PR, such as broken links.

If your branch doesn't follow this format, rename it or create a new branch and PR.

To have your PR automatically merged after it receives the required reviews, add the following PR comment:

/merge

Additional notes


          add custom config doc

8ec4bc1

maycmlee added the WORK IN PROGRESS label

maycmlee requested a review from a team as a code owner

April 21, 2025 19:53

Contributor

github-actions bot commented Apr 21, 2025 •

edited

Loading

Preview links (active after the `build_preview` check completes)

New or renamed files

https://docs-staging.datadoghq.com/may/op-ocsf-custom-configuration/observability_pipelines/processors/remap_ocsf/custom_mapping_configuration_format

Renamed files

https://docs-staging.datadoghq.com/may/op-ocsf-custom-configuration/observability_pipelines/processors/remap_ocsf/

maycmlee added 2 commits

April 21, 2025 16:34


          updates

5dad014


          add fourth-level nav item

1aad8ad

maycmlee requested a review from a team as a code owner

April 21, 2025 20:46

github-actions bot added the Architecture label


          edits

ac14b4d

maycmlee commented

View reviewed changes

content/en/observability_pipelines/processors/remap_ocsf/custom_configuration.md Outdated

    
              |------------|-----------|-----------------------------------------------------------------------------------------------------------------------|

              | `version`    | Yes       | Must be set to `1` to indicate the mapping descriptor format version.                                                 |

              | `metadata`   | Yes       | Contains a set of hard-coded description fields about the event class. See [Metadata section](#metadata-section) for details.                         |

              | `preprocess` | No        | Lists an ordered series of preprocessing steps. The preprocessors rework the data to allow the field-to-field mappings. Each entry is an object consisting of a `function` name (required) and parameters associated with that function. See [Preprocessors](#preprocessors) for more information. |

Contributor Author

maycmlee Apr 22, 2025

What does "rework" mean here? And how does it allow field-to-field mappings?

Contributor

bruceg Apr 22, 2025 •

edited

Loading

These are operations that may be necessary to transform incoming raw data from original sources into an object suitable for the mapping steps below. "Reformat" might be a better descriptor.

content/en/observability_pipelines/processors/remap_ocsf/custom_configuration.md Outdated Show resolved Hide resolved

content/en/observability_pipelines/processors/remap_ocsf/custom_configuration.md Outdated Show resolved Hide resolved

content/en/observability_pipelines/processors/remap_ocsf/custom_configuration.md Outdated

    
              The `parse_csv` preprocessor:

              1. Extracts a `source` field.

Contributor Author

maycmlee Apr 22, 2025

An example source field that gets extracted could be helpful.

content/en/observability_pipelines/processors/remap_ocsf/custom_configuration.md Outdated Show resolved Hide resolved

content/en/observability_pipelines/processors/remap_ocsf/custom_configuration.md Outdated

    
              All enumerated name or label fields identified in the OCSF schema are converted to their sibling `id` field. For example, the string field `severity` is automatically converted to the numeric field `severity_id` based on the standard enum value table defined in the OCSF schema. If no matching value is found in the lookup table, the `id` field is set to `99` to represent `Other`.

              If one of the listed `profiles` in the metadata section is `datetime`, the mapping

              automatically has all numeric timestamps identified in the OCSF schema converted into the sibling field `{DEST}_dt`. For example, the numeric `time` field is converted into `time_dt`, which contains a string representation of that timestamp. No additional work is required to support the `datetime` profile.

Contributor Author

maycmlee Apr 22, 2025

Is there a link to the OCSF schema we can add here?

Contributor

bruceg Apr 22, 2025

OCSF is composed of multiple schemas, the choice of which is specified by the version, class, and profiles given in the metadata section. The complete schema can be browsed here: https://schema.ocsf.io/

content/en/observability_pipelines/processors/remap_ocsf/custom_configuration.md Outdated Show resolved Hide resolved

content/en/observability_pipelines/processors/remap_ocsf/custom_configuration.md Outdated Show resolved Hide resolved

content/en/observability_pipelines/processors/remap_ocsf/custom_configuration.md Outdated

    
              ### Mapping Functions

              A function applies an operation to the value extracted from the source, before assigning to the destination.

Contributor Author

maycmlee Apr 22, 2025

Before assigning the extracted value to the destination? Is that correct?

Contributor

bruceg Apr 22, 2025

Correct.

content/en/observability_pipelines/processors/remap_ocsf/custom_configuration.md Outdated Show resolved Hide resolved

maycmlee added 4 commits

April 22, 2025 15:22


          Apply suggestions from code review

6b7e250


          Update content/en/observability_pipelines/processors/remap_ocsf/custo…

5e4d309

…m_configuration.md


          updates

b86aac7


          Merge branch 'master' into may/op-ocsf-custom-configuration

df48e95

maycmlee requested review from a team as code owners

April 24, 2025 14:54

maycmlee requested review from TovRudyy, vitor-de-araujo and RomainGuarinoni

April 24, 2025 14:54

github-actions bot added Images Guide labels


          Merge branch 'may/obs-pipelines-2.5-components' into may/op-ocsf-cust…

50b0c47

…om-configuration

github-actions bot removed Images Guide labels

maycmlee removed request for a team, TovRudyy, RomainGuarinoni and vitor-de-araujo

April 24, 2025 15:12


          updates

a313033

maycmlee commented

View reviewed changes

content/en/observability_pipelines/processors/remap_ocsf/custom_configuration.md Outdated

    
              1. Parses `source` as CSV and maps the values to the `columns` listed.

              1. Inserts the mapped data in the `dest` field.

              Columns with a `null` value are dropped. One of the columns can have a wildcard (`*`), in which case it is assigned a string containing all the text from the fields that remain after those before and after have been mapped.

Contributor Author

maycmlee Apr 24, 2025

Need clarification on what this means: "assigned a string containing all the text from the fields that remain after those before and after have been mapped."

Contributor Author

maycmlee Apr 24, 2025

@bruceg could you clarify that part of the sentence?

Contributor

bruceg Apr 25, 2025

See the example posted earlier. Effectively, the output column is assigned everything that is left after handling the remainder of the fields after the wildcard.

maycmlee added 2 commits

April 24, 2025 13:39


          small edits

09fb946


          update title and add link

6d82b9b

drichards-87 requested changes

View reviewed changes

Contributor

drichards-87 left a comment

Looks good overall. I just had a few minor suggestions.

content/en/observability_pipelines/processors/remap_ocsf/custom_mapping_configuration_format.md

    
              | `version`    | Yes       | Must be set to `1` to indicate the mapping descriptor format version.                                                 |

              | `metadata`   | Yes       | Contains a set of hard-coded description fields about the event class. See [Metadata section](#metadata-section) for details.                         |

              | `preprocess` | No        | An ordered list of preprocessing steps. The preprocessors reformat raw data from the sources so that the data can be converted to OCSF format based on `mapping`. Each preprocessor entry consists of a `function` and parameters associated with that function. See [Preprocessors](#preprocess-section) for more information. |

              | `mapping`    | Yes       | An ordered list of field-to-field assignments, where a `source` field is assigned to a `dest` field in the output OCSF event. See  for more information. Each [mapping](#mapping-section) may have a `conversion` specified by a [lookup table](#mapping-lookup-tables) or post-processing [mapping function](#mapping-functions). |

Contributor

drichards-87 Apr 24, 2025

I think there might be a missing link in the mapping description. (See for more information.)

content/en/observability_pipelines/processors/remap_ocsf/custom_mapping_configuration_format.md

    
              ### Preprocess section

              The `preprocess` section lists the preprocessors that reformats the data to allow field-to-field mappings. Each entry in this section consists of a `function` and the parameters associated with that function.

Contributor

drichards-87 Apr 24, 2025

Suggested change

      
            The `preprocess` section lists the preprocessors that reformats the data to allow field-to-field mappings. Each entry in this section consists of a `function` and the parameters associated with that function.
          
            The `preprocess` section lists the preprocessors that reformat the data to enable field-to-field mappings. Each entry in this section consists of a `function` and the parameters associated with that function.

content/en/observability_pipelines/processors/remap_ocsf/custom_mapping_configuration_format.md

    
              ### Mapping section

              The `mapping` section is an ordered list of field-to-field assignments. Each mapping entry consists of a `dest` path that refers to the destination field in the OCSF event and either a `source` path that refers to a field in the source event or a `value` that contains a literal constant to insert at that destination.

Contributor

drichards-87 Apr 24, 2025

Suggested change

      
            The `mapping` section is an ordered list of field-to-field assignments. Each mapping entry consists of a `dest` path that refers to the destination field in the OCSF event and either a `source` path that refers to a field in the source event or a `value` that contains a literal constant to insert at that destination.
          
            The `mapping` section is an ordered list of field-to-field assignments. Each mapping entry consists of a `dest` path that refers to the destination field in the OCSF event, and either a `source` path that refers to a field in the source event, or a `value` that contains a literal constant to insert at that destination.

content/en/observability_pipelines/processors/remap_ocsf/custom_mapping_configuration_format.md

    
              #### Implicit mappings

              All enumerated name or label fields identified in the OCSF schema are converted to their sibling `id` field. For example, the string field `severity` is automatically converted to the numeric OSCF field `severity_id` based on the values defined in the OCSF schema. See `severity_id` in [Authentication [3002]][1] for the OCSF values. If no matching value is found in the lookup table, the `id` field is set to `99` to represent `Other`.

Contributor

drichards-87 Apr 24, 2025

Suggested change

      
            All enumerated name or label fields identified in the OCSF schema are converted to their sibling `id` field. For example, the string field `severity` is automatically converted to the numeric OSCF field `severity_id` based on the values defined in the OCSF schema. See `severity_id` in [Authentication [3002]][1] for the OCSF values. If no matching value is found in the lookup table, the `id` field is set to `99` to represent `Other`.
          
            All enumerated name or label fields identified in the OCSF schema are converted to their sibling `id` field. For example, the string field `severity` is automatically converted to the numeric OSCF field `severity_id` based on the values defined in the OCSF schema. See `severity_id` in [Authentication][1] for the OCSF values. If no matching value is found in the lookup table, the `id` field is set to `99` to represent `Other`.

content/en/observability_pipelines/processors/remap_ocsf/custom_mapping_configuration_format.md

    
              ### `reshape_array`

              The `reshape_array` function extracts data from a source array to create a new array of values. The function filters only array elements containing a field that matches a condition from the list in [Mapping lookup tables](#mapping-lookup-tables), and extracts another field into the output array.

Contributor

drichards-87 Apr 24, 2025

Suggested change

      
            The `reshape_array` function extracts data from a source array to create a new array of values. The function filters only array elements containing a field that matches a condition from the list in [Mapping lookup tables](#mapping-lookup-tables), and extracts another field into the output array.
          
            The `reshape_array` function extracts data from a source array to create a new array of values. The function filters array elements, selecting only those that contain a field matching a condition from the [Mapping lookup tables](#mapping-lookup-tables), and extracts another field into the output array.

Contributor

drichards-87 Apr 24, 2025

This might be because I'm not very familiar with OCSF or Observability Pipelines, but I'm not sure I understand what 'another field' refers to in the phrase 'and extracts another field into the output array.'"

Contributor

bruceg Apr 25, 2025

That is a reference to a field within elements of the array. i.e. if the array contains [{"a":1},{"a":2}] then you could indicate you want to extract field a and it would result in [1, 2].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Architecture WORK IN PROGRESS