On this page

Converter configuration reference

This reference covers examples and operators for the Amazon S3 Import and Google Cloud Storage (GCS) converter configuration. Read the S3 guide or the GCS guide for more information.

skip_user_properties_sync

Because many cloud storage source imports are batch uploads of historical data, syncing the latest user properties for historical events might not make sense. For this reason, Amplitude sets $skip_user_properties_sync to true by default. To include user properties with your events, set it to false in the converter.

For more information about $skip_user_properties_sync, refer to the Data Backfill Guide.

ignoreEventFlag

Use ignoreEventFlag to selectively skip events from ingestion based on a computed boolean condition. Set ignoreEventFlag to a field name (for example, "$ignore"), then define that field in convertToAmplitudeFunc using any DataLang boolean expression. When the field evaluates to true for a given event, Amplitude skips the event without counting it as an error. When it evaluates to false or is absent, Amplitude ingests the event normally.

This option is useful when your source data contains events you want to filter, such as internal test events, bot traffic, or specific event types, without preprocessing the files upstream.

Example: filter out specific event types

The following example skips any event where event_type is page_view or session_start:

json
{
  "converterConfig": {
    "ignoreEventFlag": "$ignore",
    "convertToAmplitudeFunc": {
      "event_type": "event_type",
      "user_id": "user_id",
      "$ignore": ["or",
        ["equals", ["path", "event_type"], ["value", "page_view"]],
        ["equals", ["path", "event_type"], ["value", "session_start"]]
      ]
    }
  }
}

In this example:

  • "ignoreEventFlag": "$ignore" tells Amplitude to look for a field named $ignore in the converted event.
  • "$ignore" in convertToAmplitudeFunc defines a boolean expression using the or and equals operators.
  • If event_type is page_view or session_start, $ignore evaluates to true and Amplitude skips the event.
  • Amplitude ingests all other events normally.

You can use any boolean operator or combination of operators to build the condition.

convertToAmplitudeFunc

Conversion rules in convertToAmplitudeFunc instruct the ingestion service on how to construct events in Amplitude.

Example converter with convertToAmplitudeFunc

json
{
    "config_name": ["Event sample converter"],
    "converterConfig": {
        "fileType": "parquet",
        "compressionType": "none",
        "convertToAmplitudeFunc": {
          "event_type": "action",
          "user_id": "user",
          "device_id": "device",
          "event_properties": {
              "business_id_encid": "business_id"
          },
          "user_properties": {
              "utm_channel_category": "utm_channel_c",
              "utm_channel_source": "utm_channel_s"
          },
          "time": "epoch",
          "session_id": "session_id",
          "app_version": "app_version"
        }
    },

    "keyValidatorConfig": {
        "filterPattern": "folder1/folder2/ds=202011[1-2][0-9]/.*\\.parquet"
    }
}

Example constructed event

json
{
  "event_type": "watch tv",
  "user_id": "john",
  "device_id": "host1",
  "event_properties": {
    "business_id_encid": "123"
      },
      "user_properties": {
        "utm_channel_category": "discovery",
        "utm_channel_source": "network"
      },
      "time": "1645066434189",
      "session_id": "1",
      "app_version": "1"
}

Values in the event come from the fields specified by convertToAmplitudeFunc. For example, the value watch tv in field event_type comes from field "action" in ingested data files. Because the event_type value isn't ["value":"$identify"] or ["value":"$groupidentify"], Amplitude ingests events the same way it ingests events with the HTTP V2 API.

Operators

List operators

Use list operators

If the source description is a list, the first item in the list must be a string specifying the function. The rest of the list are the parameters to the function. The "|" character separates non-repeating and repeating arguments. Any arguments after "|" are repeatable, and you can specify them any number of times. The entire list of arguments must be present in any multiple-argument operator (you can't specify just one of three arguments, you must include all three).

OperatorDescriptionSyntax
pathEvaluates each SourceDescription sequentially on the returned JsonElement. Equivalent to evaluating a specific path when chaining BasicPaths. This also works with indexing into an array, for example ["path", "foo", "1"] chooses the element at index 1 (second element) in the array at "foo".The index must be provided as a string.["path",| SOURCE_DESCRIPTION...] Example: ["path", "foo", "bar"] => obj['foo']['bar'].
anyReturns the first value returned by SOURCE_DESCRIPTION in the list.["any", SOURCE_DESCRIPTION, | SOURCE_DESCRIPTION...]
valueEscapes a single JSON value so you can create a static value.Example: ["value", "amplitude-vacuum"...
dictCreates a dictionary (object) where the raw_strings are keys and values are the evaluated SOURCE_DESCRIPTIONS.["dict", "raw_string", SOURCE_DESCRIPTION, |"raw_string", SOURCE_DESCRIPTION...]
arrayReturns an array, where elements are the values returned by evaluating SOURCE_DESCRIPTIONS. If a SOURCE_DESCRIPTION fails to evaluate, it will be skipped["array", SOURCE_DESCRIPTION, \SOURCE_DESCRIPTION...]
conditionDetermines the first true BooleanCondition and returns the result of the following SOURCE_DESCRIPTION. Throws a NoValueFoundAtSource exception if nothing evaluates to true.["condition"| "cond", BOOLEAN_SOURCE, SOURCE_DESCRIPTION, | BOOLEAN_SOURCE, SOURCE_DESCRIPTION...]
ifelseIf the BOOLEAN_SOURCE evaluates to true, returns the first SOURCE_DESCRIPTION. Otherwise returns the second SOURCE_DESCRIPTION.["ifelse", BOOLEAN_SOURCE, SOURCE_DESCRIPTION, SOURCE_DESCRIPTION]
sample_md5Evaluates the given sampleKey (second arg) with the samplePercent (first arg) to determine whether it should be in the sample. Returns boolean["sample_md5", SOURCE_DESCRIPTION, SOURCE_DESCRIPTION]
iso_time_to_msAssumes the string returned by SOURCE_DESCRIPTION is an ISO datetime string, for example, YYYY-MM-DDTHH:MM:SS, and converts to milliseconds since epoch["iso_time_to_ms", SOURCE_DESCRIPTION]
ms_to_iso_timeAssumes the string returned by SOURCE_DESCRIPTION is milliseconds since epoch and converts to ISO datetime string, for example, YYYY-MM-DDTHH:MM:SS ["ms_to_iso_time", SOURCE_DESCRIPTION]
iso_time_nowGenerates an ISO datetime string for right now["iso_time_now"]
ms_time_nowGenerates the milliseconds since epoch for right now["ms_time_now"]
int96_time_to_msAssumes the string returned by SOURCE_DESCRIPTION is a base64-encoded INT96, for example, AP6qCz41AAAwhCUA, and converts to milliseconds since epoch["int96_time_to_ms", SOURCE_DESCRIPTION]
parse_time_to_msTakes in a RAW_STRING time format, for example, M/d/yyyy H:mm:ss, and a SOURCE_DESCRIPTION that returns a string in that format, for example, '1/1/2021 5:06:07', and converts to milliseconds since epoch["parse_time_to_ms", RAW_STRING, SOURCE_DESCRIPTION]
parse_json_elementAssumes the value returned by SOURCE_DESCRIPTION is a string json blob and returns the parsed json value["parse_json_element"| "parse_json_object", SOURCE_DESCRIPTION]
merge_dictsMerges the json objects that each SOURCE_DESCRIPTION evaluates to["merge_dicts", SOURCE_DESCRIPTION, | SOURCE_DESCRIPTION...]
flatten_dictFlattens a nested json object into a single layer json object["flatten_dict", "raw_string", INTEGER_SOURCE, SOURCE_DESCRIPTION]
exclude_keysEvaluates the specified SourceDescription the returned JsonElement without the requested fields["exclude_keys", SOURCE_DESCRIPTION, | "raw_string"...]
concatTreats the results of each SOURCE_DESCRIPTION as a string and returns the concatenated string["concat", SOURCE_DESCRIPTION, | SOURCE_DESCRIPTION...]
replace_withReplace all old_string within the value returned by SOURCE_DESCRIPTION with new_string. Returns a string or raises a NoValueException if SOURCE_DESCRIPTION can't be evaluated to a string. The old_string supports Java's regex syntax for matching patterns, more details at https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html["replace_with", "old_string", "new_string", SOURCE_DESCRIPTION]
splitSplits the value returned by SOURCE_DESCRIPTION by the specified character sequence. Returns a jsonArray or raises a NoValueException if SOURCE_DESCRIPTION can't be evaluated to a string["split", "raw_string", SOURCE_DESCRIPTION]
lowercaseReturns the lowercase string["lowercase"| "lower", SOURCE_DESCRIPTION]
typeofReturns type of the source description as a string: 'string', 'list', 'dict', 'bool', 'number', 'null'["typeof", SOURCE_DESCRIPTION]

Boolean operators

These operators return a JsonPrimitive of type Boolean, so they're valid to use with cond and ifelse.

OperatorDescriptionSource
boolEvaluates as a static boolean value. Throws an exception during initialization if RAW_JSON isn't a boolean value.["bool", any_json]
notReturn whether both arguments are true. Null values are treated as false, string 'true' or 'false' is cast to a boolean.["not"|"!", SOURCE_DESCRIPTION]
andReturn whether all arguments are true. Null values are treated as false, string 'true' or 'false' is cast to a boolean.["and"|"&&", SOURCE_DESCRIPTION, | SOURCE_DESCRIPTION...]
orReturn whether at least one argument is true. Null values are treated as false, string 'true' or 'false' is cast to a boolean.["or"|"||", SOURCE_DESCRIPTION, | SOURCE_DESCRIPTION...]
equalsEvaluates to true if and only if the two args are equal.["equals"|"eq"|"=", SOURCE_DESCRIPTION, SOURCE_DESCRIPTION]
containsTrue if the evaluated SourceDescription (second arg) contains the given raw string. If the SourceDescription is null, evaluates to false.["contains"|"is_substring", "raw_string", SOURCE_DESCRIPTION]

Integer and float operators

The following Operators return a JsonPrimitive of type Integer, barring the add Operator which returns JsonPrimitive of type Float.

OperatorDescriptionSyntax
intEvaluates as a static int value. Throws an exception during initialization if RAW_JSON is not an int value.["int", RAW_JSON]
roundRound the argument to the nearest integer. Amplitude attempts to convert strings to integers and treats null values as zero.["round", SOURCE_DESCRIPTION]
addReturn the sum of the arguments as an integer. Amplitude attempts to convert strings to integers and treats null values as zero.["add"|"+", SOURCE_DESCRIPTION, | SOURCE_DESCRIPTION...]
subtractSubtracts the second argument from the first one. Amplitude attempts to convert strings to integers and treats null values as zero.["subtract"|"-", SOURCE_DESCRIPTION, SOURCE_DESCRIPTION]
multiplyReturn the product of the arguments as an integer. Amplitude attempts to convert strings to integers and treats null values as zero.["multiply"|"*", SOURCE_DESCRIPTION, | SOURCE_DESCRIPTION...]
divideDivides the first argument by the second one. Amplitude attempts to convert strings to integers and treats null values as zero.["divide"|"/", SOURCE_DESCRIPTION, SOURCE_DESCRIPTION]

JSON operator

OperatorDescriptionSyntax
N/AAs syntactic sugar, Amplitude converts an object to a "dict" LIST_OPERATOR*The following two descriptions are equivalent: {"key1": SOURCE_DESCRIPTION,"key2", SOURCE_DESCRIPTION,…} ["dict","key1", SOURCE_DESCRIPTION,"key2", SOURCE_DESCRIPTION,...]

User property operations

The converter supports the same user property operators as the Identify API. Refer to the Identify documentation for details.

Was this helpful?