-
Notifications
You must be signed in to change notification settings - Fork 4
EmbeddingsNormalization
A set of XUnit tests that demonstrate how unstructured data can be classified & normalized using embeddings models. This can be used to validate both inputs and outputs. That is, these demos show how user input can be classified, but it can also be used to constrain system output, perhaps coming from an LLM like GPT, into a set of known valid outputs.
Embeddings can be used to normalize inputs, so that free-text input can be limited to a specific set of results. In this example, we construct the foundation for a simple text-adventure, perhaps one using voice-to-text for input, that constrains the results to only known valid responses. Thus, if the input is "head east" instead of "go east", the system will identify that input properly as "go east".
Additionally, this code exposes the attempt at a prompt injection attack by identifying statements that are clearly not attempts at valid inputs using a threshold distance. Any statement beyond that somewhat arbitrary distance, are classified as "other", allowing the programmer to respond appropriately, perhaps with a "did you mean..." or "try again".
Similarly, embeddings can be used to normalize the input to a job/role survey, so that free-text input can be classified into one of just a few categories. Regarless of the phrasing of the user's response, if that response is within a threshold distance of one of the valid values, it is classified as that response. Outside of that threshold, is identified as "other".
In this example, phrases are classified as best falling into the categories of "Rock", "Paper", "Scisssors", "Lizard" or "Spock". All inputs in this example will be classified into one of those 5 groups, even if there are no good matches, or if it could fall into multiple categories.