Skip to content

EmbeddingsNormalization

Barry Stahl edited this page Jan 14, 2024 · 4 revisions

Embeddings Normalization Demos

A set of XUnit tests that demonstrate how unstructured data can be classified & normalized using embeddings models. This can be used to validate both inputs and outputs. That is, these demos show how user input can be classified, but it can also be used to constrain system output, perhaps coming from an LLM like GPT, into a set of known valid outputs.

Normalizing Free-Text Input to a Text Adventure Game

Embeddings can be used to normalize inputs, so that free-text input can be limited to a specific set of results. In this example, we construct the foundation for a simple text-adventure, perhaps one using voice-to-text for input, that constrains the results to only known valid responses. Thus, if the input is "head east" instead of "go east", the system will identify that input properly as "go east".

Additionally, this code exposes the attempt at a prompt injection attack by identifying statements that are clearly not attempts at valid inputs using a threshold distance. Any statement beyond that somewhat arbitrary distance, are classified as "other", allowing the programmer to respond appropriately, perhaps with a "did you mean..." or "try again".

Classifying Free-Text Input to a Job Survey

Similarly, embeddings can be used to normalize the input to a job/role survey, so that free-text input can be classified into one of just a few categories. Regarless of the phrasing of the user's response, if that response is within a threshold distance of one of the valid values, it is classified as that response. Outside of that threshold, is identified as "other".

Classifying Inputs into Known Categories

In this example, phrases are classified as best falling into the categories of "Rock", "Paper", "Scisssors", "Lizard" or "Spock". All inputs in this example will be classified into one of those 5 groups, even if there are no good matches, or if it could fall into multiple categories.

Clone this wiki locally