gsoc2024-frame-blending

Overview

This Google Summer of Code (GSoC) 2024 project "Frame Blending by LLMs" is contributed by Zhongheng Cheng with Red Hen Lab.

My personal progress blog can be found here

Installation

Follow the instructions to setup environment.

Modules to load on CWRU HPC:

Python/3.11.3
PyTorch/2.1.2-foss-2023a-CUDA-12.1.1
PyYAML/6.0-GCCcore-12.3.0

# Cloning the repository
git clone https://github.com/Zhongheng-Cheng/gsoc2024-frame-blending
cd gsoc2024-frame-blending

# [Optional] Creating virtual environment
python -m venv venv
source venv/bin/activate

# Download dependencies
pip install -r requirements.txt

# Setup Huggingface API key
touch .env
# Enter your Huggingface API key in ".env" like this:
# HUGGINGFACE_API_KEY="..."

Frame Blender

Introduction

This is a terminal application built for CWRU HPC for generating frame blending examples. Users can enter multiple frames, select their hierarchically related frames, and generate frame blending results with options including zero/one/few-shot, CoT and rhetorical devices.

Prerequisites

Basic installation mentioned in Installation.
Setup FrameNet dataset.
- Copy the FrameNet dataset folder frame/ to the work directory.
- Create JSON-format FrameNet dataset folder frame_json/ using FrameNet XML Parser.
```
python framenet_xml_parser.py
```
Request an interactive job on the GPU node of CWRU HPC.

Usage

When you are initializing your temp workspace for the first time or when you find any error related to the temp workspace, run this command.

This script will create a workspace in /scratch/users/<caseID>. Note that this directory is not permanent, you'd better store any file you want to keep in your gallina home.

./reset_workspace.sh --user=<caseID>

Use this script to request a cpu/gpu node.

You'll need to use only this script to start the working environment when workspace is setup, so without going through the setup all over again.

./request_node.sh --user=<caseID> --node=cpu

Use this command to start Frame Blender

When on a GPU node, ignore the debug flag to activate the llm to respond

./frame_blender --encoding=ascii --debug=True   # (on a CPU node)
./frame_blender --encoding=ascii                # (on a GPU node)

Use Esc key to exit the Frame Blender

Use exit command to exit cpu/gpu node

exit

Use this script to update the data/ folder to zxc808’s gallina home

./update_gallina_home.sh --user=<caseID>

Key Bindings

When in Frame Blender interface:

Esc: Quit
Tab: Move to next window
+/-: Add/Remove input window
In Settings window:
- Arrow keys: Switch settings and change setting
In Input window:
- Enter characters in Input windows and get search result automatically in Hierarchy window when available
- Enter: Enter Hierarchy window
- Backspace: Cancel confirmed frame
In Hierarchy window:
- \: Quit Hierarchy window
- Arrow keys: Switch different frames/frame relations
- Enter: Confirm word
When needed frames are all confirmed:
- /: Start generating result
In Result Window / Evaluation Window:
- \: Quit Result/Evaluation window
- Tab: Switch to Evaluation window
- Arrow keys/Text input: Enter evaluation
- Enter: Submit evaluation form (stored to /data/evaluation.json)

Demonstration Video

Please check out this link for a demonstration video:

https://zhongheng-cheng.github.io/2024/08/16/Week-12.html#guidance-to-run-frame-blender-on-cwru-hpc

Frame Hierarchy Analyzer

Introduction

This section of the project focuses on analyzing linguistic frame hierarchies. It involves constructing a tree-like structure to represent frame relations, searching within this structure, and performing other relevant analyses.

Usage

from frame_hierarchy_analyzer import analyze_hierarchy, save_hierarchy_to_file

# Example of building a hierarchy
frames = ['Event', 'Action', ...]
frame_relation = 'Inheritance'
reverse_order = False # False: In direction of "Is Inherited by"; True: In direction of "Inherits from"
root = analyze_hierarchy(frames, frame_relation, reverse_order) # Returns the root node of the tree hierarchy

# Finding a specific frame node
node = root.find('Event')

# Print the visualized hierarchy of any node with its subnodes
print(node)

# Counts the total number of nodes in the subtree including this node
total_number = node.count()

# Get the list of immediate child nodes of this node
children = node.children()

# Saving the hierarchy to a file
save_hierarchy_to_file(root, 'output_hierarchy.txt')

FrameNet XML Parser

Introduction

This code transforms the original FrameNet data in XML format to JSON format, leaving out unimportant information for frame analysis, such as frame ID and created data. Mainly developed by Rohan. Minor modifications are made to accommodate the FrameNet data input in JSON format for Frame Hierarchy Analyzer.

Usage

from framenet_xml_parser import parse

# Example of parsing a directory of .xml files
xml_folder_path = "frame"
json_folder_path = "frame_json"
parse(xml_folder_path, json_folder_path)

RAG for Llama2 (Huggingface)

Introduction

This code utilizes Llama2-7b-chat with Huggingface API, and achieves Retrieval Augmented Generation (RAG) leveraging Llama-index. Specifically, a JSON parser is used to read all the JSON-format FrameNet frame data, and create a query engine with vector store index for querying.

When using get_query_engine(), the index created upon reading data files would be saved to ./query_engine.index/, and be automatically loaded when getting query engine next time. To avoid saving index data locally, you can specify save_index=False as a parameter for get_query_engine().

Usage

from rag import get_query_engine, generate_response

prompt = "..."
query_engine = get_query_engine()
response = generate_response(query_engine, prompt)

Llama2 (Meta)

Referring to Meta - 5 Steps to Getting Started with Llama 2

Create a virtual environment

python -m venv venv
source venv/bin/activate   # enter the virtual environment

Download dependencies

pip install -r requirements.txt

Download the model

Request download access to Llama 2 here

git clone https://github.com/facebookresearch/llama
cd llama
./download.sh # requires the pre-signed URL from Meta License

Convert the model weights to run with Hugging Face

# in the llama/ directory

# create a link to the tokenizer
ln -h ./tokenizer.model ./llama-2-7b-chat/tokenizer.model

# convert to hugging face format
TRANSFORM=`python -c "import transformers;print('/'.join(transformers.__file__.split('/')[:-1])+'/models/llama/convert_llama_weights_to_hf.py')"`
pip install protobuf && python $TRANSFORM --input_dir ./llama-2-7b-chat --model_size 7B --output_dir ./llama-2-7b-chat-hf

Write Python scripts and run the model

python main.py

Future Work

Enhance the generation performance, letting the LLM really "blends" the frames rather than "mixes/uses" the frames.

Improve the model. Newer and more powerful models (e.g. Llama 3) and more parameters(e.g. Llama2-13b, llama2-70b).
Training. With the frame blender workflow and evaluation data might be collected in the future, feed these generation result along with their evaluation data back into the model to train the LLM to understand how to construct better frame blending examples.

Name		Name	Last commit message	Last commit date
Latest commit History 182 Commits
shell_scripts		shell_scripts
.gitignore		.gitignore
README.md		README.md
evaluation_data_service.py		evaluation_data_service.py
evaluation_schema.json		evaluation_schema.json
frame_blender		frame_blender
frame_hierarchy_analyzer.py		frame_hierarchy_analyzer.py
frame_hierarchy_examiner.py		frame_hierarchy_examiner.py
framenet_xml_parser.py		framenet_xml_parser.py
llama2_meta.py		llama2_meta.py
models.py		models.py
prompts.py		prompts.py
rag.py		rag.py
requirements.txt		requirements.txt
t_evaluation_data_service.py		t_evaluation_data_service.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gsoc2024-frame-blending

Overview

Table of Contents

Installation

Frame Blender

Introduction

Prerequisites

Usage

Key Bindings

Demonstration Video

Frame Hierarchy Analyzer

Introduction

Usage

FrameNet XML Parser

Introduction

Usage

RAG for Llama2 (Huggingface)

Introduction

Usage

Llama2 (Meta)

Future Work

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Zhongheng-Cheng/gsoc2024-frame-blending

Folders and files

Latest commit

History

Repository files navigation

gsoc2024-frame-blending

Overview

Table of Contents

Installation

Frame Blender

Introduction

Prerequisites

Usage

Key Bindings

Demonstration Video

Frame Hierarchy Analyzer

Introduction

Usage

FrameNet XML Parser

Introduction

Usage

RAG for Llama2 (Huggingface)

Introduction

Usage

Llama2 (Meta)

Future Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages