Skip to content

Commit ccdd2d0

Browse files
authored
Add hyperlinks and paths validation. (#699)
* Add hyperlinks and paths validation. Signed-off-by: ZePan110 <[email protected]> * Fix format issue. Signed-off-by: ZePan110 <[email protected]> * Change runs-on Signed-off-by: ZePan110 <[email protected]> * Add hyperlinks and paths validation. Signed-off-by: ZePan110 <[email protected]> * Fix format issue. Signed-off-by: ZePan110 <[email protected]> * Change runs-on Signed-off-by: ZePan110 <[email protected]> * Change link head. Signed-off-by: ZePan110 <[email protected]> * Fix issue. Signed-off-by: ZePan110 <[email protected]> * Add output. Signed-off-by: ZePan110 <[email protected]> * Change serch rules. Signed-off-by: ZePan110 <[email protected]> * Change output and fix error Signed-off-by: ZePan110 <[email protected]> * For test Signed-off-by: ZePan110 <[email protected]> * Fix error Signed-off-by: ZePan110 <[email protected]> * Fix error. Signed-off-by: ZePan110 <[email protected]> * Fix error. Signed-off-by: ZePan110 <[email protected]> * test. Signed-off-by: ZePan110 <[email protected]> * Fix issue and add output Signed-off-by: ZePan110 <[email protected]> * Fix issue and test Signed-off-by: ZePan110 <[email protected]> * Add PR's own detection. Signed-off-by: ZePan110 <[email protected]> * reduce output Signed-off-by: ZePan110 <[email protected]> * Remove debug code. Signed-off-by: ZePan110 <[email protected]> * test Signed-off-by: ZePan110 <[email protected]> * test. Signed-off-by: ZePan110 <[email protected]> * Compatible with the origin of PR. Signed-off-by: ZePan110 <[email protected]> * Ignore links that require verification by a real person. Restore test files. Signed-off-by: ZePan110 <[email protected]> * Change the judgment method. Signed-off-by: ZePan110 <[email protected]> * Add need ignore link. Signed-off-by: ZePan110 <[email protected]> * Change runs-on. Signed-off-by: ZePan110 <[email protected]> * Redefine output. Signed-off-by: ZePan110 <[email protected]> --------- Signed-off-by: ZePan110 <[email protected]>
1 parent e29865e commit ccdd2d0

File tree

5 files changed

+125
-4
lines changed

5 files changed

+125
-4
lines changed

.github/workflows/pr-dockerfile-path-scan.yaml

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,3 +156,124 @@ jobs:
156156
echo "Please modify the corresponding README in GenAIExamples repo and ask [email protected] for final confirmation."
157157
exit 1
158158
fi
159+
160+
check-the-validity-of-hyperlinks-in-README:
161+
runs-on: ubuntu-latest
162+
steps:
163+
- name: Clean Up Working Directory
164+
run: sudo rm -rf ${{github.workspace}}/*
165+
166+
- name: Checkout Repo GenAIComps
167+
uses: actions/checkout@v4
168+
169+
- name: Check the Validity of Hyperlinks
170+
# ignore_links=("https://platform.openai.com/docs/api-reference/fine-tuning"
171+
# "https://platform.openai.com/docs/api-reference/"
172+
# "https://openai.com/index/whisper/"
173+
# "https://platform.openai.com/docs/api-reference/chat/create")
174+
run: |
175+
cd ${{github.workspace}}
176+
fail="FALSE"
177+
url_lines=$(grep -Eo '\]\(http[s]?://[^)]+\)' --include='*.md' -r .)
178+
if [ -n "$url_lines" ]; then
179+
for url_line in $url_lines; do
180+
url=$(echo "$url_line"|cut -d '(' -f2 | cut -d ')' -f1|sed 's/\.git$//')
181+
path=$(echo "$url_line"|cut -d':' -f1 | cut -d'/' -f2-)
182+
if [[ "https://platform.openai.com/docs/api-reference/fine-tuning" == "$url" || "https://platform.openai.com/docs/api-reference/" == "$url" || "https://openai.com/index/whisper/" == "$url" || "https://platform.openai.com/docs/api-reference/chat/create" == "$url" ]]; then
183+
echo "Link "$url" from ${{github.workspace}}/$path need to be verified by a real person."
184+
else
185+
response=$(curl -L -s -o /dev/null -w "%{http_code}" "$url")
186+
if [ "$response" -ne 200 ]; then
187+
echo "**********Validation failed, try again**********"
188+
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url")
189+
if [ "$response_retry" -eq 200 ]; then
190+
echo "*****Retry successfully*****"
191+
else
192+
echo "Invalid link from ${{github.workspace}}/$path: $url"
193+
fail="TRUE"
194+
fi
195+
fi
196+
fi
197+
done
198+
fi
199+
200+
if [[ "$fail" == "TRUE" ]]; then
201+
exit 1
202+
else
203+
echo "All hyperlinks are valid."
204+
fi
205+
shell: bash
206+
207+
check-the-validity-of-relative-path:
208+
runs-on: ubuntu-latest
209+
steps:
210+
- name: Clean up Working Directory
211+
run: sudo rm -rf ${{github.workspace}}/*
212+
213+
- name: Checkout Repo GenAIComps
214+
uses: actions/checkout@v4
215+
216+
- name: Checking Relative Path Validity
217+
run: |
218+
cd ${{github.workspace}}
219+
fail="FALSE"
220+
repo_name=${{ github.event.pull_request.head.repo.full_name }}
221+
if [ "$(echo "$repo_name"|cut -d'/' -f1)" != "opea-project" ]; then
222+
owner=$(echo "${{ github.event.pull_request.head.repo.full_name }}" |cut -d'/' -f1)
223+
branch="https://github.com/$owner/GenAIComps/tree/${{ github.event.pull_request.head.ref }}"
224+
else
225+
branch="https://github.com/opea-project/GenAIComps/blob/${{ github.event.pull_request.head.ref }}"
226+
fi
227+
link_head="https://github.com/opea-project/GenAIComps/blob/main"
228+
png_lines=$(grep -Eo '\]\([^)]+\)' --include='*.md' -r .|grep -Ev 'http')
229+
if [ -n "$png_lines" ]; then
230+
for png_line in $png_lines; do
231+
refer_path=$(echo "$png_line"|cut -d':' -f1 | cut -d'/' -f2-)
232+
png_path=$(echo "$png_line"|cut -d '(' -f2 | cut -d ')' -f1)
233+
if [[ "${png_path:0:1}" == "/" ]]; then
234+
check_path=${{github.workspace}}$png_path
235+
elif [[ "${png_path:0:1}" == "#" ]]; then
236+
check_path=${{github.workspace}}/$refer_path$png_path
237+
else
238+
check_path=${{github.workspace}}/$(dirname "$refer_path")/$png_path
239+
fi
240+
real_path=$(realpath $check_path)
241+
if [ $? -ne 0 ]; then
242+
echo "Path $png_path in file ${{github.workspace}}/$refer_path does not exist"
243+
fail="TRUE"
244+
else
245+
url=$link_head$(echo "$real_path" | sed 's|.*/GenAIComps||')
246+
response=$(curl -I -L -s -o /dev/null -w "%{http_code}" "$url")
247+
if [ "$response" -ne 200 ]; then
248+
echo "**********Validation failed, try again**********"
249+
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url")
250+
if [ "$response_retry" -eq 200 ]; then
251+
echo "*****Retry successfully*****"
252+
else
253+
echo "Retry failed. Check branch ${{ github.event.pull_request.head.ref }}"
254+
url_dev=$branch$(echo "$real_path" | sed 's|.*/GenAIComps||')
255+
response=$(curl -I -L -s -o /dev/null -w "%{http_code}" "$url_dev")
256+
if [ "$response" -ne 200 ]; then
257+
echo "**********Validation failed, try again**********"
258+
response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url_dev")
259+
if [ "$response_retry" -eq 200 ]; then
260+
echo "*****Retry successfully*****"
261+
else
262+
echo "Invalid path from ${{github.workspace}}/$refer_path: $png_path"
263+
fail="TRUE"
264+
fi
265+
else
266+
echo "Check branch ${{ github.event.pull_request.head.ref }} successfully."
267+
fi
268+
fi
269+
fi
270+
fi
271+
done
272+
fi
273+
274+
if [[ "$fail" == "TRUE" ]]; then
275+
exit 1
276+
else
277+
echo "All hyperlinks are valid."
278+
fi
279+
shell: bash

comps/dataprep/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ apt-get install libreoffice
1111

1212
## Use LVM (Large Vision Model) for Summarizing Image Data
1313

14-
Occasionally unstructured data will contain image data, to convert the image data to the text data, LVM can be used to summarize the image. To leverage LVM, please refer to this [readme](../lvms/README.md) to start the LVM microservice first and then set the below environment variable, before starting any dataprep microservice.
14+
Occasionally unstructured data will contain image data, to convert the image data to the text data, LVM can be used to summarize the image. To leverage LVM, please refer to this [readme](../lvms/llava/README.md) to start the LVM microservice first and then set the below environment variable, before starting any dataprep microservice.
1515

1616
```bash
1717
export SUMMARIZE_IMAGE_VIA_LVM=1

comps/finetuning/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -219,7 +219,7 @@ curl http://${your_ip}:8015/v1/finetune/list_checkpoints -X POST -H "Content-Typ
219219

220220
### 3.4 Leverage fine-tuned model
221221

222-
After fine-tuning job is done, fine-tuned model can be chosen from listed checkpoints, then the fine-tuned model can be used in other microservices. For example, fine-tuned reranking model can be used in [reranks](../reranks/README.md) microservice by assign its path to the environment variable `RERANK_MODEL_ID`, fine-tuned embedding model can be used in [embeddings](../embeddings/README.md) microservice by assign its path to the environment variable `model`, LLMs after instruction tuning can be used in [llms](../llms/README.md) microservice by assign its path to the environment variable `your_hf_llm_model`.
222+
After fine-tuning job is done, fine-tuned model can be chosen from listed checkpoints, then the fine-tuned model can be used in other microservices. For example, fine-tuned reranking model can be used in [reranks](../reranks/fastrag/README.md) microservice by assign its path to the environment variable `RERANK_MODEL_ID`, fine-tuned embedding model can be used in [embeddings](../embeddings/README.md) microservice by assign its path to the environment variable `model`, LLMs after instruction tuning can be used in [llms](../llms/text-generation/README.md) microservice by assign its path to the environment variable `your_hf_llm_model`.
223223

224224
## 🚀4. Descriptions for Finetuning parameters
225225

comps/guardrails/llama_guard/langchain/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ curl 127.0.0.1:8088/generate \
5151

5252
### 1.4 Start Guardrails Service
5353

54-
Optional: If you have deployed a Guardrails model with TGI Gaudi Service other than default model (i.e., `meta-llama/Meta-Llama-Guard-2-8B`) [from section 1.2](## 1.2 Start TGI Gaudi Service), you will need to add the eviornment variable `SAFETY_GUARD_MODEL_ID` containing the model id. For example, the following informs the Guardrails Service the deployed model used LlamaGuard2:
54+
Optional: If you have deployed a Guardrails model with TGI Gaudi Service other than default model (i.e., `meta-llama/Meta-Llama-Guard-2-8B`) [from section 1.2](#12-start-tgi-gaudi-service), you will need to add the eviornment variable `SAFETY_GUARD_MODEL_ID` containing the model id. For example, the following informs the Guardrails Service the deployed model used LlamaGuard2:
5555

5656
```bash
5757
export SAFETY_GUARD_MODEL_ID="meta-llama/Meta-Llama-Guard-2-8B"

comps/vectorstores/pathway/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Set the environment variables for Pathway, and the embedding model.
44

55
> Note: If you are using `TEI_EMBEDDING_ENDPOINT`, make sure embedding service is already running.
6-
> See the instructions under [here](../../../retrievers/langchain/pathway/README.md)
6+
> See the instructions under [here](../../retrievers/pathway/langchain/README.md)
77
88
```bash
99
export PATHWAY_HOST=0.0.0.0

0 commit comments

Comments
 (0)