Add hyperlinks and paths validation. (#699)

ZePan110 · web-flow · commit ccdd2d0d816b · 2024-09-19T15:58:37.000+08:00
* Add hyperlinks and paths validation.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Fix format issue.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Change runs-on

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Add hyperlinks and paths validation.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Fix format issue.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Change runs-on

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Change link head.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Fix issue.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Add output.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Change serch rules.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Change output and fix error

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* For test

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Fix error

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Fix error.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Fix error.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* test.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Fix issue and add output

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Fix issue and test

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Add PR's own detection.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* reduce output

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Remove debug code.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* test

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* test.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Compatible with the origin of PR.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Ignore links that require verification by a real person.
Restore test files.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Change the judgment method.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Add need ignore link.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Change runs-on.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

* Redefine output.

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;

---------

Signed-off-by: ZePan110 &lt;ze.pan@intel.com&gt;
diff --git a/.github/workflows/pr-dockerfile-path-scan.yaml b/.github/workflows/pr-dockerfile-path-scan.yaml
@@ -156,3 +156,124 @@ jobs:
             echo "Please modify the corresponding README in GenAIExamples repo and ask suyue.chen@intel.com for final confirmation."
             exit 1
           fi
+
+  check-the-validity-of-hyperlinks-in-README:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clean Up Working Directory
+        run: sudo rm -rf ${{github.workspace}}/*
+
+      - name: Checkout Repo GenAIComps
+        uses: actions/checkout@v4
+
+      - name: Check the Validity of Hyperlinks
+        # ignore_links=("https://platform.openai.com/docs/api-reference/fine-tuning"
+          #               "https://platform.openai.com/docs/api-reference/"
+          #               "https://openai.com/index/whisper/"
+          #               "https://platform.openai.com/docs/api-reference/chat/create")
+        run: |
+          cd ${{github.workspace}}
+          fail="FALSE"
+          url_lines=$(grep -Eo '\]\(http[s]?://[^)]+\)' --include='*.md' -r .)
+          if [ -n "$url_lines" ]; then
+            for url_line in $url_lines; do
+              url=$(echo "$url_line"|cut -d '(' -f2 | cut -d ')' -f1|sed 's/\.git$//')
+              path=$(echo "$url_line"|cut -d':' -f1 | cut -d'/' -f2-)
+              if [[ "https://platform.openai.com/docs/api-reference/fine-tuning" == "$url" || "https://platform.openai.com/docs/api-reference/" == "$url" || "https://openai.com/index/whisper/" == "$url" || "https://platform.openai.com/docs/api-reference/chat/create" == "$url" ]]; then
+                echo "Link "$url" from ${{github.workspace}}/$path need to be verified by a real person."
+              else
+                response=$(curl -L -s -o /dev/null -w "%{http_code}" "$url")
+                if [ "$response" -ne 200 ]; then
+                  echo "**********Validation failed, try again**********"
+                  response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url")
+                  if [ "$response_retry" -eq 200 ]; then
+                    echo "*****Retry successfully*****"
+                  else
+                    echo "Invalid link from ${{github.workspace}}/$path: $url"
+                    fail="TRUE"
+                  fi
+                fi
+              fi
+            done
+          fi
+
+          if [[ "$fail" == "TRUE" ]]; then
+            exit 1
+          else
+            echo "All hyperlinks are valid."
+          fi
+        shell: bash
+
+  check-the-validity-of-relative-path:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Clean up Working Directory
+        run: sudo rm -rf ${{github.workspace}}/*
+
+      - name: Checkout Repo GenAIComps
+        uses: actions/checkout@v4
+
+      - name: Checking Relative Path Validity
+        run: |
+          cd ${{github.workspace}}
+          fail="FALSE"
+          repo_name=${{ github.event.pull_request.head.repo.full_name }}
+          if [ "$(echo "$repo_name"|cut -d'/' -f1)" != "opea-project" ]; then
+            owner=$(echo "${{ github.event.pull_request.head.repo.full_name }}" |cut -d'/' -f1)
+            branch="https://github.com/$owner/GenAIComps/tree/${{ github.event.pull_request.head.ref }}"
+          else
+            branch="https://github.com/opea-project/GenAIComps/blob/${{ github.event.pull_request.head.ref }}"
+          fi
+          link_head="https://github.com/opea-project/GenAIComps/blob/main"
+          png_lines=$(grep -Eo '\]\([^)]+\)' --include='*.md' -r .|grep -Ev 'http')
+          if [ -n "$png_lines" ]; then
+            for png_line in $png_lines; do
+              refer_path=$(echo "$png_line"|cut -d':' -f1 | cut -d'/' -f2-)
+              png_path=$(echo "$png_line"|cut -d '(' -f2 | cut -d ')' -f1)
+              if [[ "${png_path:0:1}" == "/" ]]; then
+                check_path=${{github.workspace}}$png_path
+              elif [[ "${png_path:0:1}" == "#" ]]; then
+                check_path=${{github.workspace}}/$refer_path$png_path
+              else
+                check_path=${{github.workspace}}/$(dirname "$refer_path")/$png_path
+              fi
+              real_path=$(realpath $check_path)
+              if [ $? -ne 0 ]; then
+                echo "Path $png_path in file ${{github.workspace}}/$refer_path does not exist"
+                fail="TRUE"
+              else
+                url=$link_head$(echo "$real_path" | sed 's|.*/GenAIComps||')
+                response=$(curl -I -L -s -o /dev/null -w "%{http_code}" "$url")
+                if [ "$response" -ne 200 ]; then
+                  echo "**********Validation failed, try again**********"
+                  response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url")
+                  if [ "$response_retry" -eq 200 ]; then
+                    echo "*****Retry successfully*****"
+                  else
+                    echo "Retry failed. Check branch ${{ github.event.pull_request.head.ref }}"
+                    url_dev=$branch$(echo "$real_path" | sed 's|.*/GenAIComps||')
+                    response=$(curl -I -L -s -o /dev/null -w "%{http_code}" "$url_dev")
+                    if [ "$response" -ne 200 ]; then
+                      echo "**********Validation failed, try again**********"
+                      response_retry=$(curl -s -o /dev/null -w "%{http_code}" "$url_dev")
+                      if [ "$response_retry" -eq 200 ]; then
+                        echo "*****Retry successfully*****"
+                      else
+                        echo "Invalid path from ${{github.workspace}}/$refer_path: $png_path"
+                        fail="TRUE"
+                      fi
+                    else
+                      echo "Check branch ${{ github.event.pull_request.head.ref }} successfully."
+                    fi
+                  fi
+                fi
+              fi
+            done
+          fi
+
+          if [[ "$fail" == "TRUE" ]]; then
+            exit 1
+          else
+            echo "All hyperlinks are valid."
+          fi
+        shell: bash
diff --git a/comps/dataprep/README.md b/comps/dataprep/README.md
@@ -11,7 +11,7 @@ apt-get install libreoffice
 
 ## Use LVM (Large Vision Model) for Summarizing Image Data
 
-Occasionally unstructured data will contain image data, to convert the image data to the text data, LVM can be used to summarize the image. To leverage LVM, please refer to this [readme](../lvms/README.md) to start the LVM microservice first and then set the below environment variable, before starting any dataprep microservice.
+Occasionally unstructured data will contain image data, to convert the image data to the text data, LVM can be used to summarize the image. To leverage LVM, please refer to this [readme](../lvms/llava/README.md) to start the LVM microservice first and then set the below environment variable, before starting any dataprep microservice.
 
 ```bash
 export SUMMARIZE_IMAGE_VIA_LVM=1
diff --git a/comps/finetuning/README.md b/comps/finetuning/README.md
@@ -219,7 +219,7 @@ curl http://${your_ip}:8015/v1/finetune/list_checkpoints -X POST -H "Content-Typ
 
 ### 3.4 Leverage fine-tuned model
 
-After fine-tuning job is done, fine-tuned model can be chosen from listed checkpoints, then the fine-tuned model can be used in other microservices. For example, fine-tuned reranking model can be used in [reranks](../reranks/README.md) microservice by assign its path to the environment variable `RERANK_MODEL_ID`, fine-tuned embedding model can be used in [embeddings](../embeddings/README.md) microservice by assign its path to the environment variable `model`, LLMs after instruction tuning can be used in [llms](../llms/README.md) microservice by assign its path to the environment variable `your_hf_llm_model`.
+After fine-tuning job is done, fine-tuned model can be chosen from listed checkpoints, then the fine-tuned model can be used in other microservices. For example, fine-tuned reranking model can be used in [reranks](../reranks/fastrag/README.md) microservice by assign its path to the environment variable `RERANK_MODEL_ID`, fine-tuned embedding model can be used in [embeddings](../embeddings/README.md) microservice by assign its path to the environment variable `model`, LLMs after instruction tuning can be used in [llms](../llms/text-generation/README.md) microservice by assign its path to the environment variable `your_hf_llm_model`.
 
 ## 🚀4. Descriptions for Finetuning parameters
 
diff --git a/comps/guardrails/llama_guard/langchain/README.md b/comps/guardrails/llama_guard/langchain/README.md
@@ -51,7 +51,7 @@ curl 127.0.0.1:8088/generate \
 
 ### 1.4 Start Guardrails Service
 
-Optional: If you have deployed a Guardrails model with TGI Gaudi Service other than default model (i.e., `meta-llama/Meta-Llama-Guard-2-8B`) [from section 1.2](## 1.2 Start TGI Gaudi Service), you will need to add the eviornment variable `SAFETY_GUARD_MODEL_ID` containing the model id. For example, the following informs the Guardrails Service the deployed model used LlamaGuard2:
+Optional: If you have deployed a Guardrails model with TGI Gaudi Service other than default model (i.e., `meta-llama/Meta-Llama-Guard-2-8B`) [from section 1.2](#12-start-tgi-gaudi-service), you will need to add the eviornment variable `SAFETY_GUARD_MODEL_ID` containing the model id. For example, the following informs the Guardrails Service the deployed model used LlamaGuard2:
 
 ```bash
 export SAFETY_GUARD_MODEL_ID="meta-llama/Meta-Llama-Guard-2-8B"
diff --git a/comps/vectorstores/pathway/README.md b/comps/vectorstores/pathway/README.md
@@ -3,7 +3,7 @@
 Set the environment variables for Pathway, and the embedding model.
 
 > Note: If you are using `TEI_EMBEDDING_ENDPOINT`, make sure embedding service is already running.
-> See the instructions under [here](../../../retrievers/langchain/pathway/README.md)
+> See the instructions under [here](../../retrievers/pathway/langchain/README.md)
 
 ```bash
 export PATHWAY_HOST=0.0.0.0