Skip to content

DOC: Update title caps validation script to step through directories #55685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 47 additions & 9 deletions scripts/validate_rst_title_capitalization.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
from __future__ import annotations

import argparse
import os
import re
import sys
from typing import TYPE_CHECKING
Expand All @@ -31,6 +32,7 @@
"Excel",
"JSON",
"HTML",
"XML",
"SAS",
"SQL",
"BigQuery",
Expand Down Expand Up @@ -159,6 +161,10 @@
"Liveserve",
"I",
"VSCode",
"RangeIndex",
"SparseArray",
"SparseDtype",
"HTTP",
}

CAP_EXCEPTIONS_DICT = {word.lower(): word for word in CAPITALIZATION_EXCEPTIONS}
Expand Down Expand Up @@ -244,14 +250,40 @@ def find_titles(rst_file: str) -> Iterable[tuple[str, int]]:
previous_line = line_no_last_elem


def _collect_errors(filename: str) -> int:
"""
Helper method to collect the errors per file

Parameters
----------
filename : str
A file to validate, provided from the main method

Returns
-------
int
Number of incorrect headings found.
"""
errors: int = 0
for title, line_number in find_titles(filename):
if title != correct_title_capitalization(title):
print(
f"""{filename}:{line_number}:{err_msg} "{title}" to "{
correct_title_capitalization(title)}" """
)
errors += 1
return errors


def main(source_paths: list[str]) -> int:
"""
The main method to print all headings with incorrect capitalization.

Parameters
----------
source_paths : str
List of directories to validate, provided through command line arguments.
List of directories or files to validate,
provided through command line arguments.

Returns
-------
Expand All @@ -261,14 +293,20 @@ def main(source_paths: list[str]) -> int:

number_of_errors: int = 0

for filename in source_paths:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this implementation, why you don't simply implement a function that given source_paths yields all the files to check? I think it will be much clearer than what you're doing here. The only change you'd need in this function would be to replace this line by for filename in walk_paths(source_paths):.

Also, please use pathlib instead of os.path.

I don't understand why you skip the png files. If this is to validate rst files, shouldn't we ignore everything that is not rst?

Finally, do you know if this script is being called in the CI? Or why is it not? Are all titles correct after your changes? If that's the case, can you start running it, so we make sure all titles are correct at all times.

I don't think the CI errors are related to your changes, if you update your local branch and push, the CI may become green.

for title, line_number in find_titles(filename):
if title != correct_title_capitalization(title):
print(
f"""{filename}:{line_number}:{err_msg} "{title}" to "{
correct_title_capitalization(title)}" """
)
number_of_errors += 1
for path in source_paths:
# If `sourc_paths` is a dir, walk it to find the files
if os.path.isdir(path):
dirs = os.walk(path)
for dir in dirs:
files = dir[2]
for filename in files:
if not filename.endswith(".png"):
number_of_errors += _collect_errors(
os.path.join(dir[0], filename)
)
else:
for filename in source_paths:
number_of_errors += _collect_errors(filename)

return number_of_errors

Expand Down