Skip to content

Implement to_lower, to_upper, to_title and reverse for string_type #335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
awvwgk opened this issue Mar 12, 2021 · 8 comments · Fixed by #346
Closed

Implement to_lower, to_upper, to_title and reverse for string_type #335

awvwgk opened this issue Mar 12, 2021 · 8 comments · Fixed by #346
Assignees
Labels
easy Difficulty level is easy and good for starting into this project topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ...

Comments

@awvwgk
Copy link
Member

awvwgk commented Mar 12, 2021

The stdlib_string_type currently only implements the bare minimum functionality. To start extending the functionality the functions to_lower, to_upper, to_title and reverse implemented in stdlib_ascii should be implemented for stdlib_string_type as well.

@awvwgk awvwgk added topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ... easy Difficulty level is easy and good for starting into this project labels Mar 12, 2021
@aman-godara
Copy link
Member

aman-godara commented Mar 12, 2021

Hey! @awvwgk ,
Can I take this issue. I am working on strings project under GSoC 2021. So handling this will give me a good knowledge of the codebase as well.

@awvwgk
Copy link
Member Author

awvwgk commented Mar 12, 2021

@aman-godara Thanks, feel free to submit a patch for this issue. Let me know if you need any help, I recommend to start with forking the stdlib repo and compiling a local version to make yourself familiar with the general workflow.

The implementation itself boils down to adding interface blocks and using stdlib_ascii in the stdlib_string_type module to add the new functionality. Also have a look at the tests in src/tests to add unit tests for the new functions and the specification in doc/specs to document the new functions.

@awvwgk

This comment has been minimized.

@aman-godara
Copy link
Member

aman-godara commented Mar 13, 2021

Hey @awvwgk!,
I was going through the implementation of the reverse function in stdlib_ascii. The implementation has O(n) Space Complexity and since in fortran characters are mutable, can we implement the reverse function differently by taking advantage of this property? The only difference will be that the input itself will be modified.

[EDIT] Should I add a new function in stdlib_ascii which will swap the elements at two given input locations in input character.
swap(character_sequence, i, j). We can make use of this function in reverse function of stdlib_ascii module as well.

@awvwgk
Copy link
Member Author

awvwgk commented Mar 13, 2021

Maybe as an addition, but not as a substitute for the current reverse function.

interface
    pure function reverse(string) result(reverse_string)
        character(len=*), intent(in) :: string
        character(len=len(string)) :: reverse_string
    end function reverse
end interface

This interface allow usage on character variables, which are in principle mutable, but in this context intent(in) and therefore immutable and character literals and parameters, which are immutable. A inplace reverse with an interface like the one below wouldn't work for the latter kind of character values and is therefore more limited in functionality.

interface
    pure subroutine reverse_inplace(string)
        character(len=*), intent(inout) :: string
    end subroutine reverse_inplace
end interface

You can of course propose an additional in place reverse subroutine for the stdlib.

@wclodius2
Copy link
Contributor

Implementing to_title will require more than ASCII. Allowing more than just ASCII will require access to the Unicode character database, https://unicode.org/ucd/. This database will also be required for to_upper, to_lower, and reverse if more than ASCII is involved. This database consists of several tens of megabytes of files, http://www.unicode.org/Public/UCD/latest/, and including it in the Standard Library will be controversial, but requiring users to download and install it on their own will also be controversial. FWIW I have a couple of modules to process the more important files in the database.

@aman-godara
Copy link
Member

What naming convention should I follow in stdlib_string_type module's interface. I cannot use to_lower, to_upper, to_title or reverse as the names for the interfaces that I am creating, as these corresponds to the names of the functions imported from stdlib_ascii module?

module stdlib_string_type
use stdlib_ascii, only: to_lower, to_upper, to_title, reverse

interface to_lower1
        module procedure :: to_lower_string
    end interface to_lower1

This is how the interface looks at this moment.

@awvwgk
Copy link
Member Author

awvwgk commented Mar 14, 2021

You can add an interface block to stdlib_ascii as well, this allows you to use the same name in both modules:

interface to_lower
    module procedure :: to_lower
end interface to_lower

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy Difficulty level is easy and good for starting into this project topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ...
Projects
None yet
3 participants