-
Notifications
You must be signed in to change notification settings - Fork 185
Unique values in a 1-D array and their locations #670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I wonder why you say that this code is inefficient. I tried to think of
different implementations, but the code as written seems to be O(N**2),
with sorted data you might get down to O(N*ln(N)), but that would presume
the data can be sorted (in that case, you would need more functionality
than just an equality operation). Just wondering if you have a particular
algorithm in mind.
Op za 30 jul. 2022 om 16:57 schreef Beliavsky ***@***.***>:
… Given a 1-D array of character variables, integers, or a user-defined type
such as dates, if there are repeated values one may want to create factors
<https://www.stat.berkeley.edu/~spector/s133/factors.html> (the R term)
that are integer variables corresponding to the unique values. Below is a
small code that does this inefficiently for an array of characters. A
library subroutine that works for any data type for which an equality
operator is defined would be useful.
module factors_mod
implicit none
private
public :: compress
contains
pure subroutine compress(words,factors,values)
character (len=*) , intent(in) :: words(:) ! (n)
integer , intent(out) :: factors(:) ! (n) integer values corresponding to words(:)
character (len=len(words)), allocatable, intent(out) :: values(:) ! unique values of words(:)
integer :: i,n,nfac,imatch
n = size(words)
if (size(factors) /= n) error stop "in compress, need size(words) == size(factors)"
if (n < 1) then
allocate (values(0))
return
end if
allocate (values(n))
values(1) = words(1)
factors(1) = 1
nfac = 1
do i=2,n
imatch = findloc(values(:nfac),words(i),dim=1)
if (imatch == 0) then
nfac = nfac + 1
factors(i) = nfac
values(nfac) = words(i)
else
factors(i) = imatch
end if
end do
values = values(:nfac)
end subroutine compress
end module factors_mod
program xfactors
use factors_mod, only: compress
implicit none
integer, parameter :: n = 5, nlen = 1
character (len=nlen) :: words(n)
character (len=nlen), allocatable :: values(:)
integer :: factors(n)
words = ["a","c","a","b","c"]
call compress(words,factors,values)
print "(a,*(1x,a))", "data:",words
print "(a,*(1x,i0))", "factors:",factors
print "(a,*(1x,a))", "values:",values
end program xfactors
! output:
! data: a c a b c
! factors: 1 2 1 3 2
! values: a c b
—
Reply to this email directly, view it on GitHub
<#670>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAN6YRY7BRKCUS5X7S4HYZLVWU7GTANCNFSM55DLAWRQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Thank you @Beliavsky for this proposition.
This can be easiliy done with My current main questions concern the module (name) in which such a procedure should go, and the API of this procedure |
If stdlib gets a function to return the unique values in an array, it would be nice to have some set operations. For arrays |
Given a 1-D array of character variables, integers, or a user-defined type such as dates, if there are repeated values one may want to create factors (the R term) that are integer variables corresponding to the unique values. Below is a small code that does this inefficiently for an array of character variables. A library subroutine that works for any data type for which an equality operator is defined would be useful.
The text was updated successfully, but these errors were encountered: