Skip to content

allow_duplicate_genes not working #39

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JanKulbinski opened this issue Apr 12, 2021 · 7 comments
Closed

allow_duplicate_genes not working #39

JanKulbinski opened this issue Apr 12, 2021 · 7 comments
Labels
bug Something isn't working

Comments

@JanKulbinski
Copy link

JanKulbinski commented Apr 12, 2021

Hi!

I am trying to solve TSP with GA and it seems like allow_duplicate_genes is not working.

Reproduction:
TSP with 32 citites, each city is represented by number [0, ..., 31]

ga_instance = pygad.GA(num_generations=5,
                       num_parents_mating=2,
                       fitness_func=fitness,
                       init_range_low=0,
                       init_range_high=32,
                       num_genes=32,
                       gene_space=a = np.arange(0,32,1),
                       gene_type=int,
                       allow_duplicate_genes=False,
                       )

a = ga_instance.run()
solution, solution_fitness, solution_idx = ga_instance.best_solution()
print(f'{solution}')
solution.sort(axis=0)
print(solution)

It gives:
[25 15 20 1 30 1 19 13 29 10 28 3 24 12 12 5 0 26 26 6 7 2 23 16 20 18 8 11 18 3 17 26]
[ 0 1 1 2 3 3 5 6 7 8 10 11 12 12 13 15 16 17 18 18 19 20 20 23 24 25 26 26 26 28 29 30]

As you see numbers 1, 3, 12, 18, 20, 26 are duplicated

@ahmedfgad
Copy link
Owner

Hi,

Thanks for using PyGAD!

I have some comments on your code:
You set gene_space=a = np.arange(0,32,1) which is not valid. Where is the variable a? I wonder if that code is working.
The parameter sol_per_pop is missing. This one and the num_genes must exist as long as the initial_population parameter is not used.

I am using the latest version of PyGAD and I did not see any duplicates while allow_duplicate_genes=False. Note that I built a fitness function that returns random fitness values.

This is the code I tested where I find the difference between the following 2 sets:

  1. The set of unique values in the solution.
  2. The set of unique gene values (i.e. np.arange(0,32,1))

As long as you use 32 genes and the gene space has only 32 values, then it is expected that the difference between those 2 sets must be empty. This is what happens in my code. So, I think there is no issue with the allow_duplicate_genes parameter.

If my code does not reflect yours, please let me know.

import pygad
import numpy as np

def fitness(sol, idx):
    ss = set(np.unique(sol))
    
    r = set(np.arange(0,32,1)) - ss
    print(r)
    
    if len(r) > 0 :
        print("\n\nSomething is WRONG\n\n")
    
    return np.random.rand()

ga_instance = pygad.GA(num_generations=50,
                       num_parents_mating=2,
                       fitness_func=fitness,
                       init_range_low=0,
                       init_range_high=32,
                       sol_per_pop = 10,
                       num_genes=32,
                       gene_space=np.arange(0,32,1),
                       gene_type=int,
                       allow_duplicate_genes=False)

ga_instance.run()
solution, solution_fitness, solution_idx = ga_instance.best_solution()
# print(f'{solution}')
solution.sort(axis=0)
# print(solution)

ss = set(np.unique(solution))

r = set(np.arange(0,32,1)) - ss
print(r)

if len(r) > 0 :
    print("\n\nSomething is WRONG\n\n")

@JanKulbinski
Copy link
Author

JanKulbinski commented Apr 12, 2021

Yes, it is working. The cause was lack of a gene_space parameter . Thank you for the response and this amazing library

@KevinGalassi
Copy link

ga_instance = pygad.GA(num_generations = num_generations,
                       num_parents_mating = num_parents_mating,
                       sol_per_pop  = population_size,
                       fitness_func = fitness_function,  
                       num_genes = list_size,
                       gene_type = int,
                       gene_space = np.arange(0,list_size,1),
                       allow_duplicate_genes = False,
                       mutation_type = None,
                       on_start=on_start,
                       on_fitness=on_fitness,
                       on_parents=on_parents,
                       on_crossover=on_crossover,
                       on_mutation=on_mutation,
                       on_generation=on_generation,
                       on_stop=on_stop,
                       save_solutions = True)

ga_instance.run()

print('From this')
print(ga_instance.initial_population)

print('To this...')
print(ga_instance.population)

And i getting solutions with duplicated genes like:

[10 3 14 4 17 10 5 0 7 6 11 8 15 16 13 1 14 4 2 9]]

Mutation is not enabled, but i guess there is something i'm missing... Should allow_duplicate_genes also block duplicates after the mating?

Thank

@ahmedfgad
Copy link
Owner

ga_instance = pygad.GA(num_generations = num_generations,
                       num_parents_mating = num_parents_mating,
                       sol_per_pop  = population_size,
                       fitness_func = fitness_function,  
                       num_genes = list_size,
                       gene_type = int,
                       gene_space = np.arange(0,list_size,1),
                       allow_duplicate_genes = False,
                       mutation_type = None,
                       on_start=on_start,
                       on_fitness=on_fitness,
                       on_parents=on_parents,
                       on_crossover=on_crossover,
                       on_mutation=on_mutation,
                       on_generation=on_generation,
                       on_stop=on_stop,
                       save_solutions = True)

ga_instance.run()

print('From this')
print(ga_instance.initial_population)

print('To this...')
print(ga_instance.population)

And i getting solutions with duplicated genes like:

[10 3 14 4 17 10 5 0 7 6 11 8 15 16 13 1 14 4 2 9]]

Mutation is not enabled, but i guess there is something i'm missing... Should allow_duplicate_genes also block duplicates after the mating?

Thank

@KevinGalassi, allow_duplicate_genes works only after the mutation is applied. The reason is that even if there is a duplicate, then it can be solved using mutation because it can generate new values for a gene to solve the duplicate.

But for crossover, it only combines the genes from 2 solutions. Crossover is not meant to introduce new gene values by its own.

But I think it would be a good feature to support. A warning maybe used if mutation is disabled while allow_duplicate_genes=False.

@ahmedfgad ahmedfgad reopened this Apr 8, 2022
@KevinGalassi
Copy link

KevinGalassi commented Apr 9, 2022

My bad, when I looked at the wiki I haven't found this information explicitly declared. I avoided mutation because the possibility of multiple genes with the same value, but the same problem may arise with crossover too.

BTW I'm trying to solve a kind of 'Travelling Saleman Problem', guess I'll look online.

Thanks

ahmedfgad added a commit that referenced this issue Jul 8, 2022
## PyGAD 2.17.0

Release Date: 8 July 2022

1. An issue is solved when the `gene_space` parameter is given a fixed value. e.g. gene_space=[range(5), 4]. The second gene's value is static (4) which causes an exception.
2. Fixed the issue where the `allow_duplicate_genes` parameter did not work when mutation is disabled (i.e. `mutation_type=None`). This is by checking for duplicates after crossover directly. #39
3. Solve an issue in the `tournament_selection()` method as the indices of the selected parents were incorrect. #89
4. Reuse the fitness values of the previously explored solutions rather than recalculating them. This feature only works if `save_solutions=True`.
5. Parallel processing is supported. This is by the introduction of a new parameter named `parallel_processing` in the constructor of the `pygad.GA` class. Thanks to [@windowshopr](https://github.com/windowshopr) for opening the issue [#78](#78) at GitHub. Check the [Parallel Processing in PyGAD](https://pygad.readthedocs.io/en/latest/README_pygad_ReadTheDocs.html#parallel-processing-in-pygad) section for more information and examples.
ahmedfgad added a commit that referenced this issue Jul 8, 2022
PyGAD 2.17.0
Release Date: 8 July 2022

1. An issue is solved when the `gene_space` parameter is given a fixed value. e.g. gene_space=[range(5), 4]. The second gene's value is static (4) which causes an exception.
2. Fixed the issue where the `allow_duplicate_genes` parameter did not work when mutation is disabled (i.e. `mutation_type=None`). This is by checking for duplicates after crossover directly. #39
3. Solve an issue in the `tournament_selection()` method as the indices of the selected parents were incorrect. #89
4. Reuse the fitness values of the previously explored solutions rather than recalculating them. This feature only works if `save_solutions=True`.
5. Parallel processing is supported. This is by the introduction of a new parameter named `parallel_processing` in the constructor of the `pygad.GA` class. Thanks to [@windowshopr](https://github.com/windowshopr) for opening the issue [#78](#78) at GitHub. Check the [Parallel Processing in PyGAD](https://pygad.readthedocs.io/en/latest/README_pygad_ReadTheDocs.html#parallel-processing-in-pygad) section for more information and examples.
@gabrieldelpozo
Copy link

gabrieldelpozo commented Oct 18, 2022

I might be doing something wrong but allow_duplicate_genes=False is not working for me, even the best solutions for the fitness function I am using have duplicate genes.

For my case I am trying a fitness function that takes around 20 min, but here with a dummy fitness function also returns solutions with duplicated genes as the ones to be printed at the end:

def Genes_Trial(x, x_idx):
    rng_noise =  np.random.default_rng(678910)
    dummy_fit = rng_noise.random()*100
    x = np.sort(x)
    return dummy_fit


gene_space = np.arange(1,41,1)

ga_instance = pygad.GA(num_generations = 300,
                           num_parents_mating = 40,
                           sol_per_pop = 50,
                           num_genes = 6,
                           init_range_low = gene_space[0],
                           init_range_high = gene_space[-1],
                           gene_space = gene_space,
                           gene_type = int,
                           keep_elitism = 2,
                           mutation_probability = 0.025,
                           fitness_func = Genes_Trial,
                           save_solutions = False,
                           allow_duplicate_genes = False,
                           save_best_solutions = True,
                           random_seed=12345
                           )
ga_instance.run()

trial = ga_instance.solutions
trial = np.sort(trial)

unique_genes = []
for i_genes in range(trial.shape[0]):
    unique_genes.append(np.unique(trial[i_genes,:]))

for i_sol in range(len(unique_genes)):
    if len(unique_genes[i_sol])<n_sensors:print(np.array(ga_instance.solutions[i_sol]))

Initially I tried with adaptive mutation and thought that was the problem, then when mutation_type is left to defaults but the mutation_probability is set, there are duplicates. However, when mutation_probability is set to default, no duplicates are generated.

Then, I am not sure how to proceed since I am not sure mutation is happening at all when mutation_type and mutation_probability is set to default.

@ahmedfgad
Copy link
Owner

@gabrieldelpozo,

A new release will be pushed soon with a fix to this issue. It happens as crossover creates duplicate genes that, for sometimes , are not solved.

ahmedfgad added a commit that referenced this issue Feb 22, 2023
PyGAD 2.19.0 Release Notes
1. A new `summary()` method is supported to return a Keras-like summary of the PyGAD lifecycle.
2. A new optional parameter called `fitness_batch_size` is supported to calculate the fitness function in batches. If it is assigned the value `1` or `None` (default), then the normal flow is used where the fitness function is called for each individual solution. If the `fitness_batch_size` parameter is assigned a value satisfying this condition `1 < fitness_batch_size <= sol_per_pop`, then the solutions are grouped into batches of size `fitness_batch_size` and the fitness function is called once for each batch. In this case, the fitness function must return a list/tuple/numpy.ndarray with a length equal to the number of solutions passed. #136.
3. The `cloudpickle` library (https://github.com/cloudpipe/cloudpickle) is used instead of the `pickle` library to pickle the `pygad.GA` objects. This solves the issue of having to redefine the functions (e.g. fitness function). The `cloudpickle` library is added as a dependancy in the `requirements.txt` file. #159
4. Support of assigning methods to these parameters: `fitness_func`, `crossover_type`, `mutation_type`, `parent_selection_type`, `on_start`, `on_fitness`, `on_parents`, `on_crossover`, `on_mutation`, `on_generation`, and `on_stop`. #92 #138
5. Validating the output of the parent selection, crossover, and mutation functions.
6. The built-in parent selection operators return the parent's indices as a NumPy array.
7. The outputs of the parent selection, crossover, and mutation operators must be NumPy arrays.
8. Fix an issue when `allow_duplicate_genes=True`. #39
9. Fix an issue creating scatter plots of the solutions' fitness.
10. Sampling from a `set()` is no longer supported in Python 3.11. Instead, sampling happens from a `list()`. Thanks `Marco Brenna` for pointing to this issue.
11. The lifecycle is updated to reflect that the new population's fitness is calculated at the end of the lifecycle not at the beginning. #154 (comment)
12. There was an issue when `save_solutions=True` that causes the fitness function to be called for solutions already explored and have their fitness pre-calculated. #160
13. A new instance attribute named `last_generation_elitism_indices` added to hold the indices of the selected elitism. This attribute helps to re-use the fitness of the elitism instead of calling the fitness function.
14. Fewer calls to the `best_solution()` method which in turns saves some calls to the fitness function.
15. Some updates in the documentation to give more details about the `cal_pop_fitness()` method. #79 (comment)
@ahmedfgad ahmedfgad added the bug Something isn't working label Feb 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants