-
Notifications
You must be signed in to change notification settings - Fork 5.9k
SD_API_Pictures: Character json tags processing, suffix processing and translations triggers #1034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Added suffixes to the SD side prompts, which can be a toggle for specific baked-in tags in the nsfw_prompts and anti_nsfw_prompts parameters. Added character self detection in prompt. On self detection, prompt change to ask character to describe its clothing and a marker is set to add positive_sd tag in character json to positive SD prompt and negative_sd tag in character json to negative SD prompt. If SD translations is toggled, extension will look into extensions/sd_api/pictures/translations.json and will add tags to SD side prompt if a string in the descriptive_word array matches a string in either the prompt sent to the text generation AI or the response from the text generation AI. This is useful, for instance, if you want the extension to recognize the word "tennis" to trigger a tennis focussed LORA in SD and add tags you would always want to be in a tennis-related image.
Added a file with sample words to SD tags translation
Fixed a bug that made the character focussed prompt be overwritten
|
Oh man, I went a completely different direction: #1038 |
|
@ClayShoaf It goes a long way for weak/simple models where having it describe the character thoroughly, its clothing/accessories, its environment and then its action is too much. Offloading the character description helps a lot, and it makes sure that the character won't forget part of its self description. It allows people to carefully craft their character's appearance in Auto1111, and then copy the prompt to their character sheet and reliably get a similar looking character. And it also allows for embedding specific LORAs for the character in the character sheet. |
|
LGTM Tested it out and it seems to be working correctly. The only thing I would mention is that the examples in I know that's a little nitpicky and people who make translations files can edit them to work as intended. Great work on this PR! |
|
@ClayShoaf Noted for the translations.json file, I'm mostly copying examples from better prompters than me for the sample translations, I'm not sure what are the best examples. I do have my own tuned translations for personal usage at this point, including some that load SD LORAs (the main use for translations for me) that work very well, but I wouldn't want to put these in source code (and the LORAs wouldn't load anyway if the user doesn't have them). Unless someone has a better example, I'll use yours. Your name detection logic in your PR just gave me the idea that it should also be possible to have the extension pull and add positive_sd and negative_sd from a second character sheet if that character is mentionned in the description or request. It would be a cool feature, but I'm not going to work on it soon because it's unlikely to give satisfactory results, as it would be subject to the standard pitfalls you often see with SD when two detailed characters are requested. What's particularly great with this field though is that we can expect that much of the work done now will get us better results later, as people hook it up with better and better models on the text gen side and on the Auto1111 side. |
|
Another point to note: I've been testing the extension and tuning my prompts with Alpaca-7b. I would be surprised if the character self description prompt ("Describe what you are currently wearing, your environment and yourself performing the following action: ") would work significantly worse on a better model, but I haven't yet managed to get anything better to run on my paltry RTX3070 yet, so it would be great if people could test it on more recent models like Vicuna and GPT 4 x Alpaca, and with more parameters, to check that it behaves satisfactorily with them too. |
Added @ClayShoaf 's suggestions to make the sample tags more general.
Adjusted the sample for the translations, to follow the suggestions from @ClayShoaf , and to add a LORA to the example. I prefer adding the LORA example to the readme than the translations.json file as it could cause issues if applied by users who don't have that LORA, or by users who don't edit the translations.json file.
|
I'm testing it a little more and I'm getting some little errors. I'll send a PR to your repo soon. |
I had issues using yaml.safe_load with tabs in my JSON character so I had it switch to json.loads when it detects a json extension.
If using the sd_api_pictures extension, these tags will be forwarded directly to the prompt for Auto1111's API if the extension detects the character is sending a picture of itself.
This mostly fixes a few bugs, namely that yaml/yml files were not loading when I was testing with the Example character. There were also error messages being kicked if there was no `positive_sd` or `negative_sd` in the character's json file. I also put some boilerplate nsfw params, since I couldn't see anywhere that `params['nsfw_prompts']` was being updated.
|
Alright, I think I did that right. I'm still not completely familiar with how github works. Apparently forking oobabooga and your repo at the same time is not possible. I need to spend the time to sit down and do a formal educational session on how to push, pull, and merge all of this stuff correctly. Anyways. Let me know what you think. |
Removed NSFW prompts from negative suffix creation; I made this update earlier in my repo as I found this would give counterintuitive results if a user added a LORA in the NSFW parameter (the LORA would be loaded even if it is in the negative prompt).
Fixed some bugs and added nsfw_params
|
@ClayShoaf Excellent, I had some weird results testing that I couldn't identify and couldn't rule out as just being a quirky model or messy context/history, and I wasn't set up to inspect the final prompt sent to the language model. In hindsight, part of the action being stripped out makes a lot of sense. I made a slight edit to the non-nsfw yes-characterfocus negative suffix to remove the nsfw tags; I had edited that out a bit earlier (maybe before you started to work on your changes) as 1) it's not in-line with the non-nsfw non-characterfocus result and 2) it can give counterintuitive results if a user puts a LORA in the nsfw_prompt (according to the Auto1111 documentation: "Lora cannot be added to the negative prompt.", so we should probably make sure we avoid users inadvertently doing it). |
Moved string evaluation outside of input_modifier, changed input_modifier so that characterfocus would not interfere with picturebook / adventure mode.
|
Moved string evaluation outside of input_modifier, as it was interfering with picturebook / adventure mode. For now I'm leaving characterfocus outside of picturebook / adventure mode, so the entire weight of describing itself for that mode is on the language model. Maybe in the future a toggle for that mode or string evaluation for that mode could allow it to trigger characterfocus. |
A little unrelated, but I've probably done over 100K SD generations at this point. From what I've seen, a lot of the "better prompters" are very good at making one specific thing with one specific model and setup. A lot of the stuff that is included in generations is superfluous. I've done a lot of XYZ grids testing out the different popular tags, and while they have some effect it's not necessarily "better" in most cases, it works more like an extra bit of random entropy. I am, by no means, the authoritative voice on the matter, but I have a much better understanding of how SD generation parameters/prompts work than I do, for example, the mechanics of git, haha. I look forward to having even an XY grid for oobabooga. I would try to write it myself, but I'm worried that by the time I have something presentable, someone else will have made one that is better and all my time will have been wasted.
I'm ashamed to say, I hadn't even tested picturebook mode. I don't have the code up right now, but it seems like something that could be handled with an EDIT: I see it now. I won't have time to test it until sometime tomorrow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small bug introduced with toggle_generation() (which renders force_pic and suppress_pic buttons non-operational), other comments are my personal preferences in naming and defaults.
Other than that, LGTM, great work!
Anyone else is willing to look at those changes?
|
Also cross-linking: #1038 (comment) . In part I asked to factor out the logic for Also, I really need to plan the separation of script.py into submodules as it's already too big to grasp on a glace. |
Renamed is_of_is_in to string_evaluation, removed force toggle generation off in mode 2 and mode 0, changed default nsfw string to nsfw, changed check for existence of tags in character sheet to look for negative tags in negative suffix, renamed character sheet tags sd_tags_positive and sd_tags_negative
Changed sd tags positive/negative
Moved out request string generation from the string evaluation
@Brawlence I like the idea of those changes, it would help make the experience more immersive than a character being completely submissive to every request for pictures. I'm not sure of the flow of the inputs from the main text-generation-webui so I'd be out of my depth writing that for now. Of course there's also many improvements we could write for a more immersive experience around input evaluation. For now I just separated out the generation of the request to textgen from the evaluation of the input, that should make that new change you were talking around easier as you can add a case that won't toggle generation but will toggle a trigger for special inspection on the next message for terms of agreement or disagreement. I've renamed "if_of_is_in" to a more helpful "string_evaluation". With regards to separation in submodules, it's getting to that point yes. I think input evaluation, UI, payload generation could all be separate. |
|
I think I'm done with improvements on this PR for now, unless it's bug fixes. If it's merged the community will probably come up with other ideas for improvements. For the couple of weeks I've been playing with it, this extension has been such a game changer for local models; hopefully these improvements will take it a bit further |
Would fail if character was set to None and the character was still asked to send a picture of itself.
If merged, character sheet is now best used to describe the look of the character.
Also fixed issues with picturebook mode and forcing generation not detecting translations
Hires options are made visible or invisible with toggle of HiRes
|
Fixed many bugs found by @altoiddealer and has latest improvement recommended by @Brawlence |
This reverts commit e21db99.
Fixed README.md changes to reflect the change of the "nsfw" options to "secondary prompt"
|
Alright, so that this PR stops growing I'll hold off until merged for further adjustments. |
|
Thanks for the roadmap @GuizzyQC, I'll try to test everything today. |
Will now only load translations file once per request instead of 3.
|
I think that this PR has valid changes that can improve the immersion of the SD extension, but it
I encourage you to create your own customized fork of the extension and submit it here for others to download: https://github.com/oobabooga/text-generation-webui-extensions |
|
Makes a lot of sense, I've moved it to its own repo and am submitting it |
@Brawlence Moved the character json and NSFW tag processing to a separate suffix processing function. Created a function that, upon matching specific words in text model prompt or text model response will add specific tags to the SD prompt. With a populated translations.json file, this can make for a much more seamless experience than adding tags manually to the prefix when something more complex and precise is requested from SD.
The new character description request to the text model I have in there is giving good results with Alpaca in getting the character to first state what it's wearing, then describe its environment and finally itself doing an action.