If you enter the keyword 'beverage', it can be recognized normally . However, when it comes to prompts like 'The white beverage on the upper shelf' or 'White and transparent beverage', the recognition performance is not very good .
I further discovered that the model made incorrect judgments regarding adjectives such as "larger", "smaller", "left", "right", or "first layer" and "second layer". For example, it misidentified the yellow beverage on the left of the blue beverage as belonging to the first column.
Then I tried to label the drinks with numbers. When the prompt word is 'number', it can recognize all numbers normally, but it fails to recognize specific numbers like 123456
If you enter the keyword 'beverage', it can be recognized normally . However, when it comes to prompts like 'The white beverage on the upper shelf' or 'White and transparent beverage', the recognition performance is not very good .
I further discovered that the model made incorrect judgments regarding adjectives such as "larger", "smaller", "left", "right", or "first layer" and "second layer". For example, it misidentified the yellow beverage on the left of the blue beverage as belonging to the first column.
Then I tried to label the drinks with numbers. When the prompt word is 'number', it can recognize all numbers normally, but it fails to recognize specific numbers like 123456