-
Notifications
You must be signed in to change notification settings - Fork 933
Dedup imports_granularity = "Item"
#4737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,6 @@ | ||||||||
use crate::lexer; | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you add some test files that have multiple groups, and perhaps some that have comments too? Maybe something like this: use crate::lexer::{tokens::TokenData /* foo */ };
use crate::lexer::{tokens::TokenData}; It might also be useful to have some duplicates but with varying visibility levels (we may not handle this well with the other variants either FWIW, but would be good to check while we're working on this one) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I didn't add such test files as there are some issues, especially related to comments, so maybe it is better to discuss first (here or in a new issue) which of these issues should be resolved, whether there are cases where the original code should be used, what should be the expected results, etc. For example, the for the following input: use crate::foo;
use crate::foo::self;
use crate::foo::self /* foo 1 */;
use crate::foo::{self /* foo 2 */ };
use crate::foo::{tokens::TokenData};
use crate::foo::{tokens::TokenData /* foo 3 */ };
use crate::foo::{self, tokens::TokenData /* foo 4 */};
use crate::bar::self /* bar 1 */;
use crate::bar::self;
use crate::foobar::self; /* foobar 1 */
use crate::foobar::self; "Crate" granularity produce the following output: use crate::bar; /* bar 1 */;
use crate::foo; /* foo 1 */;
use crate::foobar; /* foobar 1 */
use crate::{
bar, foo,
foo::{self /* foo 2 */, tokens::TokenData},
foobar,
}; Comments "Module" granularity produce the following output: use crate::bar; /* bar 1 */;
use crate::foo; /* foo 1 */;
use crate::foo::tokens::TokenData;
use crate::foo::{self /* foo 2 */, self};
use crate::foobar; /* foobar 1 */
use crate::{bar, foo, foobar}; In addition to the same issues as for "Crate", "Item" granularity produce the following output: use crate::bar; /* bar 1 */;
use crate::foo;
use crate::foo::tokens::TokenData;
use crate::foo::{self /* foo 2 */};
use crate::foobar; /* foobar 1 */ Comment There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
wow there sure is 🙃 We've always had comment issues with this option (which was named use rustc_ast::ast /* abc */;
Good call 👍 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Submitted PR #4759 to fix the extra semicolon issue. I will also try to resolve the issue where in some cases if the same import module appears twice - once with a comment and once without a comment, it appears twice in the formatted output. E.g. in the above output for Update: rustfmt/src/formatting/imports.rs Lines 170 to 172 in a9876e8
This is probably a patch done because imports comments are not handled well, but it ignores duplicates. There is a relatively simple way to remove these duplicates by ignoring all non-commented flattened items that have a duplicate with comment. However, because duplicates with and without comments are not expected to be common in practice, and because no issue was opened by users against this problem, it seems that it is not worth trying to add this enhancement. |
||||||||
use crate::lexer::tokens::TokenData; | ||||||||
use crate::lexer::{tokens::TokenData}; | ||||||||
use crate::lexer::self; | ||||||||
use crate::lexer::{self}; | ||||||||
use crate::lexer::{self, tokens::TokenData}; |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
// rustfmt-imports_granularity: Item | ||
// rustfmt-reorder_imports: false | ||
// rustfmt-group_imports: StdExternalCrate | ||
|
||
use crate::lexer; | ||
use crate::lexer; | ||
use crate::lexer::tokens::TokenData; | ||
use crate::lexer::{tokens::TokenData}; | ||
use crate::lexer::self; | ||
use crate::lexer; | ||
use crate::lexer; | ||
use crate::lexer::{self}; | ||
use crate::lexer::{self, tokens::TokenData}; |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
// rustfmt-imports_granularity: Item | ||
|
||
use crate::lexer; | ||
use crate::lexer; | ||
use crate::lexer::tokens::TokenData; | ||
use crate::lexer::{tokens::TokenData}; | ||
use crate::lexer::self; | ||
use crate::lexer; | ||
use crate::lexer; | ||
use crate::lexer::{self}; | ||
use crate::lexer::{self, tokens::TokenData}; |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
use crate::lexer; | ||
use crate::lexer; | ||
use crate::lexer::tokens::TokenData; | ||
use crate::lexer::tokens::TokenData; | ||
use crate::lexer::{self}; | ||
use crate::lexer::{self, tokens::TokenData}; |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
// rustfmt-imports_granularity: Item | ||
// rustfmt-reorder_imports: false | ||
// rustfmt-group_imports: StdExternalCrate | ||
|
||
use crate::lexer; | ||
use crate::lexer::tokens::TokenData; | ||
use crate::lexer::{self}; |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
// rustfmt-imports_granularity: Item | ||
|
||
use crate::lexer; | ||
use crate::lexer::tokens::TokenData; | ||
use crate::lexer::{self}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would much rather implement
Hash
onUseTree
so that we can just useunique()
directly instead of taking the hit on callingformat
.We're really just dealing with the path so I think this would suffice (along with the default Hash derive on UseSegment):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I try to implement this, I would like to make sure fully understand the implementation. My main question is what is default Hash derive on
UseSegment
? How and were it is implemented?I am asking because my first approach was to implement
Hasher
for bothUseTree
andUseSegment
. This is by generating the list of the segments` text and then hashing the generated text. I ended up with something very similar to theUseTree
format
and realized that using theformat
is about the same from performance point of view.One more question for my information (assuming that the answer is short). I didn't fully understand how the the hash function works. Taking as an example the hash function you suggested above. First, it doesn't have a return value, so where is the hash value stored and how it is used? Second, through the iterations I assume that several numeric hash valuses are calculated for the non-List
UseSegments
. If this is the case, it is not clear to me if and how these hash values are "combined" to form the hash of the initialUseTree
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was referring to the Hash trait, and that we should able to leverage the derive attribute for UseSegment instead of having to manually implement the Hash trait ourselves. I'm not sure we'll be able to leverage
derive
on UseTree because that only works when all the fields themselves have a Hash impl, and I know that not all of the AST nodes do (feel free to check/test though!). Once there's a Hash impl for both UseTree and UseSegment then you can useunique()
directly, instead of formatting to a temporary String.I don't think there will be a need to implement the Hasher trait on any of our custom types to resolve this issue. My assumption here was that the itertools functions we were hoping to leverage utilized some hash-based data structure for the deduplication feature we needed, and that in turn would require that the items (UseTree in our case) implemented the Hash trait (side note, looks like they do leverage a hash set internally 😉).
This is indeed the case, as highlighted in the the itertools docs for the unique/unique_by functions. If you were to try to just use
unique()
on the current code, you'd presumably get errors complaining about unbound constraints, because our UseTree does not have a Hash implementation.Your initial proposal works because you're allocating an intermediate String representation for each UseTree (which we have to do again later for the actual emitting of the formatted code) and that satisfies the itertools constraints because String does have a Hash impl. We don't want to do this though because it is unnecessary, and we should simply ensure that we have a Hash impl.
Sorry, but that's a little too off topic for a code review within rustfmt 😄 Hopefully the above info helps somewhat, but otherwise I'd refer you to materials like the Book, the stdlib docs (https://doc.rust-lang.org/std/index.html), or even the rustc source if you really want to drill into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed response! The reference to stdlib also help as I realized I missed the
finish
method that emits the hash calculation result.I have now change
unique_by
tounique
using your suggested code.