Skip to content

Brainstorm ways to shrink RPM metadata #399

@dralley

Description

@dralley

#395 and other recent PRs have brought up the topic of shrinking RPM metadata once again.

I'm not thrilled with such approaches (I can live with it, but it's yak-shaving over just a few percent)

Therefore I'd like to have a discussion about potentially more meaningful approaches.

This ancient wiki page basically suggests specifically excluding icons and documentation entries e.g. /usr/share/doc, /usr/share/icons from filelists.xml, given that they make up a huge proportion of the entries there, and in practice likely should never be used as dependencies.

The data is compelling (but from 2010, so recomputing it would be useful)

2.4 million files total in pkgs in rawhide
2.3 million of those are in /usr
1.8 million of those are /usr/share
Top 3 dirs by file count under /usr/share:
533046 /usr/share/doc
120555 /usr/share/javadoc
105591 /usr/share/icons
45 file-requires requiring something in /usr/share
none of those file-requires are in the top 3 /usr/share dirs
- most of them are fonts.

This 6 year old discussion brings up the same point:

AIUI @james-antill did some analysis versus Debian and he concluded that the "file dependencies" were a major part of the wire size. And yes holy cow, I just looked at a filelists.xml. I think my vote there would be to only do file entries for "entrypoints" like /usr/bin - there's really no sane scenario where an RPM package should Require: /usr/share/doc/GeographicLib-doc/html/C/annotated.html or whatever.

And makes a second suggestion also:

One idea I had is to "presolve" - a lot of this data is completely redundant dependencies. Take this chunk from the very first package I looked at, 0ad:

<rpm:requires>
  <rpm:entry name="libstdc++.so.6()(64bit)"/>
  <rpm:entry name="libstdc++.so.6(CXXABI_1.3)(64bit)"/>
  <rpm:entry name="libstdc++.so.6(CXXABI_1.3.5)(64bit)"/>
  <rpm:entry name="libstdc++.so.6(CXXABI_1.3.8)(64bit)"/>
  <rpm:entry name="libstdc++.so.6(CXXABI_1.3.9)(64bit)"/>
  <rpm:entry name="libstdc++.so.6(GLIBCXX_3.4)(64bit)"/>
  <rpm:entry name="libstdc++.so.6(GLIBCXX_3.4.11)(64bit)"/>
  <rpm:entry name="libstdc++.so.6(GLIBCXX_3.4.14)(64bit)"/>
  <rpm:entry name="libstdc++.so.6(GLIBCXX_3.4.15)(64bit)"/>
  <rpm:entry name="libstdc++.so.6(GLIBCXX_3.4.18)(64bit)"/>
  <rpm:entry name="libstdc++.so.6(GLIBCXX_3.4.19)(64bit)"/>
  <rpm:entry name="libstdc++.so.6(GLIBCXX_3.4.20)(64bit)"/>
  <rpm:entry name="libstdc++.so.6(GLIBCXX_3.4.21)(64bit)"/>
  <rpm:entry name="libstdc++.so.6(GLIBCXX_3.4.9)(64bit)"/>

...

But those are all provides of the libstdc++ package - and I don't think we're ever going to have different symbol versions provided by separate packages.

So doing a pass where we just drop redundant requires would probably make a notable difference.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions