Here's the story. I was working on tool that supposed to compare metadata files in repodata directory. For huge repositories, repodata files can have megabytes, even hundreds of megabytes (unpacked). At first I tried to use etree library, but my laptop run out of memory quite quickly - when you consider you have to open two copied of repodata to compare them. So I rewrote my tool to use SAX parsing and stored nodes as lazy, so tool read them only when they were needed. That helped for reading, however diff result had to be also stored somewhere. So I dig more into SAX library and found out sax xml generator. That's how I've discovered SAX generator is much faster then etree.
Then I had idea we could rewrite pulp metadata generator for updateinfo to be SAX instead of etree - because at that times we were trying to improve publish performance every in possible way. After I did that, comparison results showed up significant speed improvement.
I think we've never encountered memory issues due metadata publishing, or if we have, it was only very occasionally. That doesn't mean it can't happen and I think it's better to save memory for more useful stuff than greedy etree library - notwithstanding fact you will get better performance.
I checked
https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/distributors/yum/metadata/updateinfo.py
and
https://github.com/pulp/pulp_rpm/blob/master/plugins/pulp_rpm/plugins/distributors/yum/metadata/package.py
and there's still etree used. Of course primary, files and other metadata are composed by sticking individual pieces from db together, I don't think we could make any perf improvement there. But for comps and updateinfo, I think there's place for improvement.
I will provide you patch for comps and updateinfo sax generator, it's basically replaces whole package.py and updateinfo.py files and provides saxwriter library for that. It should be easy to test it.
Add SAX writer to generate XML without etree
re #1716 https://pulp.plan.io/issues/1716