Issue #1903
closedRPM import traceback (non-utf-8 metadata slipping through)
Description
Hi,
I'm getting a "special" problem with a few RPMs.
Those RPMs have finnish alphabet characters in their author's names in the %description metadata.
`Authors:¶
Juha Yrj<F6>l<E4> <jyrjola@cc.hut.fi>
Antti Tapaninen <aet@cc.hut.fi>
Timo Ter<E4>s <timo.teras@iki.fi>
Olaf Kirch <okir@suse.de>
%files`
When you run them through iconv from latin to utf-8 it all magically clears up:
`iconv -f iso8859-1 -t utf-8 opensc.spec | grep -A6 Authors Authors:
Juha Yrjölä <jyrjola@cc.hut.fi>
Antti Tapaninen <aet@cc.hut.fi>
Timo Teräs <timo.teras@iki.fi>
Olaf Kirch <okir@suse.de>
%files`
So my impression is that the SPEC file has utf-8 content (yay) but is in fact stored as latin1 (NOES).
Trying to upload such a package of them triggers a traceback.
We try to mirror all packages in all versions we need, so a re-issue of the package doesn't really solve matters.
(Example to clarify this: Let's assume they're on one of the OS DVD's and we want to mirror them 1:1)
An example package is opensc-0.11.6-5.27.1.x86_64.rpm from SLES11SP2:
It can apparently be obtained via
http://mirror.mes.edu.cu/SLES_11_SP2/CD1/suse/x86_64/opensc-0.11.6-5.27.1.x86_64.rpm
md5: 0c463515b28998ac9966400e5d14588d
The traceback looks like this:
May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) unexpected error occurred importing uploaded file May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) Traceback (most recent call last): May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/importers/yum/upload.py", line 118, in upload May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) handlers[type_id](repo, type_id, unit_key, metadata, file_path, conduit, config) May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/importers/yum/upload.py", line 390, in _handle_package May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) unit.save_and_import_content(file_path) May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib/python2.7/site-packages/pulp/server/db/model/__init__.py", line 802, in save_and_import_content May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) self.save() May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib/python2.7/site-packages/mongoengine/document.py", line 324, in save May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) object_id = collection.save(doc, **write_concern) May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib64/python2.7/site-packages/pymongo/collection.py", line 2180, in save May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) check_keys, False, manipulate, write_concern) May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib64/python2.7/site-packages/pymongo/collection.py", line 709, in _update May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) codec_options=self.codec_options).copy() May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 216, in command May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) self._raise_connection_failure(error) May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 343, in _raise_connection_failure May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) raise error May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) InvalidStringData: strings in documents must be valid UTF-8: 'OpenSC provides a set of libraries and utilities to access smart cards.\nIt mainly focuses on cards that support cryptographic operations. It\nfacilitates their use in security applications such as mail encryption,\nauthentication, and digital signature. OpenSC implements the PKCS#11\nAPI. Applications supporting this API, such as Mozilla Firefox and\nThunderbird, can use it. OpenSC implements the PKCS#15 standard and\naims to be compatible with every software that does so, too.\n\nBefore purchasing any cards, please read carefully documentation in\n/usr/share/doc/packages/opensc/wiki/index.html - only some cards are\nsupported. Not only card type matters, but also card version, card OS\nversion and preloaded applet. Only subset of possible operations may be\nsupported for your card. Card initialization may require third party\nproprietary software.\n\n\n\nAuthors:\n--------\n Juha Yrj\xf6l\xe4 <jyrjola@cc.hut.fi>\n Antti Tapaninen <aet@cc.hut.fi>\n Timo Ter\xe4s <timo.teras@iki.fi>\n Olaf Kirch <okir@suse.de>' May 06 14:49:18 myhost pulp[27466]: py.warnings:WARNING: (27466-26912) /usr/lib/python2.7/site-packages/mongoengine/document.py:367: DeprecationWarning: update is deprecated. Use replace_one, update_one or update_many instead. May 06 14:49:18 myhost pulp[27466]: py.warnings:WARNING: (27466-26912) upsert=upsert, **write_concern) May 06 14:49:18 myhost pulp[27466]: py.warnings:WARNING: (27466-26912)
I've been hinted to look at string_to_unicode(data) in pulp_rpm/yum_plugin/util.py which in my test runs all but two lines through the utf-8 path and the two with the 'umlauts' fail there and (supposedly) run through the iso-8859-1 path.
That seems to work but where later, when running self.save() it seems there are demons.
I'm attaching the extracted SPEC, too.
Since the issue occurs basically with some random RPMs I don't have influence on I'd love to help improve the input handling here so it generally survives them.
Files
Related issues
Fix non-utf8 when found in uploaded RPMs. closes #1903