Project

Profile

Help

Issue #1903

closed

RPM import traceback (non-utf-8 metadata slipping through)

Added by darkfader almost 8 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
1. Low
Version:
2.8.2
Platform Release:
2.12.2
OS:
CentOS 7
Triaged:
Yes
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Pulp 2
Sprint:
Sprint 15
Quarter:

Description

Hi,

I'm getting a "special" problem with a few RPMs.
Those RPMs have finnish alphabet characters in their author's names in the %description metadata.

`Authors:

Juha Yrj<F6>l<E4> <jyrjola@cc.hut.fi>
Antti Tapaninen <aet@cc.hut.fi>
Timo Ter<E4>s <timo.teras@iki.fi>
Olaf Kirch <okir@suse.de>

%files`

When you run them through iconv from latin to utf-8 it all magically clears up:

`iconv -f iso8859-1 -t utf-8 opensc.spec | grep -A6 Authors Authors:

Juha Yrjölä <jyrjola@cc.hut.fi>
Antti Tapaninen <aet@cc.hut.fi>
Timo Teräs <timo.teras@iki.fi>
Olaf Kirch <okir@suse.de>

%files`

So my impression is that the SPEC file has utf-8 content (yay) but is in fact stored as latin1 (NOES).

Trying to upload such a package of them triggers a traceback.

We try to mirror all packages in all versions we need, so a re-issue of the package doesn't really solve matters.
(Example to clarify this: Let's assume they're on one of the OS DVD's and we want to mirror them 1:1)

An example package is opensc-0.11.6-5.27.1.x86_64.rpm from SLES11SP2:
It can apparently be obtained via
http://mirror.mes.edu.cu/SLES_11_SP2/CD1/suse/x86_64/opensc-0.11.6-5.27.1.x86_64.rpm

md5: 0c463515b28998ac9966400e5d14588d

The traceback looks like this:
May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) unexpected error occurred importing uploaded file May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) Traceback (most recent call last): May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/importers/yum/upload.py", line 118, in upload May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) handlers[type_id](repo, type_id, unit_key, metadata, file_path, conduit, config) May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/importers/yum/upload.py", line 390, in _handle_package May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) unit.save_and_import_content(file_path) May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib/python2.7/site-packages/pulp/server/db/model/__init__.py", line 802, in save_and_import_content May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) self.save() May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib/python2.7/site-packages/mongoengine/document.py", line 324, in save May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) object_id = collection.save(doc, **write_concern) May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib64/python2.7/site-packages/pymongo/collection.py", line 2180, in save May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) check_keys, False, manipulate, write_concern) May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib64/python2.7/site-packages/pymongo/collection.py", line 709, in _update May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) codec_options=self.codec_options).copy() May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 216, in command May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) self._raise_connection_failure(error) May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 343, in _raise_connection_failure May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) raise error May 06 14:49:18 myhost pulp[27466]: pulp_rpm.plugins.importers.yum.upload:ERROR: (27466-26912) InvalidStringData: strings in documents must be valid UTF-8: 'OpenSC provides a set of libraries and utilities to access smart cards.\nIt mainly focuses on cards that support cryptographic operations. It\nfacilitates their use in security applications such as mail encryption,\nauthentication, and digital signature. OpenSC implements the PKCS#11\nAPI. Applications supporting this API, such as Mozilla Firefox and\nThunderbird, can use it. OpenSC implements the PKCS#15 standard and\naims to be compatible with every software that does so, too.\n\nBefore purchasing any cards, please read carefully documentation in\n/usr/share/doc/packages/opensc/wiki/index.html - only some cards are\nsupported. Not only card type matters, but also card version, card OS\nversion and preloaded applet. Only subset of possible operations may be\nsupported for your card. Card initialization may require third party\nproprietary software.\n\n\n\nAuthors:\n--------\n Juha Yrj\xf6l\xe4 <jyrjola@cc.hut.fi>\n Antti Tapaninen <aet@cc.hut.fi>\n Timo Ter\xe4s <timo.teras@iki.fi>\n Olaf Kirch <okir@suse.de>' May 06 14:49:18 myhost pulp[27466]: py.warnings:WARNING: (27466-26912) /usr/lib/python2.7/site-packages/mongoengine/document.py:367: DeprecationWarning: update is deprecated. Use replace_one, update_one or update_many instead. May 06 14:49:18 myhost pulp[27466]: py.warnings:WARNING: (27466-26912) upsert=upsert, **write_concern) May 06 14:49:18 myhost pulp[27466]: py.warnings:WARNING: (27466-26912)

I've been hinted to look at string_to_unicode(data) in pulp_rpm/yum_plugin/util.py which in my test runs all but two lines through the utf-8 path and the two with the 'umlauts' fail there and (supposedly) run through the iso-8859-1 path.

That seems to work but where later, when running self.save() it seems there are demons.

I'm attaching the extracted SPEC, too.

Since the issue occurs basically with some random RPMs I don't have influence on I'd love to help improve the input handling here so it generally survives them.


Files

opensc.spec (22.2 KB) opensc.spec darkfader, 05/09/2016 05:30 PM
1903.patch (1.13 KB) 1903.patch PoC patch that theoretically might fix it (untested) mhrivnak, 05/09/2016 06:16 PM

Related issues

Related to RPM Support - Issue #2622: Sync fails when non-ASCII characters are present in primary.xmlCLOSED - CURRENTRELEASEttereshcActions
Actions #1

Updated by darkfader almost 8 years ago

Attachment: the funny SPECfile!

Btw, for reasons we don't know yet a rather recent rpmlint on SLES12 doesn't alert on this, while the one on CentOS7 does.
We're trying to track that one down.

Actions #2

Updated by mhrivnak almost 8 years ago

During sync, there is some logic to run the XML snippet for the package through a function called "string_to_unicode", which ensures all the text can be saved in our DB. This defends against RPMs with non-standard encodings in metadata. However, no such defense is present in the upload workflow.

For this particular issue, running the string on this line: https://github.com/pulp/pulp_rpm/blob/pulp-rpm-2.8.2-1/plugins/pulp_rpm/plugins/importers/yum/upload.py#L542

... through the "string_to_unicode" function would likely resolve the problem. There may be other metadata fields we should consider running through that function.

Actions #3

Updated by darkfader almost 8 years ago

For the record:
https://pulp-rpm.readthedocs.io/en/latest/user-guide/troubleshooting.html refers to a similar case.

A possible solution seems to by "ftfy" from pip which apparently knows how to detect double-miscodings like this.
Handling mildly broken input in another way than a Traceback would be much desirable, too, since that causes other issues.
(i.e. people will need to exclude "this kind of traceback" from Pulp's monitoring :-)

Actions #4

Updated by darkfader almost 8 years ago

using string_to_unicode on this works.
I'll add a PR tomorrow.

Actions #5

Updated by mhrivnak almost 8 years ago

Actions #6

Updated by mhrivnak almost 8 years ago

Following up on today's triage discussion, it looks like we've been through this problem at least once before.

For this one, we determined that SUSE had removed the offending package, and we would document that pulp_rpm requires utf-8. We also determined that when createrepo faces such a package, it would fall back to decoding as latin1. (based on a quick test just now, createrepo_c seems to do the same)

https://pulp.plan.io/issues/490

Similar to createrepo's behavior, pulp's sync workflow will try to decode with utf-8, and if that fails, will fall back to latin1. That behavior originates here:

https://bugzilla.redhat.com/show_bug.cgi?id=911650

And was vastly improved here:

https://bugzilla.redhat.com/show_bug.cgi?id=923448
https://github.com/pulp/pulp_rpm/pull/157

Given all that, I think the easiest resolution is to call this a duplicate of #490. It looks like the same exact problem, and we could happily stick with our previous decision on it.

That said, I'm not sure which is the better user experience. Consider a user trying to manage RPMs they got from somewhere else, which is most of our users. Is there harm in decoding as latin1, even though we know it's not the right encoding? From that user's standpoint, having a few incorrect characters in pulp's "description" field is probably more useful than not having the RPM in pulp at all. Are there negative consequences to that besides a few weird characters in the description? That's the behavior we have right now during sync, so if we decide to be strict, we should probably adjust the sync as well.

However, if we decide to have a policy that is more strict than other tools that work with the same data, such as createrepo*, we need to think carefully about why and explain it well.

Actions #7

Updated by rbarlow almost 8 years ago

The attached patch is dangerous. The only reason it works (it doesn't actually work, it just hides the error message) is that latin-1 uses the full 8 bits, and so no codepoint can raise an error since they are all valid (though not correct). The most populous continent of the world does not (and can not) use latin-1. RPMs encoded from those languages will result in garbage data throughout the RPM. If Pulp's fields are not valuable to the point that we are willing to fill them with garbage data intentionally, then why not just remove the fields?

The sensible approach is to raise an error message for RPMs that do not conform to the standard (UTF-8, which includes ASCII). This allows the user to take appropriate action empowered with a correct error message. Rather than walking away believing that Pulp is garbage and it ate their data, they can instead solve the problem or bring the matter to the vendor of the RPM so they can solve it.

Actions #8

Updated by darkfader almost 8 years ago

I'll currently traveling but will write something about 'sensible' as soon
as I can.
Basically, You can pick between not correct or useless to the main purpose.

Florian
Am 16.05.2016 3:26 nachm. schrieb "Pulp" <>:

Actions #9

Updated by rbarlow almost 8 years ago

Another option is to allow the RPM into Pulp, but set any non-UTF-8 data to None. This way users can still have the package in Pulp, and we don't have garbage data either. Missing data is better then incorrect data, and we can still give warning/error messages to the user about why the data is null.

Actions #10

Updated by mhrivnak almost 8 years ago

  • Priority changed from Normal to Low
  • Severity changed from 2. Medium to 1. Low
  • Triaged changed from No to Yes
Actions #13

Updated by darkfader over 7 years ago

Hi,

I kinda forgot about this although probably we lost the functionality again with the last update.
Anyway, the very summarized feedback:
If Pulp removes invalid (wrongly encoded) data, that is "meh" but better than breaking.
Ideally, it should leave a message in that case.

Besides that:
Pulp iirc was made to manage repositories of RPM packages. Not to maintain consistency of a low-criticality metadata field. As per that, yes, let it do anything to manage the RPM repo management thing.
Not uploading an RPM that is manageable by RPM, yum, zypper and createrepo is failing the main job.
If we need to run rpmlint and put that in our release process, we can (and likely DO) that. It's just not related to the most basic use case here.
And breaking that to ensure there's no bad encoding on the field is illogical.

Actions #14

Updated by mhrivnak over 7 years ago

  • Sprint Candidate changed from No to Yes
Actions #16

Updated by semyers over 7 years ago

  • Groomed changed from No to Yes

darkfader wrote:

Pulp iirc was made to manage repositories of RPM packages. Not to maintain consistency of a low-criticality metadata field.

After discussing this some more, we generally agree. I've opened up a separate (non-blocking) task[0] to document our conclusions, which are basically that "If createrepo_c can make a repo with the RPM, Pulp should be able to sync it". createrepo_c tries to use utf-8, and if it fails it munges the data by converting to latin1 just like createrepo_c would.

Bonus points to whoever implements this for logging when Pulp encounters this case that the incoming data encoding was invalid so Pulp punted to using latin1.

[0]: https://pulp.plan.io/issues/2480

Actions #17

Updated by bmbouter over 7 years ago

  • Project changed from Pulp to RPM Support

Moving to the RPM tracker.

Actions #18

Updated by bmbouter over 7 years ago

  • Status changed from NEW to POST
Actions #19

Updated by ipanova@redhat.com over 7 years ago

  • Sprint/Milestone set to 31
Actions #20

Updated by bmbouter over 7 years ago

After learning more about unicode, I have a new idea on how to fix this better than falling back to latin-1. tl;dr: we should decode to utf-8 using the 'replace' option. data.decode('utf-8', 'replace').

Consider this example, where we have Pulp try to decode invalid utf-8 encoded data. Note the '\x9a' is not a valid utf-8 encoded unicode code point.

[bmbouter@localhost devel]$ python
Python 2.7.12 (default, Sep 29 2016, 13:30:34) 
[GCC 6.2.1 20160916 (Red Hat 6.2.1-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> utf8_snowman = u"\u2603".encode('utf-8')
>>> print utf8_snowman

>>> my_utf_8_bytes =  utf8_snowman + '\x9a'
>>> my_utf_8_bytes.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 3: invalid start byte
>>> my_utf_8_bytes.decode('latin-1')
u'\xe2\x98\x83\x9a'
>>> print(my_utf_8_bytes.decode('latin-1'))
â

You can see the snowman is nowhere to be found even though the snowman symbol was not corrupted. This demonstrates how falling back to latin-1 could cause the entire string to become garbled even though there is a single corrupted symbol.

Consider now using the 'replace' option instead of falling back to 'latin-1'.

[bmbouter@localhost devel]$ python
Python 2.7.12 (default, Sep 29 2016, 13:30:34) 
[GCC 6.2.1 20160916 (Red Hat 6.2.1-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> utf8_snowman = u"\u2603".encode('utf-8')
>>> print utf8_snowman

>>> my_utf_8_bytes =  utf8_snowman + '\x9a'
>>> my_utf_8_bytes.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x9a in position 3: invalid start byte
>>> my_utf_8_bytes.decode('utf-8', 'replace')
u'\u2603\ufffd'
>>> print my_utf_8_bytes.decode('utf-8', 'replace')
☃�
>>> 

Observe that the snowman is still preserved! Notice also that the corrupted character is replaced by the unicode codepoint representing an unknown symbol.

To accomplish this, we should apply a patch that is roughly:

diff --git a/plugins/pulp_rpm/yum_plugin/util.py b/plugins/pulp_rpm/yum_plugin/util.py
index 2a6acff..1b44007 100644
--- a/plugins/pulp_rpm/yum_plugin/util.py
+++ b/plugins/pulp_rpm/yum_plugin/util.py
@@ -115,12 +115,7 @@ def string_to_unicode(data):
     :return: data as a unicode object
     :rtype:  unicode
     """
-    for code in ENCODING_LIST:
-        try:
-            return data.decode(code)
-        except UnicodeError:
-            # try others
-            continue
+    return data.decode('utf-8', 'replace')


 LISTING_FILE_NAME = 'listing'
Actions #21

Updated by dkliban@redhat.com over 7 years ago

+1 to using the replace.

Actions #22

Updated by mhrivnak over 7 years ago

That sounds perfect.

Actions #23

Updated by dkliban@redhat.com about 7 years ago

  • Sprint/Milestone changed from 31 to 32
Actions #25

Updated by mhrivnak about 7 years ago

  • Sprint/Milestone changed from 32 to 33
Actions #26

Updated by mhrivnak about 7 years ago

  • Status changed from POST to NEW
  • Priority changed from Low to Normal

I'm putting this back at NEW since I think we need a wholly different patch, and the PR appears to be abandoned by its author.

Actions #27

Updated by jortel@redhat.com about 7 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to jortel@redhat.com
Actions #28

Updated by jortel@redhat.com about 7 years ago

Nice leg work on this bmbouter.

Looks like there are (2) identical string_to_unicode() functions in the RPM package. Any objection to getting rid of both and changing the code to use decode('utf-8', 'replace') directly? According to documentation, the replace error policy will prevent UnicodeError from being raised. I don't see any value provided by the util functions. One consideration is that getting rid of the functions could break 3rd party plugins using this code. Thoughts?

[0] https://github.com/pulp/pulp_rpm/blob/2.12-dev/plugins/pulp_rpm/yum_plugin/util.py#L104
[1] https://github.com/pulp/pulp_rpm/blob/2.12-dev/plugins/pulp_rpm/plugins/importers/yum/parse/rpm.py#L80

Actions #29

Updated by mhrivnak about 7 years ago

I think that's a great plan. I don't know of any 3rd-party plugins that import code from this plugin, and we certainly don't make any guarantees at this point about using a plugin as a library.

Actions #30

Updated by jortel@redhat.com about 7 years ago

  • Status changed from ASSIGNED to POST
Actions #31

Updated by bmbouter about 7 years ago

@jortel, +1 to your proposed plan.

Added by jortel@redhat.com about 7 years ago

Revision a4ccac97 | View on GitHub

Fix non-utf8 when found in uploaded RPMs. closes #1903

Actions #32

Updated by jortel@redhat.com about 7 years ago

  • Status changed from POST to MODIFIED
Actions #33

Updated by semyers about 7 years ago

  • Platform Release set to 2.13.0
Actions #34

Updated by semyers about 7 years ago

  • Platform Release changed from 2.13.0 to 2.12.2

woops, should've been 2.12.2

Actions #36

Updated by semyers about 7 years ago

  • Status changed from MODIFIED to 5
Actions #37

Updated by Ichimonji10 about 7 years ago

  • Status changed from 5 to ASSIGNED

Uploading the RPM listed in the original bug description causes a traceback. Here's a script demonstrating the issue:

wget 'http://mirror.mes.edu.cu/SLES_11_SP2/CD1/suse/x86_64/opensc-0.11.6-5.27.1.x86_64.rpm'
pulp-admin login -u admin
pulp-admin rpm repo create --repo-id foo
pulp-admin rpm repo uploads rpm --repo-id foo --file opensc-0.11.6-5.27.1.x86_64.rpm

Sample output from last step:

[root@fedora-24-pulp-2-12 ~]# pulp-admin rpm repo uploads rpm --repo-id foo --file opensc-0.11.6-5.27.1.x86_64.rpm
+----------------------------------------------------------------------+
                              Unit Upload
+----------------------------------------------------------------------+

Extracting necessary metadata for each request...
[==================================================] 100%
Analyzing: opensc-0.11.6-5.27.1.x86_64.rpm
... completed

Creating upload requests on the server...
[==================================================] 100%
Initializing: opensc-0.11.6-5.27.1.x86_64.rpm
... completed

Starting upload of selected units. If this process is stopped through ctrl+c,
the uploads will be paused and may be resumed later using the resume command or
canceled entirely using the cancel command.

Uploading: opensc-0.11.6-5.27.1.x86_64.rpm
[==================================================] 100%
503870/503870 bytes
... completed

Importing into the repository...
This command may be exited via ctrl+c without affecting the request.

[\]
Running...

Task Failed

The importer yum_importer indicated a failed response when uploading rpm unit to
repository foo.

Deleting the upload request...
... completed

Output from journalctl:

Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472) unexpected error occurred importing uploaded file: strings in documents must be valid UTF-8: 'OpenSC provides a set of libraries and utilities to access smart cards.\nIt mainly focuses on cards that support cryptographic operations. It\nfacilitates their use in security applications such as mail encryption,\nauthentication, and digital signature. OpenSC implements the PKCS#11\nAPI. Applications supporting this API, such as Mozilla Firefox and\nThunderbird, can use it. OpenSC implements the PKCS#15 standard and\naims to be compatible with every software that does so, too.\n\nBefore purchasing any cards, please read carefully documentation in\n/usr/share/doc/packages/opensc/wiki/index.html - only some cards are\nsupported. Not only card type matters, but also card version, card OS\nversion and preloaded applet. Only subset of possible operations may be\nsupported for your card. Card initialization may require third party\nproprietary software.\n\n\n\nAuthors:\n--------\n    Juha Yrj\xf6l\xe4 <jyrjola@cc.hut.fi>\n    Antti Tapaninen <aet@cc.hut.fi>\n    Timo Ter\xe4s <timo.teras@iki.fi>\n    Olaf Kirch <okir@suse.de>'
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472) Traceback (most recent call last):
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)   File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/importers/yum/upload.py", line 118, in upload
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)     handlers[type_id](repo, type_id, unit_key, metadata, file_path, conduit, config)
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)   File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/importers/yum/upload.py", line 434, in _handle_package
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)     unit.save_and_import_content(file_path)
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)   File "/usr/lib/python2.7/site-packages/pulp/server/db/model/__init__.py", line 906, in save_and_import_content
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)     self.save()
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)   File "/usr/lib/python2.7/site-packages/mongoengine/document.py", line 324, in save
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)     object_id = collection.save(doc, **write_concern)
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)   File "/usr/lib64/python2.7/site-packages/pymongo/collection.py", line 2185, in save
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)     check_keys, False, manipulate, write_concern)
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)   File "/usr/lib64/python2.7/site-packages/pymongo/collection.py", line 709, in _update
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)     codec_options=self.codec_options).copy()
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)   File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 214, in command
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)     self._raise_connection_failure(error)
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)   File "/usr/lib64/python2.7/site-packages/pymongo/pool.py", line 342, in _raise_connection_failure
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472)     raise error
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp_rpm.plugins.importers.yum.upload:ERROR: (1228-25472) InvalidStringData: strings in documents must be valid UTF-8: 'OpenSC provides a set of libraries and utilities to access smart cards.\nIt mainly focuses on cards that support cryptographic operations. It\nfacilitates their use in security applications such as mail encryption,\nauthentication, and digital signature. OpenSC implements the PKCS#11\nAPI. Applications supporting this API, such as Mozilla Firefox and\nThunderbird, can use it. OpenSC implements the PKCS#15 standard and\naims to be compatible with every software that does so, too.\n\nBefore purchasing any cards, please read carefully documentation in\n/usr/share/doc/packages/opensc/wiki/index.html - only some cards are\nsupported. Not only card type matters, but also card version, card OS\nversion and preloaded applet. Only subset of possible operations may be\nsupported for your card. Card initialization may require third party\nproprietary software.\n\n\n\nAuthors:\n--------\n    Juha Yrj\xf6l\xe4 <jyrjola@cc.hut.fi>\n    Antti Tapaninen <aet@cc.hut.fi>\n    Timo Ter\xe4s <timo.teras@iki.fi>\n    Olaf Kirch <okir@suse.de>'
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp.server.managers.content.upload:ERROR: (1228-25472) Error from the importer while importing uploaded unit to repository [foo]
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp.server.managers.content.upload:ERROR: (1228-25472) Traceback (most recent call last):
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp.server.managers.content.upload:ERROR: (1228-25472)   File "/usr/lib/python2.7/site-packages/pulp/server/managers/content/upload.py", line 223, in import_uploaded_unit
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp.server.managers.content.upload:ERROR: (1228-25472)     unit_type=unit_type_id, summary=result['summary'], details=result['details']
Mar 13 16:35:10 fedora-24-pulp-2-12 pulp[1228]: pulp.server.managers.content.upload:ERROR: (1228-25472) PulpCodedException: The importer yum_importer indicated a failed response when uploading rpm unit to repository foo.

I observe this error on at least Fedora 24 and Fedora 25. Here's the packages installed on one of my test systems:

[root@fedora-24-pulp-2-12 ~]# rpm -qa | grep -i pulp | sort
pulp-admin-client-2.12.2-0.1.beta.fc24.noarch
pulp-docker-admin-extensions-2.3.0-1.fc24.noarch
pulp-docker-plugins-2.3.0-1.fc24.noarch
pulp-ostree-admin-extensions-1.2.1-0.1.beta.fc24.noarch
pulp-ostree-plugins-1.2.1-0.1.beta.fc24.noarch
pulp-puppet-admin-extensions-2.12.2-0.1.beta.fc24.noarch
pulp-puppet-plugins-2.12.2-0.1.beta.fc24.noarch
pulp-python-admin-extensions-1.1.3-1.fc24.noarch
pulp-python-plugins-1.1.3-1.fc24.noarch
pulp-rpm-admin-extensions-2.12.2-0.1.beta.fc24.noarch
pulp-rpm-plugins-2.12.2-0.1.beta.fc24.noarch
pulp-selinux-2.12.2-0.1.beta.fc24.noarch
pulp-server-2.12.2-0.1.beta.fc24.noarch
python-kombu-3.0.33-6.pulp.fc24.noarch
python-pulp-bindings-2.12.2-0.1.beta.fc24.noarch
python-pulp-client-lib-2.12.2-0.1.beta.fc24.noarch
python-pulp-common-2.12.2-0.1.beta.fc24.noarch
python-pulp-docker-common-2.3.0-1.fc24.noarch
python-pulp-oid_validation-2.12.2-0.1.beta.fc24.noarch
python-pulp-ostree-common-1.2.1-0.1.beta.fc24.noarch
python-pulp-puppet-common-2.12.2-0.1.beta.fc24.noarch
python-pulp-python-common-1.1.3-1.fc24.noarch
python-pulp-repoauth-2.12.2-0.1.beta.fc24.noarch
python-pulp-rpm-common-2.12.2-0.1.beta.fc24.noarch
python-pulp-streamer-2.12.2-0.1.beta.fc24.noarch
Actions #39

Updated by semyers about 7 years ago

We had a good team chat about the "Verification Required" flag on Monday, and decided that the release of 2.12.2 should not be blocked on the verification of this issue.

Added by ulif about 7 years ago

Revision f1102a99 | View on GitHub

Fix non-utf8 when found in uploaded RPMs (really).

When uploading RPM packages with non-utf8 metadata, the upload was aborted. A fix already applied apparently did not fix this completely.

Ensures that metadata from uploaded packages can be encoded to utf-8. Where no such encoding is possible, replacement chars are inserted.

fixes #1903 https://pulp.plan.io/issues/1903

Actions #40

Updated by jortel@redhat.com about 7 years ago

Community member ulif has indicated they will be submitting a PR.

Actions #41

Updated by bmbouter about 7 years ago

  • Status changed from ASSIGNED to POST
  • Platform Release deleted (2.12.2)

PR from ulif available at: https://github.com/pulp/pulp_rpm/pull/1040/files

I'm unsetting the platform release since we aren't sure if this will be included in the 2.12.2 release or not since it is being cut today. This should not release

Actions #42

Updated by jortel@redhat.com about 7 years ago

The original patch only fixed encoding issues in the primary XML fragment but the opensc-0.11.6-5.27.1.x86_64.rpm has invalid utf8 in the changelog which is found in both the others and filelists. PR https://github.com/pulp/pulp_rpm/pull/1040 encodes the entire dictionary.

Actions #43

Updated by ulif about 7 years ago

  • Status changed from POST to MODIFIED
Actions #44

Updated by bizhang about 7 years ago

  • Platform Release set to 2.12.2
Actions #45

Updated by bizhang about 7 years ago

  • Status changed from MODIFIED to 5
Actions #46

Updated by bizhang about 7 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #48

Updated by ttereshc over 6 years ago

  • Related to Issue #2622: Sync fails when non-ASCII characters are present in primary.xml added
Actions #49

Updated by bmbouter about 6 years ago

  • Sprint set to Sprint 15
Actions #50

Updated by bmbouter about 6 years ago

  • Sprint/Milestone deleted (33)
Actions #51

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF