Project

Profile

Help

Issue #2620

closed

All RPM repo searches are broken

Added by Ichimonji10 almost 8 years ago. Updated over 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
High
Assignee:
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
3. High
Version:
Platform Release:
2.12.2
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Sprint 16
Quarter:

Description

Let's say that one has created an RPM repository with an ID of "foo". One can search for content units in that RPM repository by making an HTTP POST request to /pulp/api/v2/repositories/foo/search/units/ with a body of at least {"criteria": {}}. (A more typical body is {"criteria": {"type_ids": ["rpm"]}}.) Unfortunately, executing such a search causes Pulp to return an HTTP 500. The following is logged:

Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: Unhandled Exception
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856) 'utf8' codec can't decode byte 0x9c in position 1: invalid start byte
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856) Traceback (most recent call last):
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)   File "/usr/lib/python2.7/site-packages/django/core/handlers/base.py", line 112, in get_response
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)     response = wrapped_callback(request, *callback_args, **callback_kwargs)
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)   File "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 69, in view
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)     return self.dispatch(request, *args, **kwargs)
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)   File "/usr/lib/python2.7/site-packages/django/views/generic/base.py", line 87, in dispatch
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)     return handler(request, *args, **kwargs)
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)   File "/usr/lib/python2.7/site-packages/pulp/server/webservices/views/decorators.py", line 241, in _auth_decorator
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)     return _verify_auth(self, operation, super_user_only, method, *args, **kwargs)
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)   File "/usr/lib/python2.7/site-packages/pulp/server/webservices/views/decorators.py", line 195, in _verify_auth
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)     value = method(self, *args, **kwargs)
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)   File "/usr/lib/python2.7/site-packages/pulp/server/webservices/views/util.py", line 130, in wrapper
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)     return func(*args, **kwargs)
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)   File "/usr/lib/python2.7/site-packages/pulp/server/webservices/views/search.py", line 127, in post
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)     return self._generate_response(query, options, *args, **kwargs)
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)   File "/usr/lib/python2.7/site-packages/pulp/server/webservices/views/repositories.py", line 294, in _generate_response
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)     return generate_json_response_with_pulp_encoder(units)
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)   File "/usr/lib/python2.7/site-packages/pulp/server/webservices/views/util.py", line 52, in generate_json_response
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)     json_obj = json.dumps(content, default=default)
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)   File "/usr/lib64/python2.7/json/__init__.py", line 250, in dumps
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)     sort_keys=sort_keys, **kw).encode(obj)
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)   File "/usr/lib64/python2.7/json/encoder.py", line 207, in encode
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)     chunks = self.iterencode(o, _one_shot=True)
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)   File "/usr/lib64/python2.7/json/encoder.py", line 270, in iterencode
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856)     return _iterencode(o, 0)
Mar 06 11:33:11 rhel-7-3-pulp-2-12 pulp[3309]: pulp.server.webservices.middleware.exception:ERROR: (3309-85856) UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c in position 1: invalid start byte

A similar error is logged for all RPM repository searches. Triggering this failure is as easy as executing the following script:

pulp-admin rpm repo create --repo-id foo --feed https://repos.fedorapeople.org/pulp/pulp/fixtures/rpm-unsigned/
pulp-admin rpm repo sync run --repo-id foo
pulp-admin rpm repo content rpm --repo-id foo

The final command spits out an error:

$ pulp-admin rpm repo content rpm --repo-id foo                                                            
An internal error occurred on the Pulp server:

RequestException: POST request
on /pulp/api/v2/repositories/foo/search/units/ failed with 500 - 'utf8' codec
can't decode byte 0x9c in position 1: invalid start byte

This error is present for Pulp 2.12 and 2.13 nightlies on all supported platforms. The error is reproducible on both Jenkins and on my personal systems. Here's the packages installed on one of my systems:

[root@rhel-7-3-pulp-2-12 ~]# rpm -qa | grep -i pulp | sort
pulp-admin-client-2.12.2-0.1.alpha.git.17.8ca5cf2.el7.noarch
pulp-docker-admin-extensions-2.3.1-0.1.alpha.git.5.052c506.el7.noarch
pulp-docker-plugins-2.3.1-0.1.alpha.git.5.052c506.el7.noarch
pulp-ostree-admin-extensions-1.2.1-0.1.alpha.git.27.7f9a84b.el7.noarch
pulp-ostree-plugins-1.2.1-0.1.alpha.git.27.7f9a84b.el7.noarch
pulp-puppet-admin-extensions-2.12.2-0.1.alpha.git.2.f338f5d.el7.noarch
pulp-puppet-plugins-2.12.2-0.1.alpha.git.2.f338f5d.el7.noarch
pulp-python-admin-extensions-2.0.1-0.1.alpha.git.6.8c46f3f.el7.noarch
pulp-python-plugins-2.0.1-0.1.alpha.git.6.8c46f3f.el7.noarch
pulp-rpm-admin-extensions-2.12.2-0.1.alpha.git.19.da51b5f.el7.noarch
pulp-rpm-plugins-2.12.2-0.1.alpha.git.19.da51b5f.el7.noarch
pulp-selinux-2.12.2-0.1.alpha.git.17.8ca5cf2.el7.noarch
pulp-server-2.12.2-0.1.alpha.git.17.8ca5cf2.el7.noarch
python-isodate-0.5.0-4.pulp.el7.noarch
python-kombu-3.0.33-6.pulp.el7.noarch
python-pulp-bindings-2.12.2-0.1.alpha.git.17.8ca5cf2.el7.noarch
python-pulp-client-lib-2.12.2-0.1.alpha.git.17.8ca5cf2.el7.noarch
python-pulp-common-2.12.2-0.1.alpha.git.17.8ca5cf2.el7.noarch
python-pulp-docker-common-2.3.1-0.1.alpha.git.5.052c506.el7.noarch
python-pulp-oid_validation-2.12.2-0.1.alpha.git.17.8ca5cf2.el7.noarch
python-pulp-ostree-common-1.2.1-0.1.alpha.git.27.7f9a84b.el7.noarch
python-pulp-puppet-common-2.12.2-0.1.alpha.git.2.f338f5d.el7.noarch
python-pulp-python-common-2.0.1-0.1.alpha.git.6.8c46f3f.el7.noarch
python-pulp-repoauth-2.12.2-0.1.alpha.git.17.8ca5cf2.el7.noarch
python-pulp-rpm-common-2.12.2-0.1.alpha.git.19.da51b5f.el7.noarch
python-pulp-streamer-2.12.2-0.1.alpha.git.17.8ca5cf2.el7.noarch

Related issues

Has duplicate Pulp - Issue #2753: Cannot search rpm content from the CLICLOSED - DUPLICATEActions
Actions #1

Updated by Ichimonji10 almost 8 years ago

  • Description updated (diff)
Actions #2

Updated by Ichimonji10 almost 8 years ago

  • Description updated (diff)
Actions #3

Updated by Ichimonji10 almost 8 years ago

  • Description updated (diff)
Actions #4

Updated by semyers almost 8 years ago

Ichimonji10 wrote:

This error is present for Pulp 2.12 and 2.13 on all supported platforms. ...

Do you know if this error is present for both 2.12.0 and 2.12.1, or if it's just 2.12.1?

Actions #5

Updated by Ichimonji10 almost 8 years ago

Do you know if this error is present for both 2.12.0 and 2.12.1, or if it's just 2.12.1?

AFAIK, it's only present for the development versions of 2.12.1 and 2.13. I didn't observe it in the 2.12.1 release.

Issue updated to state "2.12 and 2.13 nightlies."

Actions #6

Updated by Ichimonji10 almost 8 years ago

  • Description updated (diff)
Actions #7

Updated by semyers almost 8 years ago

  • Priority changed from Normal to High
  • Severity changed from 2. Medium to 3. High

Tentatively marking as a blocker for 2.12.z+, and raising the prio/sev each to high accordingly for a release blocker, pending review by the team during triage tomorrow.

Actions #8

Updated by ttereshc almost 8 years ago

+1 for blocker

During search all the units data from db is selected including gzipped XML snippets for RPM/SRPM units, and this issue happens when json module tries to serialize this data to generate response.
I have not found any other calls affected by the similar issue.

It is possible to solve this issue in two ways I think:
- do not return to user these XML snippets at all
- unzip the data and return it to user

I am leaning towards the first option. These XML snippets are needed only for publish purposes and during search we just return everything we have in DB including such data which is meant to be for internal use only.
This is a hacky way but just to show that one way or the other we can exclude 'repodata' from the units data:

diff --git a/server/pulp/server/webservices/views/repositories.py b/server/pulp/server/webservices/views/repositories.py
index e0389c8..6ef4943 100644
--- a/server/pulp/server/webservices/views/repositories.py
+++ b/server/pulp/server/webservices/views/repositories.py
@@ -291,6 +291,7 @@ class RepoUnitSearch(search.SearchView):
             units = manager.get_units(repo_id, criteria=criteria)
         for unit in units:
             content.remap_fields_with_serializer(unit['metadata'])
+            unit['metadata'].pop('repodata', None)
         return generate_json_response_with_pulp_encoder(units)

The potential downside(?) that we no longer return XML snippets like we did before, though I can't come up with a use case when this data may be needed for someone.

Actions #9

Updated by Ichimonji10 almost 8 years ago

During search all the units data from db is selected including gzipped XML snippets for RPM/SRPM units,

This is consistent with Pulp's behaviour. Tests which search for DRPM units are not affected, as I discovered while walking through the failing tests and writing up Pulp Smash #575.

Actions #10

Updated by bizhang almost 8 years ago

  • Sprint/Milestone set to 34
  • Triaged changed from No to Yes
Actions #11

Updated by ipanova@redhat.com almost 8 years ago

I confirm that it happens just with rpm and srpm.

Actions #12

Updated by daviddavis almost 8 years ago

Could you maybe use projection to not return the field?

https://docs.mongodb.com/manual/tutorial/project-fields-from-query-results/

Regardless, option 1 sounds good to me.

Actions #13

Updated by jortel@redhat.com almost 8 years ago

Seems like not returning the metadata would perform better but won't that break semantic versioning by altering the data returned by the API.

Actions #14

Updated by semyers almost 8 years ago

We chatted about this in IRC and agree that option 1 would probably break semver, leaving option 2. ttereshc has updated comment 8 accordingly.

Actions #15

Updated by ttereshc almost 8 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to ttereshc
Actions #16

Updated by semyers almost 8 years ago

  • Status changed from ASSIGNED to POST

After a lot of investigation, and a process of elimination going through successively less simple solutions, we put together these PRs to fix views affected by this (hopefully) without breaking anything else:
https://github.com/pulp/pulp/pull/2962
https://github.com/pulp/pulp_rpm/pull/1037

Added by ttereshc almost 8 years ago

Revision 22efbda4 | View on GitHub

Replace remap_fields with serialize_unit

In order to fix an RPM bug, an opportunity presented itself to slightly improve this interface, effectively renaming remap_fields_with_serializer to serialize_unit_with_serializer, and exposing a new "serialize" method on ModelSerializer that can be overridden in subclasses as desired to help with serializing "exotic" field types, such as BLObs and other things not easily handled with our custom JSON encoder.

closes #2620 https://pulp.plan.io/issues/2620

Added by ttereshc almost 8 years ago

Revision 22efbda4 | View on GitHub

Replace remap_fields with serialize_unit

In order to fix an RPM bug, an opportunity presented itself to slightly improve this interface, effectively renaming remap_fields_with_serializer to serialize_unit_with_serializer, and exposing a new "serialize" method on ModelSerializer that can be overridden in subclasses as desired to help with serializing "exotic" field types, such as BLObs and other things not easily handled with our custom JSON encoder.

closes #2620 https://pulp.plan.io/issues/2620

Added by ttereshc almost 8 years ago

Revision 313fe912 | View on GitHub

Decompress RPM/SRPM metadata in their serializer

re #2620 https://pulp.plan.io/issues/2620

Actions #17

Updated by ttereshc almost 8 years ago

  • Status changed from POST to MODIFIED
Actions #18

Updated by semyers almost 8 years ago

  • Platform Release set to 2.12.2
Actions #21

Updated by semyers almost 8 years ago

  • Status changed from MODIFIED to 5
Actions #22

Updated by semyers almost 8 years ago

We had a good team chat about the "Verification Required" flag on Monday, and decided that the release of 2.12.2 should not be blocked on the verification of this issue.

Actions #23

Updated by Ichimonji10 almost 8 years ago

We had a good team chat about the "Verification Required" flag on Monday, and decided that the release of 2.12.2 should not be blocked on the verification of this issue.

Hunh? This issue is already verified. The 2.12.2 release wouldn't be blocked no matter the status of the "Verification Required" flag on this issue.

Also, this issue should definitely block the 2.12.2 release. (But the issue is fixed and has been verified, so there's no need to block.)

Actions #25

Updated by bizhang over 7 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #26

Updated by daviddavis over 7 years ago

  • Has duplicate Issue #2753: Cannot search rpm content from the CLI added
Actions #27

Updated by bmbouter almost 7 years ago

  • Sprint set to Sprint 16
Actions #28

Updated by bmbouter almost 7 years ago

  • Sprint/Milestone deleted (34)
Actions #29

Updated by bmbouter over 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF