Project

Profile

Help

Issue #2136

closed

publish step error handling incorrectly assumes open file

Added by mhrivnak over 7 years ago. Updated about 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Category:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Master
Platform Release:
2.11.1
OS:
Triaged:
Yes
Groomed:
Yes
Sprint Candidate:
Yes
Tags:
Pulp 2
Sprint:
Sprint 13
Quarter:

Description

This call to finalize():

https://github.com/pulp/pulp/blob/1980a78d2758c139077c4ccbafe64a8bc10a234b/server/pulp/plugins/util/publish_step.py#L257

Assumes an open file, at least in the case of how pulp_rpm uses this base class.

Either initialize() should be called before the try: block, or finalize() (and all of its subclass implementations) should not assume that initialize() accomplished anything (like opening a file).

From the bugzilla, which references pulp 2.6 code:

Description of problem:
In /usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py: line 165:

    def process(self):
..
            try:
                self.total_units = self._get_total()
                self.report_progress()
                self.initialize()
                self.report_progress()
..
            finally:
                # Always call finalize to allow cleanup of file handles
                self.finalize()
            self.post_process()

Assume an exception (example below) is raised such that the file expected to be created has not been created at all. Calling self.finalize() then fails with:

Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664) Task pulp.server.managers.repo.publish.publish[d83cb083-161b-472d-ba43-05d121dde87f] raised unexpected: OSError(2, 'No such file or directory')
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664) Traceback (most recent call last):
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 240, in trace_task
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     R = retval = fun(*args, **kwargs)
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 328, in __call__
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     return super(Task, self).__call__(*args, **kwargs)
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/celery/app/trace.py", line 437, in __protected_call__
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     return self.run(*args, **kwargs)
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/server/managers/repo/publish.py", line 99, in publish
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     transfer_repo, conduit, call_config)
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/server/managers/repo/publish.py", line 127, in _do_publish
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     publish_report = publish_repo(transfer_repo, conduit, call_config)
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/server/async/tasks.py", line 495, in wrap_f
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     return f(*args, **kwargs)
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/distributors/yum/distributor.py", line 143, in publish_repo
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     return self._publisher.publish()
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 558, in publish
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     return self.process_lifecycle()
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 503, in process_lifecycle
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     super(PluginStep, self).process_lifecycle()
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 127, in process_lifecycle
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     step.process()
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 211, in process
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     self.finalize()
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/distributors/yum/publish.py", line 419, in finalize
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     self.other_context.checksum)
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/distributors/yum/metadata/repomd.py", line 82, in add_metadata_file_metadata
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     timestamp_element.text = str(int(os.path.getmtime(file_path)))
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)   File "/usr/lib64/python2.7/genericpath.py", line 54, in getmtime
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664)     return os.stat(filename).st_mtime
Apr  5 14:50:36 Satellite pulp: celery.worker.job:ERROR: (1275-37664) OSError: [Errno 2] No such file or directory: '/var/lib/pulp/working/repos/Default_Organization-Red_Hat_Enterprise_Linux_Server-Red_Hat_Enterprise_Linux_7_Server_RPMs_x86_64_7Server/distributors/yum_distributor/repodata/other.xml.gz'

This happens due to the fact that MetadataFileContext finalize method sees the file is not opened, so it does nothing, but PublishRpmStep finalize assumes there has been created file for other_context. (same applies when an exception is raised for file_lists or primary context).

The exception above is practically impossible to troubleshoot. It is so due to the fact a swallowed exception is silently ignored. Simple printing warning with exception details would help a lot to understand what happened.

One real example what exception could be raised & swallowed:

Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448) 01612026: caught exception: Traceback (most recent call last):
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/publish_step.py", line 183, in process
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)     self.initialize()
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)   File "/usr/lib/python2.7/site-packages/pulp_rpm/plugins/distributors/yum/publish.py", line 403, in initialize
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)     context.initialize()
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/metadata_writer.py", line 74, in initialize
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)     self._open_metadata_file_handle()
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)   File "/usr/lib/python2.7/site-packages/pulp/plugins/util/metadata_writer.py", line 364, in _open_metadata_file_handle
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)     content = gzip_handle.read(BUFFER_SIZE)
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)   File "/usr/lib64/python2.7/gzip.py", line 261, in read
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)     self._read(readsize)
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)   File "/usr/lib64/python2.7/gzip.py", line 325, in _read
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)     self._read_eof()
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)   File "/usr/lib64/python2.7/gzip.py", line 347, in _read_eof
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448)     hex(self.crc)))
Apr 14 18:02:36 Satellite pulp: pulp.plugins.util.publish_step:WARNING: (11964-64448) IOError: CRC check failed 0x7d4c4209 != 0x60a4e457L

I.e. CRC check on the gzip file is wrong -> exception raised -> silently ignored -> no other (or filelist or primary) context file exists -> cryptic "No such file exists" raised.

Another possibility might be (havent tested/seen) file permissions, too many open FDs, some I/O problem,.. many.

Please add some warning what exception was raised, like:

                # Double check & return if we have been canceled
                if self.canceled:
                    return
            except Exception as e:
                _LOG.warn('caught exception: %s' % traceback.format_exc())
            finally:
                # Always call finalize to allow cleanup of file handles
                self.finalize()

Version-Release number of selected component (if applicable):
pulp-server-2.6.0.20-1.el7sat.noarch

How reproducible:
100% (if reproducer known)

Steps to Reproduce:
1. Somehow let damage primar / other / filelist gzip for some repo (not sure on this step)
2. Try to sync & publish the repo
3. Try to understand from logs what happened

Actual results:
2. fails
3. no idea what happened, "No such file or directory: other.xml.gz" is use-less info hiding the real cause.

Expected results:
2. fails
3. original exception raised that prevented opening the metadata / context file.
Actions #1

Updated by amacdona@redhat.com over 7 years ago

  • Triaged changed from No to Yes
Actions #2

Updated by jortel@redhat.com over 7 years ago

  • Groomed changed from No to Yes
  • Sprint Candidate changed from No to Yes
Actions #3

Updated by daviddavis over 7 years ago

  • Sprint/Milestone set to 31
Actions #4

Updated by jortel@redhat.com over 7 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to jortel@redhat.com
Actions #5

Updated by jortel@redhat.com over 7 years ago

  • Status changed from ASSIGNED to POST

https://github.com/pulp/pulp/pull/2896

The code has changed quite a bit since 2.6 but testing indicated the problem still exists in 2.10.

Added by jortel@redhat.com over 7 years ago

Revision 68392c56 | View on GitHub

Prevent exceptions raised in finally: closes #2136

Added by jortel@redhat.com over 7 years ago

Revision 68392c56 | View on GitHub

Prevent exceptions raised in finally: closes #2136

Actions #6

Updated by jortel@redhat.com over 7 years ago

  • Status changed from POST to MODIFIED
Actions #7

Updated by semyers over 7 years ago

  • Platform Release set to 2.11.1
Actions #8

Updated by semyers over 7 years ago

  • Status changed from MODIFIED to 5
Actions #10

Updated by semyers over 7 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #11

Updated by bmbouter about 6 years ago

  • Sprint set to Sprint 13
Actions #12

Updated by bmbouter about 6 years ago

  • Sprint/Milestone deleted (31)
Actions #13

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF