Pulp: Issueshttps://pulp.plan.io/https://pulp.plan.io/favicon.ico2017-01-25T17:51:02ZPulp
Planio Pulp - Issue #2545 (CLOSED - CURRENTRELEASE): pulp-manage-db wait time calculation can end up wit...https://pulp.plan.io/issues/25452017-01-25T17:51:02Zelyezererezende@redhat.com
<p>Running the automation jobs showed that pulp-manage-db can wait for hours instead of to wait for up to 92 seconds (checked <code>constants.MIGRATION_WAIT_TIME</code> for this information).</p>
<p>I've ssh into the machine where pulp-manage-db is running for 3 hours and counting, the running time was calculated by checking ps and date as demonstrated below:</p>
<pre><code>$ ps aux | grep pulp-manage-db
root 8505 0.0 0.0 193332 2788 ? S 09:35 0:00 sudo -u apache pulp-manage-db
$ date
Wed Jan 25 12:41:43 EST 2017
</code></pre>
<p>I did some investigation and, for example, the pulp_resource_manager is inactive for the same amount of time:</p>
<pre><code>$ systemctl status pulp_resource_manager
● pulp_resource_manager.service - Pulp Resource Manager
Loaded: loaded (/usr/lib/systemd/system/pulp_resource_manager.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Wed 2017-01-25 09:33:30 EST; 3h 10min ago
Main PID: 5252 (code=exited, status=0/SUCCESS)
</code></pre>
<p>Running pulp-manage-db again shows that it will wait for about more 2 hours:</p>
<pre><code>$ sudo -u apache pulp-manage-db
Attempting to connect to localhost:27017
Attempting to connect to localhost:27017
Write concern for Mongo connection: {}
The following processes might still be running:
scheduler@host-172-16-46-204.openstacklocal
reserved_resource_worker-0@host-172-16-46-204.openstacklocal
reserved_resource_worker-1@host-172-16-46-204.openstacklocal
resource_manager@host-172-16-46-204.openstacklocal
Please wait 6570 seconds while Pulp confirms this.^C
Traceback (most recent call last):
File "/bin/pulp-manage-db", line 9, in <module>
load_entry_point('pulp-server==2.12c2', 'console_scripts', 'pulp-manage-db')()
File "/usr/lib/python2.7/site-packages/pulp/server/db/manage.py", line 214, in main
time.sleep(1)
KeyboardInterrupt
</code></pre>
<p>I decide to check the source code and run a python shell session to check what the code is doing and the math there is making the wait time be that long:</p>
<pre><code>$ sudo -u apache python
>>> from pulp.server.managers import status
>>> from pulp.server.db import connection
>>> from pulp.server.db.fields import UTCDateTimeField
>>> from datetime import datetime, timedelta
>>> from pulp.common import constants
>>> connection.initialize()
>>> active_workers = status.get_workers()
>>> last_worker_time = max([worker['last_heartbeat'] for worker in active_workers])
>>> last_worker_time
datetime.datetime(2017, 1, 25, 14, 33, 22, 289000, tzinfo=<isodate.tzinfo.Utc object at 0x1f83510>)
>>> time_from_last_worker = UTCDateTimeField().to_python(datetime.now()) - last_worker_time
>>> time_from_last_worker
datetime.timedelta(-1, 79217, 994870)
>>> time_from_last_worker.seconds
79217
>>> wait_time = timedelta(seconds=constants.MIGRATION_WAIT_TIME) - time_from_last_worker
>>> wait_time
datetime.timedelta(0, 7241, 855873)
>>> wait_time.seconds
7241
>>> str(wait_time)
'2:00:41.855873'
>>> constants.MIGRATION_WAIT_TIME
92
</code></pre> Pulp - Story #2525 (CLOSED - CURRENTRELEASE): Failover events should be explicitly logged as such...https://pulp.plan.io/issues/25252017-01-13T19:46:16Zdalleydalley@redhat.com
<a name="Problem"></a>
<h3 >Problem<a href="#Problem" class="wiki-anchor">¶</a></h3>
<p>pulp_resource_manager and pulp_celerybeat both use hot-spare processes for high availability. The current behavior logs when a worker go missing, when a new ones comes online, and when the lock is acquired.</p>
<a name="Solution"></a>
<h3 >Solution<a href="#Solution" class="wiki-anchor">¶</a></h3>
<p>Whenever a pulp_resource_manager or pulp_celerybeat instance is a hot spare (could not require the lock) and then becomes the primary (acquires the lock), logs should notify the user that failover has occurred. Specifically at WARNING level, it should log the following:</p>
<p><code>Failover occured: xxxxxx is now the primary</code></p>
<p>Additionally, the "lock acquired" log statement should be switched to DEBUG since the WARNING statement will be shown to the user at that same moment.</p> Pulp - Story #2509 (CLOSED - CURRENTRELEASE): Pulp process failure detection and any failover sho...https://pulp.plan.io/issues/25092017-01-03T19:59:57Zbizhangbizhang@redhat.com
<a name="Problem"></a>
<h2 >Problem<a href="#Problem" class="wiki-anchor">¶</a></h2>
<p>Pulp's failure detection and failover currently takes a long time. This consists of two primary areas: (1) marking workers are dead and (2) pulp_celerybeat and resource_manager hot-spare failover.</p>
<p>The current timings are:</p>
<ul>
<li>worker heartbeat is 30 seconds</li>
<li>celerybeat heartbeat is 90 seconds</li>
<li>worker ageout time is 300 seconds</li>
<li>celerybeat lock ageout time is 200 seconds</li>
<li>resource manager lock check is 60 seconds</li>
</ul>
<p>This means that when a worker process dies it could take 300 - 390s for it to be considered dead. It also means that it takes 200 - 290 seconds for celerybeat to failover and 300 - 360s seconds for the resource_manager to failover.</p>
<a name="Solution"></a>
<h2 >Solution<a href="#Solution" class="wiki-anchor">¶</a></h2>
<p>This time should be shorter. We should declare Pulp workers to be dead if they have been missing for 30 seconds. We should also have both the pulp_celerybeat and pulp_resource_manager failover within 30 seconds.</p>
<p>We can do this by updating the worker heartbeat to 5 seconds, the celerybeat heartbeat to 5 seconds and the worker ageout time to 25 seconds.</p>
<p>This would mean that a worker that has not checked in for 5 heartbeats (25s) would be considered missing the next time celerybeat checks (25s-30s after the last time the worker checked in)</p>
<p>In addition we need to update the current logic of the resource manager lock failover to match with celerybeat's in order to ensure a 30s failover (see comment 5 for details)</p>
<p>The proposed timings are:</p>
<ul>
<li>worker heartbeat 5s</li>
<li>celerybeat heartbeat 5s</li>
<li>worker ageout time 25s</li>
<li>celerybeat lock ageout time 25 s</li>
<li>resource manager lock heartbeat 5s</li>
</ul> Docker Support - Issue #2441 (CLOSED - CURRENTRELEASE): unassociating a docker_manifest removes d...https://pulp.plan.io/issues/24412016-11-24T12:22:16Zjluzajluza@redhat.com
<pre><code>reproducer:
repo content:
Manifest: sha256:70753876404a22bc39af3ddcae831f9f2d32950f27f964dd781c66da051be957
|- Blob: sha256:6e0419422ad90f146b41a787cbfa826ee3dd372b1dc0e3f100dbd71d5b63d47d
|- Blob: sha256:16dc1f96e3a1bb628be2e00518fec2bb97bd5933859de592a00e2eb7774b6ecf
Manifest: sha256:c7a76dc3a509a42c9af781674826ed5c107017299aea2f10b06e658abbe5299f
|- Blob: sha256:c157e91881d411d2e1847078ae5635d2b52f947ad07c218e248806768700fcdb
|- Blob: sha256:16dc1f96e3a1bb628be2e00518fec2bb97bd5933859de592a00e2eb7774b6ecf
1. unassociate {"filters": {"unit": {"digest": "sha256:c7a76dc3a509a42c9af781674826ed5c107017299aea2f10b06e658abbe5299f"}}}
2. Result:
repo doesn't have sha256:c157e91881d411d2e1847078ae5635d2b52f947ad07c218e248806768700fcdb
repo doesn't have sha256:16dc1f96e3a1bb628be2e00518fec2bb97bd5933859de592a00e2eb7774b6ecf
Expected behaviour:
repo doesn't have sha256:c157e91881d411d2e1847078ae5635d2b52f947ad07c218e248806768700fcdb
repos DOES have sha256:16dc1f96e3a1bb628be2e00518fec2bb97bd5933859de592a00e2eb7774b6ecf
</code></pre> Pulp - Issue #2264 (CLOSED - CURRENTRELEASE): better error reporting during import/associationhttps://pulp.plan.io/issues/22642016-09-16T05:22:53Zjluzajluza@redhat.com
<p>During import/association, I would like to know what happened if there was exception raised. Current pulp reraises PulpGeneralException. As user I would like to see original exception.</p> Docker Support - Story #2189 (CLOSED - CURRENTRELEASE): As a user, I can update docker_tag units ...https://pulp.plan.io/issues/21892016-08-19T11:41:25Ztwaughtwaugh@redhat.com
<p>When there are multiple docker_manifest units in a repository all sharing the same tag name, and a docker_tag unit referencing one of those manifests, it should be possible to update that docker_tag unit to refer to a different docker_manifest unit already in the repository (one which has the correct tag).</p>
<p>When there is a docker_manifest unit in the repository but no docker_tag unit for it (perhaps because it or its manifest has previously been deleted), it should be possible to create a docker_tag unit for that manifest.</p>
<p>Currently these operations are only possible with the help of an external docker v2 registry, and using the disassociate and sync operations.</p> Pulp - Story #2186 (CLOSED - CURRENTRELEASE): As a user, pulp-manage-db refuses to run if other p...https://pulp.plan.io/issues/21862016-08-18T17:48:23Zjsherril@redhat.comjsherril@redhat.com
<p>pulp db migrations are not meant to be run while pulp processes are active and can cause major data issues if there are active workers.</p>
<p>There should be a test to prevent this at the startup of the pulp-manage-db process.</p>
<p>[UPDATE] See implementation plan on comment 21.</p> RPM Support - Story #1976 (CLOSED - CURRENTRELEASE): As user, I can have packages sorted in "Pack...https://pulp.plan.io/issues/19762016-06-06T14:34:43Zjluzajluza@redhat.com
<p>See comment 32 for the use case requirements. See comment 42 for a clarification on the requirements.</p> Pulp - Story #1939 (CLOSED - CURRENTRELEASE): As a user, I would like to be able to profile Pulp ...https://pulp.plan.io/issues/19392016-05-23T19:20:08Zjcline@redhat.comjcline@redhat.com
<a name="Problem-Overview"></a>
<h2 >Problem Overview<a href="#Problem-Overview" class="wiki-anchor">¶</a></h2>
<p>Often, users (both developers and people out in the "real world") want to be able to figure out why a task is taking a long time to complete. What bits are taking the majority of the time? It is possible to answer this question with profiling tools. Python's standard library contains ``profile`` and ``cProfile``. It would nice if I could flip a configuration setting and have every task Pulp dispatches be profiled.</p>
<a name="Implementation-Plan"></a>
<h2 >Implementation Plan<a href="#Implementation-Plan" class="wiki-anchor">¶</a></h2>
<p>It might be possible (and easy, even) to add to the task decorator we use on functions to check the server config for a setting, ``profile_tasks``. If true, it wraps the task function with cProfile and sends it on its merry way. There are, of course, other ways we could do this, but that's just one way that came to mind.</p> RPM Support - Issue #1823 (CLOSED - CURRENTRELEASE): RPMs partially downloadedhttps://pulp.plan.io/issues/18232016-04-06T19:16:44Zmhrivnakmhrivnak@redhat.com
<p>There have been multiple reports of RPMs ending up in /var/lib/pulp/content/ with either 0 bytes, or partially downloaded. Looking at the 6.1 code, it is difficult to identify how that is possible.</p>
<p>You can see here that in pulp 2.6, an rpm does not get saved into the DB until after validation has happened, and the file has been moved into place without errors. Katello has assured us that they supply "validate: true" with each sync request, so validation should be happening.</p>
<p><a href="https://github.com/pulp/pulp_rpm/blob/2.6-release/plugins/pulp_rpm/plugins/importers/yum/listener.py#L85" class="external">https://github.com/pulp/pulp_rpm/blob/2.6-release/plugins/pulp_rpm/plugins/importers/yum/listener.py#L85</a></p>
<p>And yet, users are seeing this happen, so we need to investigate further.</p> Pulp - Task #1488 (CLOSED - CURRENTRELEASE): Deprecate nodeshttps://pulp.plan.io/issues/14882016-01-07T20:37:45Zbmbouterbmbouter@redhat.com
<p>Based on multiple discussions nodes support is being deprecated with 2.8.0. As such we need to take a few steps to properly deprecate it. See the checklist for details. The nodes docs they refer to are here[0].</p>
<p>NOTE: with DeprecationWarning we need to ensure they are having the effect we think they are. Two things to consider/test:</p>
<ul>
<li>In Python 2.6 DeprecationWarning was loud by default, but has been silenced in 2.7+.</li>
<li>The -W argument (iirc) enables/disables warnings so that could be a meaningful difference between a dev and production environment</li>
</ul>
<p>[0]: <a href="http://pulp.readthedocs.org/en/latest/user-guide/nodes.html" class="external">http://pulp.readthedocs.org/en/latest/user-guide/nodes.html</a></p> Pulp - Issue #1321 (CLOSED - CURRENTRELEASE): Commit message requirements for Pulp contributions ...https://pulp.plan.io/issues/13212015-10-17T16:02:12Zbmbouterbmbouter@redhat.com
<p>Currently we recommend the old-style "bug number - title" as the only requirement for the contents of a squashed commit message. This is out of date since we now use the commit keyword integration w/ Redmine (see <a class="issue tracker-2 status-11 priority-6 priority-default closed" title="Task: Document the commit message keywords that will interact with Redmine (CLOSED - CURRENTRELEASE)" href="https://pulp.plan.io/issues/103">#103</a>). Also it should have a link to the issue/story; I've seen that a lot already by convention. We should also ask contributors to write a description of their commit as a paragraph style body. This is a common style I see across the Internet. See an <a href="http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html" class="external">example here</a>.</p>
<p>The fix should describe things similar to <a href="http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html" class="external">how they are done here</a>. And also refer the user to the commit keywords (see <a class="issue tracker-2 status-11 priority-6 priority-default closed" title="Task: Document the commit message keywords that will interact with Redmine (CLOSED - CURRENTRELEASE)" href="https://pulp.plan.io/issues/103">#103</a>).</p>
<p>You probably need to <a href="http://pulp.readthedocs.org/en/2.6-release/dev-guide/contributing/branching.html#commit-messages" class="external">update this section</a>, and maybe some places on <a href="http://pulp.readthedocs.org/en/2.6-release/dev-guide/contributing/merging.html" class="external">the merging page</a> that reference "commit messages".</p> Pulp - Story #1268 (CLOSED - CURRENTRELEASE): As a user, orphan content delete reports how many u...https://pulp.plan.io/issues/12682015-09-24T14:57:27Zmmccune@redhat.commmccune@redhat.com
<p>The API DELETE /pulp/api/v2/content/orphans/ will spawn a task to delete content.</p>
<p>These tasks don't seem to report what they actually deleted. As a user it is useful to know what was deleted so the user can validate changes made to the filesystem and storage.</p>
<p>The task should report how many units it deleted of each type.</p>
<p>For example, the task status report could look like this to show that 32 rpms were removed:</p>
<pre><code> {
"exception": null,
"task_type": "pulp.server.managers.content.orphan.delete_all_orphans",
"_href": "/pulp/api/v2/tasks/7b2405ed-ae44-48e7-9d16-6fdcdc0af3f0/",
"task_id": "7b2405ed-ae44-48e7-9d16-6fdcdc0af3f0",
"tags": [
"pulp:content_unit:orphans"
],
"finish_time": "2016-11-04T18:18:46Z",
"_ns": "task_status",
"start_time": "2016-11-04T18:18:46Z",
"traceback": null,
"spawned_tasks": [],
"progress_report": {},
"queue": "celery.dq",
"state": "finished",
"worker_name": "celery",
"result": {
"rpm": 32
},
"error": null,
"_id": {
"$oid": "581cd103e96cb59365cf1133"
},
"id": "581cd103e96cb59365cf1133"
}
</code></pre> Pulp - Task #103 (CLOSED - CURRENTRELEASE): Document the commit message keywords that will intera...https://pulp.plan.io/issues/1032015-01-09T14:34:44Zrbarlow
<p>Redmine watches our commit messages for references to issue numbers with special keywords. Add documentation to our contributor guide explaining these keywords. Here are the keywords we currently support:</p>
<p>These two will create a reference relationship, but will not change the state or %done of the referenced issue.</p>
<pre><code>re #123
ref #123
</code></pre>
<p>This create a reference relationship, changes the state of the referenced issue to MODIFIED, and sets the %done to 100 on the referenced issue.</p>
<pre><code>fixes #123
closes #123
</code></pre>