Story #1759
Updated by dkliban@redhat.com over 8 years ago
Pulp users often want We currently have independent pulpservers for syncing upstream repos. However, when syncing repositories the checksums for the metadata files differs on the servers which causes yum to make content available throw 404 when getting the metadata. pulpserver 1 syncs upstream and generates filenames for the metadata like: <snip> <location href="repodata/c4e924cc643ac6ef1f377a39beace6a152c82d0acb5e88027c21b39d070dd7e9-filelists.xml.gz"/> </snip> pulpserver 2 syncs the same upstream (identical packages, identical setup for the repo) <snip> <location href="repodata/6214a1a00b4d3ee74582b5f47dd052bb4a391657c7ba739d4d20bf4854fcee4a-filelists.xml.gz"/> </snip> we had round-robin dns entries that point to all pulpservers which causes yum to download repomd.xml from multiple web servers. This server 1 and trying to download the filelists.xml.gz from server 2 which results in a 404 because of the different filenames and a retry against server 1 which usually succeeds. yum seems to assume that the metadata is beneficial identical on all nodes referenced in situations where content needs a single url. We decided against setting up a clustered pulp setup because it requires the shared storage to always be highly available or available. NFS-mounting the same content needs related folders from server 1 to server 2 requires server 1 to always be available online, thereby turning it into a single point of failure. Using some kind of external storage turns the external storage into a single point of failure which has all sorts of problems when running across network links outside of a single dc. For now we mitigate the problem by skipping the round-robin which in different geographic locations. The rsync distributor will afford users effect turns the setup into a single-ish point of RPM's yum_distributor failure again (we actually add two urls to push repositories published by the yum_distributor .repo files on the machines which behaves differently, yum seems to always pick the first url first and only switch to the secondary url when the first url is unavailable). After a very helpful trip to the irc channel (ty bmbouter) I was asked to open this ticket, do you need any server other info for this? I am happy to contribute with info and anything else I can help out with. I also noted that supports key based ssh communication. The rsync distributor will afford users parts of Docker's docker_distributor the code for repomd would in fact support the checksum to push repositories published by be set to None but it seems that that is not achievable with the docker_distributor APIs. another possibility we see is to any server that supports key based ssh communication. have an option to copy the upstream metadata 1:1 to your local repo.