Issue #7660
closedpulp fails downloading Amazon Linux repository
Added by vchepkov about 4 years ago. Updated about 4 years ago.
Description
pulp 2.21.3 fails to download packages from Amazon Linux repository. the problem is metadata of Amazon repository, more specifically, time attribute of the rpm package :
importers/yum/repomd/primary.py on line 135 expect value to be integer:
package_info['time'] = int(time_element.attrib['file'])
changing this line to
package_info['time'] = int(time_element.attrib['file'].splt('.')[0])
fixed issue for me
This is stack from failure
https://gist.github.com/vchepkov/1d12035982b46ddd15c8714779eeaac5
Updated by ggainey about 4 years ago
IRC conversation with a potential fix:
<ggainey> vchepkov: I expect changing "package_info['time'] = int(time_element.attrib['file'])" to package_info['time'] = int(time_element.attrib['file'].split('.')[0]) would work
<ggainey> (135 in primary.py)
Full conversation:
<vchepkov> Hi. I have issue with syncing Amazon Linux 2 repo. it downloads metadata fine, but then crashes with an error
<vchepkov> invalid literal for int() with base 10: '1601491199.805464'
<dkliban> vchepkov: what version of pulp?
<vchepkov> 2.21.3 on Centos 7
<vchepkov> pulp-admin rpm repo create --repo-id=amzn2-core-x86_64 --relative-url=amazon_linux/2/x86_64/amzn-core --feed http://amazonlinux.us-east-1.amazonaws.com/2/core/latest/x86_64/mirror.list
<vchepkov> that's how I created repo
<dkliban> vchepkov: gotcha
<dkliban> could you share the full traceback?
<dkliban> from from /var/log/messages or if there is one present in the task
<dkliban> or your logs might be in journalctl
<vchepkov> sure, give me a sec, I will create a gist
<vchepkov> https://gist.github.com/vchepkov/1d12035982b46ddd15c8714779eeaac5
<dkliban> looks like the metadata for one of the packages doesn't have a valid time tag
<dkliban> i don't think there is any way to work around that
<vchepkov> oh. I can open ticket with amazon if that's the case
<dkliban> vchepkov: pulp 2 is in maintenance mode. the only thing i can recommend is to try syncing with pulp 3
<vchepkov> This is what foreman/katello installed, I haven't selected the version
<dkliban> gotcha
<dkliban> vchepkov: which version of katello?
<vchepkov> 3.17
<vchepkov> How can I tell which package doesn't have proper data ?
<dkliban> vchepkov: you would need to add some logging to the python code
<dkliban> can you manually down the metadata?
<dkliban> i tried to access that repo but i was not authorized
<vchepkov> I can with no issues.
<vchepkov> I suspect mirror-list provides you with an unique ID
<dkliban> inside repodata/primary.xml you need to find '1601491199.805464'
<dkliban> that will tell you which package has that set as the time
<vchepkov> For the heck of it, I logged in from a new IP and was able to download it
<vchepkov> http://amazonlinux.us-east-1.amazonaws.com/2/core/2.0/x86_64/a3ab6bd64043e16700a1be13947a8d2155362e6d4e61908a43440ffc45becdce/repodata/primary.xml.gz
<vchepkov> let me take a look
<vchepkov> <time file="1601491199.805464" build="1589818701"/>
<vchepkov> unbound-python
<vchepkov> I compared to centos 7 file and it looks pretty much the same
<dkliban> vchepkov: i am looking at this file now also
<dkliban> and it looks like this is the very first package in the list
<dkliban> and the rest of the packages have similar timestamps
<vchepkov> yep, centos 7 too
<dkliban> vchepkov: so the difference is that CentOS is using an integer for this field and amazon linux is using a float
<vchepkov> ah, yes
<vchepkov> not sure what the standard there.
<dkliban> vchepkov: i am nto sure either
<dkliban> but Pulp is definitely expecting an int
<dkliban> vchepkov: i suspect that this is fixed in Pulp 3
<vchepkov> both refer to schema that it's not there
<vchepkov> http://linux.duke.edu/metadata/rpm
<ggainey> dkliban: vchepkov : yeah, time is a millseconds-timestamp - looks like AWS adds nanos :(
<ggainey> vchepkov: RPM has been under-specified its entire history, alas :(
<dkliban> jsherrill: which version of katello allows users to use pulp 3?
<dkliban> for rpm content
<vchepkov> I wonder if I can strip 'nanos' to make it happy
<ggainey> vchepkov: I expect changing "package_info['time'] = int(time_element.attrib['file'])" to package_info['time'] = int(time_element.attrib['file'].split('.')[0]) would work
<ggainey> (135 in primary.py)
<vchepkov> trying
<ggainey> I haven't run that, mind - I don't have a pulp2 env up right this second, doing some massive pulp3 testing that wants All Of My Memory :)
<ggainey> but it should work, even on timestamps that do *not* have nanos
<vchepkov> didn't like it though
<ggainey> :'(
<ggainey> is it the same line? (I mean, it's likely, because I haven't actually run the code-snippet I'm suggesting here)
<vchepkov> I might need to add parentheses
<ggainey> hm
<vchepkov> since int still complains about float
<vchepkov> package_info['time'] = int(time_element.attrib['file'].split('.')[0])
<vchepkov> ValueError: invalid literal for int() with base 10: '1601491199.805464'
<ggainey> or wait - the next line is build_time, which likely has the same problem
<vchepkov> no, same line, I think
<vchepkov> line 135, in process_package_element
<ggainey> time is line 135, built_time is 136, in primary.py - check the stacktrace
<vchepkov> I did, it complains about 135
<ggainey> bah, ok. hm
<ggainey> it still finds the whole float?
<ggainey> hrm
<vchepkov> yep
<ggainey> did you restart pulp after changing?
<vchepkov> I didn't but it shows split in the stack
<ggainey> restart and try again, if you would
<vchepkov> btw, build is integer in primary.xml
<ggainey> well that's good :)
<ggainey> here's me experimenting in python-shell : https://paste.centos.org/view/d39b1456
* frozengeek has quit (Quit: frozengeek)
<ggainey> trying to understand how the split could be missing
<vchepkov> rebooted the whole thing, to be sure
<ggainey> heh - "it's the only way to be sure" :)
<vchepkov> well, it
<vchepkov> is downloading :)
<vchepkov> if that's the only issue, that would be great :)
Updated by ttereshc about 4 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ggainey
- Triaged changed from No to Yes
- Sprint set to Sprint 83
Updated by ggainey about 4 years ago
- Status changed from ASSIGNED to POST
Updated by pulpbot about 4 years ago
Added by ggainey about 4 years ago
Updated by ggainey about 4 years ago
- Status changed from POST to MODIFIED
Applied in changeset 59486d55ba2157655ce74e7255029df994657448.
Added by ggainey about 4 years ago
Revision d072c8e8 | View on GitHub
Handled the case on sync where package.time or .buildtime are floats
Happens with Amazon Linux repositories.
fixes #7660
(cherry picked from commit 59486d55ba2157655ce74e7255029df994657448)
Updated by ggainey about 4 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Handled the case on sync where package.time or .buildtime are floats
Happens with Amazon Linux repositories.
fixes #7660