Issue #7660
closedpulp fails downloading Amazon Linux repository
Description
pulp 2.21.3 fails to download packages from Amazon Linux repository. the problem is metadata of Amazon repository, more specifically, time attribute of the rpm package :
importers/yum/repomd/primary.py on line 135 expect value to be integer:
package_info['time'] = int(time_element.attrib['file'])
changing this line to
package_info['time'] = int(time_element.attrib['file'].splt('.')[0])
fixed issue for me
This is stack from failure
https://gist.github.com/vchepkov/1d12035982b46ddd15c8714779eeaac5
Updated by ggainey about 4 years ago
IRC conversation with a potential fix:
<ggainey> vchepkov: I expect changing "package_info['time'] = int(time_element.attrib['file'])" to package_info['time'] = int(time_element.attrib['file'].split('.')[0]) would work
<ggainey> (135 in primary.py)
Full conversation:
<vchepkov> Hi. I have issue with syncing Amazon Linux 2 repo. it downloads metadata fine, but then crashes with an error
<vchepkov> invalid literal for int() with base 10: '1601491199.805464'
<dkliban> vchepkov: what version of pulp?
<vchepkov> 2.21.3 on Centos 7
<vchepkov> pulp-admin rpm repo create --repo-id=amzn2-core-x86_64 --relative-url=amazon_linux/2/x86_64/amzn-core --feed http://amazonlinux.us-east-1.amazonaws.com/2/core/latest/x86_64/mirror.list
<vchepkov> that's how I created repo
<dkliban> vchepkov: gotcha
<dkliban> could you share the full traceback?
<dkliban> from from /var/log/messages or if there is one present in the task
<dkliban> or your logs might be in journalctl
<vchepkov> sure, give me a sec, I will create a gist
<vchepkov> https://gist.github.com/vchepkov/1d12035982b46ddd15c8714779eeaac5
<dkliban> looks like the metadata for one of the packages doesn't have a valid time tag
<dkliban> i don't think there is any way to work around that
<vchepkov> oh. I can open ticket with amazon if that's the case
<dkliban> vchepkov: pulp 2 is in maintenance mode. the only thing i can recommend is to try syncing with pulp 3
<vchepkov> This is what foreman/katello installed, I haven't selected the version
<dkliban> gotcha
<dkliban> vchepkov: which version of katello?
<vchepkov> 3.17
<vchepkov> How can I tell which package doesn't have proper data ?
<dkliban> vchepkov: you would need to add some logging to the python code
<dkliban> can you manually down the metadata?
<dkliban> i tried to access that repo but i was not authorized
<vchepkov> I can with no issues.
<vchepkov> I suspect mirror-list provides you with an unique ID
<dkliban> inside repodata/primary.xml you need to find '1601491199.805464'
<dkliban> that will tell you which package has that set as the time
<vchepkov> For the heck of it, I logged in from a new IP and was able to download it
<vchepkov> http://amazonlinux.us-east-1.amazonaws.com/2/core/2.0/x86_64/a3ab6bd64043e16700a1be13947a8d2155362e6d4e61908a43440ffc45becdce/repodata/primary.xml.gz
<vchepkov> let me take a look
<vchepkov> <time file="1601491199.805464" build="1589818701"/>
<vchepkov> unbound-python
<vchepkov> I compared to centos 7 file and it looks pretty much the same
<dkliban> vchepkov: i am looking at this file now also
<dkliban> and it looks like this is the very first package in the list
<dkliban> and the rest of the packages have similar timestamps
<vchepkov> yep, centos 7 too
<dkliban> vchepkov: so the difference is that CentOS is using an integer for this field and amazon linux is using a float
<vchepkov> ah, yes
<vchepkov> not sure what the standard there.
<dkliban> vchepkov: i am nto sure either
<dkliban> but Pulp is definitely expecting an int
<dkliban> vchepkov: i suspect that this is fixed in Pulp 3
<vchepkov> both refer to schema that it's not there
<vchepkov> http://linux.duke.edu/metadata/rpm
<ggainey> dkliban: vchepkov : yeah, time is a millseconds-timestamp - looks like AWS adds nanos :(
<ggainey> vchepkov: RPM has been under-specified its entire history, alas :(
<dkliban> jsherrill: which version of katello allows users to use pulp 3?
<dkliban> for rpm content
<vchepkov> I wonder if I can strip 'nanos' to make it happy
<ggainey> vchepkov: I expect changing "package_info['time'] = int(time_element.attrib['file'])" to package_info['time'] = int(time_element.attrib['file'].split('.')[0]) would work
<ggainey> (135 in primary.py)
<vchepkov> trying
<ggainey> I haven't run that, mind - I don't have a pulp2 env up right this second, doing some massive pulp3 testing that wants All Of My Memory :)
<ggainey> but it should work, even on timestamps that do *not* have nanos
<vchepkov> didn't like it though
<ggainey> :'(
<ggainey> is it the same line? (I mean, it's likely, because I haven't actually run the code-snippet I'm suggesting here)
<vchepkov> I might need to add parentheses
<ggainey> hm
<vchepkov> since int still complains about float
<vchepkov> package_info['time'] = int(time_element.attrib['file'].split('.')[0])
<vchepkov> ValueError: invalid literal for int() with base 10: '1601491199.805464'
<ggainey> or wait - the next line is build_time, which likely has the same problem
<vchepkov> no, same line, I think
<vchepkov> line 135, in process_package_element
<ggainey> time is line 135, built_time is 136, in primary.py - check the stacktrace
<vchepkov> I did, it complains about 135
<ggainey> bah, ok. hm
<ggainey> it still finds the whole float?
<ggainey> hrm
<vchepkov> yep
<ggainey> did you restart pulp after changing?
<vchepkov> I didn't but it shows split in the stack
<ggainey> restart and try again, if you would
<vchepkov> btw, build is integer in primary.xml
<ggainey> well that's good :)
<ggainey> here's me experimenting in python-shell : https://paste.centos.org/view/d39b1456
* frozengeek has quit (Quit: frozengeek)
<ggainey> trying to understand how the split could be missing
<vchepkov> rebooted the whole thing, to be sure
<ggainey> heh - "it's the only way to be sure" :)
<vchepkov> well, it
<vchepkov> is downloading :)
<vchepkov> if that's the only issue, that would be great :)
Updated by ttereshc about 4 years ago
- Status changed from NEW to ASSIGNED
- Assignee set to ggainey
- Triaged changed from No to Yes
- Sprint set to Sprint 83
Updated by ggainey about 4 years ago
- Status changed from ASSIGNED to POST
Updated by pulpbot about 4 years ago
Added by ggainey about 4 years ago
Updated by ggainey about 4 years ago
- Status changed from POST to MODIFIED
Applied in changeset 59486d55ba2157655ce74e7255029df994657448.
Added by ggainey about 4 years ago
Revision d072c8e8 | View on GitHub
Handled the case on sync where package.time or .buildtime are floats
Happens with Amazon Linux repositories.
fixes #7660
(cherry picked from commit 59486d55ba2157655ce74e7255029df994657448)
Updated by ggainey about 4 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Handled the case on sync where package.time or .buildtime are floats
Happens with Amazon Linux repositories.
fixes #7660