Project

Profile

Help

Issue #3370

Unable to sync Ubuntu Trusty Universe

Added by dustball almost 4 years ago. Updated over 2 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version - Debian:
Platform Release:
2.16.4
Target Release - Debian:
OS:
RHEL 7
Triaged:
No
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

When I'm trying to synchronize the named repository, it fails with "'utf8' codec can't decode byte 0xf1 in position 1186: invalid continuation byte". I think it also happens with the xenial universe repository.

The command used to create the repo was "pulp-admin deb repo create --repo-id 'trusty-universe' --feed 'http://de.archive.ubuntu.com/ubuntu' --releases 'trusty' --components 'universe' --architectures 'amd64' --serve-http true --relative-url trusty-universe", if that matters in any way.

I've attached the journal output.

journal.txt (17.5 KB) journal.txt dustball, 02/13/2018 09:12 PM
jessie-main.txt (14.1 KB) jessie-main.txt dustball, 02/16/2018 10:42 AM
encoding_issue.patch (4.71 KB) encoding_issue.patch sbernhard, 03/13/2018 12:42 PM
deb822.patch (2.37 KB) deb822.patch mihai.ibanescu@gmail.com, 07/10/2018 09:56 PM
debpkgr.patch (3.32 KB) debpkgr.patch mihai.ibanescu@gmail.com, 07/10/2018 09:59 PM

History

#1 Updated by dustball almost 4 years ago

Aaaand I've found another one, this time in the main repo of debian jessie.

I'm somewhat expecting this to be a packaging bug, but for all I can tell debian and ubuntu work with the "broken" packages.

I've also found this somewhere in debian Wheezy, but looking at the EOL, I doubt that is worth investigating.

#2 Updated by mihai.ibanescu@gmail.com almost 4 years ago

Working on reproducing this locally.

#3 Updated by mihai.ibanescu@gmail.com almost 4 years ago

... and I ran out of space. Is there any chance you can reproduce this on a smaller repository?

#4 Updated by dustball almost 4 years ago

So far I've only found ubuntu universe, and jessie-main. I assume jessie-main is by far smaller, yet you'll still need a sizeable disk. I think it happens on wheezy-main as well, but again. Still somewhat large.

#5 Updated by dustball almost 4 years ago

Update: I've pinned the package for jessie-main to be "aspell-is". I had the hope of reproducing this error in full by creating an empty repo and upload only the broken package. However, while it does not publish the package, the error message is far less clear:

Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2204]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[8e15464b-c90f-452c-8b54-18cb68bc7b1f]
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2302]: celery.worker.strategy:INFO: Received task: pulp.server.managers.repo.publish.publish[71c030f1-4ca5-44f9-baa1-ca95c3b69cc3]
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2204]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[8e15464b-c90f-452c-8b54-18cb68bc7b1f] succeeded in 0.0136467898265s: None
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2302]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._release_resource[f45eb252-5f80-4e85-abae-b8f07022ebef]
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2302]: celery.worker.job:INFO: Task pulp.server.managers.repo.publish.publish[71c030f1-4ca5-44f9-baa1-ca95c3b69cc3] succeeded in 0.0346576478332s: {'exception': None, 'repo_id': u'aspell-test', 'traceback': None, 'started': '2018-02-24T17:06:42Z', '_ns':...
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2302]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[f45eb252-5f80-4e85-abae-b8f07022ebef] succeeded in 0.0030136150308s: None

(And yes, this is everything that shows up in the logs when trying to publish the single broken package.)

#6 Updated by mihai.ibanescu@gmail.com almost 4 years ago

Awesome, I have a reproducer.

Downloading this:

http://ubuntu-master.mirror.tudos.de/ubuntu/pool/universe/a/aspell-is/aspell-is_0.51-0-4_all.deb

And then running:

python -c "import sys; from debpkgr import debpkg; debpkg.DebPkg.from_file(sys.argv[1])" aspell-is_0.51-0-4_all.deb

#7 Updated by mihai.ibanescu@gmail.com almost 4 years ago

On first look, the fix is really ugly.

If the package is really old, the filenames may be encoded with ISO-8859-1.

I could try to open the file as utf-8 and fall back to iso-8859-1 if I get a UnicodeDecodeError.

Unfortunately, debian.debfile.DebPart.get_file wraps the file in a codecs.EncodedFile object, but fails to specify the output encoding as UTF-8, so I end up with a bunch of strings in the original encoding.

Subclassing DebFile and overwriting get_file may be the way to go, since I don't know how to provide patches to upstream python-debian

#8 Updated by sbernhard over 3 years ago

The command to test the issue needs to be:

python -c "import sys; from debpkgr import debpkg; debpkg.DebPkg.from_file(sys.argv[1])" aspell-is_0.51-0-4_all.deb

wrote:

Awesome, I have a reproducer.

Downloading this:

http://ubuntu-master.mirror.tudos.de/ubuntu/pool/universe/a/aspell-is/aspell-is_0.51-0-4_all.deb

And then running:

python -c "import sys; from debpkgr import debpkg; debpkg.DebPkg.from_file(sys.argv[1])" ~/Downloads/aspell-is_0.51-0-4_all.deb

#9 Updated by sbernhard over 3 years ago

I'm not 100% sure, but the following file in aspell looks pretty bad:

rw-r--r- root/root 72 2004-03-10 12:23 ./usr/lib/aspell/\355slenska.alias

#10 Updated by mihai.ibanescu@gmail.com over 3 years ago

That is indeed the problem.

It is an old package, that has filenames with a non-UTF8 encoding. That breaks python-debian.

#11 Updated by sbernhard over 3 years ago

I had a look at newer versions of python-debian and it doesn't fix the issue. Therefore, do you know how to fix / add a workaround for python-debpkgr?

BTW, the git repo for python-debian can be found here: https://anonscm.debian.org/gitweb/?p=pkg-python-debian/python-debian.git

#12 Updated by sbernhard over 3 years ago

Hi,

Mihai, please have a look at the attached patch. Please let me know what you think about it.

Another way would be, to "ignore" the md5sums in case a UnicodeDecodeError as "# existance of md5sums in control part is optional". What do you think?

Best regards,
Bernhard

#13 Updated by mihai.ibanescu@gmail.com over 3 years ago

I had started working on a fix, but realized it's ugly, and backed away from it. And then other things became priorities.

I will try to get back to it today/tomorrow.

#14 Updated by mihai.ibanescu@gmail.com over 3 years ago

I've submitted a patch upstream.

Depending on how quickly they respond, I would like to include this in the next version of pulp. Let's see how quickly that goes.

With that patch applied, the fix in debpkgr is much simpler (also attached).

#16 Updated by daviddavis over 3 years ago

  • Platform Release set to 2.16.4

#17 Updated by dustball over 3 years ago

It may be just me, but I haven't managed to get either of those patches working so far (by either I mean the first encoding_issue.patch, as well as deb822.patch + debpkgr.patch), both fail with the same issue.

#18 Updated by mihai.ibanescu@gmail.com over 3 years ago

debpkgr 1.1.0 should include sbernhard's workaround until python-debian merges the fix for python2 dump().

#19 Updated by mihai.ibanescu@gmail.com over 3 years ago

  • Status changed from NEW to MODIFIED

#20 Updated by daviddavis over 3 years ago

  • Status changed from MODIFIED to 5

#21 Updated by daviddavis over 3 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE

#22 Updated by bmbouter over 2 years ago

  • Tags Pulp 2 added

Please register to edit this issue

Also available in: Atom PDF