Project

Profile

Help

Issue #3370

closed

Unable to sync Ubuntu Trusty Universe

Added by dustball over 6 years ago. Updated over 5 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
-
Sprint/Milestone:
-
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version - Debian:
Platform Release:
2.16.4
Target Release - Debian:
OS:
RHEL 7
Triaged:
No
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

When I'm trying to synchronize the named repository, it fails with "'utf8' codec can't decode byte 0xf1 in position 1186: invalid continuation byte". I think it also happens with the xenial universe repository.

The command used to create the repo was "pulp-admin deb repo create --repo-id 'trusty-universe' --feed 'http://de.archive.ubuntu.com/ubuntu' --releases 'trusty' --components 'universe' --architectures 'amd64' --serve-http true --relative-url trusty-universe", if that matters in any way.

I've attached the journal output.


Files

journal.txt (17.5 KB) journal.txt dustball, 02/13/2018 09:12 PM
jessie-main.txt (14.1 KB) jessie-main.txt dustball, 02/16/2018 10:42 AM
encoding_issue.patch (4.71 KB) encoding_issue.patch sbernhard, 03/13/2018 12:42 PM
deb822.patch (2.37 KB) deb822.patch mihai.ibanescu@gmail.com, 07/10/2018 09:56 PM
debpkgr.patch (3.32 KB) debpkgr.patch mihai.ibanescu@gmail.com, 07/10/2018 09:59 PM
Actions #1

Updated by dustball over 6 years ago

Aaaand I've found another one, this time in the main repo of debian jessie.

I'm somewhat expecting this to be a packaging bug, but for all I can tell debian and ubuntu work with the "broken" packages.

I've also found this somewhere in debian Wheezy, but looking at the EOL, I doubt that is worth investigating.

Actions #2

Updated by mihai.ibanescu@gmail.com over 6 years ago

Working on reproducing this locally.

Actions #3

Updated by mihai.ibanescu@gmail.com over 6 years ago

... and I ran out of space. Is there any chance you can reproduce this on a smaller repository?

Actions #4

Updated by dustball over 6 years ago

So far I've only found ubuntu universe, and jessie-main. I assume jessie-main is by far smaller, yet you'll still need a sizeable disk. I think it happens on wheezy-main as well, but again. Still somewhat large.

Actions #5

Updated by dustball over 6 years ago

Update: I've pinned the package for jessie-main to be "aspell-is". I had the hope of reproducing this error in full by creating an empty repo and upload only the broken package. However, while it does not publish the package, the error message is far less clear:

Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2204]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[8e15464b-c90f-452c-8b54-18cb68bc7b1f]
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2302]: celery.worker.strategy:INFO: Received task: pulp.server.managers.repo.publish.publish[71c030f1-4ca5-44f9-baa1-ca95c3b69cc3]
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2204]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[8e15464b-c90f-452c-8b54-18cb68bc7b1f] succeeded in 0.0136467898265s: None
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2302]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._release_resource[f45eb252-5f80-4e85-abae-b8f07022ebef]
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2302]: celery.worker.job:INFO: Task pulp.server.managers.repo.publish.publish[71c030f1-4ca5-44f9-baa1-ca95c3b69cc3] succeeded in 0.0346576478332s: {'exception': None, 'repo_id': u'aspell-test', 'traceback': None, 'started': '2018-02-24T17:06:42Z', '_ns':...
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2302]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[f45eb252-5f80-4e85-abae-b8f07022ebef] succeeded in 0.0030136150308s: None

(And yes, this is everything that shows up in the logs when trying to publish the single broken package.)

Actions #6

Updated by mihai.ibanescu@gmail.com over 6 years ago

Awesome, I have a reproducer.

Downloading this:

http://ubuntu-master.mirror.tudos.de/ubuntu/pool/universe/a/aspell-is/aspell-is_0.51-0-4_all.deb

And then running:

python -c "import sys; from debpkgr import debpkg; debpkg.DebPkg.from_file(sys.argv[1])" aspell-is_0.51-0-4_all.deb
Actions #7

Updated by mihai.ibanescu@gmail.com over 6 years ago

On first look, the fix is really ugly.

If the package is really old, the filenames may be encoded with ISO-8859-1.

I could try to open the file as utf-8 and fall back to iso-8859-1 if I get a UnicodeDecodeError.

Unfortunately, debian.debfile.DebPart.get_file wraps the file in a codecs.EncodedFile object, but fails to specify the output encoding as UTF-8, so I end up with a bunch of strings in the original encoding.

Subclassing DebFile and overwriting get_file may be the way to go, since I don't know how to provide patches to upstream python-debian

Actions #8

Updated by sbernhard over 6 years ago

The command to test the issue needs to be:

python -c "import sys; from debpkgr import debpkg; debpkg.DebPkg.from_file(sys.argv[1])" aspell-is_0.51-0-4_all.deb

wrote:

Awesome, I have a reproducer.

Downloading this:

http://ubuntu-master.mirror.tudos.de/ubuntu/pool/universe/a/aspell-is/aspell-is_0.51-0-4_all.deb

And then running:

python -c "import sys; from debpkgr import debpkg; debpkg.DebPkg.from_file(sys.argv[1])" ~/Downloads/aspell-is_0.51-0-4_all.deb

Actions #9

Updated by sbernhard over 6 years ago

I'm not 100% sure, but the following file in aspell looks pretty bad:

rw-r--r- root/root 72 2004-03-10 12:23 ./usr/lib/aspell/\355slenska.alias

Actions #10

Updated by mihai.ibanescu@gmail.com over 6 years ago

That is indeed the problem.

It is an old package, that has filenames with a non-UTF8 encoding. That breaks python-debian.

Actions #11

Updated by sbernhard over 6 years ago

I had a look at newer versions of python-debian and it doesn't fix the issue. Therefore, do you know how to fix / add a workaround for python-debpkgr?

BTW, the git repo for python-debian can be found here: https://anonscm.debian.org/gitweb/?p=pkg-python-debian/python-debian.git

Actions #12

Updated by sbernhard over 6 years ago

Hi,

Mihai, please have a look at the attached patch. Please let me know what you think about it.

Another way would be, to "ignore" the md5sums in case a UnicodeDecodeError as "# existance of md5sums in control part is optional". What do you think?

Best regards,
Bernhard

Actions #13

Updated by mihai.ibanescu@gmail.com over 6 years ago

I had started working on a fix, but realized it's ugly, and backed away from it. And then other things became priorities.

I will try to get back to it today/tomorrow.

Actions #14

Updated by mihai.ibanescu@gmail.com about 6 years ago

I've submitted a patch upstream.

Depending on how quickly they respond, I would like to include this in the next version of pulp. Let's see how quickly that goes.

With that patch applied, the fix in debpkgr is much simpler (also attached).

Actions #16

Updated by daviddavis about 6 years ago

  • Platform Release set to 2.16.4
Actions #17

Updated by dustball about 6 years ago

It may be just me, but I haven't managed to get either of those patches working so far (by either I mean the first encoding_issue.patch, as well as deb822.patch + debpkgr.patch), both fail with the same issue.

Actions #18

Updated by mihai.ibanescu@gmail.com about 6 years ago

debpkgr 1.1.0 should include sbernhard's workaround until python-debian merges the fix for python2 dump().

Actions #19

Updated by mihai.ibanescu@gmail.com about 6 years ago

  • Status changed from NEW to MODIFIED
Actions #20

Updated by daviddavis almost 6 years ago

  • Status changed from MODIFIED to 5
Actions #21

Updated by daviddavis almost 6 years ago

  • Status changed from 5 to CLOSED - CURRENTRELEASE
Actions #22

Updated by bmbouter over 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF