Issue #3370
closedUnable to sync Ubuntu Trusty Universe
Description
When I'm trying to synchronize the named repository, it fails with "'utf8' codec can't decode byte 0xf1 in position 1186: invalid continuation byte". I think it also happens with the xenial universe repository.
The command used to create the repo was "pulp-admin deb repo create --repo-id 'trusty-universe' --feed 'http://de.archive.ubuntu.com/ubuntu' --releases 'trusty' --components 'universe' --architectures 'amd64' --serve-http true --relative-url trusty-universe", if that matters in any way.
I've attached the journal output.
Files
Updated by dustball almost 7 years ago
- File jessie-main.txt jessie-main.txt added
Aaaand I've found another one, this time in the main repo of debian jessie.
I'm somewhat expecting this to be a packaging bug, but for all I can tell debian and ubuntu work with the "broken" packages.
I've also found this somewhere in debian Wheezy, but looking at the EOL, I doubt that is worth investigating.
Updated by mihai.ibanescu@gmail.com almost 7 years ago
Working on reproducing this locally.
Updated by mihai.ibanescu@gmail.com almost 7 years ago
... and I ran out of space. Is there any chance you can reproduce this on a smaller repository?
Updated by dustball almost 7 years ago
So far I've only found ubuntu universe, and jessie-main. I assume jessie-main is by far smaller, yet you'll still need a sizeable disk. I think it happens on wheezy-main as well, but again. Still somewhat large.
Updated by dustball almost 7 years ago
Update: I've pinned the package for jessie-main to be "aspell-is". I had the hope of reproducing this error in full by creating an empty repo and upload only the broken package. However, while it does not publish the package, the error message is far less clear:
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2204]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._queue_reserved_task[8e15464b-c90f-452c-8b54-18cb68bc7b1f]
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2302]: celery.worker.strategy:INFO: Received task: pulp.server.managers.repo.publish.publish[71c030f1-4ca5-44f9-baa1-ca95c3b69cc3]
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2204]: celery.worker.job:INFO: Task pulp.server.async.tasks._queue_reserved_task[8e15464b-c90f-452c-8b54-18cb68bc7b1f] succeeded in 0.0136467898265s: None
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2302]: celery.worker.strategy:INFO: Received task: pulp.server.async.tasks._release_resource[f45eb252-5f80-4e85-abae-b8f07022ebef]
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2302]: celery.worker.job:INFO: Task pulp.server.managers.repo.publish.publish[71c030f1-4ca5-44f9-baa1-ca95c3b69cc3] succeeded in 0.0346576478332s: {'exception': None, 'repo_id': u'aspell-test', 'traceback': None, 'started': '2018-02-24T17:06:42Z', '_ns':...
Feb 24 18:06:42 pulp01vp.office.noris.de pulp[2302]: celery.worker.job:INFO: Task pulp.server.async.tasks._release_resource[f45eb252-5f80-4e85-abae-b8f07022ebef] succeeded in 0.0030136150308s: None
(And yes, this is everything that shows up in the logs when trying to publish the single broken package.)
Updated by mihai.ibanescu@gmail.com almost 7 years ago
Awesome, I have a reproducer.
Downloading this:
http://ubuntu-master.mirror.tudos.de/ubuntu/pool/universe/a/aspell-is/aspell-is_0.51-0-4_all.deb
And then running:
python -c "import sys; from debpkgr import debpkg; debpkg.DebPkg.from_file(sys.argv[1])" aspell-is_0.51-0-4_all.deb
Updated by mihai.ibanescu@gmail.com almost 7 years ago
On first look, the fix is really ugly.
If the package is really old, the filenames may be encoded with ISO-8859-1.
I could try to open the file as utf-8 and fall back to iso-8859-1 if I get a UnicodeDecodeError.
Unfortunately, debian.debfile.DebPart.get_file wraps the file in a codecs.EncodedFile object, but fails to specify the output encoding as UTF-8, so I end up with a bunch of strings in the original encoding.
Subclassing DebFile and overwriting get_file may be the way to go, since I don't know how to provide patches to upstream python-debian
Updated by sbernhard almost 7 years ago
The command to test the issue needs to be:
python -c "import sys; from debpkgr import debpkg; debpkg.DebPkg.from_file(sys.argv[1])" aspell-is_0.51-0-4_all.deb
mihai.ibanescu@gmail.com wrote:
Awesome, I have a reproducer.
Downloading this:
http://ubuntu-master.mirror.tudos.de/ubuntu/pool/universe/a/aspell-is/aspell-is_0.51-0-4_all.deb
And then running:
python -c "import sys; from debpkgr import debpkg; debpkg.DebPkg.from_file(sys.argv[1])" ~/Downloads/aspell-is_0.51-0-4_all.deb
Updated by sbernhard almost 7 years ago
I'm not 100% sure, but the following file in aspell looks pretty bad:
rw-r--r- root/root 72 2004-03-10 12:23 ./usr/lib/aspell/\355slenska.alias
Updated by mihai.ibanescu@gmail.com almost 7 years ago
That is indeed the problem.
It is an old package, that has filenames with a non-UTF8 encoding. That breaks python-debian.
Updated by sbernhard almost 7 years ago
I had a look at newer versions of python-debian and it doesn't fix the issue. Therefore, do you know how to fix / add a workaround for python-debpkgr?
BTW, the git repo for python-debian can be found here: https://anonscm.debian.org/gitweb/?p=pkg-python-debian/python-debian.git
Updated by sbernhard almost 7 years ago
- File encoding_issue.patch encoding_issue.patch added
Hi,
Mihai, please have a look at the attached patch. Please let me know what you think about it.
Another way would be, to "ignore" the md5sums in case a UnicodeDecodeError as "# existance of md5sums in control part is optional". What do you think?
Best regards,
Bernhard
Updated by mihai.ibanescu@gmail.com almost 7 years ago
I had started working on a fix, but realized it's ugly, and backed away from it. And then other things became priorities.
I will try to get back to it today/tomorrow.
Updated by mihai.ibanescu@gmail.com over 6 years ago
- File deb822.patch deb822.patch added
- File debpkgr.patch debpkgr.patch added
I've submitted a patch upstream.
Depending on how quickly they respond, I would like to include this in the next version of pulp. Let's see how quickly that goes.
With that patch applied, the fix in debpkgr is much simpler (also attached).
Updated by mihai.ibanescu@gmail.com over 6 years ago
Merge request for upstream python-debian: https://salsa.debian.org/python-debian-team/python-debian/merge_requests/4
Updated by dustball over 6 years ago
It may be just me, but I haven't managed to get either of those patches working so far (by either I mean the first encoding_issue.patch, as well as deb822.patch + debpkgr.patch), both fail with the same issue.
Updated by mihai.ibanescu@gmail.com over 6 years ago
debpkgr 1.1.0 should include sbernhard's workaround until python-debian merges the fix for python2 dump().
Updated by mihai.ibanescu@gmail.com over 6 years ago
- Status changed from NEW to MODIFIED
Updated by daviddavis over 6 years ago
- Status changed from 5 to CLOSED - CURRENTRELEASE