Issue #3499
closedAttempting to sync Python projects with a lot of packages from PyPI causes sqlite errors
Description
Trying to sync only 4 projects (Django, Scipy, Numpy, Pulpcore) will cause fatal sqlite OperationalErrors due to large query size. This has nothing to do with the "database is locked" OperationalErrors we saw recently due to concurrency.
I don't think this is specifically a python plugin problem. The changeset code is creating queries too large for sqlite to handle which is in core's domain.
Even by "sqlite is for development use only, don't use in production" standards, IMO this is very limiting. As a user I wouldn't expect Pulp to fall over syncing only a couple of projects, even if it expands to a couple hundred packages (I haven't counted but that sounds reasonable).
Importer settings:
http POST http://localhost:8000/api/v3/importers/python/ name='bar' download_policy='immediate' sync_mode='additive' repository=$REPO_HREF feed_url='https://pypi.python.org/pypi' projects='["D
jango", "pulpcore", "scipy", "numpy"]'
Error output:
Mar 18 01:10:40 pulp3.dev celery[25219]: [2018-03-18 01:10:40,079: ERROR/ForkPoolWorker-1] Task failed : [61cf9bd4-456b-41e7-a020-5d6984dfe5f4]
Mar 18 01:10:40 pulp3.dev celery[25219]: [2018-03-18 01:10:40,090: ERROR/ForkPoolWorker-1] Task pulp_python.app.tasks.sync.sync[61cf9bd4-456b-41e7-a020-5d6984dfe5f4] raised unexpected: OperationalError('Expressi
on tree is too large (maximum depth 1000)',)
Mar 18 01:10:40 pulp3.dev celery[25219]: Traceback (most recent call last):
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
Mar 18 01:10:40 pulp3.dev celery[25219]: return self.cursor.execute(sql, params)
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/sqlite3/base.py", line 328, in execute
Mar 18 01:10:40 pulp3.dev celery[25219]: return Database.Cursor.execute(self, query, params)
Mar 18 01:10:40 pulp3.dev celery[25219]: sqlite3.OperationalError: Expression tree is too large (maximum depth 1000)
Mar 18 01:10:40 pulp3.dev celery[25219]: The above exception was the direct cause of the following exception:
Mar 18 01:10:40 pulp3.dev celery[25219]: Traceback (most recent call last):
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/celery/app/trace.py", line 374, in trace_task
Mar 18 01:10:40 pulp3.dev celery[25219]: R = retval = fun(*args, **kwargs)
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp/pulpcore/pulpcore/tasking/tasks.py", line 273, in __call__
Mar 18 01:10:40 pulp3.dev celery[25219]: return super(UserFacingTask, self).__call__(*args, **kwargs)
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/celery/app/trace.py", line 629, in __protected_call__ [0/353]
Mar 18 01:10:40 pulp3.dev celery[25219]: return self.run(*args, **kwargs)
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp_python/pulp_python/app/tasks/sync.py", line 62, in sync
Mar 18 01:10:40 pulp3.dev celery[25219]: changeset.apply_and_drain()
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/main.py", line 218, in apply_and_drain
Mar 18 01:10:40 pulp3.dev celery[25219]: for x in self.apply():
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/main.py", line 196, in apply
Mar 18 01:10:40 pulp3.dev celery[25219]: for report in itertools.chain(self._apply_additions(), self._apply_removals()):
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/main.py", line 138, in _apply_additions
Mar 18 01:10:40 pulp3.dev celery[25219]: for artifact, download in downloads:
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 283, in _iter
Mar 18 01:10:40 pulp3.dev celery[25219]: for batch in BatchIterator(downloads, self.concurrent):
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 49, in __iter__
Mar 18 01:10:40 pulp3.dev celery[25219]: batch = tuple(itertools.islice(generator, 0, self.batch))
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 47, in <genexpr>
Mar 18 01:10:40 pulp3.dev celery[25219]: generator = (c for c in self.iterable)
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 282, in <genexpr>
Mar 18 01:10:40 pulp3.dev celery[25219]: downloads = ((a, a.downloader) for a in artifacts)
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 232, in _iter
Mar 18 01:10:40 pulp3.dev celery[25219]: for batch in self._batch_artifacts():
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 49, in __iter__
Mar 18 01:10:40 pulp3.dev celery[25219]: batch = tuple(itertools.islice(generator, 0, self.batch))
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 47, in <genexpr>
Mar 18 01:10:40 pulp3.dev celery[25219]: generator = (c for c in self.iterable)
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 186, in build
Mar 18 01:10:40 pulp3.dev celery[25219]: for content in self.content:
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 129, in _iter
Mar 18 01:10:40 pulp3.dev celery[25219]: for batch, fetched in self._fetch():
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 117, in _fetch
Mar 18 01:10:40 pulp3.dev celery[25219]: fetched = {c.natural_key(): c for c in q_set}
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 250, in __iter__
Mar 18 01:10:40 pulp3.dev celery[25219]: self._fetch_all()
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 1118, in _fetch_all
Mar 18 01:10:40 pulp3.dev celery[25219]: self._result_cache = list(self._iterable_class(self))
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 53, in __iter__
Mar 18 01:10:40 pulp3.dev celery[25219]: results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 899, in execute_sql
Mar 18 01:10:40 pulp3.dev celery[25219]: raise original_exception
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 889, in execute_sql
Mar 18 01:10:40 pulp3.dev celery[25219]: cursor.execute(sql, params)
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 79, in execute
Mar 18 01:10:40 pulp3.dev celery[25219]: return super(CursorDebugWrapper, self).execute(sql, params)
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
Mar 18 01:10:40 pulp3.dev celery[25219]: return self.cursor.execute(sql, params)
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/utils.py", line 94, in __exit__
Mar 18 01:10:40 pulp3.dev celery[25219]: six.reraise(dj_exc_type, dj_exc_value, traceback)
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/utils/six.py", line 685, in reraise
Mar 18 01:10:40 pulp3.dev celery[25219]: raise value.with_traceback(tb)
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
Mar 18 01:10:40 pulp3.dev celery[25219]: return self.cursor.execute(sql, params)
Mar 18 01:10:40 pulp3.dev celery[25219]: File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/sqlite3/base.py", line 328, in execute
Mar 18 01:10:40 pulp3.dev celery[25219]: return Database.Cursor.execute(self, query, params)
Mar 18 01:10:40 pulp3.dev celery[25219]: django.db.utils.OperationalError: Expression tree is too large (maximum depth 1000)
Updated by dalley almost 7 years ago
- Subject changed from Attempting to sync 4 packages with lots of distributions from PyPI causes sqlite errors to Attempting to sync packages with lots of distributions from PyPI causes sqlite errors
Updated by dalley almost 7 years ago
- Project changed from Python Support to Pulp
- Description updated (diff)
Updated by dalley almost 7 years ago
- Subject changed from Attempting to sync packages with lots of distributions from PyPI causes sqlite errors to Attempting to sync Python projects with a lot of packages from PyPI causes sqlite errors
Updated by dalley almost 7 years ago
Those 4 packages collectively represent something close to the borderline between working and failing - dropping one of them will result in a sync that succeeds.
Updated by daviddavis almost 7 years ago
This is a limitation of sqlite in that a query can only have so many expressions. Oracle has the same limitation of 1000.
One solution would be to slice up the query.
Updated by dalley almost 7 years ago
If that is the case then would it also be likely that we would run into the same problem with Postgres if scaled up a bit?
(Honestly, I haven't tested this with Postgres, so perhaps we would hit the same problem with this same query)
Updated by daviddavis almost 7 years ago
I don't think PostgreSQL has an expression limit or at least I haven't hit it.
Updated by dalley almost 7 years ago
You're right, the limit for Postgres is 1gb which is effectively nonexistant.
Updated by bmbouter almost 7 years ago
I think sqlite is pointing out that our query is complicated. I would see a solution that would lower our code's query complexity in this area.
Updated by dalley almost 7 years ago
- Status changed from MODIFIED to POST
- Assignee set to dalley
Added by dalley almost 7 years ago
Added by dalley almost 7 years ago
Revision 5ade473a | View on GitHub
Set batch size to 800 to resolve sqlite issues
Prevents Pulp from running into sqlite's evaluation tree size limit.
Updated by dalley almost 7 years ago
- Triaged changed from No to Yes
- Sprint set to Sprint 34
Updated by dalley almost 7 years ago
- Status changed from POST to MODIFIED
Applied in changeset pulp|5ade473a0325446d7d66d007d1de71f00b731ea7.
Added by bmbouter almost 7 years ago
Revision 9727d21a | View on GitHub
Uses smaller Changeset batches
This resolves an issue where sqlite was emitting 'Too Many SQL Variables' error when mirroring parts of Galaxy.
Added by bmbouter almost 7 years ago
Revision 9727d21a | View on GitHub
Uses smaller Changeset batches
This resolves an issue where sqlite was emitting 'Too Many SQL Variables' error when mirroring parts of Galaxy.
Updated by bmbouter about 5 years ago
- Status changed from MODIFIED to CLOSED - CURRENTRELEASE
Set batch size to 800 to resolve sqlite issues
Prevents Pulp from running into sqlite's evaluation tree size limit.
closes #3499 https://pulp.plan.io/issues/3499