Project

Profile

Help

Issue #3499

Attempting to sync Python projects with a lot of packages from PyPI causes sqlite errors

Added by dalley over 1 year ago. Updated 6 months ago.

Status:
MODIFIED
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Severity:
2. Medium
Version:
Platform Release:
Blocks Release:
OS:
Backwards Incompatible:
No
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
QA Contact:
Complexity:
Smash Test:
Verified:
No
Verification Required:
No
Sprint:
Sprint 34

Description

Trying to sync only 4 projects (Django, Scipy, Numpy, Pulpcore) will cause fatal sqlite OperationalErrors due to large query size. This has nothing to do with the "database is locked" OperationalErrors we saw recently due to concurrency.

I don't think this is specifically a python plugin problem. The changeset code is creating queries too large for sqlite to handle which is in core's domain.

Even by "sqlite is for development use only, don't use in production" standards, IMO this is very limiting. As a user I wouldn't expect Pulp to fall over syncing only a couple of projects, even if it expands to a couple hundred packages (I haven't counted but that sounds reasonable).

Importer settings:

http POST http://localhost:8000/api/v3/importers/python/ name='bar' download_policy='immediate' sync_mode='additive' repository=$REPO_HREF feed_url='https://pypi.python.org/pypi' projects='["D
jango", "pulpcore", "scipy", "numpy"]'  

Error output:

Mar 18 01:10:40 pulp3.dev celery[25219]: [2018-03-18 01:10:40,079: ERROR/ForkPoolWorker-1] Task failed : [61cf9bd4-456b-41e7-a020-5d6984dfe5f4]                                                                    
Mar 18 01:10:40 pulp3.dev celery[25219]: [2018-03-18 01:10:40,090: ERROR/ForkPoolWorker-1] Task pulp_python.app.tasks.sync.sync[61cf9bd4-456b-41e7-a020-5d6984dfe5f4] raised unexpected: OperationalError('Expressi
on tree is too large (maximum depth 1000)',)                                                                                                                                                                       
Mar 18 01:10:40 pulp3.dev celery[25219]: Traceback (most recent call last):                                                                                                                                        
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute                                                   
Mar 18 01:10:40 pulp3.dev celery[25219]:     return self.cursor.execute(sql, params)                                                                                                                               
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/sqlite3/base.py", line 328, in execute                                           
Mar 18 01:10:40 pulp3.dev celery[25219]:     return Database.Cursor.execute(self, query, params)                                                                                                                   
Mar 18 01:10:40 pulp3.dev celery[25219]: sqlite3.OperationalError: Expression tree is too large (maximum depth 1000)                                                                                               
Mar 18 01:10:40 pulp3.dev celery[25219]: The above exception was the direct cause of the following exception:                                                                                                      
Mar 18 01:10:40 pulp3.dev celery[25219]: Traceback (most recent call last):                                                                                                                                        
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/celery/app/trace.py", line 374, in trace_task                                                       
Mar 18 01:10:40 pulp3.dev celery[25219]:     R = retval = fun(*args, **kwargs)                                                                                                                                     
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/pulpcore/pulpcore/tasking/tasks.py", line 273, in __call__                                                                               
Mar 18 01:10:40 pulp3.dev celery[25219]:     return super(UserFacingTask, self).__call__(*args, **kwargs)  
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/celery/app/trace.py", line 629, in __protected_call__                                        [0/353]
Mar 18 01:10:40 pulp3.dev celery[25219]:     return self.run(*args, **kwargs)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp_python/pulp_python/app/tasks/sync.py", line 62, in sync
Mar 18 01:10:40 pulp3.dev celery[25219]:     changeset.apply_and_drain()
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/main.py", line 218, in apply_and_drain
Mar 18 01:10:40 pulp3.dev celery[25219]:     for x in self.apply():
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/main.py", line 196, in apply
Mar 18 01:10:40 pulp3.dev celery[25219]:     for report in itertools.chain(self._apply_additions(), self._apply_removals()):
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/main.py", line 138, in _apply_additions
Mar 18 01:10:40 pulp3.dev celery[25219]:     for artifact, download in downloads:
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 283, in _iter
Mar 18 01:10:40 pulp3.dev celery[25219]:     for batch in BatchIterator(downloads, self.concurrent):
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 49, in __iter__
Mar 18 01:10:40 pulp3.dev celery[25219]:     batch = tuple(itertools.islice(generator, 0, self.batch))
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 47, in <genexpr>
Mar 18 01:10:40 pulp3.dev celery[25219]:     generator = (c for c in self.iterable)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 282, in <genexpr>
Mar 18 01:10:40 pulp3.dev celery[25219]:     downloads = ((a, a.downloader) for a in artifacts)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 232, in _iter
Mar 18 01:10:40 pulp3.dev celery[25219]:     for batch in self._batch_artifacts():
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 49, in __iter__
Mar 18 01:10:40 pulp3.dev celery[25219]:     batch = tuple(itertools.islice(generator, 0, self.batch))
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 47, in <genexpr>
Mar 18 01:10:40 pulp3.dev celery[25219]:     generator = (c for c in self.iterable)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 186, in build
Mar 18 01:10:40 pulp3.dev celery[25219]:     for content in self.content:
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 129, in _iter
Mar 18 01:10:40 pulp3.dev celery[25219]:     for batch, fetched in self._fetch():
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 117, in _fetch
Mar 18 01:10:40 pulp3.dev celery[25219]:     fetched = {c.natural_key(): c for c in q_set}
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 250, in __iter__
Mar 18 01:10:40 pulp3.dev celery[25219]:     self._fetch_all()
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 1118, in _fetch_all
Mar 18 01:10:40 pulp3.dev celery[25219]:     self._result_cache = list(self._iterable_class(self))
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 53, in __iter__
Mar 18 01:10:40 pulp3.dev celery[25219]:     results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 899, in execute_sql
Mar 18 01:10:40 pulp3.dev celery[25219]:     raise original_exception
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 889, in execute_sql
Mar 18 01:10:40 pulp3.dev celery[25219]:     cursor.execute(sql, params)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 79, in execute
Mar 18 01:10:40 pulp3.dev celery[25219]:     return super(CursorDebugWrapper, self).execute(sql, params)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
Mar 18 01:10:40 pulp3.dev celery[25219]:     return self.cursor.execute(sql, params)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/utils.py", line 94, in __exit__
Mar 18 01:10:40 pulp3.dev celery[25219]:     six.reraise(dj_exc_type, dj_exc_value, traceback)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/utils/six.py", line 685, in reraise
Mar 18 01:10:40 pulp3.dev celery[25219]:     raise value.with_traceback(tb)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
Mar 18 01:10:40 pulp3.dev celery[25219]:     return self.cursor.execute(sql, params)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/sqlite3/base.py", line 328, in execute
Mar 18 01:10:40 pulp3.dev celery[25219]:     return Database.Cursor.execute(self, query, params)
Mar 18 01:10:40 pulp3.dev celery[25219]: django.db.utils.OperationalError: Expression tree is too large (maximum depth 1000)

Associated revisions

Revision 5ade473a View on GitHub
Added by dalley over 1 year ago

Set batch size to 800 to resolve sqlite issues

Prevents Pulp from running into sqlite's evaluation tree size limit.

closes #3499
https://pulp.plan.io/issues/3499

Revision 5ade473a View on GitHub
Added by dalley over 1 year ago

Set batch size to 800 to resolve sqlite issues

Prevents Pulp from running into sqlite's evaluation tree size limit.

closes #3499
https://pulp.plan.io/issues/3499

Revision 5ade473a View on GitHub
Added by dalley over 1 year ago

Set batch size to 800 to resolve sqlite issues

Prevents Pulp from running into sqlite's evaluation tree size limit.

closes #3499
https://pulp.plan.io/issues/3499

Revision 9727d21a View on GitHub
Added by bmbouter over 1 year ago

Uses smaller Changeset batches

This resolves an issue where sqlite was emitting 'Too Many SQL
Variables' error when mirroring parts of Galaxy.

re #3499
https://pulp.plan.io/issues/3499

Revision 9727d21a View on GitHub
Added by bmbouter over 1 year ago

Uses smaller Changeset batches

This resolves an issue where sqlite was emitting 'Too Many SQL
Variables' error when mirroring parts of Galaxy.

re #3499
https://pulp.plan.io/issues/3499

Revision 9727d21a View on GitHub
Added by bmbouter over 1 year ago

Uses smaller Changeset batches

This resolves an issue where sqlite was emitting 'Too Many SQL
Variables' error when mirroring parts of Galaxy.

re #3499
https://pulp.plan.io/issues/3499

History

#1 Updated by dalley over 1 year ago

  • Subject changed from Attempting to sync 4 packages with lots of distributions from PyPI causes sqlite errors to Attempting to sync packages with lots of distributions from PyPI causes sqlite errors

#2 Updated by dalley over 1 year ago

  • Project changed from Python Support to Pulp
  • Description updated (diff)

#3 Updated by dalley over 1 year ago

  • Description updated (diff)

#4 Updated by dalley over 1 year ago

  • Description updated (diff)

#5 Updated by dalley over 1 year ago

  • Subject changed from Attempting to sync packages with lots of distributions from PyPI causes sqlite errors to Attempting to sync Python projects with a lot of packages from PyPI causes sqlite errors

#6 Updated by dalley over 1 year ago

  • Description updated (diff)

#7 Updated by dalley over 1 year ago

  • Description updated (diff)

#8 Updated by dalley over 1 year ago

Those 4 packages collectively represent something close to the borderline between working and failing - dropping one of them will result in a sync that succeeds.

#9 Updated by daviddavis over 1 year ago

This is a limitation of sqlite in that a query can only have so many expressions. Oracle has the same limitation of 1000.

One solution would be to slice up the query.

#10 Updated by dalley over 1 year ago

If that is the case then would it also be likely that we would run into the same problem with Postgres if scaled up a bit?

(Honestly, I haven't tested this with Postgres, so perhaps we would hit the same problem with this same query)

#11 Updated by daviddavis over 1 year ago

I don't think PostgreSQL has an expression limit or at least I haven't hit it.

#12 Updated by dalley over 1 year ago

You're right, the limit for Postgres is 1gb which is effectively nonexistant.

#13 Updated by bmbouter over 1 year ago

I think sqlite is pointing out that our query is complicated. I would see a solution that would lower our code's query complexity in this area.

#14 Updated by dalley over 1 year ago

  • Status changed from NEW to MODIFIED

#15 Updated by dalley over 1 year ago

  • Status changed from MODIFIED to POST
  • Assignee set to dalley

#16 Updated by dalley over 1 year ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 34

#17 Updated by dalley over 1 year ago

  • Status changed from POST to MODIFIED

#18 Updated by dkliban@redhat.com over 1 year ago

  • Sprint/Milestone set to 3.0

#19 Updated by bmbouter 6 months ago

  • Tags deleted (Pulp 3, Pulp 3 MVP)

Please register to edit this issue

Also available in: Atom PDF