Project

Profile

Help

Issue #3499

closed

Attempting to sync Python projects with a lot of packages from PyPI causes sqlite errors

Added by dalley about 6 years ago. Updated over 4 years ago.

Status:
CLOSED - CURRENTRELEASE
Priority:
Normal
Assignee:
Category:
-
Sprint/Milestone:
Start date:
Due date:
Estimated time:
Severity:
2. Medium
Version:
Platform Release:
OS:
Triaged:
Yes
Groomed:
No
Sprint Candidate:
No
Tags:
Sprint:
Sprint 34
Quarter:

Description

Trying to sync only 4 projects (Django, Scipy, Numpy, Pulpcore) will cause fatal sqlite OperationalErrors due to large query size. This has nothing to do with the "database is locked" OperationalErrors we saw recently due to concurrency.

I don't think this is specifically a python plugin problem. The changeset code is creating queries too large for sqlite to handle which is in core's domain.

Even by "sqlite is for development use only, don't use in production" standards, IMO this is very limiting. As a user I wouldn't expect Pulp to fall over syncing only a couple of projects, even if it expands to a couple hundred packages (I haven't counted but that sounds reasonable).

Importer settings:

http POST http://localhost:8000/api/v3/importers/python/ name='bar' download_policy='immediate' sync_mode='additive' repository=$REPO_HREF feed_url='https://pypi.python.org/pypi' projects='["D
jango", "pulpcore", "scipy", "numpy"]'  

Error output:

Mar 18 01:10:40 pulp3.dev celery[25219]: [2018-03-18 01:10:40,079: ERROR/ForkPoolWorker-1] Task failed : [61cf9bd4-456b-41e7-a020-5d6984dfe5f4]                                                                    
Mar 18 01:10:40 pulp3.dev celery[25219]: [2018-03-18 01:10:40,090: ERROR/ForkPoolWorker-1] Task pulp_python.app.tasks.sync.sync[61cf9bd4-456b-41e7-a020-5d6984dfe5f4] raised unexpected: OperationalError('Expressi
on tree is too large (maximum depth 1000)',)                                                                                                                                                                       
Mar 18 01:10:40 pulp3.dev celery[25219]: Traceback (most recent call last):                                                                                                                                        
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute                                                   
Mar 18 01:10:40 pulp3.dev celery[25219]:     return self.cursor.execute(sql, params)                                                                                                                               
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/sqlite3/base.py", line 328, in execute                                           
Mar 18 01:10:40 pulp3.dev celery[25219]:     return Database.Cursor.execute(self, query, params)                                                                                                                   
Mar 18 01:10:40 pulp3.dev celery[25219]: sqlite3.OperationalError: Expression tree is too large (maximum depth 1000)                                                                                               
Mar 18 01:10:40 pulp3.dev celery[25219]: The above exception was the direct cause of the following exception:                                                                                                      
Mar 18 01:10:40 pulp3.dev celery[25219]: Traceback (most recent call last):                                                                                                                                        
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/celery/app/trace.py", line 374, in trace_task                                                       
Mar 18 01:10:40 pulp3.dev celery[25219]:     R = retval = fun(*args, **kwargs)                                                                                                                                     
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/pulpcore/pulpcore/tasking/tasks.py", line 273, in __call__                                                                               
Mar 18 01:10:40 pulp3.dev celery[25219]:     return super(UserFacingTask, self).__call__(*args, **kwargs)  
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/celery/app/trace.py", line 629, in __protected_call__                                        [0/353]
Mar 18 01:10:40 pulp3.dev celery[25219]:     return self.run(*args, **kwargs)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp_python/pulp_python/app/tasks/sync.py", line 62, in sync
Mar 18 01:10:40 pulp3.dev celery[25219]:     changeset.apply_and_drain()
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/main.py", line 218, in apply_and_drain
Mar 18 01:10:40 pulp3.dev celery[25219]:     for x in self.apply():
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/main.py", line 196, in apply
Mar 18 01:10:40 pulp3.dev celery[25219]:     for report in itertools.chain(self._apply_additions(), self._apply_removals()):
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/main.py", line 138, in _apply_additions
Mar 18 01:10:40 pulp3.dev celery[25219]:     for artifact, download in downloads:
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 283, in _iter
Mar 18 01:10:40 pulp3.dev celery[25219]:     for batch in BatchIterator(downloads, self.concurrent):
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 49, in __iter__
Mar 18 01:10:40 pulp3.dev celery[25219]:     batch = tuple(itertools.islice(generator, 0, self.batch))
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 47, in <genexpr>
Mar 18 01:10:40 pulp3.dev celery[25219]:     generator = (c for c in self.iterable)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 282, in <genexpr>
Mar 18 01:10:40 pulp3.dev celery[25219]:     downloads = ((a, a.downloader) for a in artifacts)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 232, in _iter
Mar 18 01:10:40 pulp3.dev celery[25219]:     for batch in self._batch_artifacts():
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 49, in __iter__
Mar 18 01:10:40 pulp3.dev celery[25219]:     batch = tuple(itertools.islice(generator, 0, self.batch))
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 47, in <genexpr>
Mar 18 01:10:40 pulp3.dev celery[25219]:     generator = (c for c in self.iterable)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 186, in build
Mar 18 01:10:40 pulp3.dev celery[25219]:     for content in self.content:
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 129, in _iter
Mar 18 01:10:40 pulp3.dev celery[25219]:     for batch, fetched in self._fetch():
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/devel/pulp/plugin/pulpcore/plugin/changeset/iterator.py", line 117, in _fetch
Mar 18 01:10:40 pulp3.dev celery[25219]:     fetched = {c.natural_key(): c for c in q_set}
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 250, in __iter__
Mar 18 01:10:40 pulp3.dev celery[25219]:     self._fetch_all()
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 1118, in _fetch_all
Mar 18 01:10:40 pulp3.dev celery[25219]:     self._result_cache = list(self._iterable_class(self))
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/query.py", line 53, in __iter__
Mar 18 01:10:40 pulp3.dev celery[25219]:     results = compiler.execute_sql(chunked_fetch=self.chunked_fetch)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 899, in execute_sql
Mar 18 01:10:40 pulp3.dev celery[25219]:     raise original_exception
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 889, in execute_sql
Mar 18 01:10:40 pulp3.dev celery[25219]:     cursor.execute(sql, params)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 79, in execute
Mar 18 01:10:40 pulp3.dev celery[25219]:     return super(CursorDebugWrapper, self).execute(sql, params)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
Mar 18 01:10:40 pulp3.dev celery[25219]:     return self.cursor.execute(sql, params)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/utils.py", line 94, in __exit__
Mar 18 01:10:40 pulp3.dev celery[25219]:     six.reraise(dj_exc_type, dj_exc_value, traceback)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/utils/six.py", line 685, in reraise
Mar 18 01:10:40 pulp3.dev celery[25219]:     raise value.with_traceback(tb)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/utils.py", line 64, in execute
Mar 18 01:10:40 pulp3.dev celery[25219]:     return self.cursor.execute(sql, params)
Mar 18 01:10:40 pulp3.dev celery[25219]:   File "/home/vagrant/.virtualenvs/pulp/lib64/python3.6/site-packages/django/db/backends/sqlite3/base.py", line 328, in execute
Mar 18 01:10:40 pulp3.dev celery[25219]:     return Database.Cursor.execute(self, query, params)
Mar 18 01:10:40 pulp3.dev celery[25219]: django.db.utils.OperationalError: Expression tree is too large (maximum depth 1000)
Actions #1

Updated by dalley about 6 years ago

  • Subject changed from Attempting to sync 4 packages with lots of distributions from PyPI causes sqlite errors to Attempting to sync packages with lots of distributions from PyPI causes sqlite errors
Actions #2

Updated by dalley about 6 years ago

  • Project changed from Python Support to Pulp
  • Description updated (diff)
Actions #3

Updated by dalley about 6 years ago

  • Description updated (diff)
Actions #4

Updated by dalley about 6 years ago

  • Description updated (diff)
Actions #5

Updated by dalley about 6 years ago

  • Subject changed from Attempting to sync packages with lots of distributions from PyPI causes sqlite errors to Attempting to sync Python projects with a lot of packages from PyPI causes sqlite errors
Actions #6

Updated by dalley about 6 years ago

  • Description updated (diff)
Actions #7

Updated by dalley about 6 years ago

  • Description updated (diff)
Actions #8

Updated by dalley about 6 years ago

Those 4 packages collectively represent something close to the borderline between working and failing - dropping one of them will result in a sync that succeeds.

Actions #9

Updated by daviddavis about 6 years ago

This is a limitation of sqlite in that a query can only have so many expressions. Oracle has the same limitation of 1000.

One solution would be to slice up the query.

Actions #10

Updated by dalley about 6 years ago

If that is the case then would it also be likely that we would run into the same problem with Postgres if scaled up a bit?

(Honestly, I haven't tested this with Postgres, so perhaps we would hit the same problem with this same query)

Actions #11

Updated by daviddavis about 6 years ago

I don't think PostgreSQL has an expression limit or at least I haven't hit it.

Actions #12

Updated by dalley about 6 years ago

You're right, the limit for Postgres is 1gb which is effectively nonexistant.

Actions #13

Updated by bmbouter about 6 years ago

I think sqlite is pointing out that our query is complicated. I would see a solution that would lower our code's query complexity in this area.

Actions #14

Updated by dalley about 6 years ago

  • Status changed from NEW to MODIFIED
Actions #15

Updated by dalley about 6 years ago

  • Status changed from MODIFIED to POST
  • Assignee set to dalley

Added by dalley about 6 years ago

Revision 5ade473a | View on GitHub

Set batch size to 800 to resolve sqlite issues

Prevents Pulp from running into sqlite's evaluation tree size limit.

closes #3499 https://pulp.plan.io/issues/3499

Added by dalley about 6 years ago

Revision 5ade473a | View on GitHub

Set batch size to 800 to resolve sqlite issues

Prevents Pulp from running into sqlite's evaluation tree size limit.

closes #3499 https://pulp.plan.io/issues/3499

Actions #16

Updated by dalley about 6 years ago

  • Triaged changed from No to Yes
  • Sprint set to Sprint 34
Actions #17

Updated by dalley about 6 years ago

  • Status changed from POST to MODIFIED

Added by bmbouter about 6 years ago

Revision 9727d21a | View on GitHub

Uses smaller Changeset batches

This resolves an issue where sqlite was emitting 'Too Many SQL Variables' error when mirroring parts of Galaxy.

re #3499 https://pulp.plan.io/issues/3499

Added by bmbouter about 6 years ago

Revision 9727d21a | View on GitHub

Uses smaller Changeset batches

This resolves an issue where sqlite was emitting 'Too Many SQL Variables' error when mirroring parts of Galaxy.

re #3499 https://pulp.plan.io/issues/3499

Actions #18

Updated by dkliban@redhat.com about 6 years ago

  • Sprint/Milestone set to 3.0.0
Actions #19

Updated by bmbouter almost 5 years ago

  • Tags deleted (Pulp 3, Pulp 3 MVP)
Actions #20

Updated by bmbouter over 4 years ago

  • Status changed from MODIFIED to CLOSED - CURRENTRELEASE

Also available in: Atom PDF