Project

Profile

Help

Refactor #2169

closed

Suggestion: Start using string format instead of Django templating in generation of repodata

Added by fdobrovo over 7 years ago. Updated about 5 years ago.

Status:
CLOSED - WONTFIX
Priority:
Normal
Assignee:
-
Sprint/Milestone:
-
Start date:
Due date:
% Done:

0%

Estimated time:
Platform Release:
Groomed:
No
Sprint Candidate:
No
Tags:
Pulp 2
Sprint:
Quarter:

Description

I figured out that we could start using plain

"".format()
# or
"" % Dict()

Instead of using Django templateing.
It has a lot simpler escaping (see below) and in my measurements on 12 474 characters long changelog, format is about 22 times faster than Django templates and modulo about double of that.

It would be quite easy transition.

# DAJNGO
t = '<package pkgid="{{ pkgid }}" name="yum" arch="noarch">'
Template(t).render(Context({"pkgid": "some_package_id"}))
# FORMAT
t = '<package pkgid="{ pkgid }" name="yum" arch="noarch">'
t.format(**{"pkgid": "some_package_id"})
# MODULO
t = '<package pkgid="%(pkgid)s" name="yum" arch="noarch">'
t % {"pkgid": "some_package_id"}

Escaping would be also easier:

template = ('<tag>{some {{ var }}</tag><some_tag>text</some_tag>'
            '<tag>some {% tag %} {# comment #} } $(test)s</tag>')

# After Django escaping:
'<tag>{% templatetag openbrace %}some {% templatetag openvariable %}'
' var {% templatetag closevariable %}</tag><some_tag>text</some_tag>'
'<tag>some {% templatetag openblock %} tag '
'{% templatetag closeblock %} {% templatetag opencomment %} comment '
'{% templatetag closecomment %} {% templatetag closebrace %} $(test)s</tag>'

# After Format escaping:
'<tag>{{some {{{{ var }}}}</tag><some_tag>text</some_tag>'
'<tag>some {{% tag %}} {{# comment #}} }} $(test)s</tag>'

# After modulo escaping:
'<tag>{some {{ var }}</tag><some_tag>text</some_tag>'
'<tag>some {% tag %} {# comment #} } $$(test)s</tag>'

FORMAT:
The only escaping needed is doubling all { and }.
This Method was introduced in python 2.6
MODULO:
The only escaping needed is doubling all $.

FORMAT Drawback:
Format does not work well with non-ascii characters. Solution is to use "from future import unicode_literals" or convert all strings it uses to unicode.

MODULO Drawback:
Support only Dict or only positional arguments. Can't be used with both. But this probably does not bothers us?

Actions #1

Updated by cduryee over 7 years ago

How slow was the old way in terms of wall clock time?

Actions #2

Updated by fdobrovo over 7 years ago

(pulp_rpm) [vagrant@dev pulp_rpm]$ django-admin shell
WARNING: Attempting to work in a virtualenv. If you encounter problems, please install IPython inside the virtualenv.
Python 2.7.12 (default, Aug  9 2016, 15:48:18) 
Type "copyright", "credits" or "license" for more information.

IPython 3.2.1 -- An enhanced Interactive Python.

In [158]: len(django_headers) + len(django_template)   
Out[158]: 12205
In [130]: %timeit Template(RpmBase._escape_django_syntax_chars(django_headers + django_template, "package")).render(Context({"pkgid" : "951e0eacf3e6e6102b10acb2e689243b5866ec2c7720e783749dbd32f4a69ab3"}))
1000 loops, best of 3: 580 us per loop

In [160]: len(modulo_header) + len(modulo_template)
Out[160]: 12193
In [117]: %timeit mod_fin = modulo_header + modulo_template.replace("%", "%%"); mod_fin % {"pkgid" : "951e0eacf3e6e6102b10acb2e689243b5866ec2c7720e783749dbd32f4a69ab3"}
100000 loops, best of 3: 11.8 us per loop

I don't know how big can the repository be but I did some calculation just render:

Repo - 10 000 RPMs
OLD_WAY - 0.000001 * 580  * 10000 * 3 = 17.4s
NEW_WAY - 0.000001 * 11.8 * 10000 * 3 = 0.354s
Repo - 100 000 RPMs
OLD_WAY - 0.000001 * 580  * 100000 * 3 / 60 = 2 minutes 54.0 second
NEW_WAY - 0.000001 * 11.8 * 100000 * 3 = 3.54s
Repo - The bigest I could find have 1 229 491 RPMs
OLD_WAY - 0.000001 * 580  * 1229491 * 3 / 60 = 35 minutes 39.3 s
NEW_WAY - 0.000001 * 11.8 * 1229491 * 3 = 43.52 s

But I think the biggest benefit of this is that we don't need that sophisticated escaping.

Actions #3

Updated by fdobrovo over 7 years ago

  • Status changed from NEW to ASSIGNED
  • Assignee set to fdobrovo
Actions #4

Updated by fdobrovo over 7 years ago

  • Status changed from ASSIGNED to POST
Actions #5

Updated by fdobrovo over 7 years ago

  • Status changed from POST to NEW
  • Assignee deleted (fdobrovo)
Actions #6

Updated by bmbouter about 5 years ago

  • Status changed from NEW to CLOSED - WONTFIX

Pulp 2 is approaching maintenance mode, and this Pulp 2 ticket is not being actively worked on. As such, it is being closed as WONTFIX. Pulp 2 is still accepting contributions though, so if you want to contribute a fix for this ticket, please reopen or comment on it. If you don't have permissions to reopen this ticket, or you want to discuss an issue, please reach out via the developer mailing list.

Actions #7

Updated by bmbouter about 5 years ago

  • Tags Pulp 2 added

Also available in: Atom PDF