Issue #2775
closedpulp worker api uses reserved character @ in url route
Description
Problem:¶
Currently Worker name is being used as the natural key, this is good because worker lookup urls are informative and expressive.
Worker name currently is {service_name}@{host}. This is also good and makes a lot of sense.
The issue is that reserved characters should not be allowed in url path variables..
Django urlquotes the generated url [0] however since @ is a reserved character that could show up in urls, django ignores it.
Passing in a prequoted worker name would not work since django would urlquote the %40
Potential Solutions:¶
1.) Create a new drf field and overwrite the get_url[1] function to take the full url django returns, and urlquote only the last path variable. To me this seems like a hack, but it would theoretically work. This allows us to keep @ as the delimiting character
2.) Use a different delimiter. Some proposed ones are .~-_
3.) Structure the lookup url path to nest host and service name /workers/{host}/{service}/
The issue with this approach is spelled out in https://pulp.plan.io/issues/2775?pn=1#note-9
4.) The easiest solution would be to, of course, switch to using UUID for the worker lookup_field. We would lose the expressiveness of using the natural key, but can keep the @ in the worker name
Or there might be some other solution that I'm not seeing
[0] https://github.com/django/django/blob/stable/1.8.x/django/utils/encoding.py#L210
[1] https://github.com/encode/django-rest-framework/blob/ee1a9fcef659d0a5885712575562acb71a502f21/rest_framework/relations.py#L300
Updated by mhrivnak over 7 years ago
Can you elaborate on "Django urlquotes the generated url ", under what circumstance that happens? When is django generating the URL exactly?
And what exactly do you mean that django "ignores it"?
Updated by bizhang over 7 years ago
When DRF generates a URL for a hyperlinked field [0] it does so using django's reverse function [1].
Django's reverse function figures out the fully qualified URL and then passes it to a iri_to_uri function [2] which urlquotes it before returning. So every fully qualified URL DRF generates is passed through this way.
When iri_to_uri performs the url quote it passes it some safe characters to not urlquote [3]. This is because these characters are allowed to be present in URLs as reserved characters. In our case @ should be allowed in cases like user:password@dev.example.com/api/v3
but we should not be using it as a non reserved character (like in a url path variable in dev.example.com/api/v3/workers/worker@dev.example.com
).
Changing the worker name to worker%40dev.example.com
would result in Django urlquoting the % sign and returning worker%2540dev.example.com
[0] https://github.com/encode/django-rest-framework/blob/master/rest_framework/relations.py#L300
[1] https://github.com/django/django/blob/stable/1.8.x/django/core/urlresolvers.py#L524
[2] https://github.com/django/django/blob/stable/1.8.x/django/utils/encoding.py#L183
[3] https://github.com/django/django/blob/stable/1.8.x/django/utils/encoding.py#L210
Updated by mhrivnak over 7 years ago
Interesting! Thanks for that clarification.
I could see this as a django bug in iri_to_uri. The urrlib "quote" function is intended for use on one component of a URL at a time, usually just the path. In that case, it's perfectly safe to run with the default "safe" characters of simply "/".
That said, I'm also not sure I'd want to wait for django to fix it even if they accepted this as a bug.
Updated by bmbouter over 7 years ago
I like the sounds of the "it might be better to keep the worker service_name and the worker host as two separate fields in the worker model" option from the original description, but I don't have a concrete idea of the URL the user would be forming then. Can a concrete example be given with URLs for the idea of splitting the model into two fields?
Updated by bizhang over 7 years ago
I was thinking of using some other character to join the service and host name together to use that for the lookup url.
Maybe something like dev.example.com/api/v3/workers/worker_dev.example.com
?
Or if we get nested url working: dev.example.com/api/v3/workers/dev.example.com/worker
But the downside of this is that the user would have to click through each host in dev.example.com/api/v3/workers/
to see all the workers.
Updated by mhrivnak over 7 years ago
bizhang wrote:
Or if we get nested url working:
dev.example.com/api/v3/workers/dev.example.com/worker
But the downside of this is that the user would have to click through each host indev.example.com/api/v3/workers/
to see all the workers.
I think it would actually be even more cumbersome. We need to match the data returned with the right endpoint. If we're going to make an endpoint for an individual host, it should be within a "hosts" collection endpoint.
/api/v3/hosts/ # list all the hosts
/api/v3/hosts/dev.example.com/ # show a representation of the host
/api/v3/hosts/dev.example.com/workers/ # show the host's workers
/api/v3/hosts/dev.example.com/workers/0 # finally show a specific worker
This gets clumsy in a containerized world where each worker appears to have a unique hostname. If there are "n" workers in a cluster, you'd see "n" entries at /api/v3/hosts/, and then would require "n * 3" requests to view all the workers.
Although it would be reasonable to embed the worker list in the detail view for a host, so that could reduce it to "n * 2" requests.
I think this approach would be tolerable, but not ideal. It's really a shame that the @ character doesn't work, because the original plan does seem the most elegant.
Dot-separating the worker name and host could be almost as natural though.
resource_worker_0.dev.example.com
I think I prefer that as the natural key vs. making endpoints for hosts.
Updated by dalley over 7 years ago
mhrivnak, that's a great idea. I'll quickly point out, though, that the tilde character is also unreserved. So we could also do something like this:
resource_worker_0~dev.example.com
One problem I'll note with using tilde is that depending on the font used it can appear centered vertically, or it can appear top-aligned. From what I can tell nearly all fonts chose middle-aligned, except for a very few counterexamples like Calibri.
The benefit of this approach is that the worker name and host name can be split apart more easily.
Updated by bizhang over 7 years ago
+1 to using ~ as a delimiting character.
Here are the files that need changing if we go with a different delimiter:
https://github.com/pulp/devel/blob/6c52914c0c3e78732fb9f0981f9b408d5640dd99/ansible/roles/dev/templates/pulp_worker%40.j2#L13
https://github.com/pulp/pulp/blob/253ae59c9d80a4698c47a0608f96e5d81a62f0d7/platform/pulpcore/tasking/services/manage_workers.py#L23
https://github.com/pulp/pulp/blob/253ae59c9d80a4698c47a0608f96e5d81a62f0d7/platform/pulpcore/tasking/services/scheduler.py#L26
Updated by bmbouter over 7 years ago
What are the options to not use a convention which replaces '@' with some other character?
That is a convention that would need to be documented, read, and understood by all users. We should avoid that if at all possible. I'm not clear on what the alternatives are. I'm going to read this ticket more to look for alternatives as I try to grok the ones already written above.
Updated by daviddavis over 7 years ago
Personally I like option 4. I think we should consider not using natural keys in our URLs because it greatly limits things like repo names. If we have natural keys in our URLs, I can't for example have a space or an non-ASCII character in my repo name. That's going to be inconvenient for our users--especially non-English speaking ones.
Updated by bizhang over 7 years ago
@daviddavis, I think using the natural key for the worker make sense (whether or not it does for repos is a different discussion)
Personally I am a fan of using the natural key and having an expressive URL, and I think it make great sense for workers (since worker names should be ASCII based, and are unique)
Updated by daviddavis over 7 years ago
bizhang I'm not so sure. I kind of view them as part of the same discussion. I think it would be odd to be using natural keys for some records/URLs and primary keys for others. I would aim for consistency across our application unless there is sufficient reason not to be consistent.
Updated by bizhang over 7 years ago
I don't think non-ascii characters are an issue in the repo name.
$ http --auth admin:admin --json POST http://127.0.0.1:3000/api/v3/repositories/ name=我的数据 description=123456 scratchpad:='{"test": "asgfdds"}' importers:='[]' publishers:='[]' notes:='{}'
HTTP/1.0 201 CREATED
Allow: GET, POST, HEAD, OPTIONS
Content-Type: application/json
Date: Mon, 19 Jun 2017 18:03:36 GMT
Location: http://127.0.0.1:3000/api/v3/repositories/%E6%88%91%E7%9A%84%E6%95%B0%E6%8D%AE/
Server: WSGIServer/0.2 CPython/3.5.3
Vary: Accept, Cookie
X-Frame-Options: SAMEORIGIN
{
"_href": "http://127.0.0.1:3000/api/v3/repositories/%E6%88%91%E7%9A%84%E6%95%B0%E6%8D%AE/",
"content": "http://127.0.0.1:3000/api/v3/repositories/%E6%88%91%E7%9A%84%E6%95%B0%E6%8D%AE/content/",
"description": "123456",
"importers": [],
"last_content_added": null,
"last_content_removed": null,
"notes": {},
"publishers": [],
"scratchpad": {
"test": "asgfdds"
}
}
The issue with @
in the worker name is that Django considers @ 'safe' and does not urlquote it [0]
[0] https://github.com/django/django/blob/stable/1.8.x/django/utils/encoding.py#L210
Updated by mhrivnak over 7 years ago
daviddavis@redhat.com wrote:
bizhang I'm not so sure. I kind of view them as part of the same discussion. I think it would be odd to be using natural keys for some records/URLs and primary keys for others. I would aim for consistency across our application unless there is sufficient reason not to be consistent.
Given that we now know how to do nested paths successfully with DRF (I'll be filing tasks this week), we have no need to put PKs in URLs, and thus we can return to the goal we started with of only using natural keys as the public-facing unique identifier for each resource.
Updated by mhrivnak over 7 years ago
Thinking about this more, if we just pick some other character as the separator, that's an implementation detail that we can change at any time in the future without the API being compromised. Perhaps we should just pick a character for now, file the bug with django, maybe even fix the bug in django, and then go back to using the @ sign after it becomes available.
Updated by bmbouter over 7 years ago
From an RFC perspective, this SO post makes a claim that having '@' in urls adheres to the RFC and that you don't need to percent encode them. That suggests that having Django ignore them at all parts of the url (both the domain and non-domain) is right and not a Django bug.
Since it seems correct urls can contain an '@', we should strive to do that. Is there a traceback that occurs if you try to use DRF+Django w/ the Worker model and a natural key in the url with a worker name like "dev@example.com" in it?
Also picking a character now and switching it post release would cause some integrations against the API to break so we can't pick one, release 3.0, and switch it later without breaking semver.
Updated by mhrivnak over 7 years ago
bmbouter wrote:
From an RFC perspective, this SO post makes a claim that having '@' in urls adheres to the RFC and that you don't need to percent encode them. That suggests that having Django ignore them at all parts of the url (both the domain and non-domain) is right and not a Django bug.
Since it seems correct urls can contain an '@', we should strive to do that. Is there a traceback that occurs if you try to use DRF+Django w/ the Worker model and a natural key in the url with a worker name like "dev@example.com" in it?
That's an interesting argument, and it seems to make sense when I re-read the RFCs. Maybe this isn't a problem at all for us?
Also picking a character now and switching it post release would cause some integrations against the API to break so we can't pick one, release 3.0, and switch it later without breaking semver.
If users aren't supplying the natural key, which I don't expect they will, then it's created by Pulp. From an API perspective, there is a field on Worker called "name", and it will be unique. We won't make any particular guarantees about it beyond it being a unique string. We can change how Pulp composes that string without breaking semver.
But of course if we can just use "@", then changing would be unnecessary.
Updated by bizhang over 7 years ago
- Status changed from NEW to CLOSED - NOTABUG
This is not an issue according to https://stackoverflow.com/questions/19509028/can-i-use-an-at-symbol-inside-urls/19737890#19737890
Therefore closing as notabug.