Fix #116280: bpy.ops.wm.url_open() cannot open file:/// #116295

Dalai Felinto · 2023-12-18T12:40:21+01:00

Dalai Felinto commented

2023-12-18 12:40:21 +01:00

This issue was introduced on a15c637e63.

This issue was introduced on a15c637e63798f61aebb777aca54acaed3bf562c.

Dalai Felinto added 1 commit 2023-12-18 12:40:27 +01:00

88a3606b33 Fix #116280 : bpy.ops.wm.url_open() cannot open file:///

This issue was introduced on a15c637e63.

Iliya Katushenock added this to the Python API project 2023-12-18 12:44:51 +01:00

Dalai Felinto requested review from Sergey Sharybin 2023-12-18 12:46:43 +01:00

Dalai Felinto requested review from Ray molenkamp 2023-12-18 12:46:43 +01:00

Dalai Felinto commented

2023-12-18 12:47:03 +01:00

I tagged the two original reviewers of the PR as reviewer here. But the change is rather simple.

Falk David reviewed 2023-12-18 12:51:29 +01:00

scripts/startup/bl_operators/wm.py Outdated

						
				@ -1031,3 +1031,3 @@

				        # Make sure we have a scheme otherwise we can't parse the url.

				        if not url.startswith(("http://", "https://")):

				        if "://" not in url:

Falk David commented

2023-12-18 12:51:03 +01:00

I think it's cleaner to check for the expected protocols.
E.g

expected_protocols = ("http://", "https://", "file:///")
if not url.startswith(expected_protocols):

I think it's cleaner to check for the expected protocols. E.g ``` expected_protocols = ("http://", "https://", "file:///") if not url.startswith(expected_protocols): ```

Dalai Felinto commented

2023-12-18 12:56:28 +01:00

@filedescriptor but where do we stop? how about gitea://, ftp://, ......

Falk David commented

2023-12-18 13:04:05 +01:00

@dfelinto Well you can add all the protocols that make sense right? There aren't that many..

Sybren A. Stüvel reviewed 2023-12-18 13:09:54 +01:00

Sybren A. Stüvel left a comment

I think there's a bunch of issues with the surrounding code.

# Make sure we have a scheme otherwise we can't parse the url.

This is misleading, as this works fine:

>>> import urllib.parse

>>> urllib.parse.urlparse('blender')
ParseResult(scheme='', netloc='', path='blender', params='', query='', fragment='')

>>> urllib.parse.urlparse('https://blender')
ParseResult(scheme='https', netloc='blender', path='', params='', query='', fragment='')

You could argue that without scheme the string should go into the netloc field, but that's beside the point. Without schema, the URL can be parsed just fine.

The second issue is the assumption that, if the URL doesn't start with http:// or https:// that the addition of that string will produce a valid URL. This, again, is not true in general. Checking for :// helps, in that it's a more general check and will likely work better, but I don't think it's the right approach either as it can appear in any other part of the URL as well.

Why not use urlparse() to do the parsing for us?

parsed_url = urllib.parse.urlparse(url)
if not parsed_url.scheme:
    url = f'https://{url}'
parsed_url = urllib.parse.urlparse(url)

I think there's a bunch of issues with the surrounding code. ```python # Make sure we have a scheme otherwise we can't parse the url. ``` This is misleading, as this works fine: ```python >>> import urllib.parse >>> urllib.parse.urlparse('blender') ParseResult(scheme='', netloc='', path='blender', params='', query='', fragment='') >>> urllib.parse.urlparse('https://blender') ParseResult(scheme='https', netloc='blender', path='', params='', query='', fragment='') ``` You could argue that without scheme the string should go into the `netloc` field, but that's beside the point. Without schema, the URL can be parsed just fine. The second issue is the assumption that, if the URL doesn't start with `http://` or `https://` that the addition of that string will produce a valid URL. This, again, is not true in general. Checking for `://` helps, in that it's a more general check and will likely work better, but I don't think it's the right approach either as it can appear in any other part of the URL as well. Why not use `urlparse()` to do the parsing for us? ```python parsed_url = urllib.parse.urlparse(url) if not parsed_url.scheme: url = f'https://{url}' parsed_url = urllib.parse.urlparse(url) ```

👍 1

Ray molenkamp commented

2023-12-18 14:38:14 +01:00

this feels like a job a regex would excel at?

Stephen Boddy commented

2023-12-18 14:54:54 +01:00

First-time contributor

parsed_url = urllib.parse.urlparse(url)
if not parsed_url.scheme:
    url = f'https://{url}'
parsed_url = urllib.parse.urlparse(url)

Wouldn't you want to condition that prefix on the parsed URL having an actual netloc as well? Otherwise your 'blender' example is going to turn the path of blender into http://blender where the path becomes the domain.

(I have no idea what this code is driven by, or how invalid the passed args could get, just noticed the potential.)

> ```python > parsed_url = urllib.parse.urlparse(url) > if not parsed_url.scheme: > url = f'https://{url}' > parsed_url = urllib.parse.urlparse(url) > ``` Wouldn't you want to condition that prefix on the parsed URL having an actual netloc as well? Otherwise your 'blender' example is going to turn the path of blender into http://blender where the path becomes the domain. (I have no idea what this code is driven by, or how invalid the passed args could get, just noticed the potential.)

Dalai Felinto added 1 commit 2023-12-18 15:24:00 +01:00

428f0200e1 From review: use urlparse to get the scheme

Dalai Felinto commented

2023-12-18 15:24:41 +01:00

Incorporated @dr.sybren suggestion with some changes.

Sybren A. Stüvel commented

2023-12-18 15:57:35 +01:00

Wouldn't you want to condition that prefix on the parsed URL having an actual netloc as well? Otherwise your 'blender' example is going to turn the path of blender into http://blender where the path becomes the domain.

This confusion is what I was fearing when I wrote "You could argue that without scheme the string should go into the netloc field". For me (and AFAIK all webbrowsers), blender.org is a valid URL and should be interpreted as netloc. That's not how Python's urlparse function works, though.

(I have no idea what this code is driven by, or how invalid the passed args could get, just noticed the potential.)

I suspect that there's only one use case for this particular piece of code, and that's turning a sheme-less URL into one with an explicit scheme. And so the logic of "if there is no scheme, chuck https:// in front of it" seems pretty stable. I doubt there will be any scheme-relative (//blender.org/path) URLs to handle.

Incorporated @dr.sybren suggestion with some changes.

👍 LGTM!

> Wouldn't you want to condition that prefix on the parsed URL having an actual netloc as well? Otherwise your 'blender' example is going to turn the path of blender into http://blender where the path becomes the domain. This confusion is what I was fearing when I wrote "You could argue that without scheme the string should go into the `netloc` field". For me (and AFAIK all webbrowsers), `blender.org` is a valid URL and should be interpreted as `netloc`. That's not how Python's `urlparse` function works, though. > (I have no idea what this code is driven by, or how invalid the passed args could get, just noticed the potential.) I suspect that there's only one use case for this particular piece of code, and that's turning a sheme-less URL into one with an explicit scheme. And so the logic of "if there is no scheme, chuck `https://` in front of it" seems pretty stable. I doubt there will be any scheme-relative (`//blender.org/path`) URLs to handle. > Incorporated @dr.sybren suggestion with some changes. :+1: LGTM!

Dalai Felinto commented

2023-12-18 16:43:44 +01:00

I'm taking Sybren's review as final. Thanks everyone for pitching in

Dalai Felinto merged commit 63e9cead5f into blender-v4.0-release

2023-12-18 16:44:46 +01:00

Dalai Felinto referenced this issue from a commit

2023-12-18 16:44:47 +01:00

Fix #116280: bpy.ops.wm.url_open() cannot open file:///

Dalai Felinto deleted branch fix-url-open

2023-12-18 16:44:50 +01:00

Dalai Felinto referenced this issue from a commit

2023-12-18 16:46:55 +01:00