-
-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve relative URLs within RSS article description #21943
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only some preliminary comments.
.gitignore
Outdated
@@ -1,4 +1,5 @@ | |||
.vscode/ | |||
.cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you believe that qBittorrent .gitignore
should contain it you need to provide separate PR with description of why it is needed to be done at project level. Otherwise you could just add it in your local .gitignore
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'm using clangd
extension and it produces many file under .cache/
. I'll draft another PR later.
src/gui/rss/rsswidget.cpp
Outdated
QString normalizeBasePath = basePath.endsWith(u'/') ? basePath : basePath + u'/'; | ||
QRegularExpressionMatchIterator iter = rx.globalMatch(html); | ||
|
||
while (iter.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invalid coding style.
while (iter.hasNext()) { | |
while (iter.hasNext()) | |
{ |
src/gui/rss/rsswidget.cpp
Outdated
rx.setPattern( | ||
uR"(((<a\s+[^>]*?href|<img\s+[^>]*?src)\s*=\s*["'])((https?|ftp):)?(\/\/[^\/]*)?(\/?[^\/"].*?)(["']))"_s); | ||
|
||
QString normalizeBasePath = basePath.endsWith(u'/') ? basePath : basePath + u'/'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QString normalizeBasePath = basePath.endsWith(u'/') ? basePath : basePath + u'/'; | |
QString normalizedBasePath = basePath.endsWith(u'/') ? basePath : basePath + u'/'; |
src/gui/rss/rsswidget.cpp
Outdated
QString relativePath = match.captured(6); | ||
if (relativePath.startsWith(u'/')) | ||
relativePath = relativePath.mid(1); | ||
if (!match.captured(4).isEmpty() && !match.captured(5).isEmpty()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's hard to review/maintain regex-related logic so it would be better to either use named capturing groups or just assign them to local (const) variables before using in the code.
In RSS widget, a relative url will cause an infinite loop loading for resource, which won't break until unselect the article explicitly format code
Fixed a bug with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yet another review iteration.
src/gui/rss/rsswidget.h
Outdated
@@ -91,6 +91,7 @@ private slots: | |||
|
|||
private: | |||
bool eventFilter(QObject *obj, QEvent *event) override; | |||
void convertRelativePathToAbsolute(QString &html, const QString &basePath) const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function does not need (and isn't supposed to need) access to RSSWidget
class members, so it is preferable to declare it in anonymous namespace in rsswidget.cpp
file.
Co-authored-by: Vladimir Golovnev <[email protected]>
Co-authored-by: Vladimir Golovnev <[email protected]>
Co-authored-by: Vladimir Golovnev <[email protected]>
Adjusted my code. Could you review the code for this round @glassez |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only commenting on coding style.
I'll leave the correctness to others.
Well, using the wrong term was what looked confusing to me. |
Co-authored-by: Chocobo1 <[email protected]> Co-authored-by: Vladimir Golovnev <[email protected]>
src/gui/rss/rsswidget.cpp
Outdated
@@ -54,6 +54,8 @@ | |||
#include "feedlistwidget.h" | |||
#include "ui_rsswidget.h" | |||
|
|||
void convertRelativeUrlToAbsolute(QString &html, const QString &baseUrl); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be inside anonymous namespace as I said before, i.e.:
namespace
{
void convertRelativeUrlToAbsolute(QString &html, const QString &baseUrl)
{
// code
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My misunderstanding of anonymous namespace. I thought it meant an static scope in the source file. I'll correct it later.
src/gui/rss/rsswidget.cpp
Outdated
const QString fullMatch = match.captured(0); | ||
const QString prefix = match.captured(1); | ||
const QString suffix = match.captured(7); | ||
const QString absolutePath = normalizedBaseUrl + relativePath; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
absoluteURL ???
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Corrected. 🚀
BTW it's not an final solution. The true issue is the RSS htmlbrowser infinitely retries immediate loading resources from html. The issue behind is hard to track with call stack, since it is called by signals and slots. |
@glassez Shall we start next review iteration? I'm looking forward to a release with less memory issue in RSS widget. 🚀 |
What do you mean? Does it repeatedly download the same images etc. from the Internet? |
Yes, and specially when the resource is not available. In my case a relative URL I'm wondering if the htmlbrowser calls to |
Yet another question. It behaves the same way if HTML initially contains absolute URLs, right? |
No, the absolute URLs will be rendered normally most of the time. Sometimes, the failed elements display as a blank page icon, but I'm not cofident to tell if in this situation the infinite loading issue is occurring. |
I'm confused. Do you still want to say that initially absolute URLs and URLs that was relative and transformed to absolute by your code behave differently? (provided that they point to existing resources) |
Let's figure it out.
I mean I can't confirm the failed elements with an absolute URL would cause the same infinite loading loop, just the same behavior that is to stuck like relative ones mentioned in the issue. Sorry for my poor English that might make the answer confusing 😢 |
Sure. I understand the logic of the code. I am interested in the results of your investigation on the problem itself.
The most important thing that interests me is whether it "repeatedly download the same images etc." when the resource IS vavailable and was successfully downloaded previously. As for "not available" resources, we have either endless attempts to get them from a local disk (without this patch) or from a network location (with this patch). Considering that this patch fixes the problem in the case where the converted to absolute URL points to an available resource, I approve it. The problem of endless redownload of non available resources seems to be on Qt side (however, it would be nice to inform them about it, so that they can fix it further). |
Co-authored-by: Vladimir Golovnev <[email protected]>
Thanks for your reviews and suggestions. Although it's not an final solution, I'm looking forward to these snippets in released version. I'll be grateful if you could give a probable prediction that which tag will this be included in. Moreover, it will be great if someone can update |
I don't use it so I don't care. Let someone concerned take care of it. |
@zent1n0 |
Co-authored-by: Vladimir Golovnev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
last few comments about coding style (hopefully)
Co-authored-by: Chocobo1 <[email protected]>
Co-authored-by: Chocobo1 <[email protected]> Co-authored-by: Vladimir Golovnev <[email protected]>
Should I squash the commits into one? |
You can do it. Otherwise it will be done while merging PR. |
I prefer automatic methods 🤖 😊 |
In RSS widget, a relative url will cause an infinite loop loading for resource, which won't break until unselect the article explicitly, thus caused the memory leak described in #20117 .
The code provides a conversion over the first 3 forms of relative path as described in MDN web docs using regex, and only function on
<a href>
and<img src>
html tags. With the code, RSS widget now should follow the correct absolute urls.However, the #20117 could not be fully closed with the code, as now log shows the RSS widget still send around 5 reqs/sec in the condition which was causing a rapid flow of qDebug log which indicates it stuck on a infinite loop sending requests, failing fast and immediately retries. The former situation is eating up a single CPU core at 100% and 3 MiB/s RAM increase, while the latter at below 10% CPU usage and 1 MiB/s RAM.
From the snapshot you could know the url is correct, but with no valid response (QNetworkReply::RemoteHostClosedError) and I don't know why. The fact is that the site
byr.pt
is an IPv6 only site and the url will provide an 30x jump to another path.Happy with any assistance or suggestions on further improvements. 😃 🚀