Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gdf() has performance issues with Date variables #89

Open
landroni opened this issue May 6, 2016 · 14 comments
Open

gdf() has performance issues with Date variables #89

landroni opened this issue May 6, 2016 · 14 comments

Comments

@landroni
Copy link
Contributor

landroni commented May 6, 2016

I notice that gdf() has extremely poor performance when displaying Date variables, even on relatively fast computers. Here's a reproducible example:

X <- mtcars
X$date <- as.Date(format(Sys.time(), "%D"))
for(i in 1:5) X[ , paste0("date", i)] <- X$date
for(i in 1:5) X <- rbind(X, X)


## Layout
w <- gwindow()
g <- gvbox(cont=w)

d <- gdf(X, cont=g)

Notice how it takes >10s for the CPU to calm down. But to see real slowness, try to maximize the window, scroll down, or even minimize/raise the window. All this on Linux.

What may be causing these performance issues?

@jverzani
Copy link
Owner

jverzani commented May 7, 2016

Dunno. I looked here: https://github.com/jverzani/gWidgets2RGtk2/blob/master/R/gdf.R#L165 Nothing seems amiss, but can verify it takes some time to get there. I should check if RGtk2DataFrame has this slowdown, as it might sit there.

@landroni
Copy link
Contributor Author

landroni commented May 7, 2016

I'm also getting this when using gdf() purely as a viewer, e.g.:

sapply(1:ncol(X), function(j) editable(d, j) <- FALSE)

I also notice that the slowness seems to be related to the columns (apparently character vars) constantly re-adjusting width, for some reason. Notice how the column for the row.names is constantly jumping width, increasing little by little while the CPU is running wild...

@landroni
Copy link
Contributor Author

landroni commented May 7, 2016

I did some rudimentary profiling with RStudio, and here are there results:

screenshot_2016-05-07_06-44-29

There seem to be a lot of calls (too many?) in some parts of the code, which might suggest a loop going haywire. Though it's hard to pinpoint what precisely is going wrong as the lion's share of the time comes from an Anonymous function call...

@landroni
Copy link
Contributor Author

landroni commented May 7, 2016

If I look more closely into the calls:

screenshot_2016-05-07_06-56-48

I can pinpoint several potential suspects: .getAutoMethodByName, get_view_column, and a number of try calls...

@jverzani
Copy link
Owner

These are the only two places where I use setCellDataFunc. That method formats the cell on a cell by cell basis. I'm guessing the issue sits there.

@jverzani
Copy link
Owner

I'm striking out. That is definitely the issue, but I can't find out how to improve it. My only suggestion is to format the dates as character before placing them into the data frame. This isn't great, but things will at least render more quickly.

@landroni
Copy link
Contributor Author

Is there perhaps some additional formatting going on for Date columns? Maybe it can be optionally turned off somehow...

@jverzani
Copy link
Owner

Sadly no. The only formatting is a call to format to create a character
vector from the date. The issue seems to be that each cell needs to be
formatted rather than each column. Even on this small-sized example, this
is slow. When I tried replacing the format call with a fixed string, it was
still slow.

On Thu, May 12, 2016 at 2:25 AM, landroni [email protected] wrote:

Is there perhaps some additional formatting going on for Date columns?
Maybe it can be optionally turned off somehow...


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#89 (comment)

John Verzani
Chair, Department of Mathematics
College of Staten Island, CUNY
[email protected]

@landroni
Copy link
Contributor Author

Instead of formatting things per-cell, would one option be to do what we do for logical vectors and have a function that transparently transforms Date to character in the df for displaying, and then transforms things back to Date if the user wishes to save things? (While I don't like messing with the original df like this, in my experience the Date slowness is excessive in large data frames and end-users are unlikely to realize quickly how to deal with this... It took me a couple of years to pick up on it.)

@PascalVaudrevange
Copy link

PascalVaudrevange commented May 30, 2016

Could this be related to the general slowness of as.Date.character(), see the discussion here: http://stackoverflow.com/questions/12786335/why-is-as-date-slow-on-a-character-vector?

@jverzani
Copy link
Owner

Thanks for the link. I'll check. I need to work in this to get it back on
CRAN.

On Monday, May 30, 2016, PascalVaudrevange [email protected] wrote:

Could this be related to the general slowness of as.Date.character(), see
the discussion here:
http://stackoverflow.com/questions/12786335/why-is-as-date-slow-on-a-character-vector


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#89 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAZvTCB6DHtdEZjdYTjOKjIfDTMzsZYiks5qGuD-gaJpZM4IZQDX
.

John Verzani
Chair, Department of Mathematics
College of Staten Island, CUNY
[email protected]

@jverzani
Copy link
Owner

jverzani commented Jun 1, 2016

Hi @landroni I checked in 7086032fac355f7bfe6119a389bcdb8928b16834 to gWidgets2RGtk2 that should address this in the manner you suggested (like logical vectors). You will need to to update gWidgets2 which is on CRAN and install gWidgets2RGtk2 from GitHub. If this is working, l will push to CRAN. Thanks.

@landroni
Copy link
Contributor Author

landroni commented Jun 2, 2016

Thanks! Will check out over the weekend and report back.

@landroni
Copy link
Contributor Author

landroni commented Jun 3, 2016

Performance is much better! I tried it on my real data frame, which are big and contain Date variables, and I no longer see the slowness issues as before. (I haven't tested how this affects editing though, only viewing.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants