Dplyr summarize sum values

9/12/2023

, "base Split Rfast" = lapply(DFS, Rfast::colsums) , "Rfast" = list2DF(lapply(DF, Rfast::group.sum, DF$g)) , "tapply" = list2DF(lapply(DF, tapply, list(DF$g), sum)) Summing up two columns bench::mark(check = FALSE # expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc , "collapse" = collapse::fsum(DF$x, DF$g) , "base Split Rfast" = lapply(DFS1, Rfast::colsums) , "data.table" = data.table::as.data.table(DF) GDF % dplyr::group_by(g) %>% dplyr::summarise(sum = sum(x)) Was the best in the given example which could be speed up when using a grouped data frame. Regarding speed and memory consumption collapse::fsum(numericToBeSummedUp, groups) Here only collapse::fsum and Rfast::group.sum have been faster. Here's a possible solution using ave : # create a copy of DF (only the grouping columns)Ĭateg1 Categ2 GroupTotSamples GroupAvgFreqĪ good way to sum a variable by group is rowsum(numericToBeSummedUp, groups)įrom base. We want to group by Categ1 and Categ2 and compute the sum of Samples and mean of Freq. I find ave very helpful (and efficient) when you need to apply different aggregation functions on different columns (and you must/want to stick on base R) : System.time( aggregate(data$Frequency, by=list(Category=data$Category), FUN=sum) )įor multiple aggregations, you can combine lapply and.

The difference will become more noticeable with larger datasets, as the code below demonstrates: data = data.table(Category=rep(c("First", "Second", "Third"), 100000), System.time(aggregate(data$Frequency, by=list(Category=data$Category), FUN=sum))Īnd if you want to keep the column this is the syntax: data Let's compare that to the same thing using ame and the above above: data = ame(Category=c("First","First","First","Second","Third", "Third", "Second"), However, if you are handling larger datasets and need a performance boost there is a faster alternative: library(data.table)ĭata = data.table(Category=c("First","First","First","Second","Third", "Third", "Second"), The answer provided by rcs works and is simple. Summarise(across(where(is.numeric), list(mean = mean, sum = sum)))įor more information, including the %>% operator, see the introduction to dplyr. # summarise specific variables (numeric columns except grouping columns) Summarise(across(c(qsec, mpg, wt), list(mean = mean, sum = sum))) Summarise(across(everything(), list(mean = mean, sum = sum))) # summarise all columns except grouping columns using "sum" and "mean"

# summarise all columns except grouping columns using "sum" Summarise(max_hp = max(hp), mean_mpg = mean(mpg)) # multiple summary columns Group_by(cyl, gear) %>% # multiple group columns Here are some more examples of how to summarise data by group using dplyr functions using the built-in dataset mtcars: # several summary columns with arbitrary names Or, for multiple summary columns (works with one column too): x %>% You can also use the dplyr package for that purpose: library(dplyr)

0 Comments

Dplyr summarize sum values

Leave a Reply.

Author

Archives

Categories