This year was my second time attending RStudio Conference, with last year in Austin being the first. There were about 2,400 people in attendance this year, and it felt much bigger than last year, but still just as great. Since being more active on #rstats Twitter this year, it was so fun to meet Twitter friends in person and meet new R friends. My introvert self was definitely exhausted by the end of the week, and I think RStudio Conf is my biggest expenditure of social energy all year. That said, even for an introvert like me, the R community and everyone at RStudio Conf were so open and friendly!
Both years, I’ve attended a preconference workshop and the main conference. I’d have a hard time recommending to someone to just do one or the other, as I feel I got very different things out of each. The conference feels a bit like drinking out of an Rstats firehose. You jump from 20 minute talk to 20 minute talk, just getting a taste of cool new things you didn’t know you could do in R. I leave the conference with lots of notes of things to dig in to more after the conference is over. The workshops, on the other hand, are a deeper hands-on dive into a specific topic in R, and I left both workshops with lots of example code and immediately applicable new skills. The workshops are also a great chance to ask any questions of R experts. My takeaway, if you’re able to, definitely attend both!
Like last year, I collected many hex stickers, far more than can fit on my computer. I’m currently working on turning them into magnets I can display on a magnet board in my office, which I’m very excited about. It seemed like hex-mania was dialed up to 11 this year, and people went crazy every time stickers were dropped during breaks. The most elusive stickers were only available from the package maintainers themselves, which was a great excuse to talk to them about their cool packages.
Applied Machine Learning workshop
My week started with the Applied Machine Learning workshop with Max Kuhn. The workshop was a great overview of the
tidymodels suite of packages for modeling. The packages within
tidymodels that we focused on included:
recipes: feature engineering and preprocessing
parnsip: model creation
dials: creating model tuning parameters
tune: optimizing model tuning parameters
yardstick: evaluating model accuracy
This is by no means a comprehensive review of everything covered, but I wanted to highlight a few things I learned that stuck out to me. I hope to do a longer blog post about
tidymodels in the future to help myself consolidate what I learned.
- One of the major benefits of
parsnipis that it returns missing data out when there is missingness in the original data instated of silently dropping observations with missing data. This is a huge help when you try to join predictions back to the original data because the lengths will be the same and they will join cleanly.
- Variable prep should be done on the training data only, not the full data set. This includes things like standardizing predictors, creating dummy variables, etc. This is so you can evaluate these data prep steps as part of model evaluation.
The conference kicked off with JJ Alaire’s keynote about RStudio becoming a Public Benefit Corporation, which was super exciting.
On day two, Jenny Bryan gave an awesome talk about debugging called “Object of Type Closure is Not Subsettable,” which got quite a laugh from the room full of R users. After taking her What They Forgot To Teach You About R workshop last year, which covered a lot of the same material, this was a great refresher. Her suggested steps for debugging are:
- Try the same thing again with no success. Definitely been there!
- Restart R! (There’s even a keyboard shortcut! cmd+shft+0 on Mac)
- Make a reprex. Often creating a minimal reproducible example will help you pinpoint your problem. If not, it will help others pinpoint your problem. Make the reprex as simple as possible while still generating the error. Remove all packages and functions not necessary for the error.
- Use R’s build in debugging tools.
traceback()shows the call stack that led to the error
options(error=recover)pauses execution at the error and opens debugger model to let you explore your environment at the time of the error and step through the function line by line
browser()into a function to pause execution and open debugger mode
debugonce(function)open the debugger mode every time you fun the function or the next time you run it. I typically use
debugonce()but end up running it several times, so I should just learn to call
debug()the first time…
- Run checks often if you’re building a package. R CMD check and the
testthatpackage help with this.
Like last year, there was serious within conference FOMO, as there were three tracks of talks running at the same time and always multiple talks I wished I could go to. The FOMO was eased by knowing that all the talks would be posted to watch later.
A few of the highlights for me:
- Will Chase’s “The Glamour of Graphics” on making better graphics.
- Ryan Timpe’s “Learning R with humorous side projects” highlighting some of the cool projects he’s done to learn different things in R.
- Joe Cheng’s “Styling Shiny apps with Sass and Bootstrap 4” on an easier way to customize Shiny apps with CSS.
- Sharla Gelfand’s “Don’t repeat yourself, talk to yourself! Repeated reporting in the R universe” on an R package she wrote for performing repeated reporting tasks.
Tidyverse Developers’ Day
Being able to attend Tidyverse Dev Day this year was a great opportunity to see how the Tidyverse packages come together. Ahead of time, the Tidyverse team had curated package issues that would be easy for beginners to close. Many of these had to do with fleshing out documentation for different function. These issues were on sticky notes on the wall. We all went and grabbed one and got to work. There were many mentors, including the Tidyverse team, who were available to help throughout the process. I worked on documentation for two functions,
tibble::rownames. Writing the documentation for
distinct() allowed me to learn about the new
across(). Anytime anyone closed a pull request on an issue they were working on, they got to ring a gong, and everyone clapped for them.
Working on these issues gave me a great opportunity to practice using a fork -> clone -> edit -> PR workflow using the
usethis package. This consisted of using
usethis::create_from_github() to forks and clone the repository, set upstream remote, and set master branch, then using
usethis::pr_init() to create a branch in my repo to work on. After making the edits that addressed the issue, I ran
document() to update the documentation, and
check() to make sure my edits hadn’t introduced any new problems. Finally, I committed my changes and used
usethis::pr_pull_upstream() to pull any changes that had happened since I made my changes,
usethis::pr_push() and to push my changes. Then someone from the team would review my PR and either suggest additional changes, or merge the PR.
Thanks @rbjanis5 for refreshing documentation for dplyr::distinct() at #tidydevday.— Romain François (@romain_francois) January 31, 2020
Make it loud @dataandme, the 🔊 has to travel across() an ocean, or perhaps to a #starwars galaxy far away. pic.twitter.com/b4zolnPLpS