8 minute read
I work on lots of projects at a time, and have adopted a few organizational techniques that help me stay focused, not lose my place, and write well-documented code. Below I write about a few practices of these practices.
Organize your Directory Intentionally.
Let’s say you have a working directory called
my_project with 3 folders and a
data: data files that you bring into your workspace
R: where all of your R files live
output: modeled results, .rds files, figures that you generate
README.md: description of the project to be displayed on Github
repo_log.md: log of your activity in the repository
This layout is nice because when you commit to git, you can commit everything in the
R folder easily (plus the
repo_log), and leave out all the large
Take Advantage of Default Ordering.
Let’s say you start your analysis with a script in the
R folder. I like to organize my scripts with a double-digit numeric prefix followed by an unambiguous description of its purpose, so the scripts fall into order by default, and it’s clear what the script does. For instance:
Double digits prefixes are nice because they preserve default ordering when you make it past 10 scripts. If you think you’ll need 100 or more R scripts… we can only pray for your soul.
Use Snake Case.
I like to name variables and files using snake_case instead of CamelCase because i_find_snake_case_much_easier_to_read than CamelCaseWhichIsTooBunchedUpToRead. I also avoid.using.periods.as.delimiters in variable and file names because the period is a common argument in languages like Python [ex:
pd.array()]. Using periods in names might also confuse other programmers who are used to seeing it as an argument, and it sets you up with bad habits if you start writing in one of these languages where the periods is an argument.
setwd(). Instead, cleverly manage your file paths.
There are all sorts of ways to organize file paths, including using
here() and augmenting your
.RProfile file to auto-detect your working directory. This page is a great read on the topic. I used to use
here(), but have since moved away from it, since it creates loads of temporary files that interfere with my git workflow.
Instead, what I do now is start my scripts with a list that stores a character string for each commonly used path in my working directory (a trick my friend Ben showed me).
# working directory wd <- list() # commonly used paths in my working directory wd$data <- "C:/Users/richpauloo/Documents/Github/my_project/data/" wd$output <- "C:/Users/richpauloo/Documents/Github/my_project/output/"
I access each list element with the
##  "C:/Users/richpauloo/Documents/Github/my_project/data/"
##  "C:/Users/richpauloo/Documents/Github/my_project/output/"
How would we use this? For example, let’s say we want to bring in the file
survey_results.csv in the
data folder, make a plot, and then export that plot to the
# read in survey results sr <- read_csv(paste0(wd$data, "survey_results.csv")) # make a plot p <- ggplot(sr, aes(age, height)) + geom_point() # save plot to output ggsave(p, filename = paste0(wd$output, "plot_1.png"))
Notice that we use
paste0(), which concatenates the two character strings without a separator. Since we ended our file paths in
wd with a
/ earlier, this works beautifully. Alternatively, you could not end your file paths with
/, and use
paste(sep = "/"), but that’s more typing every time you want to reference a file path.
I should mention that this approach works very well for me personally, but it’s not portable code that I can send to a collaborator without them doing any work. Anyone else opening these files on another computer or cloning them from Github will need to change file paths to make the code run. This is where
here(), and to a greater extent, Docker, excel. For my independent research and side projects though, the approach I outlined above works perfectly for me.
If you’re more interested in learning more about Efficient R programming, I strongly recommend the book by that title! It’s free online here.