You will have three individual assignments and one group project:

Deadlines are all by 11.59 pm of the due date

For help on your homework mechanics, please see this list of assignment tips

Practice assignment

This is a warm-up for your future assignment. Once you have completed it, you will have set-up for the class and have a better understanding of how to set up your repository, pushing files to GitHub, writing markdown, and using R to do simple analyses.

Go to the practice assignment and start now!

Deadline: Jan 17, 2018

Paper review

Each student should review this paper in a 450-650 word summary/report. The summary should include:

  1. A brief review of the goal, findings and conclusion of the paper.
  2. A list (or mentioning) of the related datasets/databases and data types used in the study. In the case of datasets, provide some details of the data matrix and meta data.
  3. A brief review of the analytical steps in the paper with more details on some selected parts which are relevant to the course materials. You don’t need to understand all of the analysis, but should be able to identify the key analysis/method used to answer the question the paper is intended to answer
  4. Some comments and critiques about the analytical steps, alternative suggestions or improvements.

We have provided this guideline for this task. The material in the review should not be limited to merely answering the questions in the guideline, but rather using them to provide the required items listed above. Example paper reviews are also provided. Here’s another helpful resource on how to read a research paper

Delivery: You should put your paper review files in your individual course repository in .md format (notice that you don’t need to write it in .Rmd since you are not going to have any R code in it). You could edit .md files directly from github and therefore can write your full report there, but can also write it in your RStudio and then push it to your repository.

example paper review

Analysis assignment

This assignment will assess your understanding of the seminar and lecture materials. The assignment is split into two parts. Start early because this assignment will take time to be completed and perfected. Use the issues in the Discussion repo and the seminar time to ask questions. You will find most of the analysis workflow of the assignment in the seminar materials.

Questions for Part 1 can be downloaded here. Data for Part 1 is here.

Questions for Part 2 can be downloaded here. Part 2 is a continuation of Part 1, so please use the same dataset.

Final group project


See the rubrics for project.

General principles

Identify a biological question of interest and a relevant dataset. Develop and apply a statistical approach that allows you to use the dataset to answer the question.

We assume the biological question and data fall in the general area of high-throughput, large-scale biological investigations targetted by the course. Beyond that, it is wide open: methylation, SNPs, miRNAs, CNVs, RNA-Seq, CHiP-Seq, gene networks, … it’s fair game. Avoid a dataset that doesn’t have any/much quantitative data, i.e. contains only sequence or discrete data.

Note that definitive answers are not necessarily expected. Rather, aim to provide a critical appraisal of the data, the analytical approach, and the results. You will have to handle the competing pressures to “get it right” and “get it done”. Shortcomings of the data, misfits between the data or the biological question and the statistical model, etc. are inevitable. Your goal is to identify such issues and discuss them critically, without becoming paralyzed. Demonstrate understanding of the statistical concepts and methods that are the foundation of your analytical approach.

We assume the analytical and computing task will have a substantial statistical component, probably enacted via R. So beware of a major analytical or computational undertaking that is, nonetheless, not statistical (example: constructing a database). Creating useful data visualizations can be absolutely vital and is arguably statistical, but your analysis should go beyond merely creating pretty pictures (but please do include some!). Key concepts, at least some of which should come up in your analysis:

  • the (hypothesized, probably artificial) data-generating model

  • background variation, variance, signal to noise ratio, estimates and their associated standard error

  • relationship between biological factors and experimental factors, apparent relative importance in terms of “explaining” observed data

  • attention to large-scale inference, e.g. control of family-wise error rate or false discovery rate

Data considerations

Appropriate use of data

If your project involves using unpublished data, ensure your plans are known to the data providers (e.g., your supervisors), and think about implications for publishing - are you are bringing the project team in as collaborators in effect? Are you planning to publish the results of your project, and if so who will be the co-authors? It is best to deal with these questions at the outset of the project.

Privacy of project data

The projects are not made public (other than being on a poster in the lobby of EOS for a few hours). The project report materials are loaded into Github, the secure site we use to manage the course. The course staff and instructors are the only people who have access to the projects other than the other members of the project group. The data used can be uploaded to the project, but this can limited or omitted if there are special concerns about privacy etc. - it’s primarily the code and write-up about the results that needs to be provided for evaluation.

You can read Github’s security and privacy policies.

Group makeup

Groups should have 4 to 6 members. We strongly encourage that groups be diverse in terms of backgrounds. In practice, this probably means the students should registered in a mix of programs/departments. All groups and group projects must be approved by the instructors.

STAT 540 Homework Submission Instructions

GitHub You all have a private repository in STAT540-UBC organization account, i.e., the repo zz_lastname-firstnmae_STAT540_2018. We assume that

IMPORTANT NOTE: use the repository within the organization assigned to you to submit all your course work (i.e., the repo zz_lastname-firstnmae_STAT540_2018). Do not use branches or other repositories.

Set-up your private GitHub repo for homework

R Markdown


What to put (or not put) into your Git(Hub) repository

This is rather specific to STAT 540 and may not necessarily reflect your workflow in the future and in other contexts.

How to “turn in” your homework

If you’re concerned that something hasn’t gone right with the submission, send Emma, Julia and Ogan (,, an e-mail with your assignment attached. Note: this is only an emergency back-up plan. We will pester and work with you until you eventually get it submitted via GitHub.