Studying game design for increasing code review participation

As @kytrinyx pointed out in #88, Exercism currently struggles to get adequate participation in peer code review. The suggestion to gamify Exercism in various ways (e.g. reputation like StackOverflow) has been raised many times, but @kytrinyx has rightly objected, fearing that review quality will decrease and that users' intrinsic love of discussing code and doing reviews will be damaged, citing Alfie Kohn’s book Punished by Rewards (1999). More recent research by Deterding (2015) and others have suggested however that gamification done right can actually amplify intrinsic motivation.

This issue outlines an exploratory study I'll be performing with a fork Exercism with a smaller group, with the aim to gather evidence for whether gamification can be used to motivate code reviews in Exercism without harming intrinsic motivation. Although the limitations of this study will make it hard to extrapolate, it should hopefully give us an idea of whether the theory and application is on the right track. 

During this project, I will submit my proposed changes as pull requests individually for discussion, including the experiment itself. @kytrinyx I would be interested to know if the Exercism maintainers would be potentially interested in running this same experiment for a random subset of its users. Doing so would greatly increase the validity of the results, because of its larger number of participants, large body of submitted work, and users who are inherently intrinsically motivated to use Exercism. This can be done on your own time table, independent of my graduate work, for the benefit of your own understanding. These results can help inform the next generation of Exercism's design, described in #113. Would you be open to experimenting on public exercism.io?

### Outline

* [Background Research](#user-content-background-research)
* [Study Overview](#user-content-study-overview)
* [Proposed Improvements](#user-content-proposed-changes)

<h2 id="user-content-background-research">Background Research</h2>

### Motivating Effective Learning Practices

As teachers, managers, and parents, we often want to help motivate learning practices that we believe to be effective. For instance, as a software engineer, I know from experience and from research that practice and peer review (Wang et al., 2012) are two practices that have substantial positive impact on learning software development. Despite that, learners often do not see past these practices' challenges (e.g. putting your work out in the open for judgement), and miss the real value that overshadows the challenge involved. The Exercism coding challenge platform struggles with this very motivation problem in its code review system. Gamification promises to be a means to providing additional motivation that can help learners overcome hesitation and discover the value of these activities (Muntean, 2011). 

### Gamification Can Be Quite Effective

A growing number of empirical studies are demonstrating that gamification can be effective in shaping behavior. Seaborn & Fels (2014) performed a review of 37 empirical studies of gamification. Of these studies, 63% reported that gamification positively influenced learning, enjoyability, participation, engagement, etc. Hamari (2014), author of another literature review of empirical studies on gamification, suggests that given the right context, implementation, and users, gamification provides the positive effects that gamified system designers hope for. People see these benefits and many times have suggested that Exercism implement points, leaderboards, badges, upvotes, etc.

### Extrinsic Rewards Permanently Diminish Intrinsic Motivation

Earlier research on motivational psychology by Deci, et al. (1999) found that “extrinsic rewards… significantly undermined free-choice intrinsic motivation.” Alfie Kohn (1999) in his book Punished by Rewards expands on Deci's and others' work to suggest that points, grades, rankings, and all other extrinsic motivators that are commonly used in education replace children's innate desire to learn with a focus on rewards such as grades, and that this negative effect tends to endure. For example, children who were rewarded for playing certain math games avoided those same games when the rewards stopped coming, while children who had never been rewarded continued to play with and enjoy them.

### Avoiding Harming Intrinsic Motivation

Many researchers do recognize the importance of intrinsic motivation and suggest that gamification can be used correctly to enhance rather than reduce intrinsic motivation. According to Deterding et al. (2012), in gamifying a system, we must design to “amplify the intrinsic motivations of their employees, fans, and customers” (p. 17). “The pleasures of games arise not from such system feedback [such as points, badges, and leaderboards], but from ‘meaningful choices’ in the pursuance of ‘interestingly hard goals’” (p. 14).

Deci & Ryan (1980) who described how extrinsic motivators harm motivation also put forth self-determination theory, which describes how to bolster internal motivation. Self-determination theory has become “arguably the empirically most well-researched psychological theory of intrinsic motivation” (Deterding, 2011) and has served as a basis for several gamification frameworks and studies (Seaborn, 2015). This theory posits that in order to facilitate intrinsic motivation, people must satisfy their need for competence, relatedness, and autonomy.

Autonomy is the need most commonly violated in gamification implementations. “Most deployments of gamification represent ‘exploitationware,’ in that they extract real value from users and employees in return for mere virtual tokens” (Deterding et al., 2011). Instead, a gamified system should be designed to help the user satisfy her intrinsic goals. When it does not, users' sense of autonomy diminishes, replaced by a feeling of being coerced or controlled.

Mekler et al. (2013) suggested that when a gamified system gives feedback, the user can perceive the feedback as either informational or controlling. When perceived as informational, feedback supports his competence need by helping communicate the progress he has made toward his learning goals. When perceived as controlling for the benefit of the system creator, the system then thwarts his feeling of autonomy and his intrinsic motivation to use the system. Furthermore, they suggested that studies finding gamification harmful may have been due to pressure to engage in social networking in the service of their employer, a form of controlling. The empirical study they performed suggested that user-centered gamification did not harm intrinsic motivation, at least in the short term. Their findings were backed by psychology research by Kruglanski, suggesting that “rewards may either undermine or enhance intrinsic motivation depending on whether they are endogenous or exogenous to a given task” (as cited in Mekler et al., 2013). This research suggests that a system can be gamified without harming intrinsic motivation if goals align and feedback is more informational than controlling.

Research by Deterding (2011) also supported these ideas and added two additional factors important to a sense of autonomy, voluntariness and lack of consequence. Citing research by Caillos & Barash and Ludens, Deterding states, “The overwhelming majority of theoretical discussions enlist voluntary engagement and lack of serious consequence as attributes defining play”. Being voluntary and lacking consequence is a significant source of the autonomy that is such an important contributor to intrinsic motivation. As an example, leaderboards can be intrinsically motivating in voluntary games without real consequence, because they are primarily informational, showing a person where they stand. On the other hand, when leaderboards are used in a business sales context to promote competition, participation is neither voluntary nor free of consequence, being tied to cash incentives. The leaderboard in this context is a controlling, extrinsic motivator (Deterding, 2011).

In summary, a gamified system can preserve and enhance intrinsic motivation by aligning goals, and designing for autonomy through informational messaging, voluntary participation, and avoiding serious consequence. This approach has been demonstrated to work well in case studies such as Deterding's (2015), and seems applicable in the context of Exercism.

<h3 id="user-content-skill-atoms">Amplifying the Joy of Code Review</h3>

Deterding (2015) put forth a method of gameful design, a.k.a. gamification, that aims to amplify intrinsic value. As part of this method, he applies the concept of skill atoms, which are feedback loops organized around a challenge or skill that consists of smaller recurring components that make it game-like. Let's take a look at how the skill atom components apply to the Exercism platform as it exists now, both for completing exercises and doing code review.

##### Exercise Completion

- **Goals**: Complete coding exercises 
- **Actions**: Run the tests, submit the code 
- **Objects**: Written code, provided test suite, development environment 
- **Rules**: All tests must pass to complete an exercise; you must complete an exercise before viewing others' solutions 
- **Feedback**: Individual automated tests pass, error, or fail; qualitative peer feedback 
- **Challenge**: Determine how to complete each exercise; write well enough to impress peer reviewers; implement suggestions effectively 
- **Motivation**: Learn the language being studied; improve code writing skill (satisfy competence need) 

For exercise completion, every component necessary for a gameful skill atom is present. Notably, the learners' motivations are (assuming knowledge of the benefits of practice) obviously in line with the system's goals, the challenges are sufficiently challenging, and informational feedback on one's progress is present. Participation is voluntary, and there are no serious consequences. The needs for autonomy, competence, and relatedness are largely satisfied. Therefore, we can expect this activity to be intrinsically motivating. Based on the quantity of exercise participation compared with peer review participation, it seem that we do indeed observe this to be true.

##### Code Review

- **Goals**: After completing an exercise, a call to action suggests, “see related solutions and get involved here: [url]” 
- **Actions**: Read others' code; leave comments 
- **Objects**: Others' code submissions 
- **Rules**: None 
- **Feedback**: Comment replies, if any 
- **Challenge**: Write effective feedback; communicate ideas; overcome fear of judgement 
- **Motivation**: Expand grasp of the language; improve code reading and critiquing skill; improve ability to communicate ideas (satisfy competence need). Contribute knowledge to the community, and help others' learn (satisfy relatedness need). 

The code review system, viewed through a skill atom lens, is poorly designed for amplifying intrinsic value. The goals that the system communicates are weakly linked with learners' most likely motivations. Therefore, there is little motivation to overcome the challenges involved with providing good feedback and asking good questions. There are no expressed rules or guidelines to help focus reviews. Review feedback appears to be uncommon, and there are no suggestions for how to gain the most benefit from the code review process. There are plenty of intrinsic benefits in code review that could be amplified by Exercism, but these are not communicated effectively such that users will learn about or be reminded of them.

### References

Deci, E. L., & Ryan, R. M. (1980). Self-determination theory: When mind mediates behavior. The Journal of Mind and Behavior, 33-43.

Deci, E. L., Koestner, R., & Ryan, R. M. (1999). A meta-analytic review of experiments examining the effects of extrinsic rewards on intrinsic motivation.

Deterding, S., Sicart, M., Nacke, L., O'Hara, K., & Dixon, D. (2011, May). Gamification. using game-design elements in non-gaming contexts. In CHI'11 Extended Abstracts on Human Factors in Computing Systems (pp. 2425-2428). ACM.

Deterding, S. (2011). Situated motivational affordances of game elements': A conceptual model. Gamification: Using Game Design Elements in Non-Gaming Contexts. In A Workshop at CHI.

Deterding, S., Antin, J., Lawley, E., & Paharia, R. (2012). Gamification: Designing for Motivation. interactions, 19(4), 14-17.

Deterding, S. (2015). The lens of intrinsic skill atoms: A method for gameful design. Human–Computer Interaction, 30(3-4), 294-335.

Hamari, J., Koivisto, J., & Sarsa, H. (2014, January). Does Gamification Work? — A Literature Review of Empirical Studies on Gamification. In System Sciences (HICSS), 2014 47th Hawaii International Conference on (pp. 3025-3034). IEEE.

Kohn, A. (1999). Punished by rewards: The trouble with gold stars, incentive plans, A's, praise, and other bribes. Houghton Mifflin Harcourt.

Mekler, E. D., Brühlmann, F., Opwis, K., & Tuch, A. N. (2013, October). Do points, levels and leaderboards harm intrinsic motivation?: an empirical analysis of common gamification elements. In Proceedings of the First International Conference on gameful design, research, and applications (pp. 66-73). ACM.

Muntean, C. I. (2011, October). Raising Engagement in E-learning Through Gamification. In Proc. 6th International Conference on Virtual Learning ICVL (pp. 323-329).

Ryan, R. M., & Deci, E. L. (2000). Self-determination theory and the facilitation of intrinsic motivation, social development, and well-being. American psychologist, 55(1), 68.

Seaborn, K., & Fels, D. I. (2015). Gamification in Theory and Action: A survey. International Journal of Human-Computer Studies, 74, 14-31.

Wang, Y., Li, H., Feng, Y., Jiang, Y., & Liu, Y. (2012). Assessment of Programming Language Learning Based on Peer Code Review Model: Implementation and Experience Report. Computers & Education, 59(2), 412-422.

Zichermann, G., & Cunningham, C. (2011). Gamification by Design: Implementing Game Mechanics in Web and Mobile Apps. " O'Reilly Media, Inc.".










<h2 id="user-content-study-overview">Study Overview</h2>

The aim of this project will be to use gamification techniques to increase participation in code review on Exercism, and to measure how intrinsic motivation, measured by participation, is affected after removing gamification. “Gameful design should focus on challenges inherent in the user’s goal pursuit” (Deterding, 2015). With this in mind, I will design a skill atom around code review on Exercism that helps to motivate participation in such a way that is intrinsic to the act of code review, which will continue to motivate code review beyond the scope of Exercism.

#### Problem

Can gamification enhance participation on a voluntary without harming the intrinsic motivation for the task being motivated?

#### Hypothesis

Removing game elements and mechanics encouraging peer review on Exercism after they are present will not decrease participation below levels before they were introduced.

#### Control

Because participation may change over time due to extraneous variables, such as natural gaining or waning interest over time, a randomly-selected 50% of participants will form a control group that will be used to measure participation trends that are extraneous to the experiment. This control group will be presented with a version of Exercism that contains no gamification modifications.

#### Method

The duration of the study will be split into 3 <strike>1-week</strike> periods: baseline, gamification, and withdrawal. The gamification period will introduce game elements and mechanics, and the withdrawal period will restore things to how they were at the baseline.

Edit: See [comment](https://github.com/exercism/discussions/issues/123#issuecomment-286244547): use existing data as the baseline, try gamification for 2 weeks, and withdraw for 1 week before analysis.

#### Measurement

Only people who participate during both the baseline and gamification periods can qualify as participants in the study. Decreased code review participation is expected during the withdrawal period, but to confirm the hypothesis, participation levels should not fall below the control group's participation levels.

Participation will be measured in quantity and size of participants' comments. Size is approximated to be a rough, quantifiable measure of quality.

Following the experiment, a survey will be distributed to participants to gain further understanding of participants' self-reported motivations, goals, and experience.

#### Internal Validity

There are a few factors that will limit internal validity, or the degree to which the results are attributable to gamification and not some other rival explanation.

- Participant count: the relatively small probable size of the participant pool will make it difficult to demonstrate statistical significance 
- Participants knowing that this project has a short-term life or is a toy project may decrease their motivation to participate in peer review. I will attempt to overcome this shortcoming by advertising that solutions and review comments may be transferrable to the official exercism.io site after the project, but this issue is nonetheless present. 
- The participants in this study are not of the same demographic that typically uses Exercism and may have different motivations, as described in the following section. 

#### Participant Motivation

To test my hypothesis, I will need participants to use my modified Exercism platform. The best source of participants in this study would be the existing users of Exercism—the people who are already intrinsically motivated to use the system, as it’s entirely voluntary. These users generally have no extrinsic reward motivating them to use or continue using it.

Problematically, with a project term of only 9 weeks, the project would be infeasible with the inevitable delays associated with discussing with a large group how to approach gamification, peer review, acceptance, release, distribution, and user updates. To avoid these delays, I plan to distribute my own copy of Exercism, and solicit volunteers to use it from various sources.

Our class has been given “participation tokens”, which translate to points in our grade, to help encourage classmates to participate in each other’s projects. That's great, except research suggests these participation rewards will decrease intrinsic motivation, which I’m trying to measure. Would this undermine my entire experiment? Potentially not. Although providing participation tokens is indeed extrinsic motivation, if what I reward people for doing (e.g. participate on exercism, which usually takes the form of completing exercises) isn’t the same as what I measure (i.e. comments or code reviews), then I should still be able to measure changes in intrinsic motivation toward doing code review.

#### External Validity

If participation stays at least as high, we can deduce that intrinsic motivation to perform code review was not harmed. Can this be generalized? This study is largely a test of Deterding’s (2015) method of game design that is proposed to enhance intrinsic motivation. The results of this study cannot be generalized to gamification that does not follow this approach. Furthermore, this study looks at a voluntary activity and cannot be generalized to compulsory activities such as school or work performance, unless the activity is strictly voluntary within that context.

### Schedule

- Week 1 (ending Mar 12): “Plan” 
    - Design an experiment that will demonstrate that will explore whether or not engagement is damaged by my gamification implementation
    - Post ideas for feedback on the Exercism discussion board 

- Weeks 2–3 (ending Mar 26): “Modify” 
    - Implement the proposed designs 
    - Deploy: Host the website and a build of the command-line tool 
    - Craft a pitch for requesting participation 
    - Create installation and uninstallation instructions for the forked projects

- Weeks 4–6 (ending Apr 16): “Experiment”
    - Collect experiment data 
    - Create a survey and send to participants

- Week 7 (ending Apr 23): “Analyze”
    - Analyze and report on data findings 
    - Report findings on the Exercism discussion board 










<h2 id="user-content-proposed-changes">Proposed Improvements</h2>

These are ordered by my original estimate of usefulness and ease of implementation. Even if these change, numbering will stay consistent.

### 1. Prompts ✔️ 

Implemented in exercism/exercism.io#3427.

**Usefulness**: High | **Complexity**: Low

**Summary**: Provide varied prompts to elicit thought about some aspect of the submission being read. 

**Goal**: Turn more of the same into varying challenge. Varying challenges give the user experiences of mastery and help the task not become boring. This increases users' sense of competence and builds intrinsic motivation.

Examples:

- Is this easy for you read and understand? What might make it easier?
- Is there anything in this submission that you don't understand or have questions about?
- Do you understand why the author made the design choices they made? In what situations might these decisions be good?
- What trade-offs can you identify being made in this submission? When would this be a good choice, and when would you want to try something different?
- Will this solution be performant at scale? Is efficient performance likely to matter for this problem?
- Where does this solution's formatting stray from community style guides? If any, do the variances matter?
- Would you want to want to work in a codebase comprised of code similar to this? What do you like and dislike about it?
- What principles could the author learn and apply that would improve this solution?
- Did this solution use an appropriate amount of abstraction? What changes might have made the code easier to understand?

### 2. <strike>Email notifications & digests</strike>

This is probably not suitable for this experiment. Although I think restoring informational email notifications would be very helpful for improving participation, on second thought, I am wary of using this as a measurement of change in intrinsic motivation. After beginning to receive emails, people will certainly expect that notifications will continue in the future, and assume that no email equates to no activity. Discussion #91 has some good ideas for the future.

**Summary**: Summary emails of others' comments on your submissions, and when there are reviews needed for stages you've completed.

**Goal**: Good games use A's action to call B to action, and vice versa. These interactions build users' sense of relatedness with others and therefore their intrinsic motivation.

### 3. Have another!

**Usefulness**: High | **Complexity**: High

**Summary**: Reveal the next entry after leaving a review/comment.

**Goal**: Provide the next best action to maintain flow.

I envision after clicking on the submit button, the next entry is revealed above the footer. This is similar concept to infinite scrolling, except the trigger is commenting, rather than scrolling to the bottom. Below the comment button, a message appears, worded something like:

> _username_ could use a review of this recent revision. Care to look before moving to the next exercise?

### 4. Onboarding

**Usefulness**: Medium | **Complexity**: Low

**Summary**: Once per user, introduce people to the benefits of reading code, asking questions, and offering feedback.

**Goal**: Share a clear, common goal connected with the user's motivation. Help make this connection for the user, which should help draw out intrinsic motivation.

Anyone want to help write this copy?

### 5. Sentence starters ✔️ 

Implemented in exercism/exercism.io#3435

**Usefulness**: Medium | **Complexity**: Low

**Summary**: Fill the comment box with a random example lead-off sentence.

**Goal**: Templates help overcome barrier of starting from a blank page, making the path easier to success. Helping users find success builds their sense of competence and therefore their intrinsic motivation.

### 6. Reuse comments ✔️ 

Implemented in exercism/exercism.io#3437

**Usefulness**: Medium | **Complexity**: Medium

**Summary**: Allow copying prior comments from history, so providing common feedback is less tedious.

**Goal**: Automate away what has already been mastered, so the user can focus on the real challenges. Focusing on the real challenges and not annoyances builds their sense of competence and therefore their intrinsic motivation.

### 7. Visualize impact of comments

**Usefulness**: Medium | **Complexity**: High

**Summary**: Graph over time (or otherwise display) the percentage submissions that were revised as a new iteration after you commented.

**Goal**: Provide informational feedback to build the contributor's sense of confidence.

The causation connection between a comment the presence of a new iteration is weak, but there can often be a correlation.

This could be gamed by leaving a short comment on everything, or on users' entries who already clearly do several iterations. But if this graph was just for yourself to see, would people really be motivated do that?

### 8. Call to Action ✔️ 

Implemented in:
- exercism/exercism.io#3414
- exercism/cli#377

**Usefulness**: High | **Complexity**: Medium

**Summary**: After submitting an exercise in the CLI, give a call to action to review other submissions. Talk briefly about the benefits, and suggest a goal.

**Goal**: Provide the next best action to maintain flow. Help make the connection between code review and the user's own motivation for using exercism. Set a goal that will help accomplish this purpose.

Uh oh!

Studying game design for increasing code review participation #123

Description

Outline

Background Research

Motivating Effective Learning Practices

Gamification Can Be Quite Effective

Extrinsic Rewards Permanently Diminish Intrinsic Motivation

Avoiding Harming Intrinsic Motivation

Amplifying the Joy of Code Review

Exercise Completion

Code Review

References

Study Overview

Problem

Hypothesis

Control

Method

Measurement

Internal Validity

Participant Motivation

External Validity

Schedule

Proposed Improvements

1. Prompts ✔️

2. Email notifications & digests

3. Have another!

4. Onboarding

5. Sentence starters ✔️

6. Reuse comments ✔️

7. Visualize impact of comments

8. Call to Action ✔️

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions