• Time: 8 weeks
  • Team: 3 people
  • Role: instructional design
  • Methods used: cognitive task analysis, structured interview, think-aloud, A/B testing
  • Tools used: CMU OLI platform

We created a 40-minute online module to teach complete novices introductory Mandarin Chinese, and conducted an A/B test to see which version of the course resulted in better learning outcomes. The average improvement from pre-test to post-test (learning gain) across versions was 52%.

We followed a backwards design process: we began by defining the learning goals, created assessments to measure progress towards those goals, designed instruction to address the learning goals, then planned our A/B test. However, as the design process is iterative, we revisited different stages as necessary. The final version contains a pre-test, 4 pages of multimedia instruction including formative assessment questions, and a summative post-test.


Defining and Refining Learning Goals

We conducted a structured interview with a professor who teaches Carnegie Mellon’s introductory Mandarin course to learn about how she teaches complete beginners Mandarin. Our key insights and implications for our instructional design from this interview were:

  • In her course, they spend about 2 weeks focusing only on pronunciation → not feasible for us to teach this in a 40-minute module
  • In her opinion, it would not be possible to teach students to pronounce Mandarin tones properly in only 40 minutes → we should only have students understand what the tones are, not reproduce them in speech

Initially, we had wanted to teach learners to speak basic phrases, however, after this structured interview it became clear that that was likely beyond the scope of our module. In addition to this, when we began to consider assessment design, we realized it would be very difficult to assess these speaking goals. We therefore revised our goals to focus on listening and reading Pinyin rather than speaking Mandarin.

Our final learning goals for the module and their classification using the KLI framework were:

  1. Given audio of a Mandarin phrase, students will be able to write the English equivalent (skill)
  2. Given an English phrase, students will be able to recall Pinyin for equivalent phrase in Mandarin (fact)
  3. Recall how to modulate your pitch for each of the four tones in Mandarin (fact)
  4. Given a Mandarin statement, students will be able to apply the ‘ma’ rule to make a yes/no question from the statement (rule)


Designing Assessments

We mapped our assessment questions to our learning goals to ensure that each goal was targeted by two questions in the pre-test and two questions in the post-test. We also used the KLI framework to help us assess different types of goals appropriately (for example, goals to apply rules are assessed by requiring the learner to apply the rule rather than to recall what the rule is). Formative assessment was used throughout the module to provide the learner with plenty of practice opportunities.

We tested our assessment questions by doing cognitive task analysis with someone who had learned some Mandarin but was not fluent. The goal of this was to see how difficult the questions were for her, and try to learn about her thought processes while answering the questions. To achieve this, I used a think-aloud protocol. The participant was able to answer all of the questions easily with only a few small mistakes, which indicated that the material would probably be at the right level of difficulty for a true novice.

We also noticed that she had a particular approach to listening questions: she would listen to the phrase and pick out a word or subphrase she knew, then listen to the phrase again to try to find another word she knew, and continue until she either knew the whole phrase or knew she didn’t know a word and had to make a guess.

Example of listening strategy used by think aloud participant


Designing Instruction

Brainstorming ideas about learning content

To determine the order in which to introduce content, we considered the dependencies between our learning goals (for example, students must know some phrases (goal 1) before they can apply the ma rule to turn them into questions (goal 4). We also wanted to manage the cognitive load on the learner, so we split the instruction into 4 pages.






Audio narration: “The third tone is falling and then rising, and sounds like this…” Note that there is no on-screen text

We designed our instruction using evidence-based e-Learning principles:

  • We used videos and visuals as well as text (multimedia principle)
  • In videos, audio narration was used and was not duplicated by on-screen text (modality and redundancy principles)
  • Friendly and informal language was used (personalization principle)
  • Instruction split into 4 pages rather than being all on one page (segmentation principle)



Testing the Module

We tested our first draft of the learner with two novices using a think-aloud protocol. Learners took about 45 minutes to complete the module, which was within our time range. We did make two changes to the module following observations during learner tests:

  • We added suggestions about how to use the vocabulary table (one learner was unsure about how to use the table, listening to just a few items and ignoring the rest)
  • Added a welcome message to reassure learners that they are not expected to be able to answer questions on the pre-test, it is just for research purposes (learners were concerned that they had to guess on the pre-test)


A/B Testing

4 versions of the course

We were curious whether teaching the strategy for answering listening questions that our first think-aloud participant had used would result in better learning outcomes. To see if this was the case, we conducted an A/B test with a version of our module that suggested learners use such a strategy, and a version that did not. We also used a counterbalanced design to ensure that our results were not skewed by the pre-test being significantly more difficult than the post-test or vice versa. To do this, we created two sets of 10 different questions, which we refer to as “Form 1” and “Form 2”. For each of the A and B versions, we used Form 1 as the pre-test and Form 2 as the post-test for half of the participants, and Form 2 as the pre-test and Form 1 as the post-test for the other half. There were therefore 4 versions of the course in total.

We tested the module with 29 participants: between 6 and 9 participants for each version.


Learning gains: improvement from pre-test to post-test for each course version

All 4 versions showed learning gains (52% on average). There was no significant difference between Version B with listening strategy instruction and Version A without it; in fact, Version A actually had slightly larger learning gains than Version B (56% compared to 48%). Perhaps many learners already use such strategies to answer listening questions even when not instructed to. It is also possible that the sample sizes were simply too small.



Next Steps

If we were to continue with this project, I would want to test another iteration of the unit, making targeted improvements based on the data we gathered. I would be interested in running an A/B test with a larger sample size to provide a more definitive result about including listening strategy instruction. I would also be interested in conducting more think-alouds of learners doing the A version (no listening strategy instruction) to gain more insight into how novices answer listening questions when not prompted with listening strategies.