Essays, Posts & Presentations

Cataloguing User Error Types on EIWL Exercises

Roel Fledderman

Author

Roel Fledderman

Title

Cataloguing User Error Types on EIWL Exercises

Description

Our study aims at finding patterns of errors and error types within the data available, using machine learning and other methods.

Cataloguing User Error Types on EIWL Exercises

Roel Fledderman

Users can check their answers to exercises from An Elementary Introduction to the Wolfram Language in the Wolfram Cloud. Our study aims at finding patterns of errors and error types within the data available, using machine learning and other methods. Within this context, we also aim to identify additional pointers to address misunderstandings.

Intro

In addition to reading An Elementary Introduction to Wolfram Language online, users can find additional exercises and check their solutions in the Wolfram Cloud.

Picture 1: Doing an exercise online.At the time of writing, a user receives as feedback either CORRECT or TRY AGAIN. In order to teach Wolfram Language optimally, additional methods could be implemented such as autocompletion and other forms of automated feedback/assistance.Initially we conceived of two main types of user errors: semantic errors and syntactic errors (a third category could be labeled wrong question). There could be some overlap between the two types, but in general a semantic error originates from a misunderstanding of the relevant concept being taught. In other words, the user needs some additional reading or thinking, or concrete examples, to correct the error. On the other hand, a syntactic error could be considered as a mistake when writing the code - kind of like a typo.For a brand new user, going through the first few chapters, there may be no real difference yet between a semantic or syntactic error. However, making syntactic errors can prevent grasping the main concept as the code is not evaluated correctly and the user is merely prompted to try again. Fixing this layer of errors should enable faster understanding of the semantics at hand. In addition, detecting and correcting syntax errors uses a relatively small set of patterns and rules. It may therefore be easier to automatically address than conceptual errors. The low hanging fruit, so to speak.So in this study we focused primarily on addressing the syntactic error type. First by summarizing a bunch of user error data, and then by applying machine learning to determine practical potential.

Syntactic Errors

An Example

As an example, consider the following exercise and submitted error:

Out[]=

12.1 Generate the sequence of notes with pitches 0, 4 and 7.

Sound[{SoundNote[0],SoundNote[4],SoundNote[7]}

Working from the Wolfram Cloud, the user would have seen an indication of their error while typing their code:

Picture 2: Emphasized error while typing.Maybe the emphasis wasn’t obvious enough, or the user didn’t know how to correct it. Either way, the rest of the syntax seems rather correct and the user was probably able to immediately submit correct code upon trying again.

Dataset and Method

Our main dataset contained 5670 recent submissions that had been flagged by the scoring engine to contain a syntax error. The syntax errors vary from the previous example to somewhat more severe cases.

Then we ran CodeParser (available with Version 12.3) against those lines and analyzed the different categories.

As a final step we applied machine learning on the general data (Classify), and then on a set of manually curated data (Transformer NN).

Results

The following graph shows an overview of submissions per CodeParser syntax error type:

Out[]=

	UnterminatedString
	UnterminatedGroup
	UnterminatedComment
	UnsupportedToken
	UnhandledCharacter
	UnexpectedCloser
	PatternColonError
	parsingError

	OpenSquare
	OpenParen
	GroupMissingCloser
	ExpectedTilde
	ExpectedOperand
	ExpectedLetterlike
	CommaTopLevel

The category answerEmpty has been removed from these graphics. These were instances were the user didn’t enter any code and then wanted to check their answer. We suspect that in these cases users were testing the platform, maybe looking for some hints. But since they don’t provide a body to do syntax analysis on they have been dropped from further investigation.

As explained in this video parsing code can be a tricky business, which explains the parsingError category. In most cases, the parser returned multiple inspection objects. We’ll come back on these in a later section.

Of the resulting types, three main types stand out: UnterminatedGroup, OpenSquare and CommaTopLevel:

Out[]=

	UnterminatedGroup
	parsingError
	OpenSquare
	CommaTopLevel
	UnexpectedCloser
	OpenParen
	ExpectedOperand
	GroupMissingCloser

	PatternColonError
	UnterminatedComment
	UnhandledCharacter
	ExpectedLetterlike
	UnsupportedToken
	ExpectedTilde
	UnterminatedString

Let’s have closer look at each.

UnterminatedGroup

What we see in this example is a closing “]” at the end of the input:

Out[]=

ref	000128
type	UnterminatedGroup
exercise	3.6 Plot a list that counts up from 1 to 100, then down to 1.
originalSyntaxString	ListPlot[Range[100], Reverse[Range[100]]
improvedSyntaxString	ListPlot[Range[100], Reverse[Range[100]]]

In other words, the user neglected to close the brackets for Join[], giving us this bracket pattern:

Out[]=

[[][[]]

It tends to happen a lot at the end of a line, but not always. In some cases, especially in the first chapters, users seem confused about which type of brackets to use too. For example:

Out[]=

ref	000120
type	UnterminatedGroup
exercise	2.2 Compute 2 × (3+4) using Times and Plus .
originalSyntaxString	times(2(plus(and 4))
improvedSyntaxString	times[2,plus[, 4]]

Relatively many examples seem to happen during the first few chapters, which seems to suggest users quickly learn to check their brackets.

OpenSquare

In below case the number of opening and closing brackets match, but there appears to be a ‘headless’ list of arguments [__]:

Out[]=

ref	000067
type	OpenSquare
exercise	x4.2 Make a bar chart of the sequence 1, 2, 3, ..., 9, 10, 9, 8, 7, ..., 1.
originalSyntaxString	BarChart[Range [9], [Reverse Range [10]]]
improvedSyntaxString	BarChart[Range [9], Reverse Range [10]]

Sometimes removing the opening bracket leads to improved syntax. But note that in the example above this would reveal a semantic error.

In other cases it’s unclear wether the user forgot to include a head:

Out[]=

ref	000087
type	OpenSquare
exercise	x2.3 Compute (8+7)*(9+2) using Times and Plus .
originalSyntaxString	Times[Plus[8,7],[9,2]]
improvedSyntaxString	Times[Plus[8,7],Plus[9,2]]

Other causes could be confusion around which bracket type to use, or missing the shift key when typing.

CommaTopLevel

This error seems to be related to lists, and listable functions, and how to properly use them:

Out[]=

ref	000024
type	CommaTopLevel
exercise	3.5 Use Range , Reverse and Join to create {1, 2, 3, 4, 4, 3, 2, 1} .
originalSyntaxString	Join[{1,2,3}],Join[Reverse[Range[{1,2,3,4}]]]
improvedSyntaxString	Join[Range[4],Reverse[Range[4]]]

Machine Learning

Training a Classifier

We trained a classifier to a set of 900 examples, 300 for each error type. The results are below, and they didn’t seem discouraging so we decided to do a quick pilot on training a transformer neural net using a manually curated dataset.

Out[]=

Classifier Measurements

Classifier method	Markov
Number of test examples	90
Accuracy	(48. ± 5. ) %
Accuracy baseline	(33. ± 5. ) %
Geometric mean of probabilities	0.191 ± 0.041
Mean cross entropy	1.66 ± 0.22
Single evaluation time	6.5 ms/example
Batch evaluation speed	3.67 examples/ms

Training a Transformer Neural Net

Below you’ll find some examples of manually curated syntax. The idea was to only fix the syntax in a minimal way, while trying to be conscious of the exercise the student was working on, and also the level of knowledge the student may have had. For example, in chapter one functions haven’t been introduced yet, so “1,2” will be corrected as 1+2 instead of Plus[1,2].

On top is listed for each example the user syntax and at the bottom the improved syntax:

Out[]=

RandomInteger [Range,[10]]

RandomInteger [Range[10]]

Out[]=

Join[Reverse[Range[20],[Range[20]]]

Join[Reverse[Range[20]],[Range[20]]]

Out[]=

Table[i*[i+1],{i,1,1000}]

Table[i*(i+1),{i,1,1000}]

The results object of the training process, based on a set of ~1100 training examples:

NetTrain Results

summary

batches:

1080,

rounds:

30,

time:

6.9min

examples/s:

83.

data

training examples:

1152,

validation examples:

160,

processed examples:

34560,

skipped examples:

method

ADAM

optimizer

batch size

32,

CPU

round

loss:

1.78×

-1

error:

5.28%

validation

loss:

7.49×

-1

error:

15.3%

‹

"loss"

›

	"rounds"
"loss"

"training set"

"validation set"

What is fun to see is that the transformer is actually doing at least some transformations right, as illustrated below:

"ListPlot[Join]Range[100],Reverse[Range[100]]"
"ListPlot[Join[Range[100],Reverse[Range[100]]]]"

codeCorrector@"Join[[Range[20]],Reverse[Range[20]]]"
"Join[Range[20],Reverse[Range[20]]"

"StringTake[[StringJoin[Alphabet[]]],5]"
"Sort[Table[Style[5],List[]]]"

"WordCloud[#]&/@[\"apple\", \"peach\", \"pear\"]"
"Tolumn[Stylph\"Ule[\"UpharColurarte[\"g\", \", Uph\", \"], \"], \"]"

"Sound[{SoundNote[0],SoundNote[4],SoundNote[7]}"
"Sound[{Stylus[],Redoloundowandour[4,\"],\"]"

"List[Range[RandomInteger[10]]"
"List[RandomInteger[10],RandomInteger[10]]"

"Column[ListPlot[Range[5],ListPlot[Range[5]]]"
"Column[ListPlot[Range[5]],ListPlot[Range[5]]]]"

"Times [6,8], Times [5,9] , Max []"
"Times[Plus[8, Times[5, 9]] Powes ] "

"Join[Range[4],Reverse[Range[4]]"
"Join[Range[4],Reverse[Range[4]]]"

"Table[Part[Table[{Yellow,Red,Green},RandomInteger[{1,3}],{100}]"
"Table[Part[Table[{1,2,3],Reverse[RandomIntegerse[n,{n,0}]]]}]}"

"Table[i,{i,10]"
"Table[Table[n,{1}]]"

"Range[10"
"Range[10]"

"Times 2,Plus[3,4]=14"
"Times[2,Plus[3,4]]"

Admittedly, some transformations are still plain wrong. Some improvements will be suggested in the next section.

Conclusions

Some patterns in user error were observed, mainly in different types of syntax errors.

The machine learning results seem promising, and it would be interesting to train a neural net on a bigger set of curated data (in addition to using a more standardized approach). In addition, perhaps syntax errors can be introduced manually/programmatically in submissions that are known to be correct, such as done in other studies (e.g. here).

Also the omitted data deserves a closer look. Most of the syntax returned two inspection objects when parsed, which seemed too complicated for analysis when starting out (some returned up to seventeen inspection objects). Combining the parser results for all 700+ omissions shows some additional syntax error classes and gives us this chart:

Out[]=

	ExpectedOperand
	PatternColonError
	UnexpectedCloser
	CommaTopLevel
	GroupMissingCloser
	OpenSquare
	PrefixPlus
	UnusedParameter
	UnexpectedCharacter
	StrangeCall

	UnterminatedGroup
	OpenParen
	Comma
	SuspiciousSessionSymbol
	UnexpectedSpaceCharacter
	UnterminatedString
	ImplicitTimesStrings
	UnexpectedLetterlikeCharacter
	ExpectedLetterlike
	InsertSpace

	UnexpectedNewlineCharacter
	UnrecognizedDigit
	UnhandledCharacter
	CallDifferentLine
	UnscopedObjectError
	ImplicitTimesBlanks
	PatternRule
	ImplicitTimesInSet
	IfSet
	PrefixDifferentLine

Finally, most available data has remained largely untouched - such as code without errors . Perhaps these can be of interest as well, for example to study the ratios of correct vs. incorrect submissions as a way to rank difficulty. Next, identifying exit points (where do users drop out) could reveal some additional topics users are struggling with. Another way could be to filter all available correct syntax on unique inputs then use Nearest on faulty submissions to train a neural net.

Keywords

◼

EIWL, Elementary Introduction

◼

Teaching Wolfram Language Optimally

◼

Machine Learning

◼

Data Science

◼

Syntax Errors

◼

Parsing

◼

Misunderstanding

Acknowledgment

Mentor: Jofre Espigule-Pons

I’d like to thank Jofre for his mentorship, as he was kind enough to provide gentle feedback on my coding style, on specific functions and on project focus. Also, he provided input on machine learning and did some initial training on the curated pilot set.

Others that have contributed to this project through their comments or assistance: Richard Hennigan, Silvia Hao, Jesse Friedman, Brenton Bostick & Stephen Wolfram.

References

◼

Elementary Introduction to Wolfram Language, 2/E: https://www.wolfram.com/language/elementary-introduction/2nd-ed/

◼

Code Parser: https://community.wolfram.com/groups/-/m/t/1931315

◼

Learning to Fix Programs from Error Messages: http://ai.stanford.edu/blog/DrRepair/

Cite this as: Roel Fledderman, "Cataloguing User Error Types on EIWL Exercises" from the Notebook Archive (2021), https://notebookarchive.org/2021-07-6g4gm4e