Automated Essay Scoring (AES) is an emerging area of assessment technology that is gaining the attention of many educators and policy leaders. It involves the training of computer engines to rate essays by considering both the mechanics and content of the writing. Even though it is not currently being practiced or even tested in a wide-scale manner in some classrooms, the scoring of essays by computers is fueling debate leading to the need for further independent research in order to help inform decisions on how this technology should be handled.
Most AES systems were initially developed for summative writing assessments in large-scale, high-stakes situations such as graduate admissions tests (GMAT). However, the most recent developments have expanded the potential application of AES to formative assessment at the classroom level, where students can receive immediate, specific feedback on their writing and can still be monitored and assisted by their teacher.
In classes such as math or accounting, these automated grading tools are already very common and take away a lot of work from the teachers. In many instances, these programs were highly criticized because it only allows a right or wrong answer without giving partial credit for right steps towards the answer. Also, the teachers do not get to know the student’s work as well anymore, which limits them in their ability to help each student with his or her individual weaknesses.
Numerous software companies have developed programs for written assessments that can predict essay scores by using correlations of the intrinsic qualities. First, the system needs to be trained on what to look for. This is done by entering the results from a number of essays written on the same prompt or question that are marked by human raters. The system is then trained to examine a new essay on the same prompt and predict the score that a human rater would give. Some programs claim to mark for both style and content, while others focus on just one or the other.
“I would not like automated grading in schools because I feel more comfortable with my professor seeing my essay in person and marking up my mistakes. There can also be a malfunction with an automated grading system that slows the grading system down,” said sophomore Alexander Yoo.
“Depending on which professor you have and for the type of class, it may limit the relationship with your professor. I personally like to talk directly to my professor about what I should edit rather than an automated grading system. By having an automated grading system, it will be easier for professors but will lack in teachers informing the student about extra content and creativity. The automated grading essay can’t possibly have everything a professor has to say because something may come up during the moment,” continued Yoo.
In terms of their reliability, the technology company Phillips cautions, to date, that there seems to be a lack of independent comparative research on the effectiveness of the different AES engines for specific purposes and for use with specific populations. While it would appear that one basis of comparison might be the degree of agreement of specific AES engines with human raters, this also needs to be scrutinized as different prompts, expertise of raters, and other factors can cause different levels of rater agreement.
AES has great potential. It can be more objective than human scoring because the computer will not suffer from fatigue or favoritism. Assessment criteria are applied exactly the same way whether it is the first or the thousandth essay marked on the same prompt. The potential for immediate feedback is also considered positively when AES is used as a formative assessment tool because it allows students to work at their own level and at their own pace receiving feedback on specific problem areas.
This rapid feedback also allows for more frequent testing leading to greater learning opportunities for students. By using computers to grade essays, the marking load of teachers is reduced, creating more time for professional collaboration and student-specific instruction. Since computers are being used more often as a learning tool in the classroom, computer-based testing places assessment in the same milieu as learning and provides more accessible statistical data to inform instruction.
Automated essay grading certainly has its critics, many of whom refute studies defending the technology and argue that students will never get a truly fair essay grade from a computer. For example, one group called Professionals against Machine Scoring of Student Essays in High-Stakes Assessment has collected more than 4,100 signatures protesting automated grading from educators and others across the country.
However, adopting AES in some schools requires a careful investigation of the potential threats. Some say that it removes human interaction from the writing process. According to the National Council of Teachers of English Writing, writing is a form of communication between an author and a specific audience, and using AES violates the social nature of writing. Other concerns raised are related to whether the systems can adequately detect copied, or nonsense essays. Currently, systems need to be trained by specific prompts. This limits the ability of educators to modify or create their own essay questions, potentially creating greater separation between learning and assessment. Additionally, implementing AES in schools involves not only the provision of access to computers and software, likely purchased from private companies, but also technical support and professional development to sustain its use.
In the end, the AES will never be able to fully replace human interaction between a student and a professor. Whenever there is an automatic system in place, students will find a way to trick it or get around it. Only time will tell whether automated grading will be able to replace teachers from grading essays.