An article by Monteleone, Panebianco and Zucchini appeared in May 2013 of Italian Political Science (IPS) entitled «Evaluating the Evaluation. The Pros and Cons of ‘VQR’ in Social and Political Research». The article exposes some general reflections about the benefits and limitations of peer review, indices of impact and bibliometric classifications.
The article does not examine, however, the evaluation work carried out by the political and social scientists who, in representation of “area” 14, participated as experts in the Groups of expert evaluators (Gev) constituted by Anvur (The National Agency for evaluation of University and Research). Gevs, as is known, were established to examine the thousands of works delivered by the scientific communities of the Italian university, in accordance with the provisions of the Call VQR 2004-2010. But at the time of the publication of Monteleone and associates’ article the Gev final evaluations were not yet available.
In the early summer of 2013, Anvur published on its website the final reports drawn up by all the Fourteen Gevs. This publication makes it possible to resume some observations contained in the Monteleone and associates’ article for outlying a few comments on the work carried out by the Gev 14, the group which examined the works of the Italian political scientists, along with those of practitioners in related disciplines. According to the Gev 14 report, the general performance of our community shows a «not very bright (final) result» (p. 63).
In these few lines I will focus on deficiencies issued during the process of evaluation that were caused, in my opinion, by the criteria and processes accepted by Gev 14 for evaluating the research of our scientific community. My analysis aims to discuss those criteria and processes for drawing the attention of the Italian political science analysts on to the need to promote some major improvements in the job of research evaluation.
These improvements will be more effective if they arise from a general debate that would involve the largest number of scholars possible. A broad participation is desirable in order to avoid a passive adoption of inadequate instruments of evaluation that could have a permanently negative impact on the way in which the scientific community will assess itself, on the allocation of ministerial funds to the university structures and on the reliability of judgments on the work of researchers.
The evaluation criteria
The three evaluation criteria on which Gev 14 made their judgments are known: scientific relevance, originality and degree of internationalization. The first two contain the risk that it is inevitable to assign challenging value judgments to people who do not necessarily represent the prevailing opinion among experts of a given scientific community. To minimize this risk, each of the three individual works forwarded by the Italian political scientists to Anvur was entrusted for evaluation to two different reviewers recruited by the Gev. In other words, the products conferred by each scholar to Anvur for the evaluation were examined by six different people and, in case of open conflicting judgments, entrusted to the final judgment of a consensus group specially drawn up by the Gev. In short, everything possible was done to minimize the risk of clearly discretionary evaluations.
The consequences resulting from the use of the internationalization criterion, namely “interest and international visibility” of the individual works deserves a different consideration. The negative consequences resulting from “adoption of such a criterion were such that – remarks Gev 14 – in many cases, the publication in the Italian language led to penalize – with a very low rating on the internationalization criterion – works that had gained the most in reference to the other two criteria” (p. 63). In other words, Gev 14 states that many excellent works were downgraded because they have no “international visibility”.
It is a shame that Gev 14 noticed only ex-post the risks arising from inadvertently endorsing the naif assumption that all that is written in the English language contains a surplus of ‘science’ that is missing in the works written in the Italian language. Prompt Gev 14 intervention at Anvur could have prevented the adoption of a criterion that has proved to be heavily penalizing towards the scholars of political and social sciences. On the other hand, Gev 14 could have warned the reviewers recruited for the peer review to utilize such a controversial criterion with due caution. No suggestion has been offered in this regard. Only after the trouble occurred, did the Gev 14 regret not having been able to operationalize a concept whose “difficult application” may have “put off track” the reviewers (p. 65). They, moreover, presented in many cases assessments without any reasoned argument, although it was possible to add comments to the judgment resulting from the sum of the partial judgments based on the three standardized criteria.
The final report does not provide indications for attempting a quantifiable statement of the damage resulting from the adoption of a controversial criterion and from superficial behavior shown by many reviewers in the formulation of judgments. In any case, considering that nearly 60% of the products conferred to Anvur for evaluation by scholars referring to political science – and 73.5% of all works related to the sub-sector political science – were written in Italian, one can imagine the scale of distortions generated by the criterion “internationalization”.
The evaluation process
The evaluation process focused on peer review and was accompanied, when possible, by bibliometric analysis. However, it appears – from the Gev 14 final report – that peer review, when applied to large numbers (the products referred to the SPS/04 sub-group were 503), it is difficult to govern because of the problems created by the distribution and return of the products to be evaluated.
Moreover, Anvur failed to ensure the anonymity of the authors of the products to be examined; this requirement – as is known- is essential for peer review that aims to be neutral. The immediate identifiability of the authors and their reputation, as well as the easy identification of the universities to which they belong, influenced, for better or for worse, the assessment on the individual products.
To all this, one must add the fact that Gev 14 renounced to raising the level of responsibility of the reviewers, who were mostly recruited with two short email lines. It lacked, as a consequence, an appropriate responsibility of reviewers which would have been even more necessary to solicit when it became obvious that the gap between the insignificant remuneration provided by Miur for each assessment and the considerable commitment arising from the evaluations would have discouraged many people from accepting the job or would have oriented them towards doing it ‘at ease’. Within the community of political analysts the refusal was 36% for the Italian reviewers and 47.7% for the foreigners. It indicates that in many cases the allocation was probably addressed to the wrong people. The errors of attribution would have been more limited if a systematic cross examination between the reviewer’s curricula and the content of the works assigned to them had been carried out. But the Gev 14 report suggests that in many cases it was not possible to comply with this procedure because the group of evaluators was overwhelmed by the large amount of works that were returned by reviewers on the grounds of not having enough time to carry out the task assigned to them (55.5% Italians, 37% foreigners).
In other words, the high amount of articles and books sent back to Gev 14 by irresponsible reviewers, coupled with the need to close the entire work of evaluation as scheduled, led in many cases to the allocation in emergency conditions of the products to be evaluated. This certainly has not helped the effective targeting of the products that in the end was focused on a more limited number of reviewers compared to those who at the beginning had offered their availability.
The narrowing of the actual number of reviewers contributed in many cases to the assignment of a number of revisions per person significantly higher than those initially distributed. As a result of this changed plan, 33.5% of Italian reviewers examined from 21 to more than 25 products each (p. 29), and time “in the final phase of the evaluation, had become very restricted” (p. 27). Without underestimating the important work that our colleagues have played, however, it is reasonable to doubt that even the more responsible reviewers have always been able to examine so many products in depth and arrive at a balanced judgment on each of them. These working conditions contributed, according to Gev 14, to “generate considerable variability in judgment on the products themselves” (p. 64) and made often necessary the intervention of the consensus group (see above) to reach the formulation of a coherent final judgment.
These remarks do not claim to offer a shared judgment on the quality of the results of the Gev 14, and even less on the work of all the Gevs. I, merely, intend to point out the shortcomings of a commitment that shows, in many other respects, relevant merits, as for the first time a massive, systematic and transparent process for assessing the quality of scientific research in our country was launched. On the other hand, the Gev 14 final report reveals several weaknesses that suggest the need to discuss an experience which has, in my opinion, a still experimental character.
The aggregated results need also to be evaluated with caution because the final rankings in the appendices which show at the top the university structures with the most brilliant performance appears biased by the fact that the statistical distribution of the data tends to favor the smaller structures against the largest ones, with the consequence of distorting the results of any list designed to identify and report the best scholars.