1. From the old to the new process of evaluation
In 2012 the new Italian Research Evaluation Process (VQR, Valutazione della Qualità della Ricerca) was launched. With this process Italy joined – after a long delay – the group of European countries that have decided to regularly assess the quality of their Universities and Research institutions. Under the aegis of ANVUR (the Agenzia Nazionale per la Valutazione dell’Università e della Ricerca), that has devised and guided the process, a committee was created for each research area to conduct the evaluation process. The process is now finished and, even if the final results are not public as yet, it is possible to analyse specific aspects of this significant event. Having been part of the group responsible for the scientific area 14 (Political and Social Sciences), I will describe the process and some of the problems encountered. I will also present some preliminary data for that which concerns the sub-area of political science.
The 2012-13 VQR has not been the first Italian attempt at evaluating Universities and research, since it was preceded by the Triennial Evaluation of Research, the VTR 2001-2003, conducted by another body set up by the Ministry of University, the CIVR (Comitato di Indirizzo per la Valutazione della Ricerca). This evaluation was completed at the end of 2005 and the final report published at the end of 2006. Unfortunately, instead of moving directly from the experiment to a stable process of evaluation, the whole thing stopped. Only some years later, with the creation of a new body – the ANVUR – and the formulation of new rules, could the evaluation process be resumed. This shows the difficulties that the Italian University system, and its highly centralised bureaucracy in particular, has in dealing with problems of evaluation and in setting up a regular process of assessment.
Leaving aside the organisational innovations, it must be noted that the new evaluation process significantly broadened its scope. Instead of a limited selection of research products, which each University or Research institution had to submit in each scientific area for the VTR 2001-2003, this time every permanent member of the scientific staff of Universities and Research institutions, Full Professors, Associate Professors and researchers. had to submit three products. presumably the best, published between 2004 and 2010. Thus, the amount of material to be examined increased dramatically as compared to the previous attempt. The goal was not individual evaluation of academic personnel, but to produce ‘institutional scores’ for Universities and Departments. In view of this choice individual evaluations will not be made public by ANVUR; only aggregate evaluations will be available.
For each of the 14 scientific areas defined by the Italian University system a group of evaluation experts (GEV) was created. In my case, I was nominated member of GEV 14, the group covering the history of International Relations, the history of political institutions, the history of Asia and America, political philosophy, political science and sociology. The group was composed of 13 members from the different sub-sectors represented. Two of them were from foreign institutions. The President nominated by ANVUR was Professor Ivo Colozzi, a sociologist of the University of Bologna. GEV 14 was then divided into two subgroups, one responsible for the broad sociological area, the other for political science, philosophy and the historical disciplines included in this area. Political science was represented by Jean Pierre Gaudin, Gianfranco Pasquino and myself.
The first step of our work was to articulate the evaluation criteria following the general indications of ANVUR and then to establish how the evaluation should be conducted. The second step was to recruit a sufficient number of referees to do the evaluations in full respect of the normal conflict of interest rules. Then the products had to be assigned to the referees and, when all the evaluations were completed, the institutional scores were to be calculated.
With regard to the evaluation criteria, the choice was to select the following: 1) scientific relevance, 2) innovation and 3) internationalisation. For each of these criteria, which were to receive the same weight, an A-D scale was adopted. The resulting scores were to be: excellent, good, acceptable and limited. The debate about the type of evaluation procedure to be adopted – bibliometric or peer review – was quickly solved in our area as only a small minority of products (approximately 280 out of more than 4000) were published in journals for which bibliometric (ISI or Scopus) data were available. All products had then to be submitted to two referees. It was, however, decided that – if possible – a double evaluation (bibliometric and peer) would be implemented with the purpose of comparing the results of the two methods. This exercise was, however, done only for study reasons and it was not bound to affecting the evaluation.
At this stage of the process it was also established which products could be considered acceptable scientific products. This was done both on the basis of the internal characteristics of the product and of the place of publication. For instance, a simple research report, or an article published in a journal of ‘general culture’ was to be rejected. Such cases were not so frequent, but still existed. For this purpose a list of scientific journals had to be defined and ANVUR established a ranking of their scientific quality.
2. The evaluation process
The recruitment of referees was probably one of the most crucial moments of the whole process. There was to be a sufficient number to handle all the products received, they were to be qualified to do the job, possibly there was to be also a good number of foreigners and finally they had to be willing to complete their task. The process was made more difficult because of the need to respect stringent rules on conflicts of interest. No referee was to evaluate any author coming from the same University, or any collective product where both author and referee were involved. These rules are quite obvious, but in some fields, such as political science, the relatively small number of Italian scholars, and their concentration in some universities, coupled with the need to find scholars expert in the specific sub-fields, made it sometimes very hard to find the suitable referees. Having a good number of foreign referees was potentially a solution to the problem of such conflicts of interest, but it proved to be in practice far from easy. Given the fact that a large majority of products were written in Italian, the referees had to have a sufficient command of this language and on top of this they had to be convinced to do a job for which the remuneration was more or less symbolic and to be willing to overcome some of the practical problems connected with the experimental nature of the ANVUR computer system in charge of handling the whole process of products distribution. In the end, only a relatively small group of generous foreign scholars could be involved. Their contribution was, however, very important.
At a given point of the process the products arrived. it was a bit like the moment when in ‘western’ movies a herd of recalcitrant cows coming from the prairies have to be corralled by the cowboys into the stables….. The procedure adopted to handle this stage was that each member of the GEV had to accept the responsibility for a share of these products and then proceed with the assignment to the referees. Each product had to be assigned by two different GEV members to a referee. The GEV members too had obviously to respect the conflict of interest rules. This means that they can not assign the products from their own University. The referees were also not to know which of the GEV members had done the assignment. As a consequence queries by the referees about a product were to be handled through a blind mechanism of communication.
This part of the process, computer aided and in itself apparently simple, proved in practice much more difficult than expected. It was soon, clear that the pool of referees selected in the beginning was too small for the number of products to be assigned. For the whole GEV 14 the products to be distributed were more than 4000; out of them 573 for political science. Given the need of two evaluations for each product, it meant 8000 evaluations for the whole GEV and 1146 for political science. Just for political science this number would require at least 55 referees and each one to evaluate about 20 products, mostly articles, but very often books. However, it was not only a matter of sheer numbers, it was also a matter of expertise. It is true that a majority of the products were in the mainstream areas of the discipline, but still a substantial number came from rather obscure and peripheral corners of research for which it was often rather difficult to find an expert. In addition to this, referees could obviously refuse an assignment. And, indeed, to a great extent they used this faculty. In the best of cases they refused explicitly, in the worst they did not answer. It became soon clear that the original number of referees selected had to be greatly enlarged, if the task was to be accomplished. And, also, the members of the GEV had in the end to review many products for which it was not possible to find a referee available. It must be mentioned, though, that a number of very dedicated referees accepted to do many more reviews than was originally envisaged.
Whenever the evaluations of the two referees coincided the process was concluded, but this situation was far from common and in a good number of cases the evaluations differed not only by one degree, but even by two or three. In these cases a consensus committee had to be established involving three members of the GEV and, eventually, a third referee to define the final score. Only by the end of April 2013 could the whole process be finished.
3. The products in the field of Political Science
Without waiting for a more systematic and detailed assessment, it might be interesting to anticipate a few remarks about the products submitted to evaluation. I will limit my analysis to the subsector of political science that was more directly under my attention. A first positive point that must be stressed is the breadth and variety of the themes covered. In spite of the still relatively small number of political scientists in the Italian academic system, the range of the subfields of the discipline covered by their research efforts is greater than could have been expected. The second point concerns the type of products submitted. The results, which are still provisional but not far from being final data, show that for political science (SPS04) ‘articles’ have become the largest group, followed by ‘chapters in edited volumes’ and by ‘books’ (Fig. 1). However, the two last categories together still represent the majority of products. Italian political science follows, probably with some delay, the trend prevailing abroad.
Types of academic products for class SPS/04.
With regard to the evaluation of the products from A to D, where A is excellent and D is limited, it is interesting to notice that articles score on average better than chapters in collective books, but not of books (Fig. 2, 3 and 4).
The evaluation of articles produced in the SPS/04 class.
Evaluation of book chapters in class SPS/04.
Evaluation of books in class SPS/04.
In order to assess more carefully the state of the discipline, all these data will require a much more detailed analysis and discussion within the professional organisation of the discipline. It would be, for instance, important to explore more in depth the degree of internationalisation of Italian political science by looking at the language and places of publications.
4. A final conclusion
A final note is required for the whole evaluation process. The process was far from perfect as many have noted. Some conceptual and organisational aspects will require careful review and discussion. It may be, for instance, debated whether it was wise to require such a large amount of products to be evaluated. The advantages and disadvantages of peer review also need to be better assessed. It is clear also that the job done by the referees was not always optimal for objective and subjective reasons such as the difficulty in finding enough expert referees for all products, the criteria of evaluation that were probably not sufficiently clear, the lack of professionalism in some referees, etc.. This being said, I think that the process was far from being redundant as it forced all the members of the academic community to think more carefully about their research achievements and because it will provide the Ministry of University, Universities and Departments with an instrument to assess the quality of academic life. The instrument can be improved, but this requires that evaluation does not remain an isolated case, but becomes a regular exercise.