Only one way to skin a cat? Heterogeneity and equifinality in European national innovation systems

One of the most significant results of the qualitative literature on national systems of innovation (NSIs) is that different systemic arrangements (i.e. configurations of actors and institutions) can deliver similar levels of innovative performance. Using factor analysis on a novel dataset of 29 quantitative indicators of innovative activities we provide an empirical characterization of the structure of European NSIs over the last ten years. Our results cast doubt on the empirical significance of the “equifinality” of heterogeneous systemic arrangements in the context of NSI. Innovation systems show inherent complexity, which leads to a high level of complementarity among their constituent components and configuration. This result implies that successful innovation policies should be systemic, leaving little flexibility in policy design and scope.


Introduction
The period since the mid-1980s has seen the emergence and consolidation within the broad field of innovation studies of a lively stream of work on national systems of innovation (NSIs). The main thrust of the NSI approach is that innovation and technical change are the outcomes of a complex pattern of interactions among a wide variety of actors such as firms, universities and government research institutes. Against this backdrop, the NSI literature argues that the interactions that take place within national boundaries are the most relevant. The popularity of the NSI concept suggests that it has provided policymakers with a seemingly highly effective analytical toolkit, and has contributed to putting innovation policies centre stage on growth agendas (Fromhold-Eisebith, 2007;OECD, 1997).
The results of empirical work on innovation systems have been recognition of the wide heterogeneity of "successful" NSI configurations (Nelson, 1993). As Nelson puts it: "[…] we, the authors, have been impressed by the diversity of 'national systems' that seem to be compatible with relatively strong, and weak economic performance" (Nelson, 1993, p.20). Far from endorsing an hypostatization of a linear model of innovation to serve as a blueprint accessible to every country (Balconi et al., 2010), the overall picture that emerges from Nelson's comparative study is one of a wide variety of institutional arrangements and policies. Countries as diverse as Denmark, Italy and the United States have developed original solutions and institutional instruments to foster innovative activity; this is a reflection in part of their idiosyncratic contingencies but also of different trade-offs among national policy objectives. Thus, from an early stage the NSI literature has been moving beyond the traditional benchmarking literature, represented by Organisation for Economic Co-operation and Development (OECD) studies of technology gaps in the 1960s (Godin, 2004), precisely by stressing how different systemic configurations can deliver similar results regarding innovation performance.
In other words, the interpretation of the qualitative and comparative evidence put forward in the NSI literature suggests that innovation systems are possibly characterized by strong equifinality, that is, similar outcomes or levels of performance can be achieved in different ways: "there are a variety of arrangements to achieve basically the same thing; a number of our studies when looked together, suggest that this is so" (Nelson, 1993, p.20). 1 Analogous to firms developing similar competitive advantages based on substantially different competencies (Eisenhardt and Martin, 2000) or organizational designs (Gresov and Drazin, 1997;Doty et al., 1993), countries also can achieve similar levels of innovation performance by leveraging various actors, and exploiting their different characteristics and configurations. This is mostly in line with Freeman's (1987) account of the original policies developed by Japan during the post-war period (ranging from the activities of the Ministry of International Trade and Industry (MITI) to the Keiretsu organizational model). Similarly, Lundvall (1992) identifies the building blocks of an NSI and how they can be arranged in different ways to yield effective performance results. For example, in the Italian case, the weak performance of the R&D systems of large corporations has been counterbalanced by forms of informal learning among small firms in the context of industrial districts (Malerba, 1993). Accordingly, the presence of equifinality provides policymakers with a variety of workable alternatives, and their task is to identify the most suitable configuration, taking account of their countries' specificities. From this perspective, the emergence of the NSI literature during the early 1990s represented a crucial shift from what Schot and Steinmueller (2018) 2 dub "framing 1″ innovation policy (the traditional linear model and market failure approach) to "framing 2″ policies, which conceive innovation from a systemic view.
Building on these contributions, we aim to test the empirical significance of the property of equifinality in the context of European NSIs in the early 21st century. More specifically, the presence of equifinality relates to the extent to which countries can compensate for their shortcomings in some specific dimensions by exploiting their competitive advantage in others, thereby achieving similar performances. In contrast, non-equifinality points to the existence of strong complementarities among the building blocks of an NSI (firms, universities, etc.) that call for a holistic approach to innovation policies to avoid neglecting core components of the NSI. Which of these two perspectives is more accurate is a research question that has important implications for policy.
To address these questions, we employ factor and cluster analysis and a newly constructed country-level database of innovation indicators covering the period 2000 to 2013. In line with recent work on measuring capabilities at the country level Srholec, 2008, 2015), we take account of a wide range of indicators measuring most of the variables that might affect the innovative performance of a country. We should acknowledge the limitations of our research exercise from the outset. Providing a characterization of NSIs relying only on quantitative indicators involves many conceptual and empirical difficulties (Archibugi et al., 2009). First, quantitative indicators might not provide the depth of understanding enabled by detailed countrylevel case studies. Second, as Jensen et al. (2007) point out, the DUI (doing, using, interacting) mode of innovation is inherently elusive and is not easily represented by quantitative indicators. Third, since innovation is a dynamic process characterized by multiple feedbacks, it is difficult to relate innovation indicators to specific phases of the inventive process or to specific types of innovative activities (Griliches, 1990). 3 To try to mitigate these issues, we do not limit our exercise to traditional country-level indicators but include in the analysis innovation survey data that provide a detailed characterization of the nature of the innovative processes in different countries at firm level (Mairesse and Mohnen, 2010).
Our results show that it is difficult to identify empirically different sub-components (building blocks) of the innovation system that may work as "substitutes" for the attainment of an effective innovation performance in different structural configurations. Our factor analysis extracts only one factor, thereby pointing to a strong degree of nonequifinality of NSIs. The consequences for policy are that effective innovation performance should be regarded as the outcome of the interactions among all the system dimensions. In other words, in the 21st century and in the context of innovation systems, there may be only one way to skin a cat.
We contribute to the empirical literature on NSI in several ways. First, from a theoretical point of view, we frame the NSI notion against the backdrop of the concept of equifinality. While equifinality is a concept used in the management and organizational literature, it is rarely employed to inform discussions of innovation policy. Second, we contribute to the relatively small literature that tries empirically to characterize innovation systems in a systematic way. We do this by combining country indicators with innovation survey data. Finally, the results of our empirical exercise confirm recent theoretical studies that argue for a holistic approach to the design of innovation policies at the country level.
The paper is structured as follows. Section 2 presents a review of the NSI literature; Section 3 describes the data and methodology used, and Section 4 presents the results of the empirical exercise. The paper concludes with a discussion of some policy implications.

From the "founding fathers" to the second-generation NSI
Despite the wide appeal of the notion of NSI for both scholars and policymakers, the concept remains elusive and difficult to articulate clearly, and even distilling an agreed definition from the literature is far from straightforward.
The NSI concept emerged gradually during the late 1980s within the evolutionary and institutional innovation studies tradition. Lundvall (2007) reports that the first explicit use of the expression "national innovation system" was in an OECD paper prepared by Christopher Freeman in 1982, and elaborated later in his influential analysis of the Japanese economic miracle after the Second World War (Freeman, 1987). The concept gained recognition and became consolidated a few years later following the publication of Lundvall's (1992) and Nelson's (1993) edited books. In recent years, there has been some criticism and questioning of both the relevance of a (nationally bounded) systemic approach to innovation in a world characterized by increasingly globalized value chains (Pietrobelli and Rabellotti, 2011;Szapiro et al., 2016), and of the operationalization of analysis of the role of the state within the NSI literature (Vertova, 2014).
On the definition and boundaries to an NSI, Soete et al. (2010) argue that the original NSI concept on which the literature is in broad agreement, can be understood in three ways according to the respective contributions of Freeman, Nelson and Lundvall. Table 1 summarizes the distinctive features of these three conceptualizations.
Freeman's work on Japanese technological catching up takes account of the role played by institutional embeddedness in affecting the innovative performance of agents. The focal interest is in the factors affecting the success of industry and innovation policies. Freeman adopts a "broad" conception of NSI that encompasses analysis of both formal and informal institutions (e.g. cultural and historical values) influencing learning and innovative processes. 4 The Japanese case emerges as a remarkable example of how relatively enlightened policymakers can formulate sensible industry policies while simultaneously avoiding the rigidities of a too invasive government intervention. Lundvall (1992) does not dismiss the relevant role of policies and institutions but puts more emphasis on the systemic nature of NSI. Knowledge should not be understood in static terms since knowledge production is an inherently complex learning process in which a wide variety of different types of agents (e.g. firms, universities, inventors, banks, users, etc.) are involved. In this perspective, a successful 2 For very useful critical appraisals of the Schot and Steinmueller (2018) contribution see Fagerberg (2018) and Giuliani (2018). 3 A classic example of this issue is the extent to which patents can be regarded as indicators of both innovative output and inventive activity, as we will discuss later. The blurring line between output and input innovation indicators is open to criticisms such as those raised against total factor productivity decompositions in growth accounting exercises. When interactions and feedbacks occur among the various components of a system it can be misleading to identify the "independent" contributions of specific factors.
innovation system fosters and exploits the learning processes emerging from the actors' interactions. Finally, in the "Nelsonian" tradition, the focus is on the formal R&D subsystem. Although Nelson and colleagues opt for a narrow definition of NSI focused on formalized inventive activities in firms and public organizations, they are careful to consider the importance of institutions and the broader system in which industrial and academic research is embedded. However, their focus remains primarily the empirical measurement of scientific and technological performance, 5 and leads to a framework of inquiry that reflects many facets of US experience (for a historical account, see Nelson and Wright, 1992).
This early literature conceptualizes NSI as a network comprising nodes (i.e. actors such as firms, universities, users, etc.) connected by multiple links. The most recent literature has moved away from this "structural" approach and emphasizes the role of innovation system functions and processes. Edquist (2005) argues that the early NSI literature should be seen more as a (descriptive) "approach" rather than an adequately formulated theoretical framework. Building on work by Liu and White (2001), Edquist calls for a more rigorous articulation of the NSI concept, and the development of a systematic list of activities (or functions) related to the creation and diffusion of knowledge (see Table 2 for a summary of the main features). In line with this reasoning, Bergek et al. (2008) discuss how NSI performance can be affected by certain critical functions. 6 Therefore, it is these tasks rather than the actors that should be at the core of NSI research and policy interventions. Bergek et al. (2008) provide a useful scheme of analysis that starts from a clear-cut division between the systems building blocks (actors, networks, institutions) and the functions that an innovation system ought to perform. This NSI conceptualization allows also for more straightforward comparison since different actors in different NSIs may carry out the same function.
The aforementioned two works were elaborated by Fagerberg (2016), who identifies five main dynamic processes affecting the performance of an innovation system. 7 His work differs from previous accounts of NSI in that he suggests that policies should target these five processes directly rather than the underlying structure. Furthermore, the existence of complementarities among system components implies that policymakers should coordinate interventions across all these domains using a holistic approach to innovation policy.

The empirical analysis of NSIs
The early NSI literature is mainly empirical and is based on detailed country case studies. Thus, its appraisals are mostly descriptive and qualitative. This approach is the consequence of its theoretical starting point: given the inherent complexity of innovation systems, "local search" based on comparative assessments was the most sensible approach to policymaking. As a result, there is a vast comparative literature on NSI, based mostly on binary comparisons. A perfect example here is the book edited by Nelson (1993), which provides detailed descriptions of NSIs in several countries and has become a cornerstone of the discipline. Similarly, in the context of developing countries, Dosi et al. (1994) use the NSI framework to compare the innovation performance of Latin America and East Asia. Using only few statistical indicators, they characterize the countries in two main geographical clusters and offer a qualitative comparison of their institutional arrangements. This type of thick description is not always appropriate for policymaking purposes since the abundance of detail on each country comes at the expense of comparability, and can result in a series of stand-alone results.
The limits of these ad hoc comparisons were highlighted by Patel and Pavitt (1994) who were probably the first to call for a stricter definition of NSI and its properties in the form of quantitative indicators to improve the empirical basis for understanding and evaluating national performance. However, the empirical operationalization of the NSI concept remains challenging for two main reasons. First, developing quantitative measures that effectively capture the rich institutional details discussed in the previous subsection is not trivial. Although there are some dimensions of NSI that can be measured (e.g. patents, R&D expenditure, education level, etc.), indicators capturing the "soft" part of the system (e.g. institutions, linkages, policies, and aspects that can be labelled social capabilities in the sense of Abramovitz, 1986) or DUI-modes of learning, are less susceptible to quantitative representation, especially in international comparisons involving a wide range of countries (Jensen et al., 2007).
Second, measurement of innovation processes can be difficult. The usual indicators such as R&D expenditure and patents give only a partial view of the characteristics of a country's innovative process. Innovation survey data partially overcome this problem by providing details of the innovation processes at firm level. Several statistical offices have introduced regular innovation surveys, and they are becoming increasingly comparable across countries (Mairesse and Mohnen, 2010).
Third, since a single indicator of innovation performance clearly cannot capture all the dimensions of innovative activities, the most recent literature focuses increasingly on the construction of composite indicators of innovation performance. Composite indexes are attractive to policymakers because they provide a synthetic (and easy-to-communicate) picture of the NSI (see Archibugi and Coco, 2005, for a description and comparison of several composite indexes of technological performance). However, composite indicators usually are built by Table 1 National systems of innovation theorizations of the founding fathers. Source: Our elaboration from Soete et al. (2010), Freeman (1987, Lundvall (1992) and Nelson (1993). Author Freeman ( Helix" model, which places government, universities, and firms (and their interactions) at the centre of the innovative process (Etzkowitz and Leydesdorff, 2000) 6 As Bergek et al. (2008) note, they are not the first to introduce the notion of functions in analysing innovation systems (see, for instance, Hekkert et al., 2007). However, they go much further than previous attempts in formalizing the list of functions and in applying them to the study of NSIs. 7 The sixth process refers to the influence from abroad. This is acknowledged to be important but is somewhat less central in policymaking since it is influenced by national decisions only to a limited extent.
combining basic indicators with somewhat arbitrary weights. This choice often is neither guided by theory nor justified on empirical grounds but responds simply to the necessity to provide a quick measure of a very complex phenomenon. Against this backdrop, Srholec (2008, 2015) propose a different approach. Rather than focusing on a limited set of variables, they assemble a relatively large set of indicators and use factor analysis to unravel the underlying data correlation structure and identify a reduced set of composite indicators (i.e. the factors). This procedure allows one to be agnostic about the ex-ante association among the variables, and to summarize attractively a large amount of information describing the overall system in a reduced number of components. They frame their exercise within the development literature comparing a large set of heterogeneous countries (e.g. sub-Saharan African countries, Scandinavian countries, etc.) along very different dimensions; they employ a wide array of indicators including civil rights, respect of private property, and political freedom. In their work, they find that the "innovation system" is one of four factors that emerge from the data (the other three being: "governance", "openness", and "political system"). Intriguingly, the "innovation system" factor identified using this method has the strongest association with GDP growth.

Structural heterogeneity and processes: a testable framework
As already noted, one of the main thrusts of the early NSI literature is the emphasis on a wide variety of possible configurations. This is probably the main take-away from the comparative analysis in Nelson (1993): the 15 countries analysed in that volume have not many characteristics in common. However, despite this heterogeneity, Nelson and other contributors to the volume point out that radically different innovation systems can be equally successful (Nelson, 1993). Borrowing the concept from general systems theory, we can interpret this as an implicit assumption that NSIs are characterized by the property of equifinality. In other words, countries can achieve comparable levels of innovativeness using specific solutions and institutional instruments that reflect their idiosyncratic situations, their history, and the tradeoffs between the objectives of national policies. Thus, rather than advocating a "one-size-fits-all" model, the early NSI literature stresses that different systemic arrangements can deliver similar innovation performance results.
Equifinality can also be regarded as the starting point of the secondgeneration NSI literature presented in Section 2.1. In this strand of work, functions and processes stem from the interactions among the structural components of the system; and while the configurations vary according to the solution adopted by each country, the functions are basically the same for every innovation system (Bergek et al., 2008). Thus, the contribution of the more recent literature is to emphasize a focus on greater comparability in relation to "what the system does" rather than how it looks. Obviously, this new characterization resolves the difficulty related to formulating policy prescriptions in light of a lack of best practice prescriptions related to the system configuration. For instance, Fagerberg (2016) shows clearly that adopting a dynamic perspective can result in clear policy suggestions valid for every NSI. Fig. 1 provides a representation that reconciles the two generations of the NSI literature and helps to set the boundaries to our empirical exercise. The central box shows the actors and their relational configuration. 8 The choice of these four actors is based on the literature review in Section 2.1 and is fairly straightforward. Indeed, the relevance of firms, government, and the scientific subsystem as key actors is also recognized in the innovation policy literature that is outside the strict NSI domain (Etzkowitz and Leydesdorff, 2000). 9 Furthermore, interactions among all the structural components, and with users, are fundamental sources of knowledge and learning in the system and constitute an essential part of the innovation process (Lundvall, 1992;Malerba, 1992).   (2000), Lundvall (1992) and Fagerberg (2016). 8 In Fig. 1 we do not display any particular configuration and are agnostic about the presence or absence of specific links among actors. 9 In contrast to the Triple Helix model we do not focus only on universities.
We prefer to refer to the scientific subsystem rather than universities. This allows for a more comprehensive characterization of other institutions involved in scientific research and advanced training.
From this central structure stem the five processes described in Fagerberg (2016) and represented in the surrounding boxes. Those generic processes are the outcome of the interactions among the different actors composing the bulk of the NSI. For example, skills are provided by education institutions at various levels (government and scientific subsystem). Similar arguments hold for knowledge, demand, institutions and finance. Taken together, the five processes determine the technological dynamics of the entire system.
In this framework, countries with similar levels of innovative performance have similar "levels" of the five processes; however, these can be achieved by different actor configurations. The NSI literature emphasizes the degree of heterogeneity (and, therefore, equifinality) at the level of the central large box, rather than outside it. Given the early finding that different configurations can be equally successful, the most recent theoretical contributions continue to assume the feasibility of heterogeneous arrangements. What is interesting in our conceptualization is its empirical testability. Assuming that we can collect sufficient indicators to describe the structural components of the core adequately, we can offer an empirical characterization of the heterogeneity and, therefore, test for the presence or not of equifinality. The challenge here is an empirical one due to the difficulty inherent in adequately capturing the systems institutional factors and "soft" components. However, several organizations have made systematic attempts to measure these variables, which has made our task possible (e.g. MERIT, 2016, andWEF, 2016).

Data and methods
In the remainder of the paper we test the assumption of equifinality, and see whether it is possible to highlight the existence of different "varieties of NSI" across European countries. We employ our framework and a wide range of indicators to describe the four key actors in the system highlighted in Section 2.3. Following Fagerberg and Srholec (2008), we use factor analysis to extract the main NSI dimensions, which should allow us empirically to characterize each NSI along the dimensions extracted, and to provide a quantitative reconstruction of the heterogeneity emerging from more qualitative accounts.

Dataset construction
Our empirical analysis of NSI is based on an original country-level dataset including different dimensions of innovative activities. As explained in Section 2.3, since we are interested in reconstructing the possible heterogeneity of the actors and structural configurations, we include only those variables related specifically to the system actors depicted in Fig. 1. This represents an important difference between our analysis and the one in Fagerberg and Srholec (2008), which aims to capture a broader spectrum of country characteristics and, therefore, includes several variables related to the political system and social values. Table 3 lists 29 variables characterizing the innovation system and its association with the NSI actors and structures as presented in Fig. 1. For each actor, we collect two types of variables. First, we use indicators describing its features as a building block (or node in the network literature) of the system. We use italics to highlight the indicators related to the interaction among actors characterizing the structural components of the system. 10 An example here is university-industry collaboration, a variable imputed to both scientific systems and firms since it measures the strength of the link between them. Note that our dataset includes indicators either not previously considered in NSI studies, or discarded because they refer only to short time spans (for details see Appendix A). Examples of such variables are: the Hidalgo and Hausmann (2009) indicator of economic complexity, the results of the World Economic Forum (WEF) Executive Survey and indicators from the Global Competitiveness Index dataset (WEF, 2016), and the Institutional Profile Database (MERIT, 2016). Finally, a novelty of our exercise is that it includes variables retrieved from innovation surveys conducted in various countries, which enrich the empirical characterization of the country-level innovation process. Variables such as the percentage of firms stating that they had introduced a product or process innovation, combined with their declared relevance of internal and external sources of innovation, allow integration in the analysis of a description of firm-level innovation processes (Mairesse and Mohnen, 2010). This partially overcomes the limitations of currently existing indicators that tend to refer to "technological capabilities" rather than real "innovative capabilities" (Archibugi et al., 2009). 11 We assemble a comprehensive dataset encompassing all the dimensions emphasized by the structural literature on NSIs. As shown in Table 3, we collected indicators on the many dimensions through which the government influences innovative activities. Similarly, variables on the scientific subsystem provide a description of the relative specialization in scientific disciplines, as well as the quality and the interactions of the research system with the business sector. We also considered the various peculiarities of the productive subsystem: wage shares, economic complexity (Hidalgo and Hausmann, 2009) and patterns of innovation are usually thought to reflect different "varieties of capitalism" (Soskice and Hall, 2001) and thus, a fortiori, different NSI configurations. Finally, we looked at the fundamental role of users (and customers) in innovation dynamics (Lundvall, 1992), considering the degree of social dialogue as well as the possibilities they have to actively be part of the innovation process (as proxied by their access to Note: Indicators in italics refer to more than one actor and stem from their interaction. For sources and descriptions of indicators, see Appendix A. IPRs stands for intellectual property rights. 10 Since these variables refer to the structure of the link they can be associated with multiple actors. 11 In their thorough assessment, Archibugi et al. (2009) emphasize the shortcomings of several indicators commonly used to gauge innovative activities at country level. In particular, they note that most are better suited to capturing the technological aspects of innovation, while neglecting the nontechnological side of innovation capabilities (interactions, learning, etc.).
finance and IT infrastructure). Our variables are associated with the actors and the structure of the NSI and not with specific phases of the innovation process. In fact, the distinction between inputs and outputs can be blurred in the context of dynamics processes, such as innovation activities, characterized by significant feedbacks (Aghion et al., 2009). An example here is the number of a country's patent applications in a given year, which is frequently used to proxy for innovation output (performance). In fact, patents are not just an output of innovative activity, they are also a source of knowledge (input) for innovation (see, among others, Galasso and Schankerman, 2015). Moreover, the propensity to patent can be at least as significant as a synthetic description of some specific features of inventive activity as an indicator of innovation output. In 2000, Germany applied for almost eight times as many patents to the US Patent and Trademark Office (USPTO) patents as Italy, despite having similarsized manufacturing sectors. 12 While this disparity hints at differences in performance, it also demonstrates the different patterns of innovation prevailing in these two countries: patenting is inherently more appropriate to protect product innovations, and its use differs across industry sectors (Cohen et al., 2000) and across countries, and especially countries characterized by different firm size distributions (Malerba,1993). 13 Our 29 indicators cover the period 2000-2013. In our empirical analysis we do not use yearly data but focus on data for two benchmark points in time. The first data point is the average of 2004, 2005 and 2006 yearly data; the second is the average of the yearly data for the years 2010-2012. The choice of these time-periods is determined by the timing of the available innovation surveys. Although there is some coordination among the innovation surveys conducted in Europe, this is not the case for the Latin American countries (which we employ for robustness checks) or Japan. Data still missing after the three-year averages were estimated using information available for other indicators in the dataset using the imputation procedure in Stata 13 software. A description of each indicator and its time span, and the imputed fraction of the data are provided in Appendix A; descriptive statistics of the sample over time are provided in Appendix B.
The geographical scope of this analysis is the European Union, 14 with the addition of two technological leaders -Japan and the USA -to provide an additional comparative perspective. The resulting dataset includes 33 countries. One advantage of a focus on European countries is that it provides a sample that is simultaneously comparable and heterogeneous. Indeed, Archibugi et al. (2009) argue that comparative assessments of innovative activity should be limited to countries with broadly similar features. In the context of patents, the gap between Italy and Germany might hint at different specializations and innovation modes. In contrast, the much wider gap between Germany and any developing country of a similar size can hardly be a meaningful proxy for a different NSI configuration. In capturing only the considerable differences in wealth and economic development, such a comparison would not be informative. However, a focus on the European Union is interesting because of the heterogeneity of innovative performance across European countries, which means one cannot speak of a "European system of innovation" (Borras, 2004).

Method: factor analysis
We carry out an exploratory factor analysis to condense the maximum amount of information available from the dataset of 29 indicators into a reduced number of composite variables. Factor analysis is an explorative and unsupervised technique that employs the commonalities (shared variance) of the original variables to reveal the latent factors. The method assumes an underlying causal model to identify a limited number of factors that linearly reconstruct the original variables and are able to account for common variance among the observations (Bartholomew et al., 2008).
The eigenvalues associated with each potential factor indicate the share of variance they encompass. The eigenvalue (or scree) plot is used to decide how many of the factors emerging from the analysis need to be retained. Usually, identification of the relevant factors is carried out using a rule-of-thumb or an "elbow" criterion. This consists of identifying the point where the slope of the curve in the scree plot levels off. This point, which resembles the shape of an elbow, indicates how many factors are needed. The relationship between each variable and the underlying factor is called the factor loading. We compute factor loadings for each variable using squared multiple correlations as estimates of commonality in line with the literature (see Friedman et al., 2001, for a detailed treatment of unsupervised techniques). In turn, factor loadings are necessary to obtain factor scores, which are the values taken by the observations when scored on the factors extracted.
Specifically, we perform a factor analysis using the principal-component factor method on the standardized data matrix, and create factor scores using the Thurstone method. The Thurstone scoring method defines factor scores as the product of three terms: the factor loadings matrix, the inverse of the data covariance matrix, and the data vector of interest (Estabrook and Neale, 2013). Finally, we validate the factor analysis by performing the Kaiser-Meyer-Olkin (KMO) and Bartlett test of sphericity. Bartlett's test verifies the null hypothesis that the sample correlation matrix is an identity matrix, which would indicate that the selected variables are completely unrelated and, thus, have no common factors. Similarly, the KMO test provides a measure of the suitability of the sample for factor analysis based on the degree of internal correlations. In our case, despite significant correlations among the variables (see the correlation matrix in Appendix B), the KMO measure is 0.495, which is close to the threshold usually adopted to assess sample suitability for factor analysis. The low level for the KMO test is driven mainly by three variables that have low correlations with the other variables: namely, firms' market capitalization, social dialogue, and wage share. If we exclude them, the KMO value becomes 0.71, which is very acceptable. However, we decided to apply the factor analysis to the whole sample since we are interested in every indicator of the NSI structure. Moreover, the result for Bartlett's test of sphericity is highly significant even when we consider the whole sample (pvalue < 0.0001, reject H 0 ). Rather than excluding these three variables, we perform extensive robustness checks to account for the peculiar pattern of some indicators. Appendix D presents the robustness checks for the factor estimations.

Empirical characterization of NSIs
Fig. 2 depicts the eigenvalues of the factor analysis performed on the first period of our sample. Using the elbow criterion we find that one factor explains most (47.5%) of the variance. Our statistical method identifies only one latent variable. The loadings of this single factor are presented in Table 4 where the variables within each actor are sorted by the highest loadings. Almost all of the most relevant and characteristic indicators for NSI actors (see Table 3) load heavily onto this factor.
Rather than finding different synthetic dimensions for the structural components depicted in Fig. 1, our factor analysis suggests that a single factor seems simultaneously to capture all the relevant characteristics 12 To be precise, the World Bank (2016) Development Indicators document 51,736 and 7,877 patents applications from German and Italian inventors respectively. See Appendix A for the sources.
13 Malerba (1993) shows that the most dynamic part of the Italian NSI arguably is composed of small and medium-sized enterprises in the mechanical sector, which are often characterized by appropriability strategies other than patenting (Pavitt, 1984). 14 We also include in the analysis Turkey, Serbia, and Norway, since innovation survey data for these countries are comparable and available from the Eurostat database. of an innovation system. 15 Starting with the descriptors of the scientific subsystem, both quality and science and technology specialization of tertiary education load highly on that one factor. In addition, it accounts for the specificities of firms and their innovation patterns. It is interesting to see that the single factor also captures collaborations between universities and firms, and the high-tech specialization in the productive system. In the case of users, this one factor is clearly related to the diffusion of information and communication technology, and also proxies for the availability of finance are correlated. Finally, all the institutional variables we included to account for the different roles played by governments are also related to this factor. Before providing an interpretation of the factor in the context of our framework presented in Section 2, we explore its characteristics along several dimensions. We checked the robustness of the factor analysis in three ways. First, there could be a plausible concern over the relatively small sample size compared to the large number of indicators. This is generally described as "large p, small n", where n is the number of observations and p is the number of variables. To deal with this, we applied a shrinkage technique to estimate the empirical variance matrix (see Schafer and Strimmer, 2005, for an overview).
This family of procedures strengthens the variance-covariance matrix by inflating its diagonal, making the factor analysis less prone to small sample instability. The results replicate those presented in Table 4 and Fig. 2, and again lead to a one-factor characterization. 16 Second, it could be argued that the single factor emerges because of lack of variability in our sample; we extended the countries considered to check whether our results hold. It is possible that, despite qualitative and anecdotal evidence to the contrary, the configurations of European countries are too similar. Thus, we included some Latin American countries 17 to explore the sensitivity of our analysis to the inclusion of middle-income countries with completely different institutional arrangements. Including Latin American countries is a meaningful expansion of the sample given the comparability of their innovation surveys (Bogliacino et al., 2012). The results of the factor analysis including the Latin American countries are presented in Appendix D. Again, the results are consistent with our original findings, showing that the inclusion of countries with different institutional and structural configurations does not change our result regarding the influence of a single factor. Finally, the results might be affected by the inclusion of countries of very different sizes. We controlled for this by using    Turchin et al. (2018) in the context of a study of the long-term evolution of human civilizations on a global scale. Following an empirical approach very similar to ours, they collected a large sample of data for societies all over the world and employed Principal Component Analysis to investigate the internal correlation structure of their indicators. Their statistical analysis shows that a single common component is able to account for most of the observed variation. According to Turchin et al. (2018), the result is only apparently surprising, since it actually implies, rather plausibly, the existence of strong complementarities among the indicators employed due to the very high social complexity of the phenomenon investigated. 16 To perform this test we employ the R package ShrinkCovMat described in Touloumis (2015). The optimal shrinkage intensity is determined by the package as being 0.15. We then performed the factor analysis on the shrunk covariance matrix and obtained almost the same results with the only difference being a slightly flatter scree plot that further highlights the relevance of the first factor. The results are presented in Appendix D. 17 These include Argentina, Brazil, Chile, Colombia, and Uruguay.
indicators expressed in per capita terms (Fagerberg and Srholec, 2008); however, we ran an additional check on whether the sheer size of the NSI might be at the root of our findings. We ran a weighted factor analysis using each country's population size as the weight, thereby giving more relevance to larger countries. The only appreciable difference was for the KMO test whose results become acceptable but the pattern matrix does not change. The results were the same when we excluded the smallest countries (Estonia, Latvia, Malta, Cyprus, Luxembourg) from our sample. Given that the one-factor description seems accurate and robust against a number of checks, we examine how countries score on this single dimension. This involves computing factor scores for each observation (i.e. country) in the dataset. We exploited our second period data to assess the stability of the components and whether countries tend to maintain the same position relative to the others.
The time comparison is not straightforward since we need to maintain strict comparability over time. For instance, running two separate factor analyses for the two periods, and comparing their ranking could lead to factor scores that are not comparable. 18 Therefore, we ensured comparability among factor scores over time by using the factor loadings computed on the first period data (Table 4) to extrapolate the scores for period 2. We multiplied the factor loading matrix computed in period 1 with the inverse of the data covariance matrix and the data vector of interest in period 2 (Di Stefano et al., 2009). On theoretical grounds, this is equivalent to assuming that the underlying structure captured by the factor loadings is invariant over the periods considered. This assumption is reasonable on the basis of the very short time span chosen, which hardly allows for major changes in the set of structural indicators considered. However, we ran the factor analysis only on period 2, and again found only one factor and with very similar loadings, but whose small differences in magnitude do not allow comparability over time. Our approach differs from that of Fagerberg and Srholec (2008), who prefer to perform the factoring procedure on the two periods jointly, therefore treating each country's time observations as a different unit in the pooled dataset. There is a caveat to this procedure that is acknowledged by the authors; that is, loss of perfect temporal comparability since preliminary standardization of the data (a necessary step in factor analysis) in a pooled dataset makes it impossible to distinguish variability among countries from time variations within countries.
The results of the time comparison are reported in Fig. 3. The plot shows that the scores are stable over time: countries tend to lie very close to the 45-degree line, thus showing little variation. Given the relatively short time span of our analysis, this is not so surprising since it can take a long time for the actors and structure of the core of the NSI to change. Again, this stability suggests that the factor does indeed capture the structural part of the system.
If we look at country dispersion along the bisector line, a degree of country heterogeneity emerges. Fig. 3 shows that countries score differently for the unique factor and its variability over time. Some countries register high scores for that single factor but with small variation over time. Other countries' scores change over time although their original absolute values were small. The analysis of both factor scores on the single factor, and factor variation over the two periods, highlights heterogeneous patterns among countries. To investigate the strength of these differences we perform a two-step clustering procedure using as clustering variables the factor score in period 1 and the difference in factor scores between the two periods (see Appendix C). This identifies groups of countries with similar dynamics over time. Both the Duda-Hart index and Calinski-Harabasz's pseudo-F stoppingrule suggest retaining two clusters. We test the null hypothesis that the two clusters have equal means with respect to the factor scores, using the MANOVA tests (Wilks' lambda, Lawley-Hotelling trace, Pillai's trace, Roy's largest root). All four tests reject the null hypothesis and confirm that two clusters are representative of our sample. Table 5 reports the list of countries in each cluster. The results are depicted in Fig. 4, which indicates that the Laggard cluster shows a larger variation in the difference between the factors in the two periods. In order to validate the clusters, Table 6 presents some descriptive statistics of the relevant variables in the two groups. Countries in the cluster with higher factor scores have NSIs characterized by farsighted governments that foster innovation via procurement policies. In the case of firms, the large difference in product innovations is not matched by differences in process innovations; the groups mean are within one standard deviation. The innovation systems of the leading group also have better links among the key components: stronger collaboration between firms and universities, and greater social interactions, are likely to be conducive to innovation.
We can also see that the NSI characterization emerging from our empirical exercise presents systemic complexity but not in line with the traditional literature on the NSI published in the 1990s. Instead of finding a consistent number of relevant dimensions that are strictly interrelated in complex ways, we find a single (but highly multidimensional) factor. However, the inherent complexity of the system is not reduced: the several aspects of innovation systems proxied by our choice of indicators are found to be so closely interrelated that it is almost impossible to identify individual building blocks separately from the overall system. In this perspective, the NSI resembles an emergent complex adaptive system that cannot be reduced to the sum of its parts (Ladyman et al., 2013). The result of our analysis is very similar to an empirical study by Turchin et al. (2018), who found that social complexity in the long-term evolution of human civilizations can be well captured by a single principal component of variation. In both cases, the components of the social systems analysed show complementarities and interconnections so strong that it is almost impossible empirically to decompose them. Overall, our finding challenges the extent of multidimensionality and heterogeneity among NSIs that would be expected from traditional descriptive accounts. Thus, our findings suggest that according to the quantitative account provided, the property of equifinality of the innovation system does not hold. We discuss the policy implications of this finding in the next section.

Discussion
Our result of NSI non-equifinality offers an interesting perspective on the theory and scope of innovation policies. Our factor analysis showed that a single indicator provides a comprehensive characterization of the actors and their relations. In our interpretation, this cannot be understood in terms of limited complexity of the innovation system; rather, the structure of the system is so tightly interrelated that the system's emergent properties cannot be characterized in terms of relatively independent sub-components. If we regard NSIs as competing entities, their characterization by a single synthetic factor can be considered as the outcome of a process of competition taking place among all countries on a level playing field. Furthermore, this level playing is challenging and selective, with no shortcuts or protective niches, and where selection takes places within a single dimension that encompasses all the relevant dimensions of a country innovation system. Using an evolutionary analogy, the NSI competitive environment would seem to be more similar to the North Pole than to the Galapagos Islands. Darwin pointed out that the mild climate and friendly environment of the Galapagos allowed a large variety of alternative species to survive and thrive. However, a very selective environment does not allow for the generation of significant variety. Above the Arctic Circle, only a limited set of evolutionary traits such as thick fur or dense layers of insulating feathers can ensure survival. This does not exclude the possibility of equifinality entirely, but it certainly reduces its scope.
The policy implication of this lack of substitutability is less consolatory in tone than the message in the original NSI literature. The existence of a variety of successful configurations for the design of an innovation system is one of the most (ab)used policy implications in the NSI literature. However, the results of our exercise suggest that countrylevel innovation policies should be holistic. Specifically, countries keen to improve their innovative performance should rely on comprehensive and integrated policies that affect all the actors in the system, rather than on ad hoc interventions focused on a specific actor or issue.
The NSI literature is too often understood as justifying any system configuration as potentially successful. According to Patel and Pavitt (1994), this desirable variety is not justification for complacency in the face of weak performance along specific dimensions. A good example here is the history of Italian industrial districts. For too long in Italy, the structural fragility of the science and technology base was overlooked by policymakers because of the conciliatory narrative of the innovative dynamism of small and medium-sized firms (see Nuvolari and Vasta, 2015, for a historical perspective). In this respect, our finding provides empirical support for recent work on innovation policy that advocates a "systemic approach" (Fagerberg, 2016;Edler and Fagerberg, 2017) in contrast to earlier studies, such as Woolthuis et al. (2005), which outline policy prescriptions that tackle the failure of the specific structural components of NSIs in an ad hoc way. In this way, our work is related to some recent contributions on innovation policy (Edler and Fagerberg, 2017;Steinmueller, 2010) that acknowledge the need for a narrow, market-failure approach to be abandoned in favour of a comprehensive and holistic perspective, including mission-oriented policies with clearly defined goals, provided they are able to trigger developments on a broad front. Although their goals are narrowly defined, some of the best known and most successful mission-oriented policies are very broad in scope involving several actors and exploiting their complementarities in multiple ways (see Mowery, 2011, for a description of US federal policies for the semiconductor industry and Mazzucato, 2013, for the ARPA Project).

Conclusion
The aim of this paper was to carry out an empirical investigation of the innovative process in European economies with a specific focus on testing the property of equifinality in the NSI structure. In particular, departing from the qualitative literature on the existence of a variety of successful NSIs, we applied a data-driven approach to explore the underlying structures that might be related to the NSI concept.
We reviewed the notion of NSI highlighting the salient features of the three conceptualizations of NSI in the seminal contributions of Nelson (1993), Freeman (1987 and Lundvall (1992). We further explored the differences identified by Soete et al. (2010) and compared them to what we call the "second generation" NSI literature. This more recent body of work calls for a shift in attention from the NSI's structural characteristics (i.e. actors and configuration) to its functions. Building on these two literature waves, we operationalized their main tenets into a synthetic framework that seeks to provide a suitable empirical characterization of the main activities of the actors of national innovation systems. In particular, more recent studies put forward a more articulated description of NSI, but they tended to neglect comparative quantitative evidence. We proposed a way to test the equifinality of heterogeneous NSI configurations and our findings resonate better with the notion of tight complementarity between the functions of innovation systems. Note: For details of indicators, sources and the scaling, see Appendix A, Table A2.
The empirical exercise was carried out on a novel dataset and 29 indicators for 33 countries (mostly European, plus the US and Japan as a benchmark) between 2000 and 2013. The data were retrieved from a variety of sources ranging from traditional innovation indicators to less common institutional variables and innovation surveys. The aim was to capture all the dimensions relevant to European national systems, and to identify common underlying factors. The exploratory factor analysis highlights that a single factor is sufficient to account for a large part of the common variance in the indicators with the remaining factors being only marginally relevant. The identified factor loads heavily on all the measures related to the characteristics of the actors in the system and their configuration, and links different modes of innovation, interactions and institutional arrangements. This result is robust to a number of statistical checks. The existence of a single factor that is able to summarize effectively all the relevant dimensions of an NSI is interpreted as evidence of strong complementarity rather than substitutability among all NSI actors and configurations. Finally, the dynamic analysis shows strong stability of the factor, confirming that it captures the structural part of the system.
Our results have important implications for policy. The high level of complementarity among the actors and their configurations indicates that innovation policies should be systemic. This means that a good policy design should be broadly based and refer to several actors. This finding contradicts the previous qualitative literature that suggested the existence of a variety of configurations of a successful innovation system, while it lends support to recent theoretical studies calling for holistic innovation policies. Despite making several contributions to the theoretical and empirical literature on NSI, our analysis has some limitations, the most important being the extent to which our data cover all aspects of the structural core of the NSI. Jensen et al. (2007) argue that any analysis that employs quantitative indicators will be biased toward measuring '"science-technology-innovation" type learning. However, to our knowledge, our work uses the most comprehensive data available on countries' innovative activity and integrates data sources ignored by previous studies. Another limitation of our analysis is that our data refer only to the most recent period, which means that our result concerning the limited equifinality of European innovation systems may not be an accurate representation for the pre-2000 period. If this is the case, our paper highlights a newly emerging feature of the European economy rather than a persistent trait in the evolution of European innovation systems. In this perspective, our paper points to a major policy challenge at European level. The holistic approach to innovation policy we have mentioned above requires largescale investments on a broad front for most of the countries in the European periphery. Therefore, in order to avoid a deepening of economic divergence within Europe, it is crucial that economic policies at European level ensure an adequate breathing space for these kinds of investments in country budgets. Table A1 reports the sources of the surveys' data and the concordance between the surveys' timing and the two periods considered in the analysis. Table A2 reports information about the data used for the factor and cluster analysis. The final column reports the number of missing observations.  Lack of data has partially conditioned our choice of indicators, but considering only indicators with complete data for each country would have resulted in a much more limited dataset. Instead of resorting to listwise deletion or further reducing the number of countries considered we followed Fagerberg and Srholec (2008) and used an imputation technique (Rubin, 1987). The overall number of missing values was slightly more than 6% of the dataset. Missing observations were estimated using the regression-based technique implemented by the mi impute command in Stata 13. We performed 20 imputations for each missing data and then used their averaged values to balance the final dataset. In a handful of cases the value was negative for indicators truncated at zero, so we replaced them with the minimum observed value for that indicator.

Appendix B. Descriptive statistics
Tables B1 and B2  Note: For details of indicators, sources and the scaling adopted, see Table A2.

Table B2
Correlations among variables.   Note: For details of indicators, sources and the scaling adopted, see Table A2.

Appendix C. Cluster analysis
For the clustering analysis presented in Section 4, we use as clustering variables the factor score of each country registered in period 1 and the difference between factor scores in periods 1 and 2. The procedure followed to obtain factor scores for period 2 and to assure fully comparability over time is explained in Section 4. The two-step approach allows us to first conduct a hierarchical procedure to detect the number of existing groups, followed by a non-hierarchical clustering method. The hierarchical agglomerative procedure facilitates the assessment of the number of subgroups in our sample. We use a single linkage clustering based on Euclidean distance to measure the distances between objects. Selected clusters are those minimizing the increase in total sum of squares across all variables in all clusters. The Calinski-Harabasz pseudo-F stopping-rule index helps to identify the correct number of groups in the sample. Then, we perform a non-hierarchical clustering procedure based on k-means method. The nonhierarchical procedure assigns objects into clusters given a fixed number of groups. The advantage of k-means algorithm is to divide data into the number of clusters detected in the first hierarchical analysis and then iteratively reassigning observations until the distance of observations within clusters is minimized and the distance between clusters is maximized. We try to perform the analysis with both specific cluster seeds and without assignment (random selection performed in Stata). However, the k-means method using randomly selected starting points seems to be quite weak compared to the selection of k starting points (De Jong and Marsili, 2006). Therefore, we decide to use the centroids of the initial hierarchical solution (k = 2) as starting points. Finally, we perform a MANOVA test in order to assess clustering variables validity and cluster stability as post estimation check.