Support Us

You are browsing the archive for open data.

Dutch PhD-workshop on research design, open access and open data

- February 1, 2013 in Economic Publishing, EDaWaX, External Projects, Featured, Open Access, Open Data, Open Economics

This blog post is written by Esther Hoorn, Copyright Librarian, University of Groningen, the Netherlands.

If Roald Dahl were still alive, he would certainly be tempted to write a book about the Dutch social psychologist Diederik Stapel. For not only did he make up the research data to support his conclusions, but also he ate all the M&M’s, which he bought with public money for interviews with fictitious pupils in fictitious high schools. In the Netherlands the research fraud by Stapel was a catalyst to bring attention to the issue of research integrity and availability of research data. A new generation of researchers needs to be aware of the policy on sharing research data by the Dutch research funder NWO, the EU policy and the services of DANS, the Dutch Data archiving and networked services. In the near future, a data management plan will be required in every research proposal.

Verifiability

For some time now the library at the University of Groningen is organizing workshops for PhDs to raise awareness on the shift towards Open Access. Open Access and copyright are the main themes. The question also to address verifiability of research data came from SOM, the Research Institute of the Faculty of Economics and Business. The workshop is given as part of the course Research Design of the PhD program. The blogpost Research data management in economic journals proved to be very useful to get an overview of the related issues in this field.

Open Access

As we often see, Open Access was a new issue to most of the students. Because the library buys licenses the students don’t perceive a problem with access to research journals. Moreover, they are not aware of the big sums that the universities at present pay to finance access exclusively for their own staff and students. Once they understand the issue there is a strong interest. Some see a parallel with innovative distribution models for music. The PhDs come from all over the world. And more and more Open Access is addressed in every country of the world. One PhD from Indonesia mentioned that the Indonesian government requires his dissertation to be available through the national Open Access repository. Chinese students were surprised by availability of information on Open Access in China.

Assignment

The students prepared an assignment with some questions on Open Access and sharing research data. The first question still is on the impact factor of the journals in which they intend to publish. The questions brought the discussion to article level metrics and alternative ways to organize the peer review of Open Access journals.

Will availability of research data stimulate open access?

Example of the Open Access journal Economics

The blogpost Research data management in economic journals presents the results of the German project EdaWax, European Data Watch Extended. An important result of the survey points at the role of association and university presses. Especially it appears that many journals followed the data availability policy of the American Economic Association.

[quote] We found out that mainly university or association presses have high to very high percentages of journals owning data availability policies while the major scientific publishers stayed below 20%.

Out of the 29 journals with data availability policies, 10 used initially the data availability policy implemented by the American Economic Review (AER). These journals either used exactly the same policy or a slightly modified version.

For students it is assuring to see how associations take up their role to address this issue. An example of an Open Access journal that adopted the AER policy is Economics. And yes, this journal does have an impact factor in the Social Science Citation Index and also the possibility to archive the datasets in the Dataverse Network.

Re-use of research data for peer review

One of the students suggested that the public availability of research data (instead or merely research findings) may lead to innovative forms of review. This may facilitate a further shift towards Open Access. With access to underlying research data and methodologies used, scientists may be in a better position to evaluate the quality of the research conducted by peers. The typical quality label given by top and very good journals may then become less relevant, over time.
It was also discussed that journals may not publish a certain numbers of papers in a volume released e.g. four times a year, but rather as qualifying papers are available for publication throughout the year. Another point raised was that a substantial change in the existing publication mechanics will likely require either top journals or top business schools to lead the way, whereas associations of leading scientists in a certain field may also play an important role in such conversion.

Sovereign Credit Risk: An Open Database

- January 31, 2013 in Data Release, External Projects, Featured, Open Data, Open Economics, Open Research, Public Finance and Government Data, Public Sector Credit

Throughout the Eurozone, credit rating agencies have been under attack for their lack of transparency and for their pro-cyclical sovereign rating actions. In the humble belief that the crowd can outperform the credit rating oracles, we are introducing an open database of historical sovereign risk data. It is available at http://sovdefdata.appspot.com/ where community members can both view and edit the data. Once the quality of this data is sufficient, the data set can be used to create unbiased, transparent models of sovereign credit risk.

The database contains central government revenue, expenditure, public debt and interest costs from the 19th century through 2011 – along with crisis indicators taken from Reinhart and Rogoff’s public database.

CentralGovernmentInterestToRevenue2010

Why This Database?

Prior to the appearance of This Time is Different, discussions of sovereign credit more often revolved around political and trade-related factors. Reinhart and Rogoff have more appropriately focused the discussion on debt sustainability. As with individual and corporate debt, government debt becomes more risky as a government’s debt burden increases. While intuitively obvious, this truth too often gets lost among the multitude of criteria listed by rating agencies and within the politically charged fiscal policy debate.

In addition to emphasizing the importance of debt sustainability, Reinhart and Rogoff showed the virtues of considering a longer history of sovereign debt crises. As they state in their preface:

“Above all, our emphasis is on looking at long spans of history to catch sight of ’rare’ events that are all too often forgotten, although they turn out to be far more common and similar than people seem to think. Indeed, analysts, policy makers, and even academic economists have an unfortunate tendency to view recent experience through the narrow window opened by standard data sets, typically based on a narrow range of experience in terms of countries and time periods. A large fraction of the academic and policy literature on debt and default draws conclusions on data collected since 1980, in no small part because such data are the most readily accessible. This approach would be fine except for the fact that financial crises have much longer cycles, and a data set that covers twenty-five years simply cannot give one an adequate perspective…”

Reinhart and Rogoff greatly advanced what had been an innumerate conversation about public debt, by compiling, analyzing and promulgating a database containing a long time series of sovereign data. Their metric for analyzing debt sustainability – the ratio of general government debt to GDP – has now become a central focus of analysis.

We see this as a mixed blessing. While the general government debt to GDP ratio properly relates sovereign debt to the ability of the underlying economy to support it, the metric has three important limitations.

First, the use of a general government indicator can be misleading. General government debt refers to the aggregate borrowing of the sovereign and the country’s state, provincial and local governments. If a highly indebted local government – like Jefferson County, Alabama, USA – can default without being bailed out by the central government, it is hard to see why that local issuer’s debt should be included in the numerator of a sovereign risk metric. A counter to this argument is that the United States is almost unique in that it doesn’t guarantee sub-sovereign debts. But, clearly neither the rating agencies nor the market believe that these guarantees are ironclad: otherwise all sub-sovereign debt would carry the sovereign rating and there would be no spread between sovereign and sub-sovereign bonds – other than perhaps a small differential to accommodate liquidity concerns and transaction costs.

Second, governments vary in their ability to harvest tax revenue from their economic base. For example, the Greek and US governments are less capable of realizing revenue from a given amount of economic activity than a Scandinavian sovereign. Widespread tax evasion (as in Greece) or political barriers to tax increases (as in the US) can limit a government’s ability to raise revenue. Thus, government revenue may be a better metric than GDP for gauging a sovereign’s ability to service its debt.

Finally, the stock of debt is not the best measure of its burden. Countries that face comparatively low interest rates can sustain higher levels of debt. For example, The United Kingdom avoided default despite a debt/GDP ratio of roughly 250% at the end of World War II. The amount of interest a sovereign must pay on its debt each year may thus be a better indicator of debt burden.

Our new database attempts to address these concerns by layering central government revenue, expenditure and interest data on top of the statistics Reinhart and Rogoff previously published.

A Public Resource Requiring Public Input

Unlike many financial data sets, this compilation is being offered free of charge and without a registration requirement. It is offered in the hope that it, too, will advance our understanding of sovereign credit risk.

The database contains a large number of data points and we have made efforts to quality control the information. That said, there are substantial gaps, inconsistencies and inaccuracies in the data we are publishing.

Our goal in releasing the database is to encourage a mass collaboration process directed at enhancing the information. Just as Wikipedia articles asymptotically approach perfection through participation by the crowd, we hope that this database can be cleansed by its user community. There are tens of thousands of economists, historians, fiscal researchers and concerned citizens around the world that are capable of improving this data, and we hope that they will find us.

To encourage participation, we have added Wiki-style capabilities to the user interface. Users who wish to make changes can log in with an OpenID and edit individual data points. They can also enter comments to explain their changes. User changes are stored in an audit trail, which moderators will periodically review – accepting only those that can be verified while rolling back others.

This design leverages the trigger functionality of MySQL to build a database audit trail that moderators can view and edit. We have thus married the collaborative strengths of a Wiki to the structure of a relational database. Maintaining a consistent structure is crucial for a dataset like this because it must ultimately be analyzed by a statistical tool such as R.

The unique approach to editing database fields Wiki-style was developed by my colleague, Vadim Ivlev. Vadim will contribute the underlying Python, JavaScript and MySQL code to a public GitHub repository in a few days.

Implications for Sovereign Ratings

Once the dataset reaches an acceptable quality level, it can be used to support logit or probit analysis of sovereign defaults. Our belief – based on case study evidence at the sovereign level and statistical modeling of US sub-sovereigns – is that the ratio of interest expense to revenue and annual revenue change are statistically significant predictors of default. We await confirmation or refutation of this thesis from the data set. If statistically significant indicators are found, it will be possible to build a predictive model of sovereign default that could be hosted by our partners at Wikirating. The result, we hope, will be a credible, transparent and collaborative alternative to the credit ratings status quo.

Sources and Acknowledgements

Aside from the data set provided by Reinhart and Rogoff, we also relied heavily upon the Center for Financial Stability’s Historical Financial Statistics. The goal of HFS is “to be a source of comprehensive, authoritative, easy-to-use macroeconomic data stretching back several centuries.” This ambitious effort includes data on exchange rates, prices, interest rates, national income accounts and population in addition to government finance statistics. Kurt Schuler, the project leader for HFS, generously offered numerous suggestions about data sources as well as connections to other researchers who gave us advice.

Other key international data sources used in compiling the database were:

  • International Monetary Fund’s Government Finance Statistics
  • Eurostat
  • UN Statistical Yearbook
  • League of Nation’s Statistical Yearbook
  • B. R. Mitchell’s International Historical Statistics, Various Editions, London: Palgrave Macmillan.
  • Almanach de Gotha
  • The Statesman’s Year Book
  • Corporation of Foreign Bondholders Annual Reports
  • Statistical Abstract for the Principal and Other Foreign Countries
  • For several countries, we were able to obtain nation-specific time series from finance ministry or national statistical service websites.

We would also like to thank Dr. John Gerring of Boston University and Co-Director of the CLIO World Tables project, for sharing data and providing further leads as well as Dr. Joshua Greene, author of Public Finance: An International Perspective, for alerting us to the IMF Library in Washington, DC.

A number of researchers and developers played valuable roles in compiling the data and placing it on line. We would especially like to thank Charles Tian, T. Wayne Pugh, Amir Muhammed, Anshul Gupta and Vadim Ivlev, as well as Karthick Palaniappan and his colleagues at H-Garb Informatix in Chennai, India for their contributions.

Finally, we would like to thank the National University of Singapore’s Risk Management Institute for the generous grant that made this work possible.

First Open Economics International Workshop Recap

- January 25, 2013 in Economic Publishing, Events, Featured, Open Access, Open Data, Open Economics, Open Research, Open Tools, Workshop

The first Open Economics International Workshop gathered 40 academic economists, data publishers and funders of economics research, researchers and practitioners to a two-day event at Emmanuel College in Cambridge, UK. The aim of the workshop was to build an understanding around the value of open data and open tools for the Economics profession and the obstacles to opening up information, as well as the role of greater openness of the academy. This event was organised by the Open Knowledge Foundation and the Centre for Intellectual Property and Information Law and was supported by the Alfred P. Sloan Foundation. Audio and slides are available at the event’s webpage.

Open Economics Workshop

Setting the Scene

The Setting the Scene session was about giving a bit of context to “Open Economics” in the knowledge society, seeing also examples from outside of the discipline and discussing reproducible research. Rufus Pollock (Open Knowledge Foundation) emphasised that there is necessary change and substantial potential for economics: 1) open “core” economic data outside the academy, 2) open as default for data in the academy, 3) a real growth in citizen economics and outside participation. Daniel Goroff (Alfred P. Sloan Foundation) drew attention to the work of the Alfred P. Sloan Foundation in emphasising the importance of knowledge and its use for making decisions and data and knowledge as a non-rival, non-excludable public good. Tim Hubbard (Wellcome Trust Sanger Institute) spoke about the potential of large-scale data collection around individuals for improving healthcare and how centralised global repositories work in the field of bioinformatics. Victoria Stodden (Columbia University / RunMyCode) stressed the importance of reproducibility for economic research and as an essential part of scientific methodology and presented the RunMyCode project.

Open Data in Economics

The Open Data in Economics session was chaired by Christian Zimmermann (Federal Reserve Bank of St. Louis / RePEc) and was about several projects and ideas from various institutions. The session examined examples of open data in Economics and sought to discover whether these examples are sustainable and can be implemented in other contexts: whether the right incentives exist. Paul David (Stanford University / SIEPR) characterised the open science system as a system which is better than any other in the rapid accumulation of reliable knowledge, whereas the proprietary systems are very good in extracting the rent from the existing knowledge. A balance between these two systems should be established so that they can work within the same organisational system since separately they are distinctly suboptimal. Johannes Kiess (World Bank) underlined that having the data available is often not enough: “It is really important to teach people how to understand these datasets: data journalists, NGOs, citizens, coders, etc.”. The World Bank has implemented projects to incentivise the use of the data and is helping countries to open up their data. For economists, he mentioned, having a valuable dataset to publish on is an important asset, there are therefore not sufficient incentives for sharing.

Eustáquio J. Reis (Institute of Applied Economic Research – Ipea) related his experience on establishing the Ipea statistical database and other projects for historical data series and data digitalisation in Brazil. He shared that the culture of the economics community is not a culture of collaboration where people willingly share or support and encourage data curation. Sven Vlaeminck (ZBW – Leibniz Information Centre for Economics) spoke about the EDaWaX project which conducted a study of the data-availability of economics journals and will establish publication-related data archive for an economics journal in Germany.

Legal, Cultural and other Barriers to Information Sharing in Economics

The session presented different impediments to the disclosure of data in economics from the perspective of two lawyers and two economists. Lionel Bently (University of Cambridge / CIPIL) drew attention to the fact that there is a whole range of different legal mechanism which operate to restrict the dissemination of information, yet on the other hand there is also a range of mechanism which help to make information available. Lionel questioned whether the open data standard would be always the optimal way to produce high quality economic research or whether there is also a place for modulated/intermediate positions where data is available only on conditions, or only in certain part or for certain forms of use. Mireille van Eechoud (Institute for Information Law) described the EU Public Sector Information Directive – the most generic document related to open government data and progress made for opening up information published by the government. Mireille also pointed out that legal norms have only limited value if you don’t have the internalised, cultural attitudes and structures in place that really make more access to information work.

David Newbery (University of Cambridge) presented an example from the electricity markets and insisted that for a good supply of data, informed demand is needed, coming from regulators who are charged to monitor markets, detect abuse, uphold fair competition and defend consumers. John Rust (Georgetown University) said that the government is an important provider of data which is otherwise too costly to collect, yet a number of issues exist including confidentiality, excessive bureaucratic caution and the public finance crisis. There are a lot of opportunities for research also in the private sector where some part of the data can be made available (redacting confidential information) and the public non-profit sector also can have a tremendous role as force to organise markets for the better, set standards and focus of targeted domains.

Current Data Deposits and Releases – Mandating Open Data?

The session was chaired by Daniel Goroff (Alfred P. Sloan Foundation) and brought together funders and publishers to discuss their role in requiring data from economic research to be publicly available and the importance of dissemination for publishing.

Albert Bravo-Biosca (NESTA) emphasised that mandating open data begins much earlier in the process where funders can encourage the collection of particular data by the government which is the basis for research and can also act as an intermediary for the release of open data by the private sector. Open data is interesting but it is even more interesting when it is appropriately linked and combined with other data and the there is a value in examples and case studies for demonstrating benefits. There should be however caution as opening up some data might result in less data being collected.

Toby Green (OECD Publishing) made a point of the different between posting and publishing, where making content available does not always mean that it would be accessible, discoverable, usable and understandable. In his view, the challenge is to build up an audience by putting content where people would find it, which is very costly as proper dissemination is expensive. Nancy Lutz (National Science Foundation) explained the scope and workings of the NSF and the data management plans required from all economists who are applying for funding. Creating and maintaining data infrastructure and compliance with the data management policy might eventually mean that there would be less funding for other economic research.

Trends of Greater Participation and Growing Horizons in Economics

Chris Taggart (OpenCorporates) chaired the session which introduced different ways of participating and using data, different audiences and contributors. He stressed that data is being collected in new ways and by different communities, that access to data can be an enormous privilege and can generate data gravities with very unequal access and power to make use of and to generate more data and sometimes analysis is being done in new and unexpected ways and by unexpected contributors. Michael McDonald (George Mason University) related how the highly politicised process of drawing up district lines in the U.S. (also called Gerrymandering) could be done in a much more transparent way through an open-source re-districting process with meaningful participation allowing for an open conversation about public policy. Michael also underlined the importance of common data formats and told a cautionary tale about a group of academics misusing open data with a political agenda to encourage a storyline that a candidate would win a particular state.

Hans-Peter Brunner (Asian Development Bank) shared a vision about how open data and open analysis can aid in decision-making about investments in infrastructure, connectivity and policy. Simulated models about investments can demonstrate different scenarios according to investment priorities and crowd-sourced ideas. Hans-Peter asked for feedback and input on how to make data and code available. Perry Walker (new economics foundation) spoke about the conversation and that a good conversation has to be designed as it usually doesn’t happen by accident. Rufus Pollock (Open Knowledge Foundation) concluded with examples about citizen economics and the growth of contributions from the wider public, particularly through volunteering computing and volunteer thinking as a way of getting engaged in research.

During two sessions, the workshop participants also worked on Statement on the Open Economics principles will be revised with further input from the community and will be made public on the second Open Economics workshop taking place on 11-12 June in Cambridge, MA.

Open Research Data Handbook Sprint

- January 17, 2013 in Events, Featured, Open Data, Open Economics, Open Research, Sprint

On February 15-16, the Open Research Data Handbook Sprint will happen at the Open Data Institute, 65 Clifton Street, London EC2A 4JE.

The Open Research Data Handbook aims to provide an introduction to the processes, tools and other areas that researchers need to consider to make their research data openly available.

Join us for a book sprint to develop the current draft, and explore ways to remix it for different disciplines and contexts.

Who it is for:

  • Researchers interested in carrying out their work in more open ways
  • Experts on sharing research and research data
  • Writers and copy editors
  • Web developers and designers to help present the handbook online
  • Anyone else interested in taking part in an intense and collaborative weekend of action

Register at Eventbrite

What will happen:

The main sprint will take place on Friday and Saturday. After initial discussions we’ll divide into open space groups to focus on research, writing and editing for different chapters of the handbook, developing a range of content including How To guidance, stories of impact, collections of links and decision tools.

A group will also look at digital tools for presenting the handbook online, including ways to easily tag content for different audiences and remix the guide for different contexts.

Agenda:

Week before & after:

  • Calling for online contributions and reviews

Friday:

  • 12.00 – 14:00: Seminar or bring your own lunch on open research data
  • 14:00 – 17:30: planning and initial work in the handbook in small teams

Saturday:

  • 10.00 – 10:30: Arrive and coffee
  • 10.30 – 11.30: Introducing open research – lightning talks
  • 11.30 – 13:30: Forming teams and starting sprint. Groups on:
    • Writing chapters
    • Decision tools
    • Building website & framework for book
    • Remixing guide for particular contexts
  • 13.30 – 14:30: Lunch
  • 14.30 – 16:30: Working in teams
  • 17.30 – 18:30: Report back
  • 18:30 – …… : Pub

Partners:

OKF Open Science Working Group – creators of the current Open Research Data Handbook
OKF Open Economic Working Group – exploring economics aspects of open research
Open Data Research Network – exploring a remix of the handbook to support open social science
research in a new global research network, focussed on research in the Global South.
Open Data Institute – hosting the event

The Statistical Memory of Brazil

- January 14, 2013 in Crowd-sourcing, Data Digitalization, External Projects, Featured, Open Data, Open Economics, Public Finance and Government Data, Statistical Memory of Brazil

This blog post is written by Eustáquio Reis, Senior Research Economist at the Institute of Applied Economic Research (Ipea) in Brazil and member of the Advisory Panel of the Open Economics Working Group.


The project Statistical Memory of Brazil aims to digitize and to make freely available and downloadable the rare book collections of the Library of the Minister of Finance in Rio de Janeiro (BMF/RJ). The project focuses on the publications containing social, demographic, economic and financial statistics for the nineteenth and early twentieth century Brazil. At present, approximately 1,500 volumes, 400,000 pages and 200,000 tables have been republished.

Apart from democratizing the contents to both the scientific community and the general public, the project intends the physical preservation of the collection. The rarity, age and precarious state of conservation of the books strongly recommend to restrict physical access to them, limiting their handling to specific bibliographical purposes.

For the Brazilian citizen, free access to the contents of rare historical collections and statistics provides a form of virtual appropriation of the national memory, and as such a source of knowledge, gratification and cultural identity.

The Library of the Minister of Finance in Rio de Janeiro (BMF/RJ)

Inaugurated in 1944, the BMF/RJ extends over 1,200 square meters in the Palacio da Fazenda in downtown Rio de Janeiro, the seat of the Minister of Finance up to 1972 when it was moved to Brasilia. The historical book collection dates back to the early 19th century when the Portuguese Colonial Administration was transferred to Brazil. Thereafter, several libraries from other institutions — Brazilian Customs, Brazilian Institute of Coffee, Sugar and Alcohol Institute, among others — were incorporated to the collection which today comprises over 150,000 volumes mainly specialized in economics, law, public administration and finance.

Rare book collections

For the purposes of the project, the collection of rare books includes a few thousand statistical reports and yearbooks. To mention just a few, the annual budgets of the Brazilian Empire, 1821-1889; annual budgets of the Brazilian Republic since 1890; Ministerial and Provincial reports since the 1830s; foreign and domestic trade yearbooks since 1839; railways statistics since the 1860s; stock market reports since the 1890s; economic retrospects and financial newsletters since the 1870s; the Brazilian Demographic and Economic Censuses starting in 1872 as well as the Brazilian Statistical Yearbooks starting in 1908. En passant, it should be noted that despite their rarity, fragility, and scientific value, these collections are hardly considered for republication in printed format.

Partnerships and collaborations

Under the initiative of the Research Network on Spatial Analysis and Models (Nemesis), sponsored by the Foundation for the Support of Research of the State of Rio de Janeiro and the National Council for Scientific and Technological Development, the project is a partnership between the Regional Administration of the Minister of Finance in Rio de Janeiro (MF/GRA-RJ); Institute of Applied Economic Researh (IPEA) and the Internet Archive (IA).

In addition to the generous access to its library book collection, The Minister of Finance provides the expert advice on their librarians as well as the office space and facilities required for the operation of the project. The Institute of Applied Economic Research provides advisory in economics, history and informatics. The Internet Archive provides the Scribe® workstations and digitization technology, making the digital publications available in several different formats on the website.

The project also makes specific collaborations with other institutions to supplement the collections of the Library of the Minister of Finance. Thus, the Brazilian Statistical Office (IBGE) supplemented the collections of the Brazilian Demographic and Economic Censuses, as well as of the Brazilian Statistical Yearbooks; the National Library (BN) made possible the republication of the Budgets of the Brazilian Empire; the Provincial and Ministerial Reports; the Rio News; and the Willeman Brazilian Review, the latter in collaboration with and the Department of Economics of the Catholic University of Rio de Janeiro.

Future developments an extensions

Based upon open source software designed to publish, manage, link and preserve digital contents (Drupal, Fedora and Islandora), a new webpage of the project is under construction including two collaborative / crowdsourcing platforms.

The first crowdsourcing platform will create facilities for the indexing, documentation and uploading of images and tabulations of historical documents and databases compiled by other research institutions or individuals willing to make voluntary contributions to the project. The dissemination of the digital content intends to stimulate research innovations, extensions, and synergies based upon the historical documents and databases. For such purpose, an open source solution to be considered is the Harvard University Dataverse Project.

The second crowdsourcing platform intends to foster online decentralized collaboration of volunteers to compile or transcribe to editable formats (csv, txt, xls, etc.) the content of selected digital republications of the Brazil’s Statistical Memory project. Whenever possible, optical character recognition (OCR) programs and routines will be used to facilitate the transcription of the image content of the books. The irregular typography of older publications, however, will probably require visual character recognition and manual transcription of contents. Finally, additional routines and programs will be developed to coordinate, monitor and revise the compilations made, so as to avoid mistakes and duplications.

Project Team

Eustáquio Reis, IPEA, Coordinator
Kátia Oliveira, BMF/RJ, Librarian
Vera Guilhon, BMF/RJ, Librarian
Jorge Morandi, IPEA, TI Coordinator
Gemma Waterston, IA, Project Manager
Ana Kreter, Nemesis, Researcher
Gabriela Carvalho, FGV, Researcher
Lucas Mation, IPEA, Researcher

Interns:
Fábio Baptista
Anna Vasconcellos
Ana Luiza Freitas
Amanda Légora

Timeline of Failed European Banks

- January 7, 2013 in Crowd-sourcing, Failed Banks, Featured, Public Finance and Government Data


A few months back Open Economics launched a project to list the European banks which have failed recently. After a successful online data sprint and follow up research, we have now collected data on 122 bank failures and bailouts since 1997.

To visualize the data collected on bank failures I created this timeline.

The data collection was initiated as neither the EU Commission, Eurostat nor EBA were able to provide any specific data. We decided to include a broad range of bank crisis measures beyond bankruptcy filing such as bank nationalisations and government bailouts. We also added some bank mergers,and finally we have added several cases where banks entered temporary closure (ie. “extraordinary administration” under Italian law). For each failed bank we have attempted to gather basic details such as the date of collapse, a news source and a news clip explaining the circumstances of the collapse.

We need your help to improve the failed bank tracker? Here’s how you can help.

  • Bank failures are still missing from the list. So if you know of any failures missing from the list, please go ahead and add the information directly in the sheet. If you have corrections to any of the bank appearing, please add them with an attached source and information. If news clips are not available in English, add information in the original language.
  • Descriptions and sources for several of the banks on the list are still missing – in particular on Italian and Portuguese.
  • Additional info. We hope to add more data to each bank failure, in particular a) The total assets prior to collapse and b) The auditor who signed off on the latest annual report. Let us know if you wish to help digging up any of this information.
  • We are eager to hear your view on the approach or any of the listed bank failures. Join the discussion on our mailing-list.

 

Research Data Management in Economic Journals

- December 7, 2012 in Economic Publishing, EDaWaX, Featured, Open Data, Open Economics, Open Research

This blog post has been written by Sven Vlaeminck | ZBW – German National Library of Economics / Leibniz Information Center for Economics

Research Data Management in Economic Journals

Background

In Economics, as in many other research disciplines, there is a continuous increase in the number of papers where authors have collected their own research data or used external datasets. However, so far there have been few effective means of replicating the results of economic research within the framework of the corresponding article, of verifying them and making them available for repurposing or using in the support of the scholarly debate.

In the light of these findings B.D. McCullough pointed out: “Results published in economic journals are accepted at face value and rarely subjected to the independent verification that is the cornerstone of the scientific method. Most results published in economics journals cannot be subjected to verification, even in principle, because authors typically are not required to make their data and code available for verification.” (McCullough/McGeary/Harrison: “Lessons from the JMCB Archive”, 2006)

Harvard Professor Gary King also asked: “[I]f the empirical basis for an article or book cannot be reproduced, of what use to the discipline are its conclusions? What purpose does an article like this serve?” (King: “Replication, Replication” 1995). Therefore, the management of research data should be considered an important aspect of the economic profession.

The project EDaWaX

Several questions came up when we considered the reasons why economics papers may not be replicable in many cases:

First: what kind of data is needed for replication attempts? Second: it is apparent that scholarly economic journals play an important role in this context: when publishing an empirical paper, do economists have to provide their data to the journal? How many scholarly journals commit their authors to do so? Do these journals require their authors to submit only the datasets, or also the code of computation? Do they pledge their authors to provide programs used for estimations or simulations? And what about descriptions of datasets, variables, values or even a manual on how to replicate the results?

As part of generating the functional requirements for this publication-related data archive, the project analyzed the data (availability) policies of economic journals and developed some recommendations for these policies that could facilitate replication.

Data Policies of Economic Journals

Download Dataset

The Sample

First of all, we wanted to know how many journals in Economics require their authors to provide their empirical analysis data. Of course it was not possible to analyze all of the estimated 8,000 to 10,000 journals in Economics.

We used a sample built by Bräuninger, Haucap and Muck (paper available in German only) for examining the relevance and reputation of economic journals in the eyes of German economists. This sample was very useful for our approach because it allowed the comparison of the international top journals to journals published in the German-speaking area. Using the sample’s rankings for relevance and reputation we could also establish that journals with data policies were also the ones with higher ranking.

In addition to the sample of Bräuninger, Haucap and Muck, we added four additional journals equipped with data availability policy to have more journals in our sample for a detailed evaluation of data policies. We excluded some journals because they are focused only on economic policy or theory and do not publish empirical articles.

The sample we used is not representative for economic journals, because it mainly consists of high-ranked journals. Furthermore, by adding some journals explicitly owning a data policy, the percentage of journals that is equipped with such guidelines also is much higher than we do expect for economic journals in general.

Journals owning a data availability policy

In our sample we have 29 journals equipped with a data availability policy (20.6%) and 11 journals (7.8%) owning a so called “replication policy” (we only examined the websites of the journals, not the printed versions). As mentioned above, this percentage is not representative for economic journals in general. In the contrary we assume that in our sample the majority of economic journals with data (availability) policies is included.

The number of journals with a data availability policy is considerably higher compared to earlier studies where other researchers (e.g. McCullough) examined the data archives of economic journals. An additional online-survey for editors of economic journals showed that most of our respondents implemented the data policies between 2004 and 2011. Therefore we suppose that the number of economic journals with data policies is slightly increasing. The editors of economic scholarly journals seem to realize that the topic of data availability is becoming more important.

The biggest portion of journals equipped with data availability policy were published by Wiley-Blackwell (6) and Elsevier (4). We found out that mainly university or association presses have high to very high percentage of journals owning data availability policies while the major scientific publishers stayed below 20%.

Out of the 29 journals with data availability policies, 10 used initially the data availability policy implemented by the American Economic Review (AER). These journals either used exactly the same policy or a slightly modified version.

The journals with a “replication policy” were excluded from further analysis. The reason is that “replication policies” are pledging authors to provide “sufficient data and other materials” on request only, so there are no files authors have to provide to the journal. This approach sounds good in theory – but it does not work in practice because authors often simply refuse to honor the requirements of these policies. (See “Replication in Empirical Economics: The Journal of Money, Credit and Banking Project” by Dewald, Thursby and Anderson).

Some criteria for data policies to enable replications

For a further evaluation of these data availability policies, we used some criteria for rating the quality of the policies: we extended some of the previously developed criteria by B.D. McCullough by adding standards which are important from an infrastructural point of view. The criteria we used for evaluation are as follows:

Data Policies that aim to ensure the replicability of economic research results have to:

  • be mandatory,
  • pledge authors to provide datasets, the code of computation, programs and descriptions of the data and variables (in form of a data dictionary at best),
  • assure that the data is provided prior to publication of an article,
  • have defined rules for research based on proprietary or confidential data,
  • provide the data, so other researchers can access these data without problems.

Besides journals should:

  • have a special section for the results of replication attempts or should at least publish results of replications in addition to the dataset(s),
  • require their authors to provide the data in open formats or in ASCII-format,
  • require their authors to specify the name and version of both the software and the operation system used for analysis.

Results of our survey

The above mentioned requirements have been used to analyze the data policies of 141 economic journals. These are some of the results we obtained:

Mandatory Data Availability Policies

We found out that more than 82% of the data policies are mandatory. This is a quite good percentage because for obtaining data it is crucial that policies mandate authors to do so. If they do not, there is little hope that authors provide a noteworthy amount of datasets and code – simply because it is time-consuming to prepare datasets and code and authors do not receive rewards for doing this work. Besides, authors often do not want to publish a dataset that is not fully exploited. In the academic struggle for reputation the opposite a researcher wants is to provide a substantial dataset to a competitor.

What data authors have to provide

We found out that 26 of the 29 policies (89.7%) pledged authors to submit datasets used for the computation of their results. The remaining journals do not pledge their authors to do so, because the journal’s focus often is more oriented towards experimental economic research.

Regarding the question what kinds of data authors have to submit, we found out that 65.5% of the journals’ data policies require their authors to provide descriptions of the data submitted and some instructions on how to use the single files submitted. The quality of these descriptions differs from very detailed instructions to a few sentences only that might not really help would-be-replicators.

For the purpose of replication these descriptions of submitted data are very important due to the structure of the data authors are providing: In most cases, data is available as a zip-file only. In these zip-containers there is a broad bunch of different formats and files. Without proper documentation, it is extremely time-consuming to find out what part of the data corresponds to which results in the paper, if this is possible at all. Therefore it is not sufficient that only 65.5% of the data policies in our sample mandate their authors to provide descriptions. This kind of documentation is currently the most important part of metadata for describing the research data.

The submission of (self-written) programs used e.g. for simulation purposes is mandatory for 62% of the policies. This relatively low percentage can also be considered as problematic: If another researcher wants to replicate the results of a simulation he or she won’t have the chance to do so, if the programs used for these simulations are not available.

Of course it depends on the journal’s focus, whether this kind of research is published. But if suchlike papers are published, a journal should take care that the programs used and the source code of the application are submitted. Only if the source code is available it is possible to check for inaccurate programming.

Approximately half of the policies mandate their authors to provide the code of their calculations. Due to the importance of code for replication purposes this percentage may be considered as low. The code of computation is crucial for the possibility to replicate the findings of an empirical article. Without the code would-be replicators have to code everything from scratch. Whether these researchers will be able to compile an identical code of computation is uncertain. Therefore it is crucial that data availability policies enforce strict availability of the code of computation.

The point in time for providing datasets and other materials

Regarding the question at which point in time authors have to submit the data to the journal, we found out that almost 90% of the data availability policies pledge authors to provide their data prior to the publication of an article. This is a good percentage. It is important to obtain the data prior to publication, because the publication is -due to the lack of other rewards- the only incentive to submit data and code. If an article is published, this incentive is no longer given.

Exemptions from the data policy and the case of proprietary data

In economic research it is quite common to use proprietary datasets. Companies as Thomson Reuters Data Stream offer the possibility to acquire datasets and many researchers are choosing such options. Also research based on company data or microdata always is proprietary or even confidential.

Normally, if researchers want to publish an article based on these data, they have to request for an exemption from the the data policy. More than 72% of the journals we analyzed offered this possibility. One journal (Journal of the European Economic Association) discourages authors from publishing articles that rely on completely proprietary data.

But even if proprietary data was used for research, it is important that these research outputs are replicable in principle. Therefore journals should have a procedure in place that ensures the replicability of the results even in these cases. Consequently some journals request their authors to provide the code of computation, the version(s) of the dataset(s) and some additional information on how to obtain the dataset(s).

Of the 28 journals allowing exemptions from the data policy we found out that more than 60% possess rules for these cases. This is a percentage that is not really satisfactory. There is still room for improvements.

Open formats

Open formats are important for two reasons: The first is that the long-term preservation of these data is much easier, because the technical specifications of open formats are known. A second reason is that open formats offer the possibility to use data and code in different platforms and software environments. It is useful to have the possibility to utilize the data interoperably and not only in one statistical package or on one platform.

Regarding these topics only two journals made recommendations for open formats.

Version of software and OS

According to McCullough and Vinod (2003) the results achieved in economic research are often influenced by the statistical package that was used for calculations. Also the operating system has a bearing on the results. Therefore both the version of the software and the OS used for calculations should be specified.

Most of the data policies in our sample do not mandate their authors to provide these specifications. But there are differences: For example almost every journal that has adopted the data availability policy of the American Economic Review (AER) requires its authors to “document[…] the purpose and format of each file provided” for each file they submit to the journal.

In sharp contrast, up to now not a single policy requires the specification of the operating system used for calculations.

Replication rubric

In the course of our study we also examined whether journals have a special section for providing the results of replication attempts. We found out that only a very limited number of journals own a section for results of replications. In an additional online survey of the project EDaWaX 7 journals stated that they publish replication results or attempts in the journals. However the quantity of these replication attempts was low: None of the respondents published more than three replication studies per annum, most even less than one per year.

The need for a replication section mainly consists by controlling the quality of the data submitted. If a journal does not publish the results of replications authors may submit bad quality data.

Conclusion

In summary, it can be stated that the management of publication related research data in economics is still at its early stages. We were able to find 29 journals with data availability policies. That is many more than other researchers found some years ago but compared to the multitude of economic journals in total the percentage of journals equipped with a data availability policy is still quite low. The 20.6% we found in our analyses might be the main proportion of all journals equipped with a data policy.

Nevertheless, editors and journals in economics seem to be in motion – the topic of data availability seems to become more and more important in economics. This is a positive signal and it will be an interesting aspect to monitor whether and how this upward trend continues.

A large portion of the analyzed data availability policies are mandatory, which is a good practice. Moreover, the finding that 90% of the journals are pledging their authors to submit the data prior to the publication of an article shows that many of them have appreciated the importance of providing data at an early stage in the publication process.

When analysing the data authors have to provide, we noticed that almost all guidelines mandate the submission of the (final) dataset(s), which is also quite positive.

But beyond that there is much room for improvements: Only two thirds of all policies require the submission of descriptions and of (self-written) software. As mentioned above, research data often is not usable, when descriptions or software components are missing. In particular the lack of requirements to submit the code of computation is a big problem for potential replication attempts. Only a small majority of all policies pledges their authors to provide it. Therefore it can be expected that almost half of the data availability policies in our sample is not fully enabling replications.

Another important aspect is the possibility to replicate the results of economic research that is based on proprietary or confidential data. While more than 72% of all policies allowing exemptions from their regulations, only 60.7% have a procedure in place that regulates data and descriptions which authors still have to provide in these cases. On balance, many research based on proprietary or confidential data is not replicable even in principle.

Open formats are used by a small minority of journals only. This might result in difficulties for the interoperable use of research data and the long-term preservation of these important sources of science and research.
The reuse of research data is also complicated by the lack of information on which version of a software was used for calculations. Only little more than a third of all policies discusses that authors have to specify the software version / the formats of submitted research data. Besides up to now, no single journal requires the specification of the operating system used.

But there are also good practices: Among the journals with data availability policies we noticed that the data availability policy implemented by the American Economic Review (AER) is a very good example of a data availability policy in economic journals. Journals equipped with this policy are the biggest single group of guidelines in our sample. Therefore we see a developing trend towards a de facto-standard for data policies.

In a second part to this survey (to be published in spring 2013) we will discuss the infrastructure used by economic scholarly journals for providing datasets and other materials.

This post has been added to the resources of the Open Economics Working Group.

Launching the Open Sustainability Working Group

- November 30, 2012 in Announcements, Call for participation, Environment, Energy and Sustainability, Featured, Open Data, Open Research

This blog post is written by Jorge Zapico, researcher at the Center for Sustainable Communications at KTH The Royal Institute of Technology and Velichka Dimitrova, Project Coordinator for Economics and Energy at the Open Knowledge Foundation and is cross-posted from the main blog.

Sign up to Open Sustainability

Sustainability is one of the most important challenges of our time. We are facing global environmental crises, such as climate change, resource depletion, deforestation, overfishing, eutrophication, loss of biodiversity, soil degradation, environmental pollution, etc. We need to move towards a more sustainable and resilient society, that ensures the well-being of current and future generations, that allows us to progress while stewarding the finite resources and the ecosystems we depend on.

Data is needed to monitor the condition of the environment and to measure how we are performing and progressing (or not) towards sustainability. Transparency and feedback is key for good decision-making, for allowing accountability and for tracking and tuning performance. This is true both at an institutional level, such as working with national climate change goals; at a company level, such as deciding the materials for building a product; and at a personal level, deciding between chicken and salmon at the supermarket. However, most of the environmental information is closed, outdated, static, or/and in text documents that are not possible to process.

For instance, unlike gross domestic product (GDP) and other publicly available data, carbon dioxide emissions data is not published frequently and in disaggregated form. While the current international climate negotiations at Doha discuss joint global efforts for the reduction of greenhouse gas emission, climate data is not freely and widely available.

“Demand CO2 data!” urged Hans Rosling at the Open Knowledge Festival in Helsinki last September#, encouraging a data-driven discussion of energy and resources. “We can have climate change beyond our expectations, which we haven’t done anything in time for” said Rosling in outlining the biggest challenges of our time. Activists don’t even demand the data. Many countries, such as Sweden, show up for climate negotiations without having done their CO2 emissions reporting for many months. Our countries should report on climate data in order for us to see the big picture.

Sustainability data should be open and freely available so anyone is free to use, reuse, and redistribute it. This data should be easy to access, both usable for the public but also accessible in standard machine-readable formats for enabling reuse and remix. And by sustainability data we do not mean only CO2 information, but all data that is necessary for measuring the state of, and changes in, the environment, and data which supports progress towards sustainability. This include a diversity of things like: scientific climate data and temperature records, environmental impact assessment of products and services, emissions and pollution information from companies and governments, energy production data or ecosystem health indicators.

To move towards this goal, we are founding a new Working Group on Open Sustainability, which seeks to:

  • advocate and promote the opening up of sustainability information and datasets
  • collect sustainability information and maintain a knowledge base of datasets
  • act as a support environment / hub for the development of community-driven projects
  • provide a neutral platform for working towards standards and harmonization of open sustainability data between different groups and projects.

The Open Sustainability Working Group is open for anyone to join. We hope to form an interdisciplinary network from a range of backgrounds such as academics, business people, civil servants, technologists, campaigners, consultants and those from NGOs and international institutions. Relevant areas of expertise include sustainability, industrial ecology, climate and environmental science, cleanweb development, ecological economics, social science, sustainability, energy, open data and transparency. Join the Open Sustainability Working Group by signing up to the mailing list to share your ideas and to contribute.

Creating a more sustainable society and mitigating climate change are some of the very hardest challenges we face. It will require us to collaborate, to create new knowledge together and new ways of doing things. We need open data about the state of the planet, we need transparency about emissions and the impact of products and industries, we need feedback and we need accountability. We want to leverage all the ideas, technologies and energy we can to prevent catastrophic environmental change.

This initiative was started by the OKFestival Open Knowledge and Sustainability and Green Hackathon team including Jorge Zapico, Hannes Ebner (The Centre for Sustainable Communications at KTH), James Smith (Cleanweb UK), Chris Adams (AMEE), Jack Townsend (Southampton University) and Velichka Dimitrova (Open Knowledge Foundation).

Complexity and complementarity – why more raw material alone won’t neccessarily bring open data driven growth

- November 6, 2012 in Featured, Open Data, Open Research

A guest post by Tim Davies from the Web Science DTC at the University of Southampton. Cross posted from the Open Data Impacts blog.

“Data is the raw material of the 21st Century”.

It’s a claim that has been made in various forms by former US CIO Vivek Kundra (PDF), by large consultancies and tech commentators, and that is regularly repeated in speeches by UK Cabinet Office Minister Francis Maude, mostly in relation to the drive to open up government data. This raw material, it is hoped, will bring about new forms of economic activity and growth. There is certainly evidence to suggest that for some forms of government data, particularly ‘infrastructural’ data, moving to free and open access can stimulate economic activity. But, for many open data advocates, the evidence is not showing the sorts of returns on investment, or even the ‘gold rush’ of developers picking over data catalogues to exploit newly available data that they had expected.

At a hack-event held at the soon-to-be-launched Open Data Institute in London a few weeks ago, a number of speakers highlighted the challenge of getting open data used: the portals are built, but the users do not necessarily come. Data quality, poor meta-data, inaccessible language, and the difficulty of finding wheat amongst the chaff of data were all diagnosed as part of the problem, with some interesting interfaces and tools developed to try and improve data description and discovery. Yet these diagnosis and solutions are still based on linear thinking: when a dataset is truly accessible, then it will be used, and economic benefits will flow.

Owen Barder identifies the same sort of linear thinking in much macro-economic international development policy of the 70s and 80s in his recent Development Drums podcast lecture on complexity and development. The lecture explores the question of how countries with similar levels of ‘raw materials’ in terms of human and physical capital, could have had such different growth rates over the last half century. The answer, it suggests, lies in the complexity of economic development – where we need not just raw materials, but diverse sets of skills and supply chains, frameworks, cultures and practices. Making the raw materials available is rarely enough for economic growth. And this something that open data advocates focussed on economic returns on data need to grapple with.

Thinking about open data use as part of a complex system involves paying attention to many different dimensions of the environment around data. Jose Alonso highlights “the political, legal, organisation, social, technical and economic” as all being important areas to focus on. One way of grounding notions of complexity in thinking about open data use, that I was introduced to in working on a paper with George Kuk last year, is through the concept of ‘complementarity’. Essentially A complements B if A and B together are more than the sum of their parts. For example, a mobile phone application and an app store are complements: as the software in one, needs the business model and delivery mechanisms in the other in order to be used.

The challenge then is to identify all the things that may complement open data for a particular use; or, more importantly, to identify all those processes already out there in the economy to which certain open data sets are a complement. Whilst the example above of complements appears at first glance technological (apps and app stores), behind it are economic, social and legal complementarities, amongst others. Investors, payment processing services, app store business models, remmitance to developers, and often-times, stable jobs for developers in an existing buoyant IT industry that allow them to either work on apps for fun in spare time, or to leave work with enough capital to take a risk on building their own applications are all part of the economic background. Developer meet-ups, online fora, clear licensing of data, no fear of state censorship of applications built and so-on contribute to the social and legal background. These parts of the complex landscape generally cannot be centrally planned or controlled, but equally they cannot be ignored when we are asking why the provision of a ‘raw material’ in open data has not brought about the use and impacts that many anticipated.

The newly forming Open Data Research network is currently developing a research project on ‘Exploring the Emerging Impacts of Open Data in the South’ which will be undertaking case study research to identify a range of different factors involved beyond just raw data in enabling open data to impact on political, social and economic governance. Join the mailing list at www.opendataresearch.org to find out more about the project as it develops. 

The Role of Government: Small Public Sector or Big Cuts?

- October 30, 2012 in Fact Checking Open Data, Featured, Public Finance and Government Data

This blog post is written for the School of Data Blog and is cross-posted from here.

Second Presidential Debate 2012

News stories based on statistical arguments emphasise a single fact but may lack the broader context. Would the future involve some more interactive form of media communication? Could tools like Google Fusion Tables allow us to delve into data and make our own data visualisations while discovering aspects of the story we are not told about?

There has rarely been an issue as controversial in economic policy as the role of government. Recently the role of government has been in the heart of the ideological divide of the US presidential debates. While Governor Romney advocates against a government-centred (small government) approach and threatens to undo the role of federal government in national life, President Obama supports the essential function of the state (big government) in promoting economic growth, empowering all societal groups with federal investments in education, healthcare and future competitive technology.


Graph 1: United Kingdom and other major country groups. Data Source: World Economic Outlook 2012. Download data from the DataHub

Two weeks ago the Guardian published an article about how the Tory government plans to shrink the state even below US levels, based on the recently-released data from the IMF’s World Economic Outlook. Let’s take the source data and take a look at the bigger picture. On the DataHub, I uploaded all data for “General Government Total Expenditure to GDP” for all countries as well as country groups [See the dataset]. You could use the Datahub Datastore default visualisation tools to build a line graph (select the dataset, then in Preview choose “Graph”) or try the Google Fusion Table with the all countries dataset to select the countries you are interested in exploring.

According to the data Britain would have a smaller public sector1 than the average of all advanced economies by 2017: other country groups are added to show how regions in the world compare (see Graph 1). Even EU countries with staggering public debt – like Greece – would still have a higher relative total government expenditure to GDP according to the projections (see Graph 2).

Graph 2: United Kingdom and other European countries + United States. Data Source: World Economic Outlook 2012. Download data from the DataHub

But what is the bigger picture? And does shrinking the role of government in the economy mean that total government expenditures will fall? Not necessarily, because remember that percentages are relative numbers. The growth or decline of total government spending would ultimately depend on economic growth or the increase of the total output of the economy until 2017. If we take the data for “General Government Total Expenditure” in national currency and build the growth rates2, we see that for the UK the growth rate is above zero, meaning that government expenditure would actually increase overtime, despite the diminishing role of government in the economy’s total output.

Debt-ridden countries like Greece, Portugal or Spain (dropping US and Italy to avoid an over-crowded graph) will have to slash spending first before reaching a positive growth despite their larger public sectors. The lesson is that the size of the public sector does not always equate to actual growth in government expenditures.

Graph 3: Growth in government expenditure for United Kingdom and other European countries. Data Source: World Economic Outlook 2012

Despite having a limited meaning for practical interpretational purposes, the government total expenditure to GDP is often an argument in ideological debates or a measure in policy papers which investigate the impact of government spending on consumption and economic growth or the optimal size of government. While lumped together, government total expenditure varies in composition between high-, middle- and low-income groups: richer societies tend to spend more on social security and welfare, middle- and low-income countries have higher relative capital expenditure and low-income societies tend to spend a larger share of their government budget on the military (e.g. See some examples from earlier IMF publications).

Even if no cuts are actually made, the public sector will eventually shrink: For example inflation could mean that the government will actually spend less in real terms. A smaller public sector in the long run will eventually mean that countries like UK will not be able to support an aging population or provide the same levels of infrastructure and public services as currently used to. Yet smaller public sector might also provide an opportunity to cut taxes, provide incentives for the business sector and boost economic growth. Policy choices about the size of government following the US presidential elections and the sovereign debt crisis in Europe would partly be choices of ideology, as it there is no clear evidence which recipe works for an individual case.

In the next piece in this series, we will look at some of the detailed data available on government staff salaries around the world.



1 The size of the public sector is measured by the percentage of total government expenditure in GDP.

2 How do I build growth rates? Add one column where you build natural logarithm of the absolute expenditure numbers, add another column where you lag all observations by one year: shift the entire column down by one row. In the third column take the difference between the current year logarithm and the lagged value. Growth rate = ln (xt) – ln (xt-1)