Support Us

You are browsing the archive for Featured.

Italy must expand its online franchise: a policy promising attractive side-effects for the Italian economy

- November 26, 2013 in Featured, Open Data


This blog has been reposted from the Bruegel Blog.

“55 years after its promulgation, how would you like to change the Italian constitution?” This is the rather difficult question posed to Italians in the online public consultation that closed in early October. Nonetheless, this attempt to improve the discourse between policy-making institutions and their citizens may have represented a distorted reality, skewed toward the most educated. To successfully use emerging technologies to enhance its democracy, Italy must expand the online franchise, by improving broadband access and bringing down service costs for more Italians – a policy promising attractive side-effects for the Italian economy at a time of challenge.

With the online public consultation on the Italian constitution, two democratic innovations were attempted in Italy. First, using a public consultation to collect citizen opinions over a large and challenging topic instead of a more specific one. Second, releasing results from a public consultation in open data form, under the creative common license CC-BY, which authorizes work sharing and remixing. But to be effective, new democratic channels need to be supported by adequate infrastructure and a consistent digital culture. At the current state, this might not be the case with Italy as showed by data.

Similarly to its European partners, Italy’s most active Internet users are young people between the age of 16 and 35, with around 80% of this group accessing online services regularly. In comparison, the same is true for only 50% of those between 44 and 54 years old (Eurostat). If political interest prevails over lack of digital competence, the survey outcomes could be skewed to the older part of the population and vice versa. The first data released by ISTAT (the Italian Institute of Statistics, that has analyzed survey outcomes) show that the percentage of respondents between 38 and 57 accounts alone for over 40% of the total, and it doubled the percentage of respondents between 23 and 37.

Furthermore, as the consultation has been available online only, significant parts of the population may have been put at a disadvantage when wanting to participate, either due to a lack of competence or simply an absence of Internet access. Italy is affected by a big digital divide. In 2012, household access to broadband was 55% (EU 27 70% in 2012), with regional variations across the country of up to 20%.

The number of respondent types could be narrower than hoped. New instruments of democracy may attract a part of the population less intimate with canonical democratic measures (e.g. vote in the election, debate in public forum), but once again data betray expectations. Most of the survey participants had at least high school diploma indeed. But this could be already foreseen looking at Eurostat data. Those latter show significant positive correlations between education level and internet usage. Not only does this technological advantage has potentially favored those with higher education, but the most educated respondents are also overrepresented among those already exercising their civil rights through traditional tools.  

Those threats hang on the validity and efficacy of the survey, where only around 0.4% of people with right to vote has participated. Introducing new democratic measures is a first step into a political and economical modernization path. Nonetheless, to face this challenge Italy needs to guarantee equal and homogeneous access to broadband across the territory for all the citizens.  

Some policies to this end have already been adopted. The Italian government launched ‘the National Plan for Broadband’ in 2008, aimed at reducing the infrastructure deficit currently excluding 8.5 millions of people from broadband access. The European Digital Agenda stipulates that  all Europeans should have access to Internet above 30 Megabyte per second by 2020. In order to respect these objectives, Italy presented ‘the Strategic Project for Ultra Broadband’ at the end of 2012 and the public investment tranche of 900 million euros for the plan, announced last February, aims to create over 5000 new jobs and raise GDP by 1.3 billion euros.

Access does not mean use. Equal broadband access needs to be accompanied by affordable prices. In 2012, Italian citizens paid 25% more than the OECD average for broadband access. The government should work primarily to open the market to more competition, or even to intervene on market failures at the price level.

131120_valeria

Source: OECD

Betting on broadband infrastructure is a winning game. Offline, it will significantly increase GDP, the number of jobs and the level of innovation across the country. Online, though the experience from other countries suggests it offers no magic pill, it could improve dialogue with institutions, providing more options for participation, for example also via new e-government tools.

Open model of an oil contract

- October 22, 2013 in External Projects, Featured, Open Data, Open Economics

Please come and kick the tires of our open model of an oil contract!

In the next month or so, OpenOil and its partners will publish what we believe will be the first financial model of an oil contract under Creative Commons license. We would like to take this opportunity to invite the Open Economics community to come and kick the wheels on the model when it is ready, and help us improve it.

We need you because we expect a fair degree of heat from those with a financial or reputational stake in continued secrecy around these industries. We expect the brunt of attacks to be on the basis that we are wrong. And of course we will be wrong in some way. It’s inevitable. So we would like our defence to be not, “no we’re never wrong”, but “yes, sometimes we are wrong, but transparently so and for the right reasons – and look, here are a bunch of friends who have already pointed out these errors, which have been corrected. You got some specific critiques, come give them. But the price of criticism is improvement – the open source way!” We figure Open Economics is the perfect network to seek that constructive criticism.

screengrab

Ultimately, we want to grow an open source community which will help grow a systematic understanding of the economics of the oil and gas industry independent of investor or government stakes, since the public policy impact of these industries and relevant flows are too vital to be left to industry specialists. There are perhaps 50 countries in the world where such models could transform public understanding of industries which dominate the political economy.

The model itself is still being fine-tuned but I’d like to take this chance to throw out a few heuristics that have occurred in the process of building it.

Public interest modelling. The model is being built by professionals with industry experience but its primary purpose is to inform public policy, not to aid investment decisions or serve as negotiation support for either governments or companies. This has determined a distinct approach to key issues such as management of complexity and what is an acceptable margin of error.

Management of complexity. Although there are several dozen variables one could model, and which typically appear in the models produced for companies, we deliberately exclude a long tail of fiscal terms, such as ground rent and signature bonuses, on the basis that the gain in reduction of margin of error is less than the loss from increasing complexity for the end user. We also exclude many of the fine tuning implementations of the taxation system. We list these terms in a sheet so those who wish can extend the model with them. It would be great, for example, to get tax geek help on refining some of these issues.

A hierarchy of margins of error. Extractives projects can typically last 25 years. The biggest single margin of error is not within human power to solve – future price. All other uncertainties or estimates pale in comparison with its impact on returns to all stakeholders. Second are the capex and opex going into a project. The international oil company may be the only real source of these data, and may or may not share them in disaggregated form with the government – everyone else is in the dark. For public interest purposes, the margin of error created by all other fiscal terms and input assumptions combined is less significant, and manageable.

Moving away from the zero-sum paradigm. Because modelling has traditionally been associated with the negotiation process, and perhaps because of the wider context surrounding extractive industries, a zero-sum paradigm often predominates in public thinking around the terms of these contracts. But the model shows graphically two distinct ways in which that paradigm does not apply. First, in agreements with sufficient progressivity, rising commodity prices could mean simultaneous rise of both government take and a company’s Internal Rate of Return. Second, a major issue for governments and societies depending on oil production is volatility – the difference between using minimal and maximal assumptions across all of the inputs will likely produce a difference in result which is radical. One of a country’s biggest challenges then is focusing enough attention on regulating itself, its politicians’ appetite for spending, its public’s appetite for patronage. We know this of course in the real world. Iraq received $37 billion in 2007, then $62 billion in 2008, then $43 billion or so in 2009. But it is the old journalistic difference between show and tell. A model can show this in your country, with your conditions.

The value of contract transparency. Last only because self-evident is the need for primary extractives conracts between states and companies to enter the public domain. About seven jurisdictions around the world publish all contracts so far but it is gaining traction as a norm in the governance community. The side-effects of the way extractive industries are managed now are almost all due to the ill-understood nature of rent. Even corruption, the hottest issue politically, may often simply be a secondary effect of the rent-based nature of the core activities. Publishing all contracts is the single biggest measure that would get us closer to being able to address the root causes of Resource Curse.

See http://openoil.net/ for more details.

Fundamental Stock Valuation on an Open Platform

- September 3, 2013 in External Projects, Featured, Open Data

Investors have traditionally relied on Wall Street analysts for projections of companies’ intrinsic values.  Wall Street analysts typically come up with their valuations using Discounted Cash flow (DCF) analysis. However, they do not disclose the proprietary models used for arriving at buying, selling or holding recommendations. ThinkNum has a solution which allows users to build their own models.

A cash flow model is a tool for translating projections of a company’s future operating performance like revenue growth and costs of goods into an intrinsic value for the company. Without viewing the assumptions underlying a model, a leap of faith is required in order to use the model’s outputs. With Thinknum, users can view and change any formula or assumption that drives the valuation. The interactive nature of the application allows users to conduct ‘what-if’ analysis to test how sensitive a company’s valuation is to changes in a specific performance measure.

To get started, all that is needed is a stock ticker. After entering the ticker, Thinknum displays a model using the mean of analysts’ revenue growth projections. We load the historical numbers for the company’s balance sheet, income statement and the statement of cash flows from corporate filings.  We then use the growth assumptions to project how the company’s financial performance will evolve over time and how much value will ultimately accrue to shareholders. Users can modify the model or build one from scratch. Users can also download the models into Excel spreadsheets.

Google DCF 3 Statement Model pictured above is an example of a model I recently built for valuing Google’s stock price. If you disagree with my assumptions of Google’s revenue growth you can simply change those assumptions and compute the new value. DCF models can be used to make rational investment decisions by comparing the model’s intrinsic value to the current market price.

One important caveat is any model is only as good as the assumptions underlying it. We provide data from over 2,000 sources in an attempt to place proper context around companies and help analysts make the best assumptions based on all the information available. ThinkNum users can plot any number in the cash flow models over time. Visualizing numbers over time and comparing metrics across the industry help users gain insight into the company’s historical performance and how such performance might vary going forward. For example, simply type total_revenue(goog) into the expression window to pull up the total historical revenue for Google. You can then click on the bar graphs to pull up the corporate filings used in the charts.

We are excited about the role the web can play in helping us make better decisions by rationally analyzing available data.

Open Economics: the story so far…

- August 30, 2013 in Advisory Panel, Announcements, Events, Featured, Open Data, Open Economics, Projects

A year and a half ago we embarked on the Open Economics project with the support of the Alfred P. Sloan Foundation and we would like a to share a short recap of what we have been up to.

Our goal was to define what open data means for the economics profession and to become a central point of reference for those who wanted to learn what it means to have openness, transparency and open access to data in economics.

Advisory Panel of the Open Economics Working Group:
openeconomics.net/advisory-panel/

Advisory Panel

We brought together an Advisory Panel of twenty senior academics who advised us and provided input on people and projects we needed to contact and issues we needed to tackle. The progress of the project has depended on the valuable support of the Advisory Panel.

1st Open Economics Workshop, Dec 17-18 ’12, Cambridge, UK:
openeconomics.net/workshop-dec-2012/

2nd Open Economics Workshop, 11-12 June ’13, Cambridge, MA:
openeconomics.net/workshop-june-2013

International Workshops

We also organised two international workshops, first one held in Cambridge, UK on 17-18 December 2012 and second one in Cambridge U.S. on 11-12 June 2013, convening academics, funders, data publishers, information professionals and students to share ideas and build an understanding about the value of open data, the still persisting barriers to opening up information, as well as the incentives and structures which our community should encourage.

Open Economics Principles

While defining open data for economics, we also saw the need to issue a statement on the openness of data and code – the Open Economics Principles – to emphasise that data, program code, metadata and instructions, which are necessary to replicate economics research should be open by default. Having been launched in August, this statement is now being widely endorsed by the economics community and most recently by the World Bank’s Data Development Group.

Projects

The Open Economics Working Group and several more involved members have worked on smaller projects to showcase how data can be made available and what tools can be built to encourage discussions and participation as well as wider understanding about economics. We built the award-winning app Yourtopia Italy – http://italia.yourtopia.net/ for a user-defined multidimensional index of social progress, which won a special prize in the Apps4Italy competition.




Yourtopia Italy: application of a user-defined multidimensional index of social progress: italia.yourtopia.net

We created the Failed Bank Tracker, a list and a timeline visualisation of the banks in Europe which failed during the last financial crisis and released the Automated Game Play Datasets, the data and code of papers from the Small Artificial Agents for Virtual Economies research project, implemented by Professor David Levine and Professor Yixin Chen at the Washington University of St. Louis. More recently we launched the Metametrik prototype of a platform for the storage and search of regression results in the economics.


MetaMetrik: a prototype for the storage and search of econometric results: metametrik.openeconomics.net

We also organised several events in London and a topic stream about open knowledge and sustainability at the OKFestival with a panel bringing together a diverse range of panelists from academia, policy and the open data community to discuss how open data and technology can help improve the measurement of social progress.

Blog and Knowledge Base

We blogged about issues like the benefits of open data from the perspective of economics research, the EDaWaX survey of the data availability of economics journals, pre-registration of in the social sciences, crowd-funding as well as open access. We also presented projects like the Statistical Memory of Brazil, Quandl, the AEA randomized controlled trials registry.

Some of the issues we raised had a wider resonance, e.g. when Thomas Herndon found significant errors in trying to replicate the results of Harvard economists Reinhart and Rogoff, we emphasised that while such errors may happen, it is a greater crime not to make the data available with published research in order to allow for replication.

Some outcomes and expectations

We found that opening up data in economics may be a difficult matter, as many economists utilise data which cannot be open because of privacy, confidentiality or because they don’t own that data. Sometimes there are insufficient incentives to disclose data and code. Many economists spend a lot of resources in order to build their datasets and obtain an advantage over other researchers by making use of information rents.

Some journals have been leading the way in putting in place data availability requirements and funders have been demanding data management and sharing plans, yet more general implementation and enforcement is still lacking. There are now, however, more tools and platforms available where researchers can store and share their research content, including data and code.

There are also great benefits in sharing economics data: it enables the scrutiny of research findings and gives a possibility to replicate research, it enhances the visibility of research and promotes new uses of the data, avoids unnecessary costs for data collection, etc.

In the future we hope to concentrate on projects which would involve graduate students and early career professionals, a generation of economics researchers for whom sharing data and code may become more natural.

Keep in touch

Follow us on Twitter @okfnecon, sign up to the Open Economics mailing list and browse our projects and resources at openeconomics.net.

Introducing the Open Economics Principles

- August 7, 2013 in Announcements, Featured

The Open Economics Working Group would like to introduce the Open Economics Principles, a Statement on Openness of Economic Data and Code. A year and a half ago the Open Economics project began with a mission of becoming central point of reference and support for those interested in open economic data. In the process of identifying examples and ongoing barriers for opening up data and code for the economics profession, we saw the need to present a statement on the guiding principles of transparency and accountability in economics that would enable replication and scholarly debate as well as access to knowledge as a public good.

We wrote the Statement on the Open Economics Principles during our First and Second Open Economics International Workshops, receiving feedback from our Advisory Panel and community with the aim to emphasise the importance of having open access to data and code by default and address some of the issues around the roles of researchers, journal editors, funders and information professionals.

Second Open Economics International Workshop, June 11-12, 2013

Second Open Economics International Workshop, June 11-12, 2013

Read the statement below and follow this link to endorse the Principles.


Open Economics Principles

Statement on Openness of Economic Data and Code

Economic research is based on building on, reusing and openly criticising the published body of economic knowledge. Furthermore, empirical economic research and data play a central role for policy-making in many important areas of our
economies and societies.

Openness enables and underpins scholarly enquiry and debate, and is crucial in ensuring the reproducibility of economic research and analysis. Thus, for economics to function effectively, and for society to reap the full benefits from economic research, it is therefore essential that economic research results, data and analysis be openly and freely available, wherever possible.

  1. Open by default: by default data in its different stages and formats, program code, experimental instructions and metadata – all of the evidence used by economists to support underlying claims – should be open as per the Open Definition1, free for anyone to use, reuse and redistribute. Specifically open material should be publicly available and licensed with an appropriate open licence2.
  2. Privacy and confidentiality: We recognise that there are often cases where for reasons of privacy, national security and commercial confidentiality the full data cannot be made openly available. In such cases researchers should share analysis under the least restrictive terms consistent with legal requirements, abiding by the research ethics and guidelines of their community. This should include opening up non-sensitive data, summary data, metadata and code, and facilitating access if the owner of the original data grants other researchers permission to use the data
  3. Reward structures and data citation: recognizing the importance of data and code to the discipline, reward structures should be established in order to recognise these scholarly contributions with appropriate credit and citation in an acknowledgement that producing data and code with the documentation that make them reusable by others requires a significant commitment of time and resources. At minimum, all data necessary to understand, assess, or extend conclusions in scholarly work should be cited. Acknowledgements of research funding, traditionally limited to publications, could be extended to research data and contribution of data curators should be recognised.
  4. Data availability: Investigators should share their data by the time of publication of initial results of analyses of the data, except in compelling circumstances. Data relevant to public policy should be shared as quickly and widely as possible. Funders, journals and their editorial boards should put in place and enforce data availability policies requiring data, code and any other relevant information to be made openly available as soon as possible and at latest upon publication. Data should be in a machine-readable format, with well-documented instructions, and distributed through institutions that have demonstrated the capability to provide long-term stewardship and access. This will enable other researchers to replicate empirical results.
  5. Publicly funded data should be open: publicly funded research work that generates or uses data should ensure that the data is open, free to use, reuse and redistribute under an open licence – and specifically, it should not be kept unavailable or sold under a proprietary licence. Funding agencies and organizations disbursing public funds have a central role to play and should establish policies and mandates that support these principles, including appropriate costs for long-term data availability in the funding of research and the evaluation of such policies3, and independent funding for systematic evaluation of open data policies and use.
  6. Usable and discoverable: as simply making data available may not be sufficient for reusing it, data publishers and repository managers should endeavour to also make the data usable and discoverable by others; for example, documentation, the use of standard code lists, etc., all help make data more interoperable and reusable and submission of the data to standard registries and of common metadata enable greater discoverability.

See Reasons and Background: http://openeconomics.net/principles/.

Endorse the Principles

I endorse the Open Economics Principles

[signature]

Endorse
306 signatures

Share this:

   


1. http://opendefinition.org/

2. Open licenses for code are those conformant with the Open Source Definition see http://opensource.org/licenses and open licenses for data should be conformant with the open definition, see http://opendefinition.org/licenses/#Data.

3. A good example of an important positive developments in this direction from the United States is http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf

EC Consultation on open research data

- July 17, 2013 in Featured, Open Access, Open Data

The European Commission held a public consultation on open access to research data on July 2 in Brussels inviting statements from researchers, industry, funders, IT and data centre professionals, publishers and libraries. The inputs of these stakeholders will play some role in revising the Commission’s policy and are particularly important for the ongoing negotiations on the next big EU research programme Horizon 2020, where about 25-30 billion Euros would be available for academic research. Five questions formed the basis of the discussion:

  • How we can define research data and what types of research data should be open?
  • When and how does openness need to be limited?
  • How should the issue of data re-use be addressed?
  • Where should research data be stored and made accessible?
  • How can we enhance “data awareness” and a “culture of sharing”?

Here is how the Open Knowledge Foundation responded to the questions:

How can we define research data and what types of research data should be open?

Research data is extremely heterogeneous, and would include (although not be limited to) numerical data, textual records, images, audio and visual data, as well as custom-written software, other code underlying the research, and pre-analysis plans. Research data would also include metadata – data about the research data itself – including uncertainties and methodology, versioned software, standards and other tools. Metadata standards are discipline-specific, but to be considered ‘open’, at a bare minimum it would be expected to provide sufficient information that a fellow researcher in the same discipline would be able to interpret and reuse the data, as well as be itself openly available and machine-readable. Here, we are specifically concerned with data that is being produced, and therefore can be controlled by the researcher, as opposed to data the researcher may use that has been produced by others.

When we talk about open research data, we are mostly concerned with data that is digital, or the digital representation of non-digital data. While primary research artifacts, such as fossils, have obvious and substantial value, the extent to which they can be ‘opened’ is not clear. However, the use of 3D scanning techniques can and should be used to enable the capture of many physical features or an image, enabling broad access to the artifact. This would benefit both researchers who are unable to travel to visit a physical object, as well as interested citizens who would typically be unable to access such an item.

By default there should be an expectation that all types of research data that can be made public, including all metadata, should be made available in machine-readable form and open as per the Open Definition. This means the data resulting from public work is free for anyone to use, reuse and redistribute, with at most a requirement to attribute the original author(s) and/or share derivative works. It should be publicly available and licensed with this open license.

When and how does openness need to be limited?

The default position should be that research data should be made open in accordance with the Open Definition, as defined above. However, while access to research data is fundamentally democratising, there will be situations where the full data cannot be released; for instance for reasons of privacy.

In these cases, researchers should share analysis under the least restrictive terms consistent with legal requirements, and abiding by the research ethics as dictated by the terms of research grant. This should include opening up non-sensitive data, summary data, metadata and code; and providing access to the original data available to those who can ensure that appropriate measures are in place to mitigate any risks.

Access to research data should not be limited by the introduction of embargo periods, and arguments in support of embargo periods should be considered a reflection of inherent conservatism among some members of the academic community. Instead, the expectation should be that data is to be released before the project that funds the data production has been completed; and certainly no later than the publication of any research output resulting from it.

How should the issue of data re-use be addressed?

Data is only meaningfully open when it is available in a format and under an open license which allows re-use by others. But simply making data available is often not sufficient for reusing it. Metadata must be provided that provides sufficient documentation to enable other researchers to replicate empirical results.

There is a role here for data publishers and repository managers to endeavour to make the data usable and discoverable by others. This can be by providing further documentation, the use of standard code lists, etc., as these all help make data more interoperable and reusable. Submission of the data to standard registries and use of common metadata also enable greater discoverability. Interoperability and the availability of data in machine-readable form are crucial to ensure data-mining and text-mining of the data can be performed, a form of re-use that must not be restricted.

Arguments are sometimes made that we should monitor levels of data reuse, to allow us to dynamically determine which data sets should be retained. We refute this suggestion. There is a moral responsibility to preserve data created by taxpayer funds, including data that represents negative results or that is not obviously linked to publications. It is impossible to predict possible future uses, and reuse opportunities may currently exist that may not be immediately obvious. It is also crucial to note the research interests change over time.

Where should research data be stored and made accessible?

Each discipline needs different options available to store data and open it up to their community and the world; there is no one-size-fits-all solution. The research data infrastructure should be based on open source software and interoperable based on open standards. With these provisions we would encourage researchers to use the data repository that best fits their needs and expectations, for example an institutional or subject repository. It is crucial that appropriate metadata about the data deposited is stored as well, to ensure this data is discoverable and can be re-used more easily.

Both the data and the metadata should be openly licensed. They should be deposited in machine-readable and open formats, similar to how the US government mandate this in their Executive Order on Government Information. This ensures the possibility to link repositories and data across various portals and makes it easier to find the data. For example, the open source data portal CKAN has been developed by the Open Knowledge Foundation, which enables the depositing of data and metadata and makes it easy to find and re-use data. Various universities, such as the Universities of Bristol and Lincoln, already use CKAN for these purposes.

How can we enhance data awareness and a culture of sharing?

Academics, research institutions, funders, and learned societies all have significant responsibilities in developing a culture of data sharing. Funding agencies and organisations disbursing public funds have a central role to play and must ensure research institutions, including publicly supported universities, have access to appropriate funds for longer-term data management. Furthermore, they should establish policies and mandates that support these principles.

Publication and, more generally sharing, of research data should be ingrained in the academic culture, and should be seen as a fundamental part of scholarly communication. However, it is often seen as detrimental to a career, partly as a result of the current incentive system set up by by universities and funders, partly as a result of much misunderstanding of the issues.

Educational and promotional activities should be set up to promote the awareness of open access to research data amongst researchers, to help disentangle the many myths, and to encourage them to self-identify as supporting open access. These activities should be set up in recognition of the fact that different disciplines are at different stages in the development of the culture of sharing. Simultaneously, universities and funders should explore options for creating incentives to encourage researchers to publish their research data openly. Acknowledgements of research funding, traditionally limited to publications, could be extended to research data and contribution of data curators should be recognised.

References

Open Access to Research Data: The European Commission’s consultation in progress

- July 9, 2013 in Featured, Open Access, Open Research

The European Commission held a public consultation on open access to research data on July 2 in Brussels inviting statements from researchers, industry, funders, IT and data centre professionals, publishers and libraries. The inputs of these stakeholders will play some role in revising the Commission’s policy and are particularly important for the ongoing negotiations on the next big EU research programme Horizon 2020, where about 25-30 billion Euros would be available for academic research. Five questions formed the basis of the discussion:

  • How we can define research data and what types of research data should be open?
  • When and how does openness need to be limited?
  • How should the issue of data re-use be addressed?
  • Where should research data be stored and made accessible?
  • How can we enhance “data awareness” and a “culture of sharing”?

Contributions from the researchers’ perspective emphasised that data, metadata and other documentation should be made available in order to be able to replicate the results of a research article and more data available means more scrutiny and getting more value out of the data. Furthermore, there is a need for pre-registration of studies in order to understand the full picture of a research field where e.g. negative results in the biomedical sciences (as well as many other fields) are not published. Then, this is also a need to have binding mechanisms e.g. required data management plans, better linkage between the research data and scientific publication with enforcement of data availability by journals, but also sustainable plans for making data available, where open access to data is formally a part of the research budget.

Searching and finding research data should be also made easier, as open access to data does not necessarily mean accessible data. There was also an emphasis that every contributor should be known and acknowledged and there is a need of establishing cultures around data sharing in different disciplines and “augmenting the scientific infrastructure to be technical, social and participatory” (Salvatore Mele, CERN).

There was some agreement that commercial data and data which can lead back to individuals should be kept closed but some aggregated data should be shared. Industry representatives (Philips Research, Federation of German Security and Defence Industries) argued for keeping some data closed, deciding on a case by case basis and having embargo periods on data produced in public-private partnerships in order to encourage investment.

Funders viewed research data as a public good, which should be managed and be discoverable, and encouraged open and better access to research data where research outputs are accessed and used in a way that maximises the public benefit. While there is a growing consensus about funder policies, these should be better implemented and enforced. Resources like – infrastructure, incentives and cultures, capacity and skills, ethics and governance – should be built and sustained in recognition of the different stages that different disciplines are currently at (some really good points made by David Carr, the Wellcome Trust).

The IT, data centre professionals and librarians spoke about the need to recognise the role of data scientists and data librarians, with appropriate funding and careers. While the value of data is often recognised later on and grows over time there is less of an understanding who would pay for the long-term preservation since few institutions can make indefinite commitments. A key component should be also proper training and the development of core skills in dealing with research data (where librarians can assist researchers in data management plans, bridging the gap in knowledge), as well as the proper citation rules and practices for data where career recognition can be linked to sharing of research data in order to boost incentives.

While the European Commission has been carrying the flag of open access, mandating open access to research publications funded by the last research and innovation programme FP7, there are larger hurdles on the road to open access to research data. While the EC’s communication “Towards better access to scientific information” reflects some commitment to open access to research data, there are many exceptions, e.g. privacy, trade secrets, national security, legitimate commercial interest, intellectual property, data resulting from a public-private partnership, etc. As Mireille van Echoud, professor of Information Law at IViR, stated at the Open Economics workshop in June, “any lawyer will find whatever argument they need to keep data from falling under an open access obligation”.

Look at some more detailed notes from Ian Mulvany and his presentation on the behalf of several publishers.

Second Open Economics International Workshop Recap

- July 5, 2013 in Events, Featured, Workshop


Open Knowledge Foundation, CIPIL, MIT Sloan. Supported by Alfred P. Sloan Foundation

On June 11-12, the Open Economics Working Group of the Open Knowledge Foundation organised the Second Open Economics International Workshop, hosted at the MIT Sloan School of Management, a second of two international workshops funded by the Alfred P. Sloan Foundation, aimed at bringing together economists and senior academics, funders, data publishers and data curators in order to discuss the progress made in the field of open data for economics and the still existing challenges. This post is an extended summary of the speakers’ input and some of the discussion. See the workshop page for more details.

Setting the Scene

The first panel addressed the current state of open data in economics research and some of the “not bad” practices in the area. Chaired by Rufus Pollock (Open Knowledge Foundation) the panel brought together senior academics and professionals from economics, science, technology and information science.

Eric von Hippel (MIT Sloan School of Management) talked about open consumer-developed innovations revealing that consumers actually innovate a lot to solve their needs as private users and while they are generally willing to let others adopt their innovations for free, they don’t actively invest in knowledge diffusion. As producers of findings, economists have high incentives to diffuse those, but as users of private research methods and data they have low or negative incentives to diffuse to rivals. Lower costs of diffusion, increasing the benefits from diffusion, more collaborative research processes and mandatory sharing are some of the ways to increase economists’ incentives to diffuse research methods and data as they diffuse findings. [See slides]

Micah Altman (MIT Libraries, Brookings Institution) stressed that best practices are often not “best” and rarely practiced thus preferred to discuss some probably “not bad” practices including policy practices for the dissemination and citation of data: e.g. that data citations should be treated as first-class objects of publication as well as reproducibility policies where more support should be given to publishing replications and registering studies. He emphasised that policies are often not self-enforcing or self-sustaining and compliance with data availability policies even in some of the best journals is very low. [See slides]

Shaida Badiee (Development Data Group, World Bank) shared the experience of setting the World Bank’s data free in 2010 and the exceptional popularity and impact the World Bank’s data has affected. To achieve better access, data is legally open – undiscriminating about the types of uses – given the appropriate user support, available in multiple languages, platforms and devices e.g. with API access, plug-ins for regression software, integration with external applications and mobile phones, etc. She reminded that data is as good as the capacity of the countries which produce it and that working closely with countries to improve their statistical capacities is necessary for the continuous improvement of data. The World Bank works in partnership with the global open data community and provides supports to countries who are willing to launch their own open data initiatives. [See slides]

Philip E. Bourne (UCSD) shared some thoughts from the biomedical sciences and indicated that while there are some success stories, many challenges still need to be addressed e.g. the lack of reproducibility and the unsolved problem of sustainability. He highlighted that change is driven by the community and there should be a perception that the community owns this culture including e.g. transparency and shared ownership, a reward system for individuals and teams, strategic policies on open access and data sharing plans, etc. and critically, the notion of “trust” in the data, which is crucial to the open data initiative. Funders and institutions may not initiate change but they would eventually follow suit: the structural biology community created successful data sharing plans before funders. He emphasised that it is all about openness: no restrictions on the usage of the data beyond attribution, running on open source software and transparency about data usage. [See slides]

Knowledge Sharing in Economics

The second panel, chaired by Eric von Hippel (MIT Sloan School of Management) dealt closer with the discipline of economics, what technological and cultural challenges still exist and what are the possible roles and initiatives. [See audio page].

Joshua Gans (University of Toronto) analysed some of the motives for knowledge contribution – e.g. money, award and recognition, ownership and control, intrinsic motivation, etc. – and addressed other issues like the design and technology problems which could be as important as social norms. He talked about designing for contribution and the importance of managing guilt: since there is a concern that data should be accurate and almost perfect, less data is contributed, so a well-designed system should enable the possibility of contributing imperfect pieces (like Wikipedia and open-source in breaking down contributions). This should be ideally combined with an element of usefulness for the contributors – so that they are getting something useful out of it. He called for providing an easy way of sharing without the hassle and all the questions which come from data users since there are “low hanging fruit” datasets that can be shared. [See slides]

Gert Wagner (German Institute for Economic Research DIW) spoke in his capacity as a Chairman of the German Data Forum, an organisation which promotes production of data, data re-use and re-analysis of data. He pointed out that there is no culture of data sharing in economics: “no credit is given where credit is due” and incentives should be promoted for sharing economics data. So far just funding organisations can enforce data sharing by data producers, but this only happens at the institutional level. For individual authors there is a little incentive to share data. As ways to change this culture, he suggested that there is a need to educate graduate students and early career professionals. In the German Socio-Economic Panel Study, a panel study of private households in Germany, they have been applying the Schrumpeter’s principle: where producers who innovate must educate the consumers if necessary. Along with the workshops which educate the new users in technical skills, they will be also educated to cite the data and give the credit, where credit is due. [See slides]

Daniel Feenberg (National Bureau of Economic Research) gave a brief introduction about NBER, which is a publisher of about a thousand working papers a year, more than a third of which are empirical economics papers of the United States. There is the option to upload data resources in a “data appendix” which are put on the website and available for free. Very few authors, however, take the advantage of being able to publish the data and are also aware that they will get questions if they make their data available. He mentioned that requiring data sharing is only something that employers and funders can mandate and there is a limited role for the publisher. Beside the issues of knowledge sharing design and incentives for individual researchers, there is also the issue of governments sharing data, where confidentiality is a big concern but also where politically motivated unscientific research may inform policy in which case more access and more research is better than less research.

John Rust (Georgetown University) indicated that the incentives for researchers might not be the biggest problem, but there is an inherent conflict between openness and confidentiality and there is a lot of economics research which uses data that cannot be made publicly available. While companies and organisations are often sceptical, risk-averse and not aware of the benefits of sharing their operations data with researchers, they could save money and make profit by research insights e.g. especially in the field of optimising rental and replacement decisions (see e.g. seminal paper by Rust 1987). Appealing to the self-interest of firms and showing success stories where collaborative research has worked can convince firms to share more data. The process of establishing trust and getting data could be aided by trusted intermediaries who can house and police confidential data and have the expertise to work with information protected by non-disclosure agreements.

Sharing Research Data

The panel session “Sharing research data – creating incentives and scholarly structures” was chaired by Thomas Burke (European University Institute Library) and dealt with different incentives and opportunities researchers have for sharing their data: storing it in a curated repository like the ICPSR or a self-service repository like DataVerse. In order to be citable a dataset should obtain a DOI where DataCite provides such a service and where a dataset can be also published with a data paper in a peer-review data journal. [See audio page].

Amy Pienta (The Interuniversity Consortium for Political and Social Research – ICPSR) presented some context about the ICPSR – the oldest archive for social science data in the United States, which has been supporting data archiving and dissemination for over 50 years. Among some of the incentives for researchers to share data, she mentioned the funding agencies’ requirements to make data available, scientific openness and stimulating new research. ICPSR has been promoting data citations and getting more journal editors to understand data citations and when archiving data also capturing how data is being used, by what users, institutions, etc. The ICPSR is also currently developing open access data as a new product, where researchers will be allowed to publish their original data, tied with data citation and DOI, data downloads and usage statistics and layered with levels of curation services. See slides.

Mercè Crosas (Institute for Quantitative Social Science, Harvard University) presented the background of the DataVerse network, a free and open-source service and software to publish, share and reference research data, originally open only to social scientists, it now welcomes contributions from all universities and disciplines. It is completely self-curated platform where authors can upload data and additional documentation, adding additional metadata to make the resource more discoverable. It builds on the incentives of data sharing, giving a persistent identifier, generating automatically a data citation (using the format suggested by Altman and King 2007), providing usage statistics and giving attribution to the contributing authors. Currently DataVerse is implementing closer integration with journals using OJS, where the data resources of an approved paper will be directly deposited online. She also mentioned also the Amsterdam Manifesto on Data Citation Principles, which encourages different stakeholders – publishers, institutions, funders, researchers – to recognise the importance of data citations. See slides.

Joan Starr (DataCite, California Digital Library) talked about DataCite – an international organisation set up in 2009 to help researchers find, re-use and cite data. She mentioned some of the most important motivations for researchers to share and cite data e.g. exposure and credit for the work of researchers and curators, scientific transparency and accountability for the authors and data stewards, citation tracking and understanding the impact of one’s work, verification of results and re-use for producing new research (See more at ESIP—Earth Science Information Partners). Some of the basic service that DataCite provides are DOIs for data (see a list of international partners who can support you in your area). Other services include usage statistics and reports, content negotiation, citation formatter and metadata search where one could see what kind of data is being registered in a particular field. Recently DataCite has also implemented a partnership with Orchid to have all research outputs (including data) on researchers’ profiles. See slides.

Brian Hole (Ubiquity Press) talked about data journal or encouraging data sharing and improving data citations through the publication of data and methodology in data papers. He emphasised that while at the beginning of scientific publications it was enough to share the research findings, today the the data, software and methodology should be shared as well in order to enable replication and validation of the research results. Amongst the benefits of making research data available he mentioned the collective benefits for the research community, the long-term preservation or research outputs, enabling new and more research to be done in a more efficient way, re-use of the data in teaching, ensuring of public trust in science, access to publicly-funded research outputs, opportunities for citizen science, etc. The publication of a data paper where the data is stored in a repository with a DOI and linked with a short data paper which describes the methodology of creating the dataset could be a way to incentivise individual researchers to share their data as it builds up their career record of publications. Additional benefits of having data journals is having a metadata platform where data from different (sub-) disciplines can be collected and mashed up producing new research. See slides.

The Evolving Evidence Base of Social Science

The purpose of the panel on the evolving evidence base of social science, chaired by Benjamin Mako Hill (MIT Sloan School of Management / MIT Media Lab) is to showcase examples of collecting more and better data and making more informed policy decisions about a larger volume of evidence. See audio page.

Michael McDonald (George Mason University) presented some updates on the Public Mapping Project, which involves an open source online re-districting application which the optimises re-districting according to selected criteria and allows for public participation in decision-making. Most recently there was a partnership with Mexico – with Instituto Federal Electoral (IFE) – using redistricting criteria like population equality, compactness, travel distance, respect for municipality boundaries, respect for indigenous communities, etc. A point was made about moving beyond data and having open optimisation algorithms, which can be verified, which is of great importance especially when they are the basis of an important public policy decision like the distribution of political representation across the country. Open code in this context is essential not just for the replication of research results but also for a transparent and accountable government. See slides.

Amparo Ballivian (Development Data Group, World Bank) presented the World Bank project for the collection of high frequency survey data using mobile phones. Some of the motivations for the pilot included the lack of recent and frequently updated data where e.g. poverty rates are calculated on the basis of household surveys, yet such surveys involve a long and costly process of data collection. The aspiration was related to the possibility of having comparable data data every month for thousands of households and being able to track changes in welfare and responses to crisis and having data to help decisions in real time. Two half year pilots were implemented in Peru and Honduras where e.g. it was possible to test monetary incentives, different cellphone technologies and the responses of different income groups. In contrast to e.g. crowd-sourced surveys, such a probabilistic cellphone survey provides the opportunity to draw inferences about the whole population and can be implemented at a much lower cost than the traditional household surveys. See slides.

Patrick McNeal (The Abdul Latif Jameel Poverty Action Lab) presented the AEA registry for randomised controlled trials (RCTs). Launched several weeks ago, sponsored by the AEA, the trials registry addresses the problem of publication bias in economics – setting up a place where a list is available of all ongoing RCTs in economics. The registry is open to researchers from around the world who want to register their randomised controlled trial. Some of the most interesting feedback of researchers includes e.g. having an easy and fast process for registering the studies (just about 17 fields are required), including a lot of information which can be taken from the project documentation, the optional uploading of the pre-analysis plan and the option to hide some fields until the trial is completed in order to address the fear that researches will expose their ideas publicly too early. The J-PAL affiliates who are running RCTs will have to register them in the system according to a new policy which mandates registration and there are also discussions on linking required registration with the funding policies of RCT funders. Registration of ongoing and completed trials is also pursued and training of RAs and PhD students now includes the registration of trials. See the website.

Pablo de Pedraza (University of Salamanca) chairs Webdatanet, a network that brings together web data experts from a variety of disciplines e.g. sociologists, psychologists, economists, media researchers, computer scientists working for universities, data collection institutes, companies and statistics institutes. Funded by the European Commission, the network has the goal of fostering the scientific use of web-based data like surveys, experiments, non-reactive data collection and mobile research. Webdatanet organises conferences and meetings, supports researchers to go to other institutes and do research through short scientific missions, organises training schools, web data metrics workshops, supports early career researchers and PhD students and has just started a working paper series. The network has working groups on quality issues, innovation and implementation (working with statistical institutes to obtain representative samples) and hosts bottom-up task forces which work on collaborative projects. See slides.

Mandating data availability and open licenses

The session chaired by Mireille van Echoud (IViR – Institute for Information Law) followed up on the discussions about making datasets available and citable to focus on the roles of different stakeholder and how responsibility should be shared. Mireille reminded that as the legal instruments like creative commons and open data licenses are already quite well-developed, role of the law in this context is in managing risk aversion and it is important to see how legal aspects are managed at the policy level. For instance, while the new EU Framework Programme for Research and Innovation – Horizon 2020 – carries the flag of open access to research publications, there are already a lot of exceptions which would allow lawyers to contest that data falls under an open access obligation. See audio page.

Carson Christiano (Center for Creative Global Action – CEGA) presented the perspective of CEGA, an inter-disciplinary network of researchers focused on global development, which employs rigorous evaluation techniques to measure the impact of large-scale social and economic development programs. The research transparency initiative of CEGA is focusing on the methodology and motivated by the issues of publication bias, selective presentation of results and inadequate documentation of research projects where a number of studies in e.g. medicine, psychology, political science and economics have pointed out the fragility of research results in the absence of the methods and tools for replication. CEGA has launched an opinion series: Transparency in Social Science Research and is looking into ways to promote examples of researchers, support and train early career researchers and PhD students in registering studies and pre-analysis plans and working in a transparent way.

Daniel Goroff (Alfred P. Sloan Foundation) raised the question of what funders should require of the people they make grants to, those who e.g. undertake economics research. While some funders may require data management plans and making the research outputs entirely open, this is not a simple matter and there are trade-offs involved. The Alfred P. Sloan Foundation has funded and supported the establishment of knowledge public goods, commodities which are non-rivalrous and non-excludable like big open access datasets with large setup costs (e.g. Sloan Digital Sky Survey, Census of Marine Life, Wikipedia, etc.). Public goods, however, are notoriously hard to finance. Thinking about other funding models, the involvement of markets and commercial enterprises where e.g. the data is available openly for free, but value-added services are offered at a charge could be some of the ways to make knowledge public goods useful and sustainable.

Nikos Askitas (Institute for the Study of Labor IZA) heads Data and Technology at the Institute for the Study of Labor (IZA), a private independent economic research institute, based in in Bonn, Germany, focused on the analysis of global labor markets. He challenged the notion that funders must require data availability by the researchers, since researchers are already overburdened and too many restrictions may destroy creativity and result in well-documented mediocre research. The data peer review is also a very different process than a peer review of academic research. He suggested that there is a need to create a new class of professionals that will assist the researchers and which would require proper name, titles, salaries and recognition for their work.

Jean Roth (National Bureau of Economic Research – NBER) mentioned that there has been a lot of interest as well as compliance from researchers when the NSF implemented the data managements plans. Several years ago, she modified the NBER paper submission code to incorporate adding data to submit together with the code and now researchers curate their data themselves where about 5.5% have papers have data available with the paper. A number of the data products from the NBER are very popular in online searches which helps people find the data in a format which is easier to use. As a Data Specialist at the NBER, she helps to make data more usable and to facilitate the re-use by other researchers. Over time the resources and time invested in making data more usable decrease both for the data curator and for the users of data.

The last session concentrated on further steps for the open economics community and ideas which should be pursued.
If you have any questions or need to get in touch with one of the presented projects, please contact us at economics[at]okfn.org.

The AEA Registry for Randomized Controlled Trials

- July 4, 2013 in External Projects, Featured, Open Tools, Trials Registration

The American Economic Association (AEA) has recently launched a registry for randomized controlled trials in economics (https://www.socialscienceregistry.org). The registry aims to address the growing number of requests for registration by funders and peer reviewers, make access to results easier and more transparent, and help solve the problem of publication bias by providing a single place where all trials are registered in advance of their start.

Screenshot of www.socialscienceregistry.org

In order to encourage registration, the process was designed to be very light. There are only 18 required fields (such as name and a small subset of IRB requirements,) and the entire process should take less than 20 minutes. There is also the opportunity to add much more, including power calculations and an optional pre-analysis plan. To protect confidential and other sensitive design information, most of the information can remain hidden while the project is ongoing.

Please contact support [at] socialscienceregistry.org with any questions, comments or support issues.

Looking for the Next Open Economics Project Coordinator

- July 3, 2013 in Announcements, Featured, Open Economics

### Open Economics Project Coordinator

The Open Economics Working Group is looking for a project coordinator to lead the Open Economics project in the next phase. The Open Economics Project Coordinator will be the point of contact for the Working Group and will work closely with a community of economists, data publishers, research data professionals, lawyers and funders to make more data and content in economics open, coordinate the creation of tools which aid researchers and facilitate stakeholder dialogue. Some of the responsibilities include:

  • Coordinating the project through all phases of project development including initiating, planning, executing, controlling and closing the project.
  • Representing Open Economics Working Group at local and international events, point of contact for the Working Group.
  • Leading communications: Responsible for communications with the Working Group members, the community, interested individuals and organisations, point of contact for the project PI and the Advisory Panel arranging the details of conference calls and leading communication with individual AP members and their participation in the workshop and other activities.
  • Community coordinator: Writing news to the mailing list, and using social media to promote activities to the network and beyond, maintaining the website of the Open Economics project: planning design, content and presentation of the project and the Working Group, organising and coordinating online meetings / online sprints and other online communication.
  • Maintaining the website: Inviting and supervising contributions to the blog, actively searching authors, setting agenda for presented content and projects, blog author: putting together content for the blog: surveying relevant projects, publishing news about forthcoming events and documentation (slides, audios, summary) of past events and activities
  • Point of contact for the project, responsible for collaboration and communication to other projects within the Open Knowledge Foundation.
  • Preparing reports: Writing both financial and substantive midterm and final report for the funder as well as weekly reports for the project team.
  • Point of contact and support for the Open Economics fellows: Planning and supervising the recruitment process of the fellows, maintaining regular contact with the fellows, monitoring progress of the fellows’ projects and providing necessary support.
  • Event development and management: concept, planning, research on invitees and relevant projects, programme drafting, sending and following up on invitations, event budgeting, organising the entire event.

#### Person specification

  • Someone self-driven, organised and an excellent communicator. This person should be comfortable running a number of initiatives at the same time, speaking at events and travelling.
  • Having background in economics and knowledge of quantitative research and data analysis.
  • Preferably some knowledge of academic research and some familiarity with stakeholders in the area of economics research.
  • Be comfortable with using online communication and working from different locations.
  • Having ability to engage with community members at all levels – from senior academics to policy-makers, developers, and journalists.
  • #### Location
    We will consider applicants based anywhere in the world; however a mild preference is given to those close to one of our hubs in London, Berlin or Cambridge.

    #### Pay & closing date
    The rate is negotiable based on experience. The closing date for applications is July 15, 2013.

    ####How to apply
    To apply please send a cover letter highlighting relevant experience, your CV and explaining your interest in the role to [email protected]