Usage

Zenodo usage documentation for developers.

Running

Starting a development server is as simple as (note, if you are using docker, simply run docker-compose up):

$ zenodo run

Celery workers can be started using the command:

$ celery worker -A zenodo.celery -l INFO

You enable debug mode by setting the FLASK_DEBUG environment variable:

$ export FLASK_DEBUG=1

An interactive Python shell is started with the command:

$ zenodo shell

or, if you prefer to use IPython, you can start an interactive IPython shell using the command:

$ zenodo konch

Configuration

Out-of-the-box Zenodo is configured to run in a local development environment with all services running on localhost. Some Zenodo features are dependent on external services, which by default are not configured - e.g. ORCID/GitHub sign-in. In order to make these features work, please follow the guide below for how to configure them.

Instance configuration

You can configure your specific instance by either setting environment variables (prefixed with APP_) and/or using the instance configuration file (recommended) located at:

${VIRTUAL_ENV}/var/instance/invenio.cfg

Recaptcha

To enable Recaptcha on the sign up page, you need to get a public and private key from https://www.google.com/recaptcha/ and add them to your configuration:

RECAPTCHA_PUBLIC_KEY = '...'
RECAPTCHA_PRIVATE_KEY = '...'

ORCID Login

In order to enable ORCID login you must get an OAuth client id and client secret from ORCID and add them to:

ORCID_APP_CREDENTIALS = dict(
    consumer_key='...',
    consumer_secret='...',
)

GitHub Login

In order to enable GitHub login you must get an OAuth client id and client secret from GitHub (register a new application on e.g. https://github.com/settings/developers) and add them to:

GITHUB_APP_CREDENTIALS = dict(
    consumer_key='...',
    consumer_secret='...',
)

For the GitHub integration to work with a self-signed SSL certificate you need to set (only use this during development):

GITHUB_INSECURE_SSL = True

For production instances, you should set the following shared secret (note, do not use your SECRET_KEY for this):

GITHUB_SHARED_SECRET = '...'

Last, set the URL template on which GitHub will send webhook events:

GITHUB_WEBHOOK_RECEIVER_URL = \
    'http://example.org/' \
    'api/hooks/receivers/github/events/?access_token={token}'

Note, {token} will be interpolated with the user’s OAuth access token.

GitHub for localhost

For development machines runnning Zenodo on localhost and/or behind firewalls, you also need a public IP address which GitHub can reach you on. You can achieve this by using a service such as ngrok:

  1. Go to https://ngrok.com and sign-up with your GitHub account.

  2. Install ngrok (e.g. brew cask install ngrok on OS X).

  3. Get the authtoken from the dashboard and start a tunnel to localhost:5000:

    $ ngrok authtoken <your-authtoken>
    $ ngrok http 5000
    
  4. Grab the public ngrok address (e.g. http://<id>.ngrok.io) and set the GITHUB_WEBHOOK_RECEIVER_URL configuration variable:

    GITHUB_WEBHOOK_RECEIVER_URL = \
        'http://<id>.ngrok.io/' \
        'api/hooks/receivers/github/events/?access_token={token}'
    

DataCite DOI minting

For DOI minting to work you must provide the DataCite prefix and credentials like this:

PIDSTORE_DATACITE_USERNAME = '...'
PIDSTORE_DATACITE_PASSWORD = '...'
PIDSTORE_DATACITE_DOI_PREFIX = '10.5072'

Google Site Verification

If you want to use Google Webmasters toolkit you can add the site verification id in to the templates by setting the following configuration:

GOOGLE_SITE_VERIFICATION = ['<id1>', '<id2>', ...]

Elasticsearch

If you need to configure Elasticsearch to connect to an ES cluster with HTTPS proxy using HTTP Basic authentication it can be done like this:

SEARCH_ELASTIC_KWARGS = dict(
    port=443,
    http_auth=('myuser', 'mypassword'),
    use_ssl=True,
    verify_certs=False,
)
SEARCH_ELASTIC_HOSTS = [
    dict(host='es1.example.org', **SEARCH_ELASTIC_KWARGS),
    dict(host='es2.example.org', **SEARCH_ELASTIC_KWARGS),
    dict(host='es2.example.org', **SEARCH_ELASTIC_KWARGS),
]

PostgreSQL, RabbitMQ, Redis

In case you want to use a remote database, broker and cache you can change the defaults using the following configuration variables:

SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://dbhost/db'
REDIS_URL = 'redis://redishost:6379'
BROKER_URL = 'amqp://rabbitmqhost:5672/myvhost"

ACCOUNTS_SESSION_REDIS_URL = '{0}/0'.format(REDIS_URL)
CACHE_REDIS_URL = '{0}/0'.format(REDIS_URL)
CELERY_RESULT_BACKEND = '{0}/1'.format(REDIS_URL)

Storage

You can configure the default storage location using the configuration variable:

FIXTURES_FILES_LOCATION = \
 'root://eospublic.cern.ch//eos/zenodo/prod/data/'

In case you need XRootD support, please ensure that XRootDPyFS have been installed by e.g. installing Zenodo with the xrootd extras:

$ pip install -e .[postgresql,elasticsearch2,xrootd]

Sentry

If you would like error logging to Sentry, set the configuration variable:

SENTRY_DSN = 'https://user:pw@sentry.example.org/'

Theme

Piwik analytics can be configured with the configuration variable:

THEME_PIWIK_ID = 123

You can add a message to all pages, in order to show that a certain instancen is not a production instance, e.g.”

THEME_TAG = 'Sandbox'

Assets

For non-development installation be sure to set the static file collection to copy files instead of symlinking:

COLLECT_STORAGE = 'flask_collect.storage.file'

Metrics

Zenodo uses the Invenio-Metrics module to compute application KPIs at given intervals and send it to the CERN monitoring infrastructure.

METRICS_XSLS_API_URL = "http://xsls-dev.cern.ch"
METRICS_XSLS_SERVICE_ID = "myid"

StatsD

Zenodo uses StatsD to measure request performance.

STATSD_HOST = "localhost"
STATSD_PORT = 8125
STATSD_PREFIX = "zenodo"

Proxy configuration

In order for Zenodo to correctly determine a client’s IP address, you must set how many proxies are in-front of the application (Zenodo production has e.g. two proxies in front - HAproxy and Nginx):

WSGI_PROXIES = 2

Vocabularies

Zenodo relies on external vocabularies/authorities for linking records to funders/grants and licenses. Since some of the vocabularies can be rather big, the actual important is done using the task queue. Hence, before executing any of the commands below, please first start Celery (see Running).

Licenses

Licenses are imported from opendefinition.org:

(zenodo)$ zenodo opendefinition loadlicenses

Funders and grants

Funders are imported from FundRef. Currently the dataset contains more than 10.000 funders:

(zenodo)$ zenodo openaire loadfunders

Grants are imported from OpenAIRE. Currently the full dataset contains more than 600.000 grants spread over a handful of funders. You can harvest grants selectively from the funders you need:

(zenodo)$ zenodo openaire loadgrants --setspec=FP7Projects

The --setspec option should be one of the following:

  • Australian Research Council: ARCProjects
  • European Commission FP7: FP7Projects
  • European Commission Horizon 2020: H2020Projects
  • European Commission: ECProjects (contains both FP7Projects and H2020Projects)
  • Foundation for Science and Technology, Portugal: FCTProjects
  • National Health and Medical Research Council: NHMRCProjects
  • National Science Foundation: NSFProjects
  • Wellcome Trust: WTProjects