Politique de confidentialité

Introduction

L'équipe de chercheurs et de développeurs de PADI-web s'engage à protéger et à respecter votre vie privée. Ces déclarations de confidentialité (rédigées en anglais) expliquent la raison des traitements, la manière dont nous collectons, traitons et assurons la protection de toutes les données personnelles fournies, ainsi que comment ces informations sont utilisées et quels droits vous pouvez exercer en relation avec vos données.

Ces déclarations concernent le suivi des actualités en ligne par PADI-web. PADI-web effectue une surveillance automatique des médias en ligne afin de fournir des services de surveillance des médias au personnel de renseignement sur les épidémies et aux chercheurs des institutions nationales et internationales dans le domaine de la santé animale, de la santé publique, de la santé des plantes et de la sécurité alimentaire.

Privacy statements

Monitoring of news

 

Introduction

Creation

21/07/2022

Keywords

Text mining, media disease surveillance, outbreaks, animal health, public health, plant health, food security

Last update

26/11/2022

Language

English

Unit

UMR TETIS & UMR ASTRE

Target Population

Epidemic intelligence practitioners and researchers

Controller

Sylvain Villaudy <sylvain.villaudy@cirad.fr>

DPO Notes

-

DPO

DPO <dpo@cirad.fr>

Processing

Name of the processing

Monitoring of online media news

Description

PADI-web performs automatic monitoring of online media in order to provide media monitoring services to epidemic intelligence staff and researchers across national and international institutions in the domain of animal health, public health, plant health, and food security. For these purposes, online media sources are monitored (currently Google News) and the text of news articles is downloaded and processed. The processing includes identification of news articles, categorisation, identification of places, dates, hosts of diseases, clinical signs mentioned in the text, indexing and presentation of the results through the PADI-web website interface. Names of persons in the content of texts are identified by a Named Entity Recognition tool (i.e. spaCy) integrated into PADI-web

The PADI-web processing chain consists of several modules. The main modules are as follows:

  1. Scraper: The Scraper module visits a pre-defined list of Google news feeds searching for news content.
  2. Categorization: The categorization module relies on machine learning methods that categorize news according to relevance to an outbreak (relevant vs irrelevant) and categorize sentences based on sanitary topics (outbreak declaration, consequences, alert, preparedness, general epidemiological information and other information).
  3. Entity Recognition: The entity recognition module detects probable mentions of diseases, hosts, clinical signs, dates and outbreak-related keywords within the news text storing them in a cache. A human moderation step is included allowing for amendments or deletion of entities.
  4. Entity Matcher: The entity matcher module identifies known entities in the news items using the entity database. These entities have previously been recognized by the entity recognition module mentioned above.
  5. Name removal: A process obfuscates all the detected names mentioned in texts by replacing them with ‘*’ characters. This module exploits the Entity Recognition module.
  6. Geolocation: The geolocation module matches known geolocations in the text (using the Geonames web API) and aims at identifying the most relevant geolocations in each news item.
  7. Deduplication: The deduplication module identifies news items that are mostly identical to previously retrieved news items. Not implemented at this stage in PADI-web.
  8. Filtering: The filter module allows for filtering for keywords in the text, title, language, country (country in which the article was published or country mentioned in the text), publication dates (from-to period), relevance categorization per news article, fine-grained categorization per sentence, sources, sources type, reliability of the sources, or combinations of categories on the PADI-web interface.

There are several platforms that use the results generated by the above-mentioned PADI-web processing chain, comprising:

  1. ISID, ProMED-mail module – Manually curated system for analysing both traditional media and official sources for early warning and alerting of disease outbreaks across the world.
  2. MOOD platform– A map view of PADI-web data highlighting the latest places of outbreak-related information and media trend analysis for the European region.

Automated / Manual operations

The processing operations are performed automatically, except for:

  • Google News RSS feeds, manually added to the system
  • List of entities (diseases, clinical signs, hosts), manually added to the system
  • Selection of articles (disease, geographical region, type of content) distributed through email alerts
  • Names of person in the content of texts are identified by a Named Entity Recognition tool (i.e. spaCy) integrated into PADI-web
  • Filtering module for the PADI-web interface

Storage

The data gathered and generated by the system is stored electronically in files and in a database allowing searches of the news texts, on servers in CIRAD.

Purpose & legal basis

Purposes

The purpose of the processing is to:

·         provide multilingual media monitoring services to epidemic intelligence staff and researchers across national and international institutions in the domain of animal health, public health, plant health, and food security, with near real-time information on emerging topics of current interest;

·         remove names of real people from the texts stored in the database;

·         provide email notifications, related to topics of interest, to recipients who have requested them;

·         allow searching for historical news articles and support the analysis of spatial-temporal trends in reporting over many years; and

·         perform research in text mining, natural language processing, and computational linguistics.

Data subjects and Data Fields

Data subjects and Data Fields

No personal information contained in online news reporting from the sources monitored is collected, such as persons mentioned in the media: politicians, journalists, persons of public interest, persons considered newsworthy, etc. No specific processing is applied to this information if present.

Rights of Data Subject

Procedure to grant rights

Names of persons in the content of texts are identified by a Named Entity Recognition tool (i.e. spaCy) integrated into PADI-web.  However, if any person would like to check or have access to eventual personal information not detected by the Named Entity Recognition tool, he/she may do so by contacting the controller.  Upon request, the controller will manually anonymize personal information in the concerned news collected by PADI-Web.

Retention

News article text – Data retention time is unlimited for research purposes to study long-term spatial-temporal trends of topics of relevance to epidemic intelligence staff and researchers across national and international institutions in the domain of animal health, public health, plant health, and food security.

Time limit

The controller will reply to all queries from data subjects within 10 working days.

Recipients

Recipients

Data are available only to registered (authorised) users, checked through login and password through the PADI-web website.

Transfer out of UE/EEA

Not applicable.

Security measures

Technical and organizational measures

Processing and data storage is performed on servers located in a secure data centre in CIRAD to which physical access requires specific authorisation. Access to the data is available only to authorised users from CIRAD or PADI-web team, checked through login and password.

Complementary information

PADI-web described above is currently used in routine by the French Epidemic Intelligence team in animal health. Research and development of the PADI-web by the CIRAD team of researchers is constantly evolving, upgraded and updated in order to meet new challenges, such as dealing with fake news, and extraction of fine-grained outbreak information from the news, as well as spatial-temporal modelling of the outbreak-related events and news for improved epidemic intelligence.

Main users of PADI-web are (as of June 2022):

Data on users of media monitoring tools

Introduction

Creation

21/07/2022

Keywords

Text mining, media disease surveillance, outbreaks, animal health, public health, plant health, food security

Last update

26/11/2022

Language

English

Unit

UMR TETIS & UMR ASTRE

Target Population

Epidemic intelligence practitioners and researchers

Controller

Sylvain Villaudy <sylvain.villaudy@cirad.fr>

DPC Notes

-

DPO

DPO <dpo@cirad.fr>

Processing

Name of the processing

Data on users of PADI-web media monitoring tool

Description

PADI-web performs automatic monitoring of online media in order to provide media monitoring services to epidemic intelligence staff and researchers across national and international institutions in the domain of animal health, public health, plant health, and food security. For these purposes, online media sources are monitored (Google News) and the text of news articles is downloaded and processed.

In general, there are two groups of services: those which require registration and those which do not.

Automated / Manual operations

All operations are automatic, except for system configuration and the selection of articles for moderated newsletters which are performed manually.

Storage

The data gathered and generated by the system is stored electronically in files and in a database allowing searches of the news texts, on servers in CIRAD.

Purpose & legal basis

Purposes

The purpose of the processing is to:

  • provide multilingual media monitoring services to epidemic intelligence staff and researchers across national and international institutions in the domain of animal health, public health, plant health, and food security, with near real-time information on emerging topics of current interest;
  • provide email notifications, related to topics of interest, to recipients who have requested them.
  • allow searching for historical news articles and support the analysis of spatial-temporal trends in reporting over many years; and
  • perform research in text mining, natural language processing and computational linguistics.

Legal basis and Lawfulness TBC

Data subjects and Data Fields

Data subjects

Registered users of PADI-web. The system provides an address book function which may be used by an authorised developer by CIRAD, or PADI-web team to maintain a list of recipients of newsletters and notifications, which may be sent by email. These recipients are also data subjects.

Data fields / Category

For all users, the IP address from which the system is accessed and the pages accessed are logged for statistical and debugging purposes.

For registered users the following fields are processed:

  • e-mail address
  • first name
  • surname
  • profile picture (optional)
  • subscription to an e-mail alert and sources of interest (optional)
  • date of the last access to their account

Rights of Data Subject

Procedure to grant rights

Data subjects can request access to any information held by the controller through the contact details supplied in this notification, in the privacy statement provided at the PADI-web website. Email messages provide a link through which the user is able to unsubscribe directly.

Retention

Logfiles are retained for one year. Data on registered users is retained for one year after the last access, after which the user is requested to confirm their wish to keep the account; if this confirmation is not received the account and all associated personal data are deleted. The period of one year was chosen to balance the need to remove data quickly with providing a good service to occasional users who expect their accounts to be maintained.

Time limit

The controller will reply to all queries from data subjects within 10 working days.

Historical purposes

Not applicable.

Recipients

Recipients

Data is available to the CIRAD PADI-web team providing, supporting and securing the services requested by the data subjects.

Transfer out of UE/EEA

Not applicable.

Security measures

Technical and organizational measures

Processing and data storage are performed on servers located in a secure data centre in CIRAD to which physical access requires specific authorisation. Access to the data is available only to authorised users from CIRAD or PADI-web team, checked through login and password.

Complementary information

-