Research Data Management

Effective research data management ensures integrity, reproducibility, and accessibility. A structured approach to organizing, documenting, and securely storing data ensures the reuse and validation of research data. This document outlines the key steps in research data management, aligning with Latvia’s Open Science Strategy and EKA standards—from research data planning to sharing and preservation.

Planning and the Data Management Plan (DMP)

The first step is to develop a comprehensive Data Management Plan (DMP) - a document describing how research data will be collected, organized, stored, shared, and preserved, ensuring accessibility and compliance with scientific standards.

Main DMP sections:

Data Collection: What data will be collected, methods used, and formats.

Example: Interview recordings in audio format, lab measurement data in Excel.

Metadata Standards: How the data will be described and structured for easy discovery and reuse.

Example: Dublin Core metadata standard for social science research.

Storage and Backups: Where and how data will be stored securely.

Example: Main data on institutional servers, backup in cloud storage.

Data Sharing and Access: Rules and restrictions on data access and sharing.

Example: Anonymized data publicly available on Zenodo; full data on request.

Ethical and Legal Aspects: Compliance with ethical and legal norms (e.g. informed consent, GDPR compliance).

Example: All participants sign informed consent; data anonymized before publication.

Latvia’s Open Science Strategy advocates for standardized practices throughout the full data lifecycle, emphasizing open access and reusability of research data to foster transparency, innovation, and international collaboration. The DMP plays a crucial role in this by systematically describing all stages of data handling in alignment with the FAIR principles.

DMPs can be self-developed or created using online templates provided by institutions or funders. Templates often include predefined questions and categories based on FAIR principles to ensure compliance with ethical, legal, and institutional requirements.

Example: The ARGOS online tool for creating DMPs, recommended for projects funded by the Latvian Council of Science.

DMPs are not static - they may evolve during the research process due to discoveries, methodology changes, or funder requirements. It′s important to review and adapt the plan regularly.

Example: Initially planning to use XLSX, the researcher later decides that CSV format is more effective and updates the DMP accordingly.

FAIR Data Principles

The FAIR principles ensure data are:

Findable
Accessible
Interoperable
Reusable

These principles guide the creation and management of research data to maximize usability and sustainability. FAIR data is one of the three main pillars of Latvia’s Open Science Strategy.

Findable – Data must be easily discoverable, described with detailed metadata, and accessible through registered or indexed search resources such as research data repositories. Digital objects should have a unique, internationally recognized, and persistent identifier (e.g., DOI) to ensure long-term accessibility and traceability.
Example: A dataset uploaded to the public data repository Zenodo, with its own DOI.

Accessible – Data must be accessible under specific conditions and with clearly defined access permissions. FAIR data does not mean that all data must be openly available. However, metadata should remain accessible even when the data itself is no longer available.
Example: The data is stored in an open-access repository where anyone can freely access and download it without registration.

Interoperable – Data must be structured and usable across different systems by using standardized, non-proprietary file formats and standardized terminology, ensuring data integration and comparability. References must also be clearly indicated to trace links between datasets and scientific results.
Example: Medical researchers publish patient health data in the FHIR (Fast Healthcare Interoperability Resources) format, allowing use across various healthcare systems.

Reusable – To enable reuse, data must be understandable to others. This is ensured through rich metadata, comprehensive documentation, and clear licensing terms that define conditions for reuse.
Example: A dataset is prepared and published under a CC0 license, meaning other researchers can freely analyze and use the data in their studies without restrictions.

The Data Management Plan (DMP) is closely linked to the FAIR principles (Findable, Accessible, Interoperable, Reusable) as it provides a structured approach to organizing, storing, and sharing data.

Findable – The DMP includes the addition of metadata and the selection of appropriate repositories to ensure data can be easily discovered.
Accessible – The plan defines how and where the data will be accessible, including access rights and long-term storage.
Interoperable – The DMP supports the selection of standardized formats and metadata to ensure the data can be used across various systems and scientific disciplines.
Reusable – The plan outlines data quality control, licensing, and documentation to ensure the data is understandable and usable by other researchers.

Thus, the DMP helps ensure responsible data management and compliance with the FAIR principles, promoting scientific transparency and sustainability.

Data Collection

Accurate and consistent data collection is essential for obtaining reliable research results. Researchers should adhere to standardized protocols and methodologies within their field to ensure data quality and integrity.

Key steps in the data collection process:

Preparation: Before starting data collection, ensure that all necessary tools and materials are available and functioning correctly.
Example: In a laboratory, measuring instruments should be ready; in a survey study, printed or digital questionnaires should be prepared.
Standardization: Use consistent data collection methods and formats so that the research team works according to the same principles.
Example: Use the same set of questions for all interviews, and identical measurement protocols for experiments.
Documentation: Keep a detailed record of the data collection process, including any deviations from the standard protocol and any problems encountered.
Example: If a sensor briefly stops working during measurements, this should be noted in the records.

Data Processing and Cleaning

Once data has been collected, it must be processed and checked to ensure it is accurate and complete.

This process includes:

Data Entry – Enter data into an electronic system carefully and accurately. If necessary, use double data entry to reduce the likelihood of errors.
Example: Survey responses are entered into an Excel file, and another researcher checks for mistakes.
Data Cleaning – Identify and correct errors, inconsistencies, or missing values.
Example: If a survey lists "250 years" in the age field, this is an incorrect entry and should be corrected or removed.
Data Transformation – Adapt data for analysis, such as standardizing units or coding qualitative data into numerical format.
Example: If height data is recorded in both centimeters and inches, convert all measurements to centimeters to ensure consistency.

Data Analysis

Data analysis involves applying statistical and computational methods to understand the data and draw well-founded conclusions.

This process includes:

Choosing the Method of Analysis – Select the analytical method that best aligns with the research objectives and the type of data.
Example: If you′re studying student academic performance across different groups, you might use a comparison of means (t-test). If you′re analyzing a large text dataset, natural language processing may be more suitable.
Performing the Analysis – Carry out the analysis using appropriate software tools and statistical packages, ensuring transparency and reproducibility.
Example: Use SPSS, R, or Python for analyzing survey data, such as conducting correlation analysis or building a regression model.
Reviewing and Validating Results – To ensure reliability, validate the results using different methods.
Example: If you find that a certain variable significantly influences the outcome, you can verify this conclusion using an independent dataset or by asking other researchers to perform a similar analysis.

Data Storage and Backup

Secure and reliable data storage is essential to prevent data loss and ensure long-term accessibility.

Key considerations for data storage:

Primary Storage – Choose reliable storage solutions such as institutional servers, cloud storage, or specialized research data repositories.
Example: Storing research data on a university′s secure server or using a trusted cloud service like Google Drive or Dropbox for easy access and management.
Backup – Set up regular backups to create data duplicates in multiple locations, minimizing the risk of data loss in case of system failure.
Example: Regularly backing up research data on an external hard drive or a secondary cloud storage platform.
Security – Use security measures such as encryption and access restrictions to protect sensitive data from unauthorized access.
Example: Encrypting data files and using password protection for shared files or access control systems on storage platforms. When possible, multi-factor authentication is recommended (e.g., one-time codes via apps like Google Authenticator).

Data Publication and Accessibility

The availability and publication of research data promote scientific progress by ensuring that research results are verifiable, reliable, and reproducible. Published data allow other researchers to validate and reuse them, saving time and resources, and fostering new discoveries. Accessible data support international collaboration and help address global challenges. They increase public trust in science by providing transparency and making data accessible to a broader audience, including the general public, businesses, and policymakers. Moreover, many funders and institutions require data publication to comply with the FAIR principles (Findable, Accessible, Interoperable, Reusable), making scientific research more efficient and sustainable.

This phase includes:

Repository Selection: Choosing appropriate data repositories for storing and sharing data, considering factors such as disciplinary standards and repository policies.
Example: A researcher may use the international repository Zenodo, which allows scientists to share data globally, or a national or institutional repository if available.
Data Documentation: Providing clear and detailed metadata that explains the data, its context, and the methodologies used.
Example: In a climate change study, include metadata on geographic location, time period, and the methodology used to collect temperature data.
Licensing: Applying appropriate licenses to define usage terms and allow other researchers to reuse the data.
Example: Using a Creative Commons license such as CC BY allows others to freely use and share the data, provided they credit the original source.

Researchers must ensure that data sharing is secure, responsible, and aligned with data subject rights and research ethics.

Data Availability by Level of Openness:

Open Access: Data is freely available to everyone without restrictions.
Example: Research data published in open data repositories (e.g., Zenodo, Dryad, OSF).
Restricted/Closed Access: Access is granted only upon request and approval; metadata is available. For restricted datasets, users often need to submit applications, sign agreements, or undergo ethical reviews to ensure responsible use and compliance with privacy laws.
Example: Access requires registration and submission of additional information.
Embargo Period: Data becomes open after a specified period; metadata is available in the meantime.
Example: A published article used specific research data. The repository includes a description and metadata of the dataset, along with a date indicating when the data will become available to others.

According to the Latvian Open Science Strategy Guidelines, research data should be open by default, and if they are not published, there must be a justified reason. Non-disclosure of data may be justified in cases where the data contains sensitive information, there are legal or ethical restrictions, or the data volume is so large that its distribution involves significant costs.

The availability of research data often depends on the funding body, as funders set data management requirements, open access policies, and publication conditions. Some funders require that data be made freely available after the end of a project, while others may impose restrictions due to intellectual property protection or confidentiality concerns. The funding source can also influence how long the data must be stored and on which platforms it should be accessible.

Example: Projects funded by the European Commission under Horizon Europe require that research data be published in open access data repositories in accordance with the FAIR principles. Researchers must ensure that data is openly accessible after the project ends, unless there are legal or ethical restrictions. In contrast, if research is funded by a private company, it may require that data remains confidential or accessible only to a limited group of users, such as project partners.

Long-Term Data Preservation

Long-term data preservation, or archiving, ensures that research data remains accessible and usable for future generations.

Key steps in data preservation:

Selection of Archival Repositories: Store data in reliable, long-term archival repositories that offer ongoing maintenance and preservation services.
Example: Storing data on platforms like Zenodo or national repositories that ensure long-term accessibility and regular updates.
Format Selection: Choose formats that are openly available and usable without the need for specific software or paid licenses, ensuring broader access and long-term data preservation.
Example: Using formats such as CSV for data tables.
Regular Review: Periodically review and update data and its documentation to keep up with technological changes or evolving research practices.
Example: Updating metadata or data formats to maintain compatibility with new software tools or emerging standards in the field.

Information sources

European Union Open Science Strategy https://research-and-innovation.ec.europa.eu/strategy/strategy-research-and-innovation/our-digital-future/open-science_en
Latvia′s Open Science Strategy 2021 – 2027 https://www.izm.gov.lv/lv/media/17072/download
Practical Guide for International Coordination of Research Data Management https://www.scienceeurope.org/media/4brkxxe5/se_rdm_practical_guide_extended_final.pdf
FAIR Data Principles https://www.go-fair.org/fair-principles/
Research Data Lifecycle https://rdmkit.elixir-europe.org/data_life_cycle
Research Data Management https://ukdataservice.ac.uk/learning-hub/research-data-management/
Data Licensing https://data.europa.eu/elearning/en/module4/#/id/co-01
Creative Commons Licenses https://creativecommons.org/share-your-work/cclicenses/
How to Create a Data Management Plan https://www.openaire.eu/how-to-create-a-data-management-plan
Research Data Management Plan Tool ARGOS https://argos.openaire.eu/home
Workshop on Creating Data Management Plans for FLPP and VPP Projects https://www.lzp.gov.lv/lv/jaunums/seminars-par-datu-parvaldibas-planu-izveidi-flpp-un-vpp-projektiem?utm_source=https%3A%2F%2Fwww.google.com%2F

RESEARCH DATA MANAGEMENT

Ask a question

Cookies