Objective: To create practical recommendations for the curation of routinely collected health data and artificial intelligence (AI) in primary care with a focus on ensuring their ethical use.
Methods: We defined data curation as the process of management of data throughout its lifecycle to ensure it can be used into the future. We used a literature review and Delphi exercises to capture insights from the Primary Care Informatics Working Group (PCIWG) of the International Medical Informatics Association (IMIA).
Results: We created six recommendations: (1) Ensure consent and formal process to govern access and sharing throughout the data life cycle; (2) Sustainable data creation/collection requires trust and permission; (3) Pay attention to Extract-Transform-Load (ETL) processes as they may have unrecognised risks; (4) Integrate data governance and data quality management to support clinical practice in integrated care systems; (5) Recognise the need for new processes to address the ethical issues arising from AI in primary care; (6) Apply an ethical framework mapped to the data life cycle, including an assessment of data quality to achieve effective data curation.
Conclusions: The ethical use of data needs to be integrated within the curation process, hence running throughout the data lifecycle. Current information systems may not fully detect the risks associated with ETL and AI; they need careful scrutiny. With distributed integrated care systems where data are often used remote from documentation, harmonised data quality assessment, management, and governance is important. These recommendations should help maintain trust and connectedness in contemporary information systems and planned developments.