CoDiet Research Opportunity

We are looking for volunteers to verify how good our algorithms are in recognising words that relate to biomedicine.

The volunteers will use software to check whether our annotations in scientific articles are good.

Depending on the level of contribution, you will be acknowledged in the dataset (all annotators), as part of the acknowledgements in a peer reviewed paper (most annotators), and co-author (see below for what decides authorship).

More details of why we ask you this is below to inform your decision whether you can spend some time to help out.

What is CoDiet?

The CoDiet (Combating diet related non-communicable disease through enhanced surveillance) is an EU HORIZON/UKRI funded project involving 17 partners across 10 countries.

This project, lead at Imperial by Prof Gary Frost, strives to develop new methodologies to address the current gaps in our knowledge and lead to establishing a tool that will assess diet-induced non-communicable disease (NCD) risk.

Work package (WP) 1 concerns the development of AI driven literature searching tools to bring clear understanding of the large global literature of the physiological and metabolic links between diet and NCD.

What is CoDiet aiming to do?

We will develop new methods that are able to ‘read’ a scientific article and pinpoint which words relate to different biomedical entities such as genes, diets, study methodologies, lipids, foods, microbiota, and many more.

WP1 is using natural language processing (NLP) tools to automatically read scientific articles, however not all algorithms are equally accurate.

Current NLP algorithms (ChatGPT included) have gaps in their capabilities and, in order to create better algorithms, we need better data. Better human-verified data to be exact.

We want to create a set of 1,000 Open Access (OA) articles in which we have manually annotated and verified all biomedical entities of relevance to CoDiet that will allow us to test our new algorithms, but also allow others to improve their algorithms.

How can you contribute and get involved, and what expertise is needed?

We have used existing algorithms to annotate all entities we could find in the 1,000 OA articles, this is considered ‘silver standard’.

Just as with the Olympics, we are aiming for gold: annotations that have been verified by at least 2 human experts (hence 2,000 articles need to be read).

We are using software accessible in a web browser to help you annotate words of importance in a document.

This does not mean you need to read every single word of an article, but ‘scan diagonally’ to spot any missed entities and to check if what we annotated is actually correct.

You do not need to be an expert in everything/anything, but having general biochemistry knowledge will be useful for most categories.

Alternatively we are also looking at computational methods (statistical, machine learning) and study methodologies so those with numerical expertise can focus on a subtask.

What do you gain from helping?

Anyone helping with the annotation process will be credited in the final dataset with all annotations, i.e. your email will be recorded against the records you verified. This is independent of how many annotated articles you contribute.

We will also submit this dataset for publication in a peer reviewed journal in Q2 2024, all annotators that have annotated at least 25 articles will be credited in the list of acknowledgements of the paper.

The top annotator of each institution (measured by number of articles) will be included as author on the publication, as long as they have annotated a minimum of 50 articles, as well as any annotator from any institution that has successfully annotated over 100 articles where little arbitration was required.

The software we used for annotation also tracks to what extent the two annotators are in agreement, ideally they agree, when they are not we will use an independent arbiter that will judge which annotation/annotator is correct.

We will order the annotation authors based on the number of articles and number of agreed/verified annotations.

When do we need your help?

First we will do a training session via Teams for anyone interested to annotate any number of documents on the 3rd of November, we will record the session, however it will involve a live demo of the software.

This is the perfect time for you to experience what it will be like and actually try it out, to set you up with an account we would need to know your email address, so please send an email to Joram Posma before 4pm on 2nd of November.

After this session everybody that wants to join is asked to confirm roughly how many articles they want to help with, and from the second week of November to mid-January we will open the annotation software.

We will then assign people to articles, we can take into account time preferences as well so that people work on the same batch of articles at the same time.

How long does it take?

This ultimately depends on you, however we anticipate it should not take longer than 30 minutes per article.

Most of our volunteers so far have elected to do either 50 or 100 articles each, but you are welcome to do more if you have time.

We do have to pre-assign people to an article, and we will do this in batches of 25 articles, so that once you are done we can assign you to more articles.

I.e. if you want to help out with 25 articles, over 9 weeks is on average under 3 articles per week of effort.

Please get in touch if you require any additional information prior to the training session, please confirm via email so that we can create you an account.

Many thanks on behalf of CoDiet and WP1 in particular,

Antoine Lain (Imperial), Tim Beck (Nottingham), Marek Rei (Imperial), Gary Frost (Imperial) and Joram Posma (Imperial)