联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2019-09-09 10:45

INFS7410 Project - Part 1 - v3

Note: these instructions have been modified on 28/08/2019

Preamble

The due date for this assignment is 29 August 2019 17:00 5 September 2019 17:00, Eastern

Australia Standard Time (extended from 29/08) 19 September 2019 17:00 Eastern Australia

Standard Time, together with part 2.

This project is worth 5% of the overall mark for INFS7410. A detailed marking sheet for this

assignment is provided at the end of this document.

We recommend that you make an early start on this assignment, and proceed by steps. There are

a number of activities you make already tackle, including setting up the pipeline, manipulating

the queries, implement some retrieval functions and perform evaluation and analysis. There are

some activities you do not know yet how to perform, in particular the implementation of the rank

fusion algorithms: this will be the topic of the week 5 lecture and tutorials.

Aim

Project aim: The aim of this project is to implement a number of information retrieval methods,

evaluate them and compare them in the context of a real use-case.

Project Part 1 aim

The aim of part 1 is to:

setup the evaluation infrastructure, including collection and index, topics, qrels

implement common information retrieval baselines

implement ranking fusion methods

evaluate, compare and analyse baseline and ranking fusion methods

The Information Retrieval Task: Ranking of studies for

Systematic Reviews

In this project we will consider the problem of ranking research studies identified as part of a

systematic review. Systematic reviews are a widely used method to provide an overview of the

current scientific consensus, by bringing together multiple studies in a reliable, transparent way.

We will use the CLEF 2017 and 2018 eHealth TAR (task 2) collections. In CLEF TAR 2017, the task

we consider is referred to as subtask 1 (and is the only task); in CLEF TAR 2018, the task we

consider is referred to as subtask 2. We provide the CLEF 2017 and 2018 TAR task overview

papers in the assignment folder in blackboard for your reference. These contain details about the

topics, the collection, the task, etc. These details are not necessary to complete the assignment,

but nevertheless you may want to know more about this task, its importance, approaches that

have been tried, and so on.

The task consists of, given as the starting point the results of the Boolean search created by the

researchers undertaking a systematic review, ranking the set of the provided documents (they are

PMID - pubmed ID - in the files provided; for each PMID there is an associated title and abstract).

The goal is to produce an ordering of the documents such that all the relevant documents are

retrieved above the irrelevant ones. This is to be achieved through automatic methods that rank

all abstracts, with the goal of retrieving relevant documents as early in the ranking as possible.

There are two datasets to consider in this project. The CLEF 2017 TAR dataset; and the CLEF 2018

TAR dataset. Each dataset consists of material for training, and. material for testing the developed

information retrieval methods.

What we provide you with

We provide:

for each dataset, a list of topics to be used for training. Each topic is organised into a file.

Each topic contains a title and a Boolean query.

for each dataset, a list of topics to be used for testing. Each topic is organised into a file. Each

topic contains a title and a Boolean query.

each topic file (both those for training and those for testing), includes a list of retrieved

documents in the form of their PMIDs: these are the documents that you have to rank. Take

note: you do not need to perform the retrieval from scratch (i.e. execute the query against

the whole index); instead you need to rank (order) the provided documents.

for each dataset, and for each train and test partition, a qrels file, containing relevance

assessments for the documents to be ranked. This is to be used for evaluation.

for each dataset, and for test partitions, a set of runs from retrieval systems that

participated to CLEF 2017/2018 to be considered for fusion.

a Terrier index of the entire Pubmed collection. This index has been produced using the

Terrier stopword list and Porter stemmer.

a Java Maven project that contains the Terrier dependencies and a skeleton code to give you

a start. NOTE: Tip #1 provides you with a restructured skeleton code to make the processing

of queries more efficient.

a template for your project report.

What you need to produce

You need to produce:

correct implementations of the methods required by this project specifications

correct evaluation, analysis and comparison of the evaluated methods, written up into a

report following the provided template

a project report that, following the provided template, details: an explanation of the retrieval

methods used, an explanation of the evaluation settings followed, the evaluation of results

(as described above), inclusive of analysis, a discussion of the findings.

Required methods to implement

In part 1 of the project you are required to implement the following retrieval methods:

1. TF-IDF: you can create your own implementation using the Terrier API to extract index

statistics, or use the implementation available through the Terrier API

2. BM25: you can create your own implementation using the Terrier API to extract index

statistics, or use the implementation available through the Terrier API

3. The ranking fusion method Borda; you need to create your own implementation of this

4. The ranking fusion method CombSUM; you need to create your own implementation of this

5. The ranking fusion method CombMNZ; you need to create your own implementation of this

We strongly reccommend you use the provided Maven project to implement these methods. You

should have already attempted many of the implementations above as part of the tutorial

exercises.

In the report, detail how the methods were implemented, i.e. (i) which formula you implemented,

(ii) if you did your own implementation or levereged Terrier's ones (for TF-IDF and BM25).

For ranking fusion methods, consider to fuse the runs from previous participants from CLEF

2017/2018 we provide, and the TF-IDF and the BM25 runs you will produce.

What queries to use

We ask you to consider two types of queries for each topic (the second type is optional and

attracts bonus points):

1. for each topic, a query created from the topic title. For example, consider the example

(partial) topic listed below: the query will be Rapid diagnostic tests for diagnosing

uncomplicated P. falciparum malaria in endemic countries (you may consider

performing text processing).

2. (OPTIONAL: 2% bonus if done) for each topic, a query created from the Boolean query

associated with the topic. This Boolean query will be made up of the terms that appear in

the query, but will ignore any operator (e.g., will ignore and , or, Exp , / , etc.) and field

restrictions (e.g., .ti , .ab , .ti,ab , etc.). Note that some keywords in the Boolean query

have been manually stemmed, e.g. diagnos* in the example topic below. As part of the

query creation process, we ask you to use the entrez API. For documentation on the entrez

esearch API, please refer to the Entrez Programming Utilities Help reference available at:

https://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.ESearch. Example usage can be

found at the following URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?

db=pubmed&term=diagnos*. Note the terms in the TranslationStack field. These are the

terms you would use to replace diagnosis* and therefore concatenate to form the query

(along with the other terms).

Above: example topic file

More on the Entrez API

The Entrez API provides access to the Pubmed search functionalities. In this part of the project we

will not use this API for retrieval. However, it also provide some additional method. One in

particular is useful for expanding terms in the Boolean query that have been "wildcarded"

(manually stemmed): the TranslationStack . We have shown you above an example of how to

obtain the output of the TranslationStack for a stem term. You will have to use this method

for all terms in the Boolean query that contain the wildcard operator * . Practically, you will need

to make a call to this API by constructing an appropriate URL, then request that URL, and finally

parsing the response to obtain the list of index terms to use to substitute the wildcarded term

from the boolean query for inclusion in your text query. Note that it is likely that one wildcarded

term will give rise to many terms you will add to your query.

Tips on making query processing efficient

A number of tips have been provided in Blackboard to make the execution of queries more

efficient. Please consider these tips to reduce the execution time of the experiments.

Required evaluation to perform

In part 1 of the project you are required to perform the following evaluation:

1. For all methods, train on the training set for the 2017 topics (train here means you use this

data to tune any parameter of a retrieval model, e.g. and for BM25, runs to be

considered for the rank fusion methods, etc.) and test on the testing set for the 2017 topics

(using the parameter values you selected from the training set). Report the results of every

method on the training and on the testing set, separately, into one table. Perform statistical

significance analysis across the results of the methods.

2. Comment on the results reported in the previous table by comparing the methods on the

2017 dataset.

3. For all methods, train on the training set for the 2018 topics (train here means you use this

data to tune any parameter of a retrieval model, e.g. and for BM25, runs to be

Title: Rapid diagnostic tests for diagnosing uncomplicated P. falciparum

malaria in endemic countries

Query:

1. Exp Malaria/

2. Exp Plasmodium/

3. Malaria.ti,ab

4. 1or2or3

5. Exp Reagent kits, diagnostic/ 6. rapid diagnos* test*.ti,ab

7. RDT.ti,ab

8. Dipstick*.ti,ab

considered for the rank fusion methods, etc.) and test on the testing set for the 2018 topics

(using the parameter values you selected from the training set). Report the results of every

method on the training and on the testing set, separately, into one table. Perform statistical

significance analysis across the results of the methods.

4. Comment on the results reported in the previous table by comparing the methods on the

2018 dataset.

5. Perform a topic-by-topic gains/losses analysis for both 2017 and 2018 results on the testing

datasets, by considering as baseline BM25, and as comparison each of TF-IDF, Borda,

CombSUM and CombMNZ.

6. Comment on trends and differences observed when comparing the findings from 2017 and

2018 results. Is there a method that consistently outperform the others?

7. Provide insights of when ranking fusion works, and when it does not, e.g. with respect to

runs to be considered in the fusion process, queries, etc.

In terms of evaluation measures, evaluate the retrieval methods with respect to mean average

precision (MAP) using trec_eval . Remember to set the cut-off value ( -M , i.e. the maximum

number of documents per topic to use in evaluation) to the number of documents to be reranked

for each of the queries. Using trec_eval , also compute Rprecision (Rprec), which is the

precision after R documents have been retrieved (by default, R is the total number of relevant

docs for the topic).

For all statistical significance analysis, use paired t-test; distinguish between p<0.05 and p<0.01.

Perform the above analysis for: 1. queries created from topic files using the topic title; 2.

(OPTIONAL) queries created from the topic files using the Boolean queries. Finish your analysis by

comparing the effectiveness difference between the methods using topic titles and those using

queries extracted from the Boolean queries (OPTIONAL: to do only if you do consider Boolean

queries and want to obtain the bonus points).

How to submit

You will have to submit 3 files:

1. the report, formatted according to the provided template, saved as PDF or MS Word

document

2. a zip file containing all the runs (result files) you have created for the implemented methods

a zip file containing a folder called runs-part1 , which itself contains the runs (result files)

you have created for the implemented methods.

3. a zip file containing all the code to re-run your experiments. a zip file containing a folder

called code-part1 , which itself contains all the code to re-run your experiments. You do not

need to include in this zip file the runs we have given to you. You may need to include

additional files e.g. if you manually process the topic files into an intermediate format

(rather than automatically process them from the files we provide you), so that we can rerun

your experiments to confirm your results and implementation.

All items need to be submitted via the relevant Turnitin link in the INFS7410 Blackboard site, by 29

August 2019 17:00, Eastern Australia Standard Time 19 September 2019 17:00 Eastern Australia

Standard Time, together with part 2, unless you have been given an extension (according to UQ

policy), before the due date of the assignment.

INFS 7410 Project Part 1 – Marking Sheet – v2

Criterion % 7

100%

4

50%

FAIL 1 0%

IMPLEMENTATION

The ability to:

? Understand

implement and

execute common

IR baseline

? Understand

implement and

execute rank

fusion methods

? Perform text

processing

2 ? Correctly implements the

specified baselines and the

rank fusion methods

? Implemented methods to

deal with title queries

? (OPTIONAL:) Implemented

methods deal with

Boolean queries, and

wildcards are

appropriately handled via

expansion to possible

forms using provided API

(2% bonus)

? Correctly implements the specified

baselines and the rank fusion

methods

? No implementation

? Implements only baselines, but not

the rank fusion methods

EVALUATION

The ability to:

? Empirically evaluate

and compare IR

methods

? Analyse the results of

empirical IR

evaluation

? Analyse the statistical

significance

difference between

IR methods’

effectiveness

2 ? Correct empirical

evaluation has been

performed

? Uses all required

evaluation measures

? Correct handling of the

tuning regime (train/test)

? Reports all results for the

provided query sets into

appropriate tables

? Provides graphical analysis

of results on a query-byquery

basis using

appropriate gain-loss plots

? Provides correct statistical

significance analysis within

the result table; and

correctly describes the

statistical analysis

performed

? Provides a written

understanding and

discussion of the results

with respect to the

methods

? Provides examples of

where fusion works, and

were it does not, and why,

e.g., discussion with

respect to queries, runs.

? Correct empirical evaluation has

been performed

? Uses all required evaluation

measures

? Correct handling of the tuning

regime (train/test)

? Reports all results for the provided

query sets into appropriate tables

? Provides graphical analysis of

results on a query-by-query basis

using appropriate gain-loss plots

? Does not perform statistical

significance analysis, or errors are

present in the analysis

? No or only partial empirical evaluation

has been conducted, e.g. only on a

topic set, or a subset of topics

? Only report a partial set of evaluation

measures

? Fails to correctly handle training and

testing partitions, e.g. train on test,

reports only overall results

WRITE UP

Binary score: 0/1

The ability to:

? use fluent

language with

correct grammar,

spelling and

punctuation

? use appropriate

paragraph,

sentence

structure

? use appropriate

style and tone of

writing

? produce a

professionally

presented

document,

according to the

provided

template

1 ? Structure of the document

is appropriate and meets

expectations

? Clarity promoted by

consistent use of standard

grammar, spelling and

punctuation

? Sentences are coherent

? Paragraph structure

effectively developed

? Fluent, professional style

and tone of writing.

? No proof reading errors

? Polished professional

appearance

? Written expression and

presentation are incoherent, with little

or no structure, well below

required standard

? Structure of the document is not

appropriate and does not meet

expectations

? Meaning unclear as grammar and/or

spelling contain frequent errors.

? Disorganised or incoherent writing.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp