The Computer Vision Center (CVC) offers ONE pre-doctoral research  

General Rules:  

The object of this call is the selection of a candidate to work as pre-doctoral researcher in the CVC. This selection process is subject to Law 14/2011, of 1 June, of Science, Technology and Innovation, by the CVC Statutes where applicable and the rest of the applicable labor law.  

This call comes ahead of the final resolution of the projects funded by the AEI and the ESF+ and the definitive award of predoctoral contracts for the training of doctors, following the indications of article 12 of the call “Proyectos de Generación de Conocimiento”. Consequently, it is conditional on the final definite resolution of the aforementioned project call. 

Outline of the fellowship: 

The CVC offers a pre-doctoral fellowship for the project “Multimodal LLMs for Document Undestanding (MuDocU)” with reference PID2023-148027OB-I00 funded by MICIU/AEI/10.13039/501100011033 and by FEDER, UE. 

This call has been entrusted by the Agencia Estatal de Investigación (AEI) to the Computer Vision Center by the Subprograma Estatal de Formación, within the Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023, and is carried out under Order CIN/1025/2022, of 27 October, published in the BOE of 29 October, approving the regulatory bases for the award of a public grant for the formalization of a predoctoral contract, corresponding to various programs and subprograms of the said Plan. 

Candidates’ Requirements and Description of the Job:  

  • Candidates should not have benefitted, prior to the submission of the application, of a pre-doctoral contract for more than 12 months.  
  • Candidates will have to comply with all the accession requirements to a PhD program at the Universitat Autònoma de Barcelona (UAB) when accepting the grant.  
Job Description 
Job title PhD fellow position linked to the project Multimodal LLMs for Document Undestanding – MuDocU ( PID2023-148027OB-I00) 
Dedication Full time, 4-year grant (annual evaluation) 
Supervisors Dimosthenis Karatzas and Ernest Valveny  
Background The selected candidate will work in the Computer Vision Centre (CVC), Barcelona, a research institute comprising more than 130 researchers and support staff, dedicated to computer vision research and knowledge transfer, with an excellent research production. With a strong international projection and links to the industry, the Computer Vision Centre offers an exciting environment for scientific career development. The Computer Vision Centre has a plan for expansion of its permanent research staff base and has received the “HR Excellence in Research” award as a provider and supporter of a stimulating and favourable working environment.  The successful candidate will be joining the Vision and Language Group, a vibrant group of researchers at the CVC with more than two decades of experience in research. The Vision and Language Group conducts research at the frontier between computer vision and natural language processing. The group has a special focus on incorporating intelligent reading systems to multi-modal models for scene and document understanding.  
Summary of the project The field of Document Understanding encompasses the techniques and methods that allow machines to analyse and interpret documents through vision. As such, it stands at the frontier between computer vision and natural language processing. Documents convey complex 2-dimensional written communication, and holistic document understanding involves not only extracting the textual content, but interpreting complex structures (e.g. tables), understanding layout (sections, captions, footnotes, etc), graphical elements (plots, figures, etc), text styles, handwritten information, signatures, etc. In many cases, employing external knowledge is necessary to properly interpret the conveyed message.   For the past 40 years, the document analysis field has focused on information extraction techniques (e.g. character recognition, table extraction, layout analysis), basically converting document images to machine readable content. Such techniques were then combined in modular ways to build ad-hoc document processing systems, tied to specific domains and applications. In the last 4 years, the field has experienced a revolution, moving from information extraction to generic document understanding models, that aim to perform conditional analysis of documents, based on a text prompt. The research team proposing this project has played a pivotal role in this shift, by bringing the techniques of Visual Question Answering (VQA) to the document analysis field for the first time (2020) and proposing the first large-scale benchmarks for Document VQA . This has created a significant research following. The subsequent introduction of LLMs and multimodal LLMs gave an extra boost to this space, being the DocVQA benchmark one of the standard evaluation benchmarks used nowadays by all recent LLMs, such as OpenAI GPT4, and Google Deepmind Gemini. All this context opens the possibility for building foundational models for document understanding.  This project aims to capitalize on our research background, to advance holistic Document Understanding (DU) and push research forward in specific directions that are not adequately addressed by current multimodal LLMs. These directions are (i) enabling models to deal with large-scale input (multi-page, multi-document, high-resolution large-scale documents); (ii) creating interpretable / explainable DU models; and (iii) developing models that provide privacy guarantees with respect to sensitive information in their training data. 
Description of the job  The candidate will develop a doctoral thesis on a topic related to the project, with the aim of advancing current state-of-the-art in end-to-end document understanding models, taking into account interpretability and privacy guarantees.  It is expected that the candidate will also collaborate in tasks related to the development of the research project and will present and publish the research results in top international conferences and journals. 
Specific requirements The candidate should possess a MSc degree in computer vision, artificial intelligence, computer science, computer engineering or related subject.  We will positively consider previous background in computer vision, deep learning and/or document image analysis, as well as previous experience in research activities.  Applicants are expected to be fluent in both oral and written communication in English. They should work well in a team while demonstrating initiative and autonomy. 
Selection Committee  Dr. Dimosthenis Karatzas  Dr. Ernest Valveny 

Duration of the fellowship:  

The fellowship has a maximum duration of 4 years and it’s carried out through a pre-doctoral contract. If the pre-doctoral fellow obtains his/her PhD title before the start of the 4th year of the fellowship, he/she will be authorized to be hired this last year as a post-doctoral researcher with a raise in the salary, higher than in the pre-doctoral phase.  

At the end of each fellowship year, the fellow will have to submit a work report, stating the objectives and work plan of the following year, as well as a positive report from his/her thesis director. The renewal of the fellowship for the following year is subject to the positive review of the academic committee of the PhD program. The contract can be terminated if the evaluation is negative. 

Payment and eligible expenses according to the AEI:  

The grant will finance the contract (gross remuneration and employers' fees), compensation costs at the end of the contract and an additional aid to cover the expenses of doing research stays and enrolment fees of the doctoral studies, in accordance with the provisions of the call.  

The salary offered is established by the Royal Decree 103/2019, of March 1, on the Statute of Research Personnel in Training.  For establishing the said salary, the Single Agreement of the General Administration of the State will be used. Specifically, the basic remuneration table corresponding to the Group 1 category1.  The remuneration per year is as follows: 

  • 1st year (12 months): at least 60% of Group 1  
  • 2nd, 3rd and 4th year (36 months): at least 75% of Group 1 


The start of the contract will depend on the final resolution of the AEI project, not coming into force until it is resolved and it is possible to allocate expenses to it. According to the call, the CVC will have 3 months to contract the candidate since the moment the AEI project is approved with the definite resolution. Therefore, the expected start date of the fellowship will be between 1 January to 1 March 2025.  


This fellowship is not compatible with any other fellowship or grant that has the same objective and is financed with public or private funds; Spanish or European, as well as any wage or salary that involve a contractual or statutory relation. 

It could be authorized payments coming from educational tasks (teaching) provided that they have a sporadic and not regular nature. The said payments cannot exceed the 30% of the total fellowship annual amount. The authorization is given by the Director of the CVC. 

Submission of applications:  

Candidates should apply by filling in the online application form at the end of the page. The deadline to submit applications is 30 November 2024. 

Candidates should send a CV and academic record of his/her degree along with a cover letter. The grades obtained in the courses should be included in the academic record. When submitting documents issued by foreign education centers, these should be submitted in accordance to the PhD program requirements. You can only upload one file that should include all required documents. 

If submissions are incomplete or with mistakes that could be corrected, candidates will be required to correct the said mistakes within the next 5 working days. In the case the mistakes are not corrected within the given period, the submission will be considered as invalid and it will be excluded.  

The list with the definitive candidates will be published on the CVC announce board and will be emailed to the selected candidates.  

Evaluation criteria: 

Applications will be evaluated by the IP’s of the project, Dimosthenis Karatzas and Ernest Valveny Llobet.  

A maximum of 100 points will awarded to the following criteria: 

Criterion Points 
1) Academic and/or scientific-technical trajectory of the candidate.  
1.a) Scientific-technical contributions: The academic record and other curricular merits of the candidate will be evaluated, as well as their suitability for the tasks to be carried out based on their training and professional experience. 0 to 45 points 
1.b) Mobility and internationalization: The relevance and impact of the candidate's stays in national and international research centers and/or the industrial sector will be assessed, considering the prestige of the hosting entity and the activities carried out there. 0 to 5 points 
2) Suitability of the candidate for the research activities to be carried out. The candidate's suitability for the project or research activities will be evaluated based on their previous training and experience. This will include the added value that the completion of the project will bring to their research career, as well as the contribution to the institution and research group. 0 to 50 points 

Evaluation of applications:  

Applications received until 30 November 2024 will be taken into consideration for evaluation. The resolution of the fellowship will be published on 18 December 2024 and it will name the selected candidate and, if available, a waiting list. 

In the case that no candidate is selected on the said date, the process will continue open until the position is filled.  

Resignation and admission of substitutes  

If the resignation of the selected candidate is done before he/she signs the official contract, the substitute candidate can directly occupy the position. The substitute candidate could also be called if the selected candidate does not fulfill all the necessary requirements to be hired. 

Transition from the pre-doctoral stage to the post-doctoral orientation period.  

If the pre-doctoral researcher defends and passes his/her PhD thesis before the start of the last year of the fellowship, previous confirmation of the defense of the thesis as well as passing it, the post-doctoral contract would be made official. These contracts will be full-time and will last for one year with a salary as established in section 5. 


The modification of the initial conditions of the fellowship and the period of time for its execution should be authorized by the CVC, who can request any information deemed suitable.  


The following situations will not be considered as part of the contract: temporary disability, pregnancy risk, maternity leave or leave for adoption or foster care, risk during lactation and paternity leave. In these cases, the candidate can request the recovery of this time at least 2 months prior to the end of the contract. In order to calculate the suspension period, the beneficiary entities should send to the CVC the certification or accreditation of the corresponding registration and cancellation. The recovery period will be performed at the end of the last year and for the period of time that has been justified by the beneficiary entity. 

Any other interruption situation of the fellowship that is not contemplated in this section shall be communicated and approved by the thesis supervisor and will be evaluated by the CVC board of direction that will have to give its conformity and set forth the conditions together with the thesis director.  

Breach of contract 

The total or partial unfulfillment of the requirements and obligation established in this resolution and other applicable laws, as we as the conditions that, in its case, established in the corresponding resolution award, will in turn produce, previous creation of a breach of contract file, the loss of the right of the fellowship. 

Bellaterra, 18 October 2024. 

Josep Lladós Canet