In a connected world, the volumen of data that is produced, managed and stored increases exponentially, same as the need to obtain relevant and meaningful information. The benefits of proper explitation of this data through the Big Data application and advanced analytic technologies are plenty and the information that can be obtained is extremelu valuable and revealing.
Public Administrations are aware of this fact and there are many that know the importance and need to effeciently exploit the vast amount of data they contain (health, economy and finance, enviroment, agriculture, etc.).
What porblem does it solve?
At the end of 2014, the Secretary of State for Telecomunications and for the Information Society saqw a clear opportunity of application in the field of evaluating aid applications aimed to encourage the implementtion of high potential projects which will allow them to increase the competitiveeness of the Spanish ICT Industry.
The aid evaluation scenery involves a high volumen of unstructured documentation supported by a limited set of structured data, which must be reviewed and contrasted by a Group of evaluators, so the application of natural language processing techniques was identified as an excellent device to support and facilitate the work of these professionals
How have we solved it?
The information system to support aid evaluation is based on the use of Open Souce tools and technologies on the available data Group, through a processing pipeline consisting of:
- information, extraction, transformation and loading processes (ETL) of information
- natural language processing: including:
- tokenized, lemmatizeed and recognition of entities
- topic analysis
- semantic analysis
- document similarity calculations based on such analysis
- downloading information from websites
- textual and faceted search on the documentry corpus
The implemented facilities provides
- The general analysis of the topics on the complete set of grants requested, including which topics are to be covered, for a more efficient planning of the allocation of evaluators
- A specific view of the topics that each of the applications deals with individually.
- The analysis of the hierarchies of topics tha have been detected in these requests
- A presentation of the temporal evolution of the topics detected in the aid applications of the different calls.
- Support determinating the tematic combinations that tend to happen more frequently in the documentation presented.
- Search functionalities to help identify relevant documents when evaluating an innovative Project, including textual search, the filtration by specifil metadata (years, companies, CNAE, provincial towns…), and the possibility of thematic searching and the identification of similar documents.
The results obtained are aligned with the objectives set and have served to identify new application possibilities such as:
- Identifying which companies are working on certain innovative topics.
- The possibilitu of ezpanding knowledge of Spanish R&D by relating benefits applications with scientific articles and patents presented.
- The impact that funded projects have on R&D and competitiveness.
- The extensión of the services offered by the tool to other organizations that grant benefits within the AGE and the Administrations of the CCAA
At SATEC we collaborate with the Language Technology Promotion Plan, you can find more details about the Corpus Viewer platform on its website.