Round table discussions

The future of big data for research and development

Hewlett Packard Enterprise (HPE) Ireland hosted a virtual round table discussion comprising the insight of experts from across third-level education and industry to explore the future of big data for research and development.

What support does industry, and third-level institutions, require to capitalise on data explosion?

Maeve Culloty

Industry needs a clear framework for data governance to enable long-term planning and investment; there are some data governance models in existence today – such as the Data Governance Act as announced by the European Commission – which could provide this certainty, however, there is a long way to go before researchers can have and truly utilise interoperable data across different platforms. It would be useful to understand government’s roadmap as it pertains to data. What we are seeing is that it is very difficult for researchers to access data because of the permissions required and the length of time it takes to get them. With the evolution of the Data Governance Act, we are hoping that Ireland interprets it and establishes a better roadmap for accessibility, portability and usability of data inside the country.

Eoin O’Reilly

From a Tyndall National Institute perspective, it is crucial for our research that we have access to usable open data. It is not only important that the data exists, but that it exists in a format that we can understand and trust. That ranges from materials research, where it is necessary to know the background to how the data was developed, through to work in biophotonics. A second important factor is the ability to analyse the data. The whole research community is on a journey to learn how to use the data. Right across the system, there is much more that we could do that we are not doing currently, if we knew how to do it.

Ray Walshe

We need to concentrate on why we have this data explosion. Where is it coming from and what utility can we get out of it? The two areas within which the greatest strides forward will be made are data governance and standardisation in relation to data explosion. That is where the most progress will be made in attempting to shape the data economy or the digital single market for the future. As a member of the EU, Ireland must align itself with the European Commission regulations, such as the Data Governance Act and the AI directives. Having our own national policy and our own data governance strategy will be key to the success of our indigenous industries in the future.

JC Desplat

I agree with my colleagues. The key factor here is the presence of a well thought out, long-term vision or a national strategy. What has been missing is strong political leadership across successive governments, alongside an absence of a sense of long-term purpose. The timeliness of decisions is increasingly important and there is a need to carefully align any national strategy with the European Commission which has demonstrated very strong leadership in this domain. This would allow Ireland to leverage European initiatives because, increasingly, funding is allocated on the basis of cofunding with other member states. The signals are very positive and now we must ensure that they are realised. For instance, the creation of the new Department for Further and Higher Education, Research, Innovation and Science.

Which data storage and access innovations are you most looking forward to and why?

JC Desplat

Software innovations are often overlooked but in my opinion are of equal importance. The dominant POSIX I/O API used in the vast majority of scientific applications is becoming a bottleneck to performance at scale. The cloud industry has shown how applications can be ported to object-based user space APIs, such as S3, to enable far greater scalability but these APIs are not always suitable for scientific applications. Several research projects such as DAOS and SAGE2 are implementing high performance object stores which I hope will enable a break with the constraints of POSIX which we have been living with for the past 40 years.

Additionally, I see exciting innovations at application level in the form of the emergence of digital twins. These represent a major step change in the way in which high-performance computing [HPC] is utilised, and how HPC coalesces with sister technologies such AI, Big Data, IoT and Edge, etcetera. into a digital continuum. We are entering the era of the digital continuum, and this will have a drastic impact on how we build infrastructure and provide services.

Fred Clarke

There is so much data that there will never be enough disc space to store it all so how we train people to manage data is fundamental. We need to train the research community of the future on how they distil the data they require and how to curate it. Another major challenge for us is the time spent shunting data around to a location where it can be used. We are computationally bound so we move data in, use it and then we have to delete it.

“There needs to be a standardised platform that businesses, governments and citizens can operate their data on and that the movement of that data is secure, refined, and understood.”

— Maeve Culloty, Managing Director, Hewlett Packard Enterprise

Maeve Culloty

One innovation that HPE is working on at the moment is the swarm learning principle, where we move algorithms to where the dataset is hosted – as opposed to having to centralise data in one place before running analysis – enabling movement of the insights onto multiple platforms without the need to move the actual data. It is a development that is attracting a lot of interest, especially because we are talking to customers who remain unsure about whether their datasets should reside in on-premises cloud or the public cloud, often at a cost. Swarm learning is an exciting proposition. Another exciting initiative is the GAIA-X initiative in Europe, creating a platform that will enable interoperability in a secure and standardised way. There are different interpretations of the initiative across Europe, but I believe the principle of it is really important. There does need to be standardised platform that businesses, governments and citizens can operate their data on and then that the movement of that data is secure, refined, and understood.

Ray Walshe

I think two major shifts are ongoing in relation to data storage. The first is an architectural shift. More and more of the storage function behind our data processing capability is being provided by software and in particular, a software stack. The greater migration of software stacks to cloud-based storage or non-premises-based storage is enabling potential cloud-based HPCs. The second shift is a geographical one. With edge computing and IoT at the edge we’re going to have a requirement for a large amount of fast storage at the edge. Flash storage and flash technology will be important in the future.

Eoin O’Reilly

I think a major challenge to data storage in the future will be the need for rapid access and there is a need for strong innovations in this area. So, the development of access to networks to be able to efficiently handle large data samples, both from a research computing side and more widely. Interestingly, some of the challenges in this space can be addressed by the capabilities of big data in terms of digital twins to support very agile networks and functionality.

What are the most significant obstacles to the adoption and implementation of new data innovations?

Ray Walshe

For most innovations to become global, interoperability is required. Interoperability requires standardisation. One of the limitations I see is the adoption of harmonised standards, whether they are European or are international. The lack of adoption of standardised approaches, tools, technologies, instruments, services, systems etcetera is a major limitation when it comes to data innovation. Standardisation is a normal and crucial part of the innovation lifecycle. The reality of the innovation lifecycle is beginning with a scenario in which graduates come up with a smart idea. That idea leads to disruption and new innovation. That disruption then leads to a scenario where many people compete to become dominant in a market. In order for that chaos to subside, standards are required. Once adopted, the standardised approach becomes survival of the fittest. After that has occurred, the chaos dies down, and a stable globalisation emerges whereby people worldwide begin adopting new technology. From university level to national level, there is a growing awareness that standardisation is becoming increasingly important and for industry and educational institutions.

Maeve Culloty

Firstly, when it comes to cookie consent banners, the understanding of the information that is being given away, from a user perspective right through to companies and public administrations, is really lacking. Secondly, within our standard curricula, digital skills should be more prevalent. Regardless of the course that is being undertaken, students should be equipped with a baseline of digital skills, an understanding of cybersecurity, data, and the repercussions. Thirdly, there needs to be national data infrastructure; the Government needs to start thinking about data infrastructure in the same way it does about physical infrastructure – with a long-term plan for investment.

“The effectiveness of collaboration requires the striking of the right balance. The need for clarity on where the benefits to the industry lie are important because if the collaboration means just a sale or is about covert access to data then the true value of partnership will never be delivered and a race to the bottom will ensue.”

— Jean-Christophe Desplat, Director, Irish Centre for High-End Computing (ICHEC)

JC Desplat

In HPC, data innovations are in AI, machine learning and quantum computing. For me, the real innovation is in the effective orchestration and blending of these technologies within the digital continuum. There are four main obstacles to this. Firstly, the excessive splitting of data resulting from chronic underinvestment in data infrastructure. Short-range vision leads to the targeting of opportunistic funding which itself leads to the organic growth and fragmentation of infrastructure. In the end, capability is greatly reduced, and issues of stability and interoperability become more commonplace. Secondly, the competitiveness of the Irish national HPC infrastructure has been in relative decline for over a decade now. This situation is of great concern to me, as we are moving science and research ever closer to needing exascale computing capability as standard. Across Europe, through the EuroHPC Competence Centre initiative [EuroCC], the advancement of academic research and industrial application at this level is underway. Ireland is part of this work but its competitiveness relative to other European countries is lower now than at any time since ICHEC was established. Thirdly, there is a silo mentality and resistance to change. There is a need for strong, long-range political leadership on technology. Maybe technology is now so intertwined with our society that the time has come to appoint a dedicated Chief Technology Officer [CTO] to advise government in the same way the Government CIO and the Chief Science Advisor advise government on their respective domains. A government CTO would provide reliable and timely advice to government on matters such as HPC and sister technologies.

Fred Clarke

Ray is talking about is standards and JC is talking about funding. I think the whole idea of standards-driven data storage and management is great, however storage configurations based on standards are set at a very high bar, it is hard to get funding for a standards-driven approach; you need significant infrastructure to do that.

Ray Walshe

StandICT.eu is a Horizon 2020-funded project which supports engagement on international standardisation by European and Irish experts and we have a good success record in doing that. Standardisation is something that happens in the small, as well as in the large. I can cite many examples where companies such as Openet, have developed their technology in parallel with driving standard processes to become multibillion dollar companies by putting standards at the forefront of their technologies. Perhaps it is not suitable for all industries, but it is possible, and we can use those exemplars to illustrate that the coevolution of technology development and standards development can be very lucrative.

Eoin O’Reilly

One of the major obstacles is the expertise capacity to take advantage of the data. There are some areas which are well positioned to rush ahead in this field while there are others where it would be very advantageous, but they do not have the expertise or the linkages to do so. This relates to Maeve’s emphasis on the value of having everyone having a baseline of digital skills. Digital experts must also be able to collaborate with people who have demands and applications. The evolution of ICHEC is an excellent example of this. In its initial years, ICHEC enabled a small section of the academic community to deliver very good research. Now the wider ecosystem has developed so that ICHEC is a much broader resource which serves not just academia, but also the public service, MNCs and SMEs. Another long-term challenge is resourcing; that is partly funding but it is also the scale of the computing that is required. As more and more adapt to utilise big data a challenge then emerges in terms of natural resource usage. In the long-term, we must look at how we undertake the same processes in a more efficient manner.

How can industry best assist third-level institutions in extracting the most value from their data?

Fred Clarke

Strong links with industry have already been formed and are evident in some of the initiatives already in place. I think there is a bigger role for industry on the ground within third-level institutions to demonstrate their latest initiatives. Successes have been achieved, for example, the skill levels of PhD students are rising because that is what industry is asking for.

“I think the whole idea of standards-driven data storage and management is great, however storage configurations based on standards are set at a very high bar, it is hard to get funding for a standards-driven approach; you need to significant infrastructure to do that.”

— Fred Clarke, Head of Research IT Service, University College Dublin

Ray Walshe

Close collaboration is key to increasing the value associated with the analysis of data. It is important to point out that it is not just a sharing of expertise but also of the technology, the platforms, and the use cases.

Maeve Culloty

We have collaborated extensively with third-level institutions in the area of cybersecurity with that should be a reference point when focusing on data. For example, the funding of courses within universities has ensured we have some of the highest calibre students working with us today. Equally, the research we have done in R&D has enabled us to help develop out the curriculum, meaning it’s beneficial for all parties. We now need to turn that collaboration towards data.

Ray Walshe

Companies like Hewlett-Packard Enterprise have a high-level overview of worldwide trends in relation to ICT and they are well versed in knowing where the new technology opportunities and challenges are. It is important that they disseminate and outsource some of that vision to help build out the next tech leaders in Ireland.

“It is important that across our academic environment we have engagement across the full spectrum, from the fundamental side through to the applied side. I also think it is important that we have different disciplines speaking to each other.”

— Eoin O’Reilly, Chief Scientist, Tyndall National Institute

Eoin O’Reilly

For me, it is through collaboration. We have come a long way in regard to collaboration between third-level institutions and industry in the last two decades and the model we have adopted is making a strong impact. There is a mutual benefit. Yes, access to skills is important but so too is the value of higher-education institutions getting access to relevant industry problems.

JC Desplat

There are clear benefits for higher-education institutions to access modern, finely tuned data services but it must not be done at any cost. The effectiveness of collaboration requires the striking of the right balance. The need for clarity on where the benefits to the industry lie are important because if the collaboration means just a sale or is about covert access to data then the true value of partnership will never be delivered and a race to the bottom will ensue. In particular, I am wary of arrangements where HEIs get locked into specific technologies or platforms, and/or hand over access to and control over their data. I would advise the adoption of open frameworks whenever possible. As we migrate towards a more connected and federated data environment, interoperability will be key.

How can today’s students be equipped to meet the research and development demands of Industry 4.0?

Eoin O’Reilly

Again, it comes back to close engagement between academia and industry, but I believe we need to think more broadly than Industry 4.0. It is important that across our academic environment we have engagement across the full spectrum, from the fundamental side through to the applied side. I also think it is important that we have different disciplines speaking to each other.

Fred Clarke

I’ve seen students really benefiting from training in practical examples and gain a better understanding of good practices in relation to data management. We’re working tightly with UCD library where there are some good resources for data management and archiving. They are very good at distilling what data people need and how long they are going to keep it. Those systems are being put in cost-effectively and offer practical examples of the incentives of good behavior when it comes to data management.

“Having our own national policy and our own data governance strategy will be key to the success of our indigenous industries in the future.”

— Ray Walshe, Director, EU Observatory for ICT Standards (EUOS)

Ray Walshe

DCU’s tagline is “the university of enterprise” and we have very close relationships with industry. What that shows us is that there is quite a diverse skills requirement within the ICT sector. There is no silver bullet to solving the skills gap and what we are seeing is that it is no longer sufficient to have a niche expertise, you need to be able to communicate with other disciplines. That applies to not only the horizontal disciplines like AI, cloud, and big data but also the vertical disciplines like manufacturing, medical and agriculture. There isn’t one type of student who delivers all that, so we need to have a mix of deliveries at graduate, diploma, post-graduate and doctoral level to meet that need. There is going to be constant evolving change and you we can adapt to that change by being grounded in industry.

JC Desplat

A McKinsey report published last July on Industry 4.0 and Covid-19 highlighted the value of digital solutions in facing the pandemic but it also demonstrated the value of a people-centered process and highlighted the need for new skills and training in key areas. A degree of reskilling is required if Industry 4.0 is to be successful. Similarly, a University of Cambridge Report from 2019 on the digitalisation of the manufacturing sector and the policy implications for Ireland identified the business benefits of digital adoption but suggested the need for a digital curricula and the creation of programmes for upskilling networks of SMEs, amongst other things. For Industry 4.0, many students need foundations in data-driven decision-making. We at ICHEC have an education and training role, in particular on the topic of HPC, and we have been involved in key programmes often in collaboration with other centres and institutions. I believe Ireland needs to be far more innovative than how it has addressed skills development in the past.

Maeve Culloty

There is a recognition needed across government and industry of the potential of the data economy. Much like the physical trade economy, there is a need to underpin the data economy with a structured strategy involving things like skills, funding and R&D. Data is going to be a huge part of our economy going forward and we need to have the mechanisms and structures to capitalise on that. Understanding the data economy and how to capitalise on data as a resource will be a fundamental skill for the next generation of graduates.


Round table participants

Fred Clarke

Fred Clarke has worked in information technology for over 30 years. He is currently Head of Research IT, UCD IT Services, a service which provides IT services and infrastructure to the active research community in University College Dublin.

Maeve Culloty

Maeve Culloty is the Managing Director for HPE Ireland. As a qualified chartered accountant, she joined HP/HPE 12 years ago working in the HPE Financial Services business unit. Her career has spanned various roles including audit, customer operations, business development and Global Sales teams.

Jean-Christophe Desplat

Jean-Christophe ‘JC’ Desplat is a technology expert with over 25 years’ experience in high-performance computing (HPC). His particular interest lies in the innovative use of HPC technologies in emerging domains. He has served as advisor to several committees in Ireland and abroad, including the strategic advisory team of the UK Engineering and Physical Sciences Research Council (EPSRC), the ICT sub-committee of the Irish Medical Council and the Climate Change Research Co-ordination Committee of the Irish Environmental Protection Agency (EPA). JC is also the national representative for Ireland on the Partnership for Advanced Computing in Europe (PRACE) AISBL Council since 2010.

Eoin O’Reilly

Eoin O’Reilly is Chief Scientist at Tyndall National Institute, Ireland’s largest research centre, focused on deep-tech research based on photonics and electronics. He joined Tyndall in 2001, as one of the first research professors funded by Science Foundation Ireland. His research on photonic materials and devices is widely recognised, including the award in 2014 of the Rank Prize for Optoelectronics for his pioneering work on advanced semiconductor lasers.

Ray Walshe

Ray Walshe is a senior researcher in the ADAPT Research Centre at Dublin City University (DCU), Ireland. He began his career in industry as a software engineer, software consultant and project manager with LM Ericsson, Software and Systems Engineering Limited and Siemens. Joining the School of Computing at DCU in 1995, he delivers AI, IoT and Data Governance modules on undergraduate, master’s, and PhD programmes and currently (2021) chairs the Graduate Diploma in Web Technologies. Ray has been a digital leader with the World Economic Forum since 2016, was appointed to the IEEE European Public Policy Committee on ICT in 2019 and is currently the AI WG Lead for IEEE EPPC. Ray was also appointed in 2020 to the OECD Network of Experts (ONE AI).


Show More
Back to top button