GenSyn2026

2nd Workshop on Generation of Synthetic Datasets for Information Systems (GenSyn)

Verona, Italy

About The Event

Data is crucial for deploying and evaluating advanced information systems – however, usable benchmarks are rarely available and collecting sufficient data to construct new benchmark is challenging due to privacy, scarcity, and legal concerns. Synthetic datasets have become valuable for machine learning tasks, as well as support for data-driven applications, AI, IoT, Digital Twins, and business process models. Synthesizing representative and useful datasets is an elaborate process and the GenSyn workshop aims to discuss both generative AI and classical techniques for creating synthetic datasets for AI and non-AI information systems, presenting state-of-the-art tools and approaches.

Where

Co-located with the 38th International Conference on Advanced Information Systems Engineering (CAiSE 2026)

When

June 8, 2026

Call for papers

This workshop is a dedicated forum to encourage the exploration of how synthetic datasets can be integrated across diverse information system engineering (ISE) contexts, which is a developing field and a promising application area for AI, where many approaches are not mature enough yet for publication at the main track and are more exploratory. Areas of Interest include, but are not limited to:

Methods, techniques, and algorithms to generate high-quality synthetic data for IS, including traditional frameworks, natural language process algorithms, and generative AI models.
Synthetic data generation for databases and knowledge graphs, including queries and data integration.
Process Automation and Mining with Synthetic Databy utilizing synthetic datasets to improve data-driven applications, process mining, and business process modeling and simulation.
Synthetic datasets for real-time application by creating synthetic datasets to train digital twins, simulate real-time data streams in IoT systems, and data for advanced driver assistance systems (ADAS).
Data-driven compliance and governance: Addressing regulatory and privacy concerns through synthetic data generation to support decision-making and compliance in sensitive systems, e.g., eGovernment and healthcare.
Producing novel synthetic dataset benchmarks for different application domains where data scarcity represents an issue, e.g., industry settings or healthcare.
Evaluation of Synthetic Data in Real-world Contexts in particular methodologies to validate the quality, diversity, sustainability, and realism of existing synthetic datasets in business intelligence and industry-specific systems.
Case studies and experience reports from academia and industry.

Important dates

March 8, 2026 (AoE): Deadline for workshop paper submissions
March 28, 2026 (AoE): Compliance with Springer's requirements on acceptance rates.
March 31, 2026 (AoE): Notification of acceptance sent to workshop authors
April 7, 2026 (AoE): Deadline for camera-ready papers
June 8, 2026: Workshop date

Submission process

Submissions to the presentation-oriented workshop must conform to the Springer LNCS/LNBIP format and the page limits includes references, figures, tables, and appendixes. Following the Springer guidelines, we accept the following types of submissions:

Full papers up to 12 pages, to present mature works with a rigorous evaluation
Short papers from 6 up to 8 pages, to present early-stage works with a ligh-weight evaluation or proof-of-concept

Papers should be submitted as PDF, following Springer’s LNCS format. Submissions not conforming to the LNCS format, the page limitations, or being obviously out of the scope of the workshop, will be rejected without review. For Springer’s LNCS format, see the guidelines provided at Springer’s site

To submit your paper, please use the Easychair link by selecting the entry "2nd edition of Generation of Synthetic Datasets for Information Systems".

The results described in the submitted paper must be unpublished and must not be under review elsewhere. Three to five keywords characterizing the paper should be listed at the end of the abstract. The selected papers will be discussed on with the paper reviewers as well as during the program board meeting. As the review process is not blind, please indicate your name and affiliation when you submit. According to the Springer standards, the overall acceptance rate cannot exceed 45%-50%.

Section Title

Keynote

Liina Kamm

The Potential of Synthetic Data in e-Governance

Abstract: Synthetic data seems to be an obvious technology that would solve different data access issues in an e-government setting. Yet governments are hesitant to adopt it into their pipelines. We will look at the possibilities and barriers of synthetic data based on the example of Estonia - a country with an advanced digital society and a mature state database system.

Bio: Liina Kamm is a senior researcher and principal investigator at Cybernetica (a deep-tech SME in Estonia). Her research focuses on privacy enhancing technologies and their uptake, and the privacy and security of AI systems. She holds a PhD degree in computer science from the University of Tartu. She leads the AI security and privacy research team in the Estonian Centre of Excellence in AI (EXAI) and is the chairman of Technical Committee 4 (Information technology) of the Estonian Centre for Standardisation and Accreditation.

Organizers

PC members

Adem Ait Fonollà, SnT-University, Luxembourg
Andrea Maldonado Hernandez, Technical University of Munich, Germany
Boqi Chen, McGill University, Canada
Charlotte Verbruggen, TU Wien, Austria
James Pontes Miranda, CEA-List, France
José Antonio Hernández López, University of Murcia, Spain
Martin Kuhn, German Institute for Artificial Intelligence (DFKI), Germany
Pablo Gomez Abajo, Universidad Autónoma of Madrid, Spain
Stefan Klikovits, Johannes Kepler Universität, Austria

Program

9:00-10:00

Keynote Speaker Liina Kamm, Senior Researcher at Cybernetica

10:00-10:30

Synthetic log generation under control-flow conditions using autoregressive models Martin Kuhn, Tony Trinh, Joscha Grüger and Ralph Bergmann

10:30 - 11:00

Coffee break

11:00-11:25

Benchmarking Natural Language Database Conversational Agents Tomás P. Lenzi, Eduardo R.S. Nascimento, Matheus O. Silva, Grettel M. García, Yenier T. Izquierdo, Michelle S.P. Facina, Isabela G. Siqueira, Melissa Lemos and Marco A. Casanova

11:25 - 11:50

Graphtender: An Interactive Generator of Absence-Aware Synthetic Graph Datasets José Calderón, Daniel Ayala, Inma Hernandez and David Ruiz

11:50 - 12:30

Discussion session and closing remarks

Event Venue: Verona, Italy

You can find all the information on how to reach the conferece location on the CAiSE main website