1st Workshop on Generation of Synthetic Datasets for Information Systems (GenSyn)

16th June 2025, Vienna, Austria

About The Event

Data is crucial for deploying and evaluating advanced information systems – however, usable benchmarks are rarely available and collecting sufficient data to construct new benchmark is challenging due to privacy, scarcity, and legal concerns. Synthetic datasets have become valuable for machine learning tasks, as well as support for data-driven applications, AI, IoT, Digital Twins, and business process models. Synthesizing representative and useful datasets is an elaborate process and the GenSyn workshop aims to discuss both generative AI and classical techniques for creating synthetic datasets for AI and non-AI information systems, presenting state-of-the-art tools and approaches.

When

June 16th, 2025

Call for papers

This workshop is a dedicated forum to encourage the exploration of how synthetic datasets can be integrated across diverse information system engineering (ISE) contexts, which is a developing field and a promising application area for AI, where many approaches are not mature enough yet for publication at the main track and are more exploratory. Areas of Interest include, but are not limited to:

  • Synthetic data for AI and Machine Learning in ISE: Generating high-quality synthetic data to support machine learning applications within ISE, including deep learning, natural language processing, and generative AI models;
  • Method, techniques, and algorithms to generate synthetic data for IS, spanning from AI/ML generative models to traditional frameworks, e.g., model-based tools;
  • Synthetic data generation to support data management systems, e.g., DBMS and knowledge graphs;
  • Process Automation and Mining with Synthetic Data: Utilizing synthetic datasets to improve data-driven applications, process mining, and business process modeling and simulation;
  • Synthetic datasets for real-time application: Creating synthetic datasets to conceive digital twins, simulate real-time data streams in IoT systems, and data for advanced driver assistance systems (ADAS);
  • Data-driven compliance and governance: Addressing regulatory and privacy concerns through synthetic data generation to support decision-making and compliance in sensitive systems, e.g., eGovernment and healthcare;
  • Evaluation of Synthetic Data in Real-world Contexts: Developing benchmarks and methodologies to validate the quality, diversity, sustainability, and realism of synthetic datasets in business intelligence and industry-specific systems;
  • Case studies and experience reports from academia and industry

Important dates

  • Full paper submissions: March 14th, 2025 March 21th, 2025
  • Notification of acceptance: April 7th, 2025
  • Camera-ready copies: April 14th, 2025
  • Author registration: April 14th, 2025
  • Workshop date: 16th June 2025

Submission process

Submissions to the presentation-oriented workshop must conform to the Springer LNCS/LNBIP format and the page limits includes references, figures, tables, and appendixes. Following the Springer guidelines, we accept the following types of submissions:

  • Full papers up to 12 pages, to present mature works with a rigorous evaluation
  • Short papers from 6 up to 8 pages, to present early-stage works with a ligh-weight evaluation or proof-of-concept
Papers should be submitted as PDF, following Springer’s LNCS format. Submissions not conforming to the LNCS format, the page limitations, or being obviously out of the scope of the workshop, will be rejected without review. For Springer’s LNCS format, see the guidelines provided at Springer’s site

To submit your paper, please use the Easychair link by selecting the entry Workshops - GENSYN.

The results described in the submitted paper must be unpublished and must not be under review elsewhere. Three to five keywords characterizing the paper should be listed at the end of the abstract. The selected papers will be discussed on with the paper reviewers as well as during the program board meeting. As the review process is not blind, please indicate your name and affiliation when you submit. According to the Springer standards, the overall acceptance rate cannot exceed 45%-50%.

Keynote

Speaker

Paul Tiwald, Mostly AI

Data Without Barriers: Synthetic Data as a Catalyst for Responsible Innovation

Abstract: The ability to access and use high-quality data is becoming a key enabler — and bottleneck — for innovation across AI and digital systems. Yet privacy, regulation, and data scarcity continue to limit what organizations and researchers can do. At MOSTLY AI, we believe that synthetic data generation is a key element in enabling responsible, inclusive, and scalable data-driven innovation. In this keynote, I’ll introduce the broader vision behind our work: promoting data democratization, with synthetic data playing a central role. I’ll walk through how generative AI models can be used to synthesize rich, realistic tabular datasets, and how these can be safely shared and used across use cases—from AI model development and testing to fairness research, simulation, and beyond. The session will include a live walkthrough of our open-source tools, showcasing how easy and accessible synthetic data generation can be today — and inviting the audience to follow along.

Bio: Paul Tiwald is Principal AI Researcher at MOSTLY AI, a Vienna-based pioneer in privacy-preserving synthetic data. After leading the company’s AI research team for several years, Paul now focuses on advancing open-source innovation, driving hands-on research, and sharing the potential of synthetic data with broader audiences. With a PhD in theoretical physics and deep expertise in generative modeling, he’s passionate about building ethical, accessible, and high-quality data solutions for the real world.

Organizers

Claudio Di Sipio

Post-doc researcher, University of l'Aquila

Arianna Fedeli

Post-doc researcher, Gran Sasso Science Institute

Riccardo Rubei

Post-doc researcher, University of l'Aquila

Eduard Kamburjan

Assistant professor, IT University of Copenhagen

PC members

  • Chenyu Wang, Singapore Management University, Singapore
  • Christophe Debruyne, University of Liege, Belgium
  • David Manrique Negrin, Eindhoven University of Technology, Netherlands
  • Guang Yang, Singapore Management University, Singapore
  • Hong Jin Kang, Singapore Management University, Singapore
  • José Antonio Hernández López, Linköping University, Sweden
  • Martin Weyssow, Singapore Management University, Singapore
  • Shaukat Ali, Simula Lab, Norway
  • Stefan Klikovits, Johannes Kepler Universität, Austria
  • Vittoriano Muttillo, University of Teramo, Italy
  • Xin Zhou, Singapore Management University, Singapore

Program

Program schedule (all times relate to local time in Vienna)

Opening

Keynote Paul Tiwald

Data Without Barriers: Synthetic Data as a Catalyst for Responsible Innovation

Paper session Hooman Tavakoli Ghinani, Nimesh Singh, Tatjana Legler, Achim Wagner and Martin Ruskowski

“Synthetic Data and Active Learning for Efficient Object Detection”

Event Venue: TU Wien

You can find all the information on how to reach the conferece location on the CAiSE main website