Speaker |
Title |
|
|
|
|
|
Current research on emotions
and on multimodal interfaces at Casa Paganini - InfoMus
as part of EU projects HUMAINE, ENACTIVE, and TAI-CHI
8 August 2007, Wednesday, 14:00 |
|
|
Dilek
H. Tür
International Computer Science Institute
UC Berkeley, USA |
Spoken Language Understanding
for Conversational Systems
8 August 2007, Wednesday, 16:00 |
Leonello Tarabella
Research Area of the
National Council of Researches
Italy |
Gesture touchless live
computer music
computerART project of ISTI-C.N.R. music and visual art by computer
1 August 2007, Wednesday, 14:00 |
Gaël Richard
ENST (Télécom Paris)
France |
1 - An overview of audio
indexing
19 July Thursday, 14:00
2 - Transcription and separation
of drum signals from polyphonic music
20 July Friday, 14:00 |
Roland Bammer, PhD
Stanford University, LUCAS MRS/I Center, Dept. of Radiology, School of Medicine,
Stanford, CA, USA
Clinical Applications and User Interfaces for
DT-MRI Data: Tensor Field Visualization and Interaction
Diffusion tensor imaging (DTI) and its variants provide important diagnostic
information about tissue microstructure occult to conventional imaging. DTI
leverages on the highly anisotropic proton self-diffusion in white matter fibers
and thus the eigenvector orientation of the 2nd order tensor provides an excellent,
non-invasive, surrogate for the orientation of these fibers.
Clinically, DTI promises to be of great utility for better understanding the
pathophysiology of diffuse white matter abnormalities in a great variety of
diseases, such as multiple sclerosis, schizophrenia, dyslexia, autism, traumatic
brain injury, etc., but also for focal abnormalities, such as tumors. For the
latter, the directional information obtained with DTI can be used to study the
involvement of important fiber tracts and therefore can facilitate surgical
planning. Visualization of fiber tracts or tractography has also gained attention
by the neuroscience community, where it is utilized to associate functional
connectivity (via functional fMRI) with anatomical connectivity (via DTI tracking).
However, the multidimensionality of the diffusion tensor itself and the tracking
results pose challenges when it comes to presenting these data to the clinician
or neuroscientists for interpretation or quantification. Graphical user interfaces
and software tools for medical image analysis have so far been focused on 2D
or 3D data sets and perception of and interaction with these multi-dimensional
data is therefore difficult and still in its infancy.
The objective of this presentation is to provide a general overview of user
interfaces for the acquisition and presentation of MRI data, tools for visualizing
scalar metrics of the diffusion tensor, and state-of-the-art methods for DTI
tractography. This will be followed by a discussion of current concepts to present
and interact with tracking data, their respective strengths and weaknesses,
and a discussion on how these tools could be tailored to be more efficient for
a clinical setting.
Giovanna Varni
InfoMus Lab. Casa Paganini Intl Centre of Excellence, DIST, University of Genoa,
Italy
The EyesWeb XMI open platform for multimodal
interaction
This seminar presents an overview of the architecture and main technical features
of EyesWeb XMI (for eXtended Multimodal Interaction), a hardware and software
platform for real-time multimodal processing of multiple data streams. This
platform originates from the previous EyesWeb platform and it is the result
of a three year work concerning a new conceptual model, design and implementation.
Main focus of EyesWeb XMI is on multimodality and cross-modality in order to
enable a deeper, natural, and experience-centric approach in human-computer
interaction. In this framework, a very crucial target was to improve synchronization
and processing of several different data streams. Concrete scenarios and interactive
systems based on EyesWeb XMI applications will be shown during the seminar.
Antonio Camurri, PhD
InfoMus Lab. Casa Paganini Intl Centre of Excellence, DIST, University of Genoa,
Italy
Research projects on multimodal interfaces and
emotion at Casa Paganini – InfoMus
The seminar introduces research at InfoMus Lab in multimodal interfaces for
non-verbal expressive communication, for experience-centric, multimedia systems
able to interpret the high-level information conveyed by users through their
non-verbal expressive gesture, and to establish an effective dialog with users
taking into account emotional, affective content. The seminar addresses research
issues in the design of multimodal interactive systems such as the following:
multimodal analysis, i.e., approaches and techniques for extracting high-level
non-verbal information from expressive gesture performed by users, and the interaction
strategies that such systems should apply in the dialog process with users;
the emergence of novel interface paradigms, e.g. tangible acoustic interfaces;
research on emotional interfaces and measurements of emotion in subjects exposed
to music stimuli. The seminar will refer to research projects at the InfoMus
Lab (www.infomus.org , www.casapaganini.org)
based on the EyesWeb XMI open software platform ( www.eyesweb.org).
Metin Tevfik Sezgin, PhD
University of Cambridge, Computer Laboratory, William Gates Building, 15 JJ
Thomson Avenue, Cambridge CB3 0FD, UK
Temporal Sketch Recognition and Sketch Based
Interfaces
Sketching is a natural mode of interaction used in a variety of settings. For
example, people sketch during early design and brainstorming sessions to guide
the thought process; when we communicate certain ideas, we use sketching as
an additional modality to convey ideas that can not be put in words. The emergence
of hardware such as PDAs and Tablet PCs has enabled capturing freehand sketches,
enabling the routine use of sketching as an additional human-computer interaction
modality.
But despite the availability of pen based information capture hardware, relatively
little effort has been put into developing software capable of understanding
and reasoning about sketches. To date, most approaches to sketch recognition
have treated sketches as images (i.e., static finished products) and have applied
vision algorithms for recognition. However, unlike images, sketches are produced
incrementally and interactively; one stroke at a time and their processing should
take advantage of this.
In this talk, I will describe ways of doing sketch recognition by extracting
as much information as possible from temporal patterns that appear during sketching.
I will present a sketch recognition framework based on hierarchical statistical
models of temporal patterns. I will show that in certain domains, stroke orderings
used in the course of drawing individual objects contain temporal patterns that
can aid recognition. I will also briefly summarize some of the current work
on sketch-based interfaces at the University of Cambridge Computer Laboratory.
Dilek Hakkani-Tür, PhD
International Computer Science Institute, UC Berkeley, Berkeley, CA, USA
Spoken Language Understanding in Conversational
Systems
Understanding language is about extracting the "meaning" from natural
language input. One of the biggest challenges of spoken language understanding
is the characteristics of naturally spoken language, which varies greatly orthographically
and incorporates prosody and syntax. The same meaning can be expressed in many
different surface forms and also the same surface form can express many different
meanings. Another challenge for spoken language understanding is robustness
to noise in the input resulting from the errors in the speech recognizer output
and the disfluencies in spontaneously spoken language. Furthermore, one has
to deal with the lack of typographic cues such as paragraphs and punctuation
in the speech recognizer output.
In this talk, I will mainly summarize the previous work attacking these challenges
using data driven approaches. I will briefly present related work on domain-dependent
and independent meaning representations and then describe the state-of-the-art
for some of the popular language understanding tasks.
Leonello Tarabella, PhD
Research Area of the National Council of Researches via Moruzzi, 1 - 56124 Pisa,
Italy
Gesture touchless live computer music
computerART project of ISTI-C.N.R. music and visual art by computer
I here propose the practical results of my research in interactive/improvised
electro-acoustic music after having developed both hardware and software tools.
The research in the whole finds roots in my active experience in jazz music.
My proposal emphasizes the importance of expressiveness and feeling in live
computer music performance. Two different original gesture recognition devices
and systems, or hyper-instruments, are described (PalmDriver and Handel) together
with the "pCM" real-time music language based on C-language, for sound
synthesis and events management.
1) The PalmDriver hyper-instrument is an electronic device based on IR technology
which consists of 2 sets of 8 that measure the distance of the different zones
of the hands’ palm. The PalmDriver is stable and responsive. As a consequence,
sounds generated by the computer evoke on the performer the sensation of “touching
and modelling the sound”.
2) Image processing technology has been used for realizing Handel System hyper-instrument:
a CCD camera is connected to a video grabber card. The digital image is then
analyzed as the reconstructed image consisting of those pixels whose luminance
is greater than a predefined threshold and color. On the basis of Handel, the
Imaginary Piano has been realized. Information is used for controlling an algorithmic
compositions rather than for playing scored music.
3) For composing and for performing interactive computer music using the hyper-instruments,
I realized a framework based on pure C programming, that is pure-C-Music or
pCM. This programming framework gives the possibility to write a piece of music
in terms of synthesis algorithms, score and management of data streaming from
external interfaces. The composition itself is a C program, which mainly consists
of the Score and Orchestra parts. The Object Oriented paradigm is mainly used
for defining instruments in terms of class declaration then instanced as many
times as wanted. Everything is compiled into machine code that runs at CPU speed.
- I propose the presentation of the above mentioned hyper-instruments and the
pCM language.
- I also propose a live performance using my tools and systems.
Gaël Richard, PhD
ENST (Télécom Paris), 75014 Paris, FRANCE
Talk 1 Title:
An overview of audio indexing
Summary: The enormous amount of unstructured digital audio
(and more generally multimedia) data available nowadays and the spread of its
use as a data source in many applications are introducing new challenges to
researchers in information and signal processing. The need for content-based
audio data indexing and retrieval techniques to make the audio information more
readily available to the user is becoming ever more critical. The purpose of
this talk is to provide an overview of some approaches of audio indexing with
a focus on music signal processing
Talk 2 Title: Transcription and separation of
drum signals from polyphonic music.
Summary: The purpose of this talk is to present current research
directions in audio indexing conducted at GET-ENST. After a brief introduction
of our subspace based signal analysis framework, several aspects of audio indexing
such as feature selection or Harmonic/noise decomposition will be illustrated
in the context of drum signal processing (drum signal separation and transcription).
The talk will be concluded with a demonstration of audio post-remixing (with
enhanced or reduced drum track) and if time remains with a demonstration of
a drum loop retrieval system from vocal queries. The content of this talk will
be largely based on a paper recently accepted for publication [1].
[1] O. Gillet and G. Richard. Transcription and separation of drum signals
from polyphonic music. Accepted for publication In IEEE Transactions on Audio,
Speech, and Language Processing, Special Issue on Music Information Retrieval,
2007.