Scientific Program

Conference Series Ltd invites all the participants across the globe to attend 3rd Global Summit and Expo on Multimedia & Artificial Intelligence Holiday Inn Lisbon – continental, Lisbon, Portugal.

Day 1 :

Keynote Forum

Ching Y Suen

Concordia University, Canada

Keynote: How well can computers recognize handwriting ?

Time : 10:05-10:35

Multimedia 2017 International Conference Keynote Speaker Ching Y Suen photo
Biography:

Ching Y. Suen is the Director of CENPARMI and the Concordia Honorary Chair on AI & Pattern Recognition. He received his Ph.D. degree from UBC (Vancouver) and his Master's degree from the University of Hong Kong. He has served as the Chairman of the Department of Computer Science and as the Associate Dean (Research) of the Faculty of Engineering and Computer Science of Concordia University. Prof. Suen has served at numerous national and international professional societies as President, Vice-President, Governor, and Director. He has given 45 invited/keynote papers at conferences and 200 invited talks at various industries and academic institutions around the world. He has been the Principal Investigator or Consultant of 30 industrial projects. His research projects have been funded by the ENCS Faculty and the Distinguished Chair Programs at Concordia University, FCAR (Quebec), NSERC (Canada), the National Networks of Centres of Excellence (Canada), the Canadian Foundation for Innovation, and the industrial sectors in various countries, including Canada, France, Japan, Italy, and the United States. Currently, he is the Editor-in-Chief of the journal of Pattern Recognition, an Adviser or Associate Editor of 5 journals, and Editor of a new book series on Language Processing and Pattern Recognition. Actually he has held previous positions as Editor-in-Chief, or Associate Editor or Adviser of 5 other journals. He is not only the founder of three conferences: ICDAR, IWFHR/ICFHR, and VI, but has also organized numerous international conferences including ICPR, ICDAR, ICFHR, ICCPOL, and as Honorary Chair of numerous international conferences.

Abstract:

Handwriting is one of the most important media of human communication. We write and read every day. Though handwriting can vary considerably in style and neatness, we recognize handwritten materials easily. Actually humans develop their writing skill in their childhood and gradually refine it throughout their lives. This paper examines ways humans write (from primary school to adult writing) and ways of teaching the computer to recognize (handwriting technology) what they produce from ancient (such as carved scripts, old books and documents) to modern times (such as immigration port-of-entry forms, cheques, payment slips, envelopes, and different kinds of notes and messages). Methods such a machine learning and deep classifier structures, extraction of space and margins, slant and line direction, width and narrowness, stroke connections and disconnections will be analyzed with large quantities of data. Both training procedures and learning principles will be presented to illustrate methodologies of enabling computers to produce robust recognition rates for practical applications in the office and in mobile phones. In addition, the art and science of graphology will be reviewed, and techniques of computerizing graphology will be illustrated with interesting examples.
 

Multimedia 2017 International Conference Keynote Speaker Jean-Marc Ogier  photo
Biography:

Jean-Marc Ogier received his PhD degree in computer science from the University of Rouen, France, in 1994. During this period (1991-1994), he worked on graphic recognition for Matra Ms&I Company. From 1994 to 2000, he was an associate professor at the University of Rennes 1 during a first period (1994-1998)  and at the University of Rouen from 1998 to 2001. Now full professor at the university of la Rochelle, Pr Ogier was the head of L3I laboratory (research lab in computer sciences of the university of la Rochelle) which gathers more than 120 members and works mainly of Document Analysis and Content Management. Author of more than 200 publications / communications, he managed several French and European projects dealing with document analysis, either with public institutions, or with private companies.  Pr Ogier was  Deputy Director of the GDR I3 of the French National Research Centre (CNRS) between 2005 and 2013. He was also Chair of the Technical Committee 10 (Graphic Recognition) of the International Association for Pattern Recognition (IAPR) from 2010 to 2015, and is the representative member of France at the governing board of the IAPR. He is now the general chair of the TC6, dealing with computational forensics of the International Association for Pattern Recognition. Jean-Marc Ogier has been the general chair or the program chair of several international scientific events dealing with document analysis (DAS, ICDAR, GREC, …) He was also Vice rector of the university of La Rochelle from 2005 to 2016, and president of VALCONUM association, which is an association aiming at forstering relations between industries and research organizations. He is now the president of the university of La Rochelle

Abstract:

Document engineering is the area of knowledge concerned with principles,tools and processes that improve our ability to create, manage, store, compact, access, and maintain documents. The fields of document recognition and retrieval have grown rapidly in recent years. Such development has been fueled by the emergence of new application areas such as the World Wide Web (WWW), digital libraries, computational forensics for document processing, and video- and camera-based OCR. This talk will address some recent developments in the area of Document Processing.

Keynote Forum

Jiankun Hu

University of New South Wales

Keynote: Multimedia security in big data era: Advances and open questions

Time : 10:00-10:40

Multimedia 2017 International Conference Keynote Speaker Jiankun Hu photo
Biography:

Jiankun Hu is full Professor at the School of Engineering and IT, University of New South Wales, Canberra, Australia. He has worked in the Ruhr University Germany on German Alexander von Humboldt Fellowship1995--1996; research fellow in Delft University of the Netherlands 1997-1998, and  research fellow in Melbourne University, Australia 1998–1999. Jiankun's main research interest is in the field of cyber security including Image Processing/Forensics and machine learning where he has published  many papers in high quality journals. He has served in the editorial board of up to 7 international journals including the top venue IEEE Transactions on Information Forensics and Security and served as Security Symposium Chair of IEEE flagship conferences of IEEE ICC and IEEE Globecom. He  has served at the prestigious Panel of Mathematics, Information and Computing Sciences (MIC), ARC ERA(The Excellence in Research for Australia) Evaluation Committee.

Abstract:

Multimedia audio, video, and images have dominated the Internet traffic and also constituted the major applications in our daily life. Multimedia security has always been a concern in the community due to the issues of movie and music piracy, and privacy of medical images. Conventional cryptography can be applied to multimedia but additional challenges, such as real-time and efficiency, must be addressed. With emerging multimedia applications such as peer-to-peer streaming, and 3D data, multimedia is embracing a big data era where security and privacy are facing new challenges. Most multimedia security surveys have focused on the aspect of issues related to data communication. In the big data era, cloud computing is becoming a major platform which needs attracting due attention in the multimedia community. This keynote speech will provide a report on the advances in the field and open research questions will be discussed. The focus will be placed on efficient algorithms for emerging multimedia applications such a 3D imaging, and cloud based applications including digital rights management.   

Keynote Forum

Keynote:
Multimedia 2017 International Conference Keynote Speaker   photo
Biography:

 

Keynote speaker slots available

Abstract:

  • Virtual Reality| Animation and Simulations | Computer Vision & Pattern Recognition |Computer Graphics & Applications| Image Processing | Artificial Intelligence| 3D analysis, representation and printing |

Session Introduction

David Xu

Regent University, USA

Title: Maya blend shapes for 3D Facial Animation
Speaker
Biography:

David Xu is tenure associate professor at Regent University, specializing in computer 3D animation and movie special effects. He got MFA Computer Graphics in 3D Animation from Pratt Institute in NY. He has served as a senior 3D animator in Sega, Japan; a senior CG special effector in Pacific Digital Image Inc., Hollywood; and as a professor of animation in several colleges and universities where he developed the 3D animation program and curriculum. He has been a committee member of the computer graphics organization Siggraph Electronic Theater, where he was recognized with an award for his work.

Abstract:

Blend shapes, also known as morph target animation, are a powerful way of deforming geometry such as human face to create various facial expressions, from happy to sad. In this presentation, after overviewing Autodesk Maya blend shapes, their features and work flow, Professor Xu will demonstrate and discuss how to create a more efficient workflow by combining blend shapes with Maya’s Set Driven Key features, and how to create a more complex facial animation with advanced blend shape techniques and features. Concepts and techniques such as blend shape deformer, Blend Shape node, Tweak node, Set Driven Key, morph target animation, vertex position, and more will be introduced and demonstrated. Finally, Professor Xu will discuss the advantages and disadvantages of using morph target animation over skeletal animation in 3D facial animation.

Speaker
Biography:

Chengjiang Long received his Ph.D. degree from Stevens Institute of Technology in 2015. Currently, he is a computer vision researcher in the computer vision team at Kitware, a leader in the creation and support of state-of-the-art technology, providing robust solutions to academic and government institutions, such as DARPA, IARPA, and the Army, as well as private corporations worldwide. Prior to joining Kitware, he ever worked at NEC Labs America and GE Global Research in 2013 and 2015, respectively. To date, he has published more than 20 papers in reputed international journals and conferences, and has been searving as a reviewer for top international journals (e.g., TIP, MM, MVAP and TVCJ) and conferences (e.g., ICCV, ECCV, ACCV, BMVC and ICME).

Abstract:

Active learning is an effective way to relieve the tedious work of manual annotation in many applications of visual recognition. The vast majority of previous works, if not all of them, focused on active learning with a single human oracle. The problem of active learning with multiple oracles in a collaborative setting has not been well explored. Moreover, most of the previous works assume that the labels provided by the human oracles are noise free, which may often be violated in reality.

To solve the above-mentioned issues, we proposed two models (i.e., a distributed multi-labeler active learning model and a centralized multi-labeler active learning model) for collaborative active visual recognition from the crowds, where we explore how we can effectively model the labelers’ expertise in a crowdsourcing labeling system to build better visual recognition models.  Both two models are not only robust to label noises, but also a principled label quality measure to online detect irresponsible labelers. We also extended the centralized multi-labeler active learning model from binary cases to multi-class cases and also incorporate the idea of reinforcement learning to actively select both the informative samples and the high-quality annotators, which better explores the trade-off between exploitation and exploration.

Our collaborative active learning models have been validated in the real-world visual recognition benchmark datasets. The experimental results strongly show the validity and efficiency of the two proposed models.

Speaker
Biography:

Mort Naraghi-Pour received his Ph.D. degree in electrical engineering from the University of Michigan, Ann Arbor, in 1987. Since August 1987, he has been with the School of Electrical Engineering and Computer Science, Louisiana State University, Baton Rouge, where he is currently the Michel B. Voorhies Distinguished Professor of Electrical Engineering. From June 2000 to January 2002, he was a Senior Member of Technical Staff at Celox Networks, Inc., a network equipment manufacturer in St. Louis, MO. Dr. Naraghi-Pour received the best paper award from WINSYS 2007 for a paper co-authored with his student, Dr. X. Gao. Dr. Naraghi-Pour’s research and teaching interests include wireless communications, broadband networks, information theory, and coding. He has served as a Session Organizer, Session Chair, and member of the Technical Program Committee for numerous national and international conferences.

Abstract:

In ensemble systems, several experts, which may have access to possibly different data, make decisions which are then fused by a combiner (meta-learner) to obtain a final result. Such Ensemble-based systems are well-suited for processing Big-Data from sources such as social media, in-stream monitoring systems, networks, and markets, and provide more accurate results than single expert systems. However, most existing ensemble learning techniques have two limitations: i) they are supervised, and hence they require access to the true label, which is often unknown in practice, and ii) they are not able to evaluate the impact of the various data features/contexts on the final decision, and hence they do not learn which data is required. In this paper we propose a joint estimation-detection method for evaluating the accuracy of each expert as a function of the data features/context and for fusing the experts’ decisions. The proposed method is unsupervised: the true labels are not available and no prior information is assumed regarding the performance of each expert. Extensive simulation results show the improvement of the proposed method as compared to the state-of-the-art approaches. We also provide a systematic, unsupervised method for ranking the informativeness of each feature on the decision making process.

Speaker
Biography:

Maria Jose Arrojo is reader at the Department of Humanities of the University of A Coruña. She has the recognition as Titular Professor in the field of Communication Sciences. She is member of the research group that works on the philosophy and methodology of the Sciences of the Artificial. She has published papers analyzing communication sciences from the perspective of the sciences of the artificial, in general, and the design sciences, in particular. Arrojo has been working off research projects on bounded rationality and the sciences of design supported by the Spanish Ministry related to scientific research and technological innovation. Since 2003 she is the co-director of the Post-degree studies in Audiovisual Production and Management at the University of A Coruña. She was an advisor to the Institute for Foreign Trade of the Government of Spain, in strategic issues of the internationalization of the audiovisual sector (2005-2011). 

Abstract:

The procedures for finding information on the Internet are mediated by three main aspects. First, the mode of representation of knowledge, because the search is done by a conceptual content that guides the search designs. Second, the specific procedures of heuristics, that is, the patterns or rules that agents or machines must follow so that they have adequate programming to achieve the results they are looking for. Third, there is the task of the agents, who are the ones who select the aims, choose the procedures and evaluate the results.

All of this is done in the procedures for searching information on the Internet. This is possible by designs based on Artificial Intelligence. The study of these designs requires research from the Artificial Sciences. In this sense, this paper starts from the base of the Communication Sciences conceived as Applied Sciences of Design, therefore of the Sciences of the Artificial. Thus, the epistemological point of departure is the representation of knowledge that set aims, which in this case are forms of information on the Internet (YouTube, Snapchat, etc.). Then the methodological component comes, specific ways of finding the information based on the objectives sought with the designs. Thirdly, we have the ontological factor, where agents intervene to achieve the results, based on the representation of knowledge and search methods

It is assumed here that, to design the machine learning procedure, the starting point of the designs is in Artificial Intelligence. To that end, it is necessary to be clear what the objectives of the search are, to reach the aims drawn with the design. Only then do the machines finish doing what they have been programmed for. The agents intervene at the beginning, for the design aims, and at the end, to evaluate the results obtained.

Speaker
Biography:

Ghyslain Gagnon received the Ph.D. degree in electrical engineering from Carleton University, Canada in 2008. He is now an Associate Professor at École de technologie supérieure, Montreal, Canada. He is an executive committee member of ReSMiQ and Director of research laboratory LACIME, a group of 10 Professors and nearly 100 highly-dedicated students and researchers in microelectronics, digital signal processing and wireless communications. Highly inclined towards research partnerships with industry, his research aims at digital signal processing and machine learning with various applications, from media art to building energy management.

Abstract:

This peculiar combination of illuminated electro-technical elements honors the intellectual journey of philosopher Gaston Bachelard (1884-1962), who interlaced forward-thinking ideas underlying the complex interaction of reason and imagination, an important contribution to inspire us a society deeply marked by scientific and artistic creativity. Permanently installed in the main tunnel at École de technologie supérieure, Montreal, Canada, this interactive artwork reminds future engineers of the importance of the rationale-intuitive bilaterality in any technological innovation.

The animation of lighting creates routes of running light blobs through the tunnel. Since the lighted tubes share the space with actual electrical and HVAC pipes, the lighting dynamics gives the impression of flow of useful elements (electricity, network data, air) in the building. A microphone is hidden in an electrical box at the center of the tunnel to allow interactive control. A sound recognition algorithm is used to identify blowing sounds: when users blow in an opening in this electrical box, the flow of light is accelerated, a symbol of the contribution of engineers in such technical systems.

The artwork was designed as an innovation platform, for students to add elements to the installation in the future, allowing increased interactivity. This platform was successfully tested in 2015 by a team who created a luminous tug of war game in the tunnel, with players using their mobile phones as a controlling device.

The installation was nominated at the Media Architecture Biennale awards ceremony, Sydney, 2016.

Nuria Medina-Medina

University of Granada, CITIC-UGR, Spain

Title: Designing video games
Speaker
Biography:

Nuria Medina received her Ph.D. in computer science from the University of Granada (UGR) in 2004, proposing an adaptive and evolutionary model for hypermedia systems. Nowadays, she belongs to the direction team of the Research Centre for Information and Communications Technologies (CITIC-UGR) and is professor at the Department of Computer Languages and Systems at this Spanish University where she directs a project that implements educational games in Andalusian school classrooms (P11-TIC7486).

Abstract:

The video game industry continues with its high rate of growth and, correspondingly, video games are a product present in most of the first-world households. Thousands of people around the planet work in the development of games and billions of players enjoy these multimedia creations; however, from an engineering perspective, critical issues in the design of these video games are not are being sufficiently considered. Therefore, an academic effort to identify and analyze which are the keys of a good design and the possible design solutions in each particular context should be done in relation to the productive and unstoppable world of video games. With this aim, taxonomies, guidelines and design patterns are different approaches in which we have been working. On the other hand, serious games must be specially attended since the serious propose involved in the game implies the need to design and integrate no-ludic contents and the collaboration with no-technical professionals during all the process. In the first case, it is essential to introduce the serious elements in the game so that they remain hidden within the ludic contents. In the second case, an adequate language is crucial to facilitate the communication between the technical team and the subject-domain experts (educators, doctors, etc.). Particularly, our group has been researching to achieve the indispensable balance between the ludic component and the instructive component in educational video games. As a result, our design methodology establishes a ‘divide and conquer' approach where the game challenges and the educational goals are designed and interrelated making use of graphics notations, which allow modeling of the artefacts of the educational video game in a comprehensible form for all the stakeholders. As a study case, an educational adventure to promote reading comprehension has been developed and is being evaluated.

Speaker
Biography:

Mounîm A. El-Yacoubi (PhD,University of Rennes, France, 1996) was with the Service de Recherche Technique de la Poste (SRTP) at Nantes, France, from 1992 to 1996, where he developed software for Handwritten Address Recognition that is still running in Automatic French mail sorting machines. He was a visiting scientist for 18 months at the Centre for Pattern Recognition and Machine Intelligence (CENPARMI) in Montréal, Canada, and then an associated professor (1998-2000) at the Catholic University of Parana (PUC-PR) in Curitiba, Brazil. From 2001 to 2008, he was a Senior Software Engineer at Parascript, Boulder (Colorado, USA), a world leader company in automatic processing of handwritten and printed documents (mail, checks, forms). Since June 2008, he is a Professor at Telecom SudParis, University of Paris Saclay. His main interests include Machine Learning, Human Gesture and Activity recognition, Human Robot Interaction, Video Surveillance and Biometrics, Information Retrieval, and Handwriting Analysis and Recognition.

Abstract:

Human action recognition (HAR) is an active research field, driven by the dramatic price decrease of powerful digital cameras, storage and computing machines, and by the potential of HAR for designing smart engines making sense of today ubiquitous video streams. In video surveillance, for instance, HAR can help determining abnormal events in public facilities. For e-health, HAR may be harnessed as an assistive technology monitoring people with autonomy loss. Recognizing human actions, however, is challenging. Known variability factors such intra-class and inter-class variability, are much more adverse as they involve additional structural variability that is hard to cope with. Besides, actions are not communicative in general, which hinders relevant information acquisition and tracking. Others variability sources include viewing direction and distance w.r.t acquisition sensor, lighting conditions, etc. In this talk, we review the problem of human action and gesture recognition in general. After discussing the challenges above, we propose a new approach that focuses on video sequential input modeling. The modeling is based on a two-layer SVM – Hidden Conditional Random Field (SVM-HCRF) in which SVM acts as a discriminative front-end feature extractor. First, a sliding window technique segments the video sequence into short overlapping segments, described each by a local Bag-of-Words (BOW) of interest points. A first-layer SVM classifier converts each BOW into a vector of class conditional probabilities. The sequence of these vectors serves as the input observation sequence to HCRF for actual human action recognition. We show how this hierarchical modeling optimally combines two different sources of information characterizing motion actions: local motion semantics inferred by SVM, and long range motion feature dependencies modeled by HCRFs at a higher level. Finally, we show how these models can be extended to the problem of conjoint segmentation and recognition of a sequence of actions within a continuous video stream.

 

​Leonel Toledo

Barcelona Supercomputer Center, Spain

Title: Virtual Environments
Speaker
Biography:

Leonel Toledo recieved his Ph.D from Instituto Tecnológico de Estudios Superiores de Monterrey Campus Estado de México in 2014, where he was a full-time professor from 2012 to 2014. He was an assistant professor and researcher and has devoted most of his research work to crowd simulation and visualization optimization. He has worked at the Barcelona Supercomputing Center using general purpose graphics processors for high performance graphics. His thesis work was in Level of detail used to create varied animated crowds. Currently he is a researcher at Barcelona Supercomputer Center.

Abstract:

The process of designing virtual environments is typically an expensive task in both terms of resources and processing power. It is a complex process to create immersive experiences in simulations or video games, even though hardware capabilities are constantly increasing, allowing developers to create impressive scenes, sometimes is not enough. The constant technological advances rely on heavy GPU computations for developers to be able to represent virtual environments that are composed of millions of polygons to represent highly realistic scenes, nevertheless sometimes developers are faced with an important tradeoff between realism and performance. Recently there has been a remarkable increase in the number of middlewares and frameworks that try to solve the technical requirements of complex 3D. For instance, scenes that have several thousands of characters are computationally expensive as well as memory consuming. To attempt to solve this problem, several techniques must be implemented such as level of detail, illumination, collision avoidance, animation transfer, audio management just to mention a few. Most approximate rendering algorithms ignore perception, or use early vision based perceptual metrics to accelerate performance.
 

Visual perception in computer graphics has received a lot of attention over the past few years. By understanding the limitations of the human visual systems, rendering algorithms can be modified to eliminate unnecessary computations which will produce image with no perceivable difference to the observer. For instance, it is known that observers do not require a physically accurate simulation of the illumination in order to perceive a scene as realistic. Optimizing the rendering stage for any given simulation is a complex process and there are many possible ways that can be used to reduce the detail of a geometric mesh, having different advantages and draw-backs for its implementation within a GPU.

Speaker
Biography:

With hyper/multi-spectral sensor technology evolving and becoming more cost-effective, it is likely we will see these spectral cameras replace standard RGB cameras in a multitude of applications beyond the traditional niches of medical, remote sensing, and precision agriculture [5] in the near future. The rich spectral information contained in hyperspectral images can be used to characterize the objects/materials in the scene with great precision and detail. The availability of more bands than the usual three RGB bands has been shown advantageous in disambiguating objects [1,2,3,4,5,6]. Recently, spectral cameras have been deployed to a wide range of applications, such as in mammography, single-shot spectral imaging is used for breast tumour and developing cancer detection [8], similarly cameras in cars exploit V-NIR range for pedestrian detection in night vision [1], and more. In this talk, we will discuss the hyperspectral imaging pipeline from computer vision [4,7] standpoint in terms of exploiting the multi/hyper-spectral sensor information for a more accurate classification, recognition, and detection tasks. So that the
imaging system is compact, computationally efficient, cost-effective, and a perfect fit for real-time applications.

Abstract:

Vivek Sharma is a research assistant at KU Leuven working with Prof. Luc Van Gool. He expertises in Multi/Hyper-Spectral Imaging for computer vision. He
has a M.Sc. (2014) CIMET degree, and a B.Tech. degree in Computer Science (2011). He had worked in Karlsruhe Institute of Technology and Fraunhofer
IOSB, and had internships in NASA AERONET, Zenrobotics, Technalia Robotiker, and Norwegian Colorlab. He is a member of SPIE, OSA, CVF, and
SpectroNet International Collaboration Forum. He has been involved in organizing several workshops and conferences under SPIE/OSA and IEEE chapters of KU Leuven and KIT. Outcomes of his research are published in major machine learning and computer vision conferences and workshops, such as ICML,CVPR, and ROMAN.

Syed Afaq Ali Shah

The University of Western Australia, Australia

Title: Deep learning for Image set based Face and Object Classification
Biography:

Syed Afaq Ali Shah has done PhD in 3D computer vision (feature extraction, 3D object recognition, reconstruction) and machine learning in the School of Computer Science and Software Engineering (CSSE), University of Western Australia, Perth. He was holder of the most competitive Australian scholarships, which include Scholarship for International Research Fee (SIRF) and Research Training Scheme (RTS). He has published several research papers in high impact factor journals and reputable conferences. Afaq has developed machine learning systems and various feature extraction algorithms for 3D object recognition. He is the reviewer for IEEE Transactions on Cybernetics, Journal of Real Time Image Processing and IET Image Processing journal.

Abstract:

I shall present a novel technique for image set based face/object recognition, where each gallery and query example contains a face/object image set captured from different viewpoints, background, facial expressions, resolution and illumination levels. While several image set classification approaches have been proposed in recent years, most of them represent each image set as a single linear subspace, mixture of linear subspaces or Lie group of Riemannian manifold. These techniques make prior assumptions in regards to the specific category of the geometric surface on which images of the set are believed to lie. This could result in a loss of discriminative information for classification. The proposed technique alleviates these limitations by proposing an Iterative Deep Learning Model (IDLM) that automatically and hierarchically learns discriminative representations from raw face and object images. In the proposed approach, low level translationally invariant features are learnt by the Pooled Convolutional Layer (PCL). The latter is followed by Artificial Neural Networks (ANNs) applied iteratively in a hierarchical fashion to learn a discriminative non-linear feature representation of the input image sets. The proposed technique was extensively evaluated for the task of image set based face and object recognition on YouTube Celebrities, Honda/UCSD, CMU Mobo and ETH-80 (object) dataset, respectively. Experimental results and comparisons with state-of-the-art methods show that our technique achieves the best performance on all these datasets.

David Zhang

Hong Kong Polytechnic University, HK

Title: Advanced Biometrics
Speaker
Biography:

David Zhang graduated in Computer Science from Peking University. He received his MSc in 1982 and his PhD in 1985 in Computer Science from the Harbin Institute of Technology (HIT), respectively. From 1986 to 1988 he was a Postdoctoral Fellow at Tsinghua University and then an Associate Professor at the Academia Sinica, Beijing. In 1994 he received his second PhD in Electrical and Computer Engineering from the University of Waterloo, Ontario, Canada. He is  a Chair Professor since 2005 at the Hong Kong Polytechnic University where he is the Founding Director of the Biometrics Research Centre (UGC/CRC) supported by the Hong Kong SAR Government in 1998. He also serves as Visiting Chair Professor in Tsinghua University, and Adjunct Professor in Peking University, Shanghai Jiao Tong University, HIT, and the University of Waterloo. He is Founder and Editor-in-Chief, International Journal of Image and Graphics (IJIG); Founder and Series Editor, Springer International Series on Biometrics (KISB); Organizer, the 1st International Conference on Biometrics Authentication (ICBA); Associate Editor of more than ten international journals including IEEE Transactions and so on. So far, he has published over 10 monographs, 400 journal papers and 35 patents from USA/Japan/HK/China. According to Google Scholar, his papers have got over 34,000 citations and H-index is 85. He was listed as a Highly Cited Researcher in Engineering by Thomson Reuters in 2014 and in 2015, respectively. Professor Zhang is a Croucher Senior Research Fellow, Distinguished Speaker of the IEEE Computer Society, and a Fellow of both IEEE and IAPR.

Abstract:

In recent times, an increasing, worldwide effort has been devoted to the development of automatic personal identification systems that can be effective in a wide variety of security contexts. As one of the most powerful and reliable means of personal authentication, biometrics has been an area of particular interest. It has led to the extensive study of biometric technologies and the development of numerous algorithms, applications, and systems, which could be defined as Advanced Biometrics.  This presentation will systematically explain this new research trend. As case studies, a new biometrics technology (palmprint recognition) and two new biometrics applications (medical biometrics and aesthetical biometrics) are introduced. Some useful achievements could be given to illustrate their effectiveness.

Li Liu

University of Shanghai for Science and Technology, China

Title: Generating graphs from key points for near-duplicate document image matching
Speaker
Biography:

Li Liu is a lecturer at the University of Shanghai for Science and Technology. She received the Ph.D. degree in pattern recognition and intelligent system from East China Normal University, Shanghai, China, in 2015. She was with the Centre for Pattern Recognition and Machine Intelligence (CENPARMI), Concordia University, Montreal, Quebec, Canada, from 2013 to 2014 as a visiting doctoral student. Her research interests include pattern recognition, machine learning and image analysis

Abstract:

We propose a novel near-duplicate document image matching approach. Some keypoints are first detected from the image using the difference-of-Gaussian function. We then present a clustering method, based on which the keypoints are clustered into several groups. The number of clusters is determined automatically according to the distributions of the keypoints.  Afterwards, a graph is generated whose nodes correspond to the obtained clusters and the edges describe the relationships between two clusters. Consequently, the problem of image matching is transformed to graph matching. To compute the similarity between two graphs, we build their association graph and then find the maximum weight clique. A thorough evaluation of the performance of the proposed approach is conducted on two different datasets. Promising experimental results demonstrate the effectiveness and validity of this method.

Yan Xu

Shandong Normal University, P. R. China

Title: Image super-resolution reconstruction
Speaker
Biography:

Dr. Yan Xu has been a lecturer at the School of Communication of Shandong Normal University in China since 2013. She obtained her bachelor's degree in computer science and technology (2005) and master's degree in computer software and theory (2008) from Shandong Normal University, and received her doctorate degree in signal and information processing from Beijing University of Posts and Telecommunications (2013). Dr. Xu is engaged in the research on image processing, machine learning, artificial intelligence, data mining, etc. She has published more than 17 papers, and attended many international conferences and academic exchanges in South Korea, Spain, Canada, Turkey and other countries.

Abstract:

Image super-resolution reconstruction is to restore (or reconstruct) a high-resolution (HR) image (or image sequences) from a series of low-resolution (LR) images. It restores the high-frequency information of images, and increases the pixel density to enhance the spatial resolution. Super-resolution has become a hotspot of research topic in recent years. Not only research individual and institute, but also government and military had invested lots of energy and finance in this attractive task. In this presentation, we intend to review its history and development tendency of super-resolution technology, summarize its common and classical thinking and method, compare with deblurring and enhancement method, recommend some promising techniques, and propose some interesting applications, especially in our real life.

Speaker
Biography:

Yea-Shuan Huang graduated from the Computer Science Department of Concordia University, Canada, in 1994. He retired from Industrial Technology Research Institute (Taiwan) in 2006 and at the same year became an associate
professor of Computer Science & Information Engineering Department in Chung-Hua University. His research interests are in the areas of face recognition, gesture analysis, biometrics authentication, OCR, image analysis, computer vision, and pattern recognition. In 2010 and 2011, his lab own individually the third place and the second place of the Utechzone Machine Vision Prize on performing face detection, face recognition, gender recognition
and age recognition. He have received about 50 patents and performed more than 10 technology transfers to industrial companies and research institutes.

Abstract:

Face recognition is essential for human to communicate with each other, and due to its impotantance and applicability, many methods have been propsoed in the last three decades. Among various research directions, feature extraction plays an important role because it correlates to the recognition accuracy considerably. However, only a few efforts were devoted to extract the personalized land-mark facial points and further use the geometrical information of them to perform recognition. In this paper, a novel feature-point bilateral recognition(FPBR) method for recognizing human faces is proposed. At first, a set of distinct feature points is extracted from a test image. Then, from every training face image each detected feature point finds its best matched position through a block matching operation. Further, from the detected feature points and their matched ones, two geometrical models for describing their structure relationships are constructed respectively. With a geometrical model comparison design, the difference of the two geometric models is computed. Then, by associating the average matching strength and the difference of geometric models, the score of forward recognition is produced. Similarly, the score of backward recognition can be also produced by just detecting feature points from a training image and locating their individual matched ones from the test image. By summing up the scores of both forward and backward recognition, a bilateral recognition score is obtained and is used to produce the final recognition result. Beside the bilateral recognition, the used feature, called local vector pattern (LVP), will also be introduced which encodes various pairwise directions of vector as a facial descriptor to strengthen the structure of micropatterns. Experiments on the famous Feret face databases show that the proposed algorithm produce an excellent recognition result and performs much better than two other well-known face recognition methods.
 

Speaker
Biography:

Jehan Janbi is an assistance professor in Computer Science and Information Technology College at Taif University. She got her bachelor of Computer Science from King Abdul-Aziz University, Jeddah, Saudi Arabia. She started her Academic career journey as TA lab supervisor and research assistance in Computer Science department in Qassim University. She upgraded her academic level and got her Master and PhD from Concordia University, Montreal, Canada. Her research area is in text and font recognition, mainly for Arabic script. She worked on encoding Arabic digital font’s design characteristics into a number composed of several digits where each digit represents specific design characteristics. This will enhance manipulating and searching fonts based on their appearance.

Abstract:

In digital world, there are thousands of digital fonts makes selecting an appropriate font is not an intuitive issue. Designers can search for a font like any other file using general information such as name and file format. But for document design purposes, the design features or visual characteristics of fonts are more meaningful for designers than font file information. Therefore, representing fonts’ design features by searchable and comparable data would facilitate searching and selecting a desirable font. One solution is to represent a font’s design features by a code composed of several digits. This solution has been implemented as a computerized system called PANOSE-1 for Latin script fonts. It is used within several font management tools as an option for ordering and searching fonts based on their design features. It is also used in font replacement processes when an application or an operating system detects a missing font in an immigrant document or website. This research defined a new model, PANOSE-A, to extend PANOSE-1 coverage to support Arabic characters. The model defines eight digits in addition to the first digit of PANOSE-1which indicates the font script and family type. Each digit takes value between 0-15 where each value indicates a specific variation of its represented feature. Two digits of the models describe the common variations of the weight and contrast features, which are two essential features in any font design. Another four digits describe the shape of some strokes that usually vary in their design between fonts, such as the end shape of terminal strokes, the shape of the bowl stroke, the shape of curved stroke and the shape of rounded strokes with enclosed counter. The last two digits describe the characteristics of two important vertical references of the Arabic font design which are tooth and loop heights.

Speaker
Biography:

 

 

Speaker slots available

We are also accepting proposals for Symposia and Workshops on all tracks. All proposals must be submitted to multimedia@conferenceseries.net

Abstract:

  • Young Researchers Forum

Session Introduction

Jin Wang

Beijing University of Technology, China

Title: Packet loss rate mapped to the quality of experience in the IOT network
Speaker
Biography:

Jin Wang received a Bachelor’s degree in Software Engineering from Beijing University of Chemical Technology, Beijing, China, in 2012.6. And won the National Scholarship in 2010 and won the National Endeavor Fellowship in 2009. She received a master graduate in Computer Application Technology in Shijiazhuang Tiedao University in 2015.1. And published many papers including ISTP, EI and SCI. Participate in the National Natural Science Fund Project. She used to work at the computer center of Navy General Hospital in 2015.4-2015.7 as a intern technician. Participate in Naval Logistics Project and anesthesiaprogram. Now from 2015.4 she is in the school of software engineering, Department of information, Beijing University of Technology, read her PHD, Her research interests are the Internet of things and software engineering and Embedded and  image and video quality assessment in distorting network.

Abstract:

The Internet of things, including Internet technology, including wired and wireless networks. In this paper, we investigate on the QOE and packet loss rate of the network because QOE is important in the network and packet loss rate is the key point in many papers. In order to study the influence of packet loss on the users’ quality of experience QoE and establish the Mapping model of the two when the video transmit in the network, building a NS2 + MyEvalvid simulation platform, by the method of modifying QoS parameters to simulate different degrees of packet loss, focus on the influence of packet loss on QoE and establish the mapping model between them. Experimental results show that, packet loss has a significant influence on Quality of experience. Packet loss rate and the Quality of experience presents a nonlinear relationship, and use Matlab to establish the mapping model, this model’s accuracy is high, easy to operate, can real-time detect packet loss has influence on the user’s quality of experience (QoE). The contribution of this paper is first through research obtained packet loss has a significant effect on the video. Second, based on received the packet loss has a significant effect on QoE study and establish the mapping model of packet loss rate and the user’s quality of experience QoE. Next step is to set up considering network packet loss of video quality evaluation model, on the basis of considering different packet loss rate and different content complexity has effects on QoE which conclude from packet loss has effects on QoE’s part, combine consider other factors such as different packet loss models to establish video quality evaluation model consider the network packet loss, more accurate prediction of QoE is the future work.

Fig. 1 MyEvalvid system structure

Fig. 23  Src13 fitting curve (PSNR) 

Speaker
Biography:

Yoshikatsu Nakajima received his B.E. degree in information and computer science from Keio University, Japan, in 2016. Since 2016, he has been a master student in the Department of Science and Technology and worked as a research assistant of Keio Program for Leading Graduate School at Keio University, Japan. He attended a lot of international and domestic conferences and won five awards in the two years since he started his research. In 2014, he had joined the start-up company Home-tudor Tomonokai as a developer and developed the on-line tutor system by himself using Ruby on Rails. His research interests include augmented reality, SLAM, object recognition, and computer vision.

Abstract:

Camera pose estimation with respect to target scenes is an important technology for superimposing virtual information in augmented reality. However, it is difficult to estimate the camera pose for all possible view angles because feature descriptors such as SIFT are not completely invariant from every perspective. We propose a novel method of robust camera pose estimation using multiple feature descriptor databases generated for each partitioned viewpoint in which the feature descriptor of each keypoint can be almost invariant. Our method estimates the viewpoint class for each input image using deep learning based on the set of training images prepared for each viewpoint class.  We introduce two ways of preparing those images for deep learning and generating databases. In the first method, images are generated by Projection matrix to learn more robustly in the environment by changing those background. The second method uses real images to learn the entire environment around the plane pattern. Through the evaluation result, we confirmed that the number of the correct matches increased and the accuracy of camera pose estimation was improved compared to the conventional method.

Furthermore, we are trying on applying the concept of Viewpoint Class to the field of Object Recognition recently. Object recognition is one of the major research fields in computer vision and has been applied to various fields. In general, conventional methods are not robust in the obstacle and have a problem such that the accuracy is decreased when the camera stagnates at a poor position to the target object. We propose a novel method of object recognition that can be carried in real time by equally dividing the viewpoint around each object in the scene and impartially integrating the Convolutional Neural Network (CNN) outputs from each Viewpoint Class (See Image). We confirmed its effectiveness through experiments.