We first theoretically explore the impact of neural quantization on federated knowledge transfer across quantized DNNs and provide the convergence proof of the quantized federated learning.
Tag: Computer Vision
M2-V Video De-Duplication
The goal of this project is to develop novel deep learning algorithms for video segment hashing and identification to support efficient and accurate duplicates identification and removal from phones and cloud storages.
O3-N Understand Video Transcripts in Live Streamed Videos
We propose comprehensive resources and models for understanding automatically
transcribed videos. In particular, in this project, we pursue a deep learning model for identifying the
important points and questions mentioned in a video transcript. To achieve this objective, we employ two specific deep learning models.
F2-T DeepSLAM: Object Detection, Re-identification and Prediction wih Implicit Mapping
The project goals and objectives are
1.Communication privacy and security of multi-agent systems:
Develop a privacy-enhanced multi-agent system that uses shared knowledge for both (i) Vision and (ii) Communication tasks.
2.Ego-motion prediction under Intermittent Feedback:
This goal removes the assumption that the GPS signal is always given and considers a GPS denied area. We design a hybrid system to help a traditional error-based control method maintain an error bound, of its state information using a CNN-based localization method.
M1-V Deep Learning in Video Compression
We are proposing end-to-end video compression with motion field prediction. In video-based point cloud compression (V-PCC), a dynamic point cloud is projected onto geometry and attribute videos patch by patch for compression. We propose a CNN-based occupancy map recovery method to improve the quality of the reconstructed occupancy map video. To the best of our knowledge, this is the first deep learning based accurate occupancy map work for improving V-PCC coding efficiency.
M3-V Video De-Duplication
The goal of this project is to develop novel deep learning algorithms for video segment hashing and identification to support efficient and accurate duplicates identification and removal from phones and cloud storages.
F2-T DeepSLAM: Object Detection, Re-identification and Prediction wih Implicit Mapping
DeepSLAM project aims to Develop an end-to-end deep neural network simultaneously capable of (a) Object Detection, and (b) Re-identification/Tracking
(a) Object Detection using RGB images/videos.
(b) Object Re-identification over multiple frames.
This all-in-one solution provides a better level of information integrity and reuse. In doing so, a local belief of the surrounding area is trained with occupancy grid to generate a local implicit map to capture dynamic road condition. Then local coarse implicit mapping is then combined with global accurate road information for above goals.
F10-V Robust PCA with Outlier Mitigation
Principal Component Analysis (PCA) has been widely used in computer vision and machine learning applications due to its excellent performance in compression, feature extraction and feature representation. However, PCA suffers from severely degraded performance when outliers exist in datasets. To address this issue, the project is intended to develop a robust PCA algorithm, capable of mitigating outliers. The key idea is to leverage a popularity index for each sample so that outliers will contribute little in finding the projection matrix of the PCA.
M1-V DeepCompression: Deep Learning in Future Video Coding
Conventional video coding methods optimize each part separately which might lead to sub-optimal solution. Motivated by the success of deep learning on computer vision tasks, we are proposing deep learning for video compression in an end-to-end manner.
F2-T DeepSLAM: Object Detection, Re-identification and Prediction wih Implicit Mapping
DeepSLAM project is designed to develop an end-to-end and all-in-one deep neural network capable of followings, simultaneously, with only road scene RGB images
(a) Object Detection using RGB images and videos;
(b) Object Re-identification over multiple frames, with occlusions;
(c) Predicting multiple object’s future trajectories.
This all-in-one solution provides a better level of information integrity and reuse. In doing so, a local belief of the surrounding area will be trained with grid cells, a navigation system in humans’ brain, to generate a local implicit map to capture dynamic road condition. Then local coarse implicit mapping is then combined with global accurate road information for above three goals.