RoboMaster to Computer Vision: My AI Journey

"The opponent's energy organs are open, is there any room for reversal next?"

"Blue's infantry droid number five continues to output, constantly using projectiles to hit the opponent's armour plates!"

"Blue wins!"

I will not forget the first win that I had during the RoboMaster competition in 2019. It was this game that convinced me to have a career in computer vision.

In the competition, we needed to identify and locate the armour plates of the target robots to help our robots aim and shoot the target robots efficiently. In the beginning, we only designed our methods based on threshold and graphical morphology, such as using the ratio of height and weight of the light bars, to determine the position of the armour plate. Disappointed with the poor results, we decided to embed a real-time object detection system YOLO (You Only Look Once) (Redmon et al., 2018) in our program. In this way, we could detect both the whole target robot and the armour plate. The YOLO had increased the accuracy and speed of object detection, and made us focus on debugging jointly with other team members. Besides, I also used a depth camera to solve the ammunition box grasping problem. We used Intel Realsense D435 in the project, applying the librealsense library provided by Intel and the powerful image processing library OpenCV to solve problems such as object contour recognition. Excited and passionate, I was inspired by this interesting competition to continue my studies in this area.

In the course Parallel Processing, I was assigned to refine 3D ultrasound images of certain organs into 3D organ models by volume rendering algorithm (R et al., 1988). Volume rendering technique which maps volume data to optical properties can be categorized into indirect volume rendering and direct volume rendering. To obtain a clear 3D model of a foetus in the womb, I compared four common volume rendering algorithms, such as ray casting (Pfister et al.,1999), splatting (Mueller et al., 1999), shear-warp (Guo et al., 2009) and hardware-assisted 3D texture-mapping algorithm (Meiner et al., 1999), and implemented the ray casting algorithm on my model. I designed a novel transfer function with appropriate gradient parameter and used three-dimensional Sobel operator to get precise gradient amplitude. Due to its large amount of calculation, the volume rendering experiment is parallelly accelerated by using Compute Unified Device Architecture.

Since then, I have developed a great research interest in the field of machine learning, particularly in the area of computer vision within it. To dig deeper in this field, I believe a solid foundation in mathematical and coding skills is a key and I have been working hard, in and out of class, to enhance my skills and accumulate my experiences. In the Probability and Mathematical Statistics course, it is an inherently common way of thinking to use existing models to predict the probability of unknown events. However, Maximum Likelihood Estimation (MLE) uses a backwards thinking approach to parameter estimation through unknown distributions. Bayesian classifier is a basic and simple method in machine learning, and it uses this method of MLE in this context. Besides, the Bayesian classifier also uses a log-likelihood approach to deal with the problem of underflow caused by concatenated multiplication operations. Inspired by MLE, I used a backwards thinking approach in my research framework, where I trained the neural network to obtain a probability distribution and use the parameters of this probability distribution to process the source domain image to obtain the target domain image.

"A workman wants to finish his work, but he needs a good tool. "A good grasp of machine learning library can make a contribution to facilitate the development and training of our model. Currently, the most common open-source machine learning platform are TensorFlow, PyTorch, Keras and so on. My most used training framework is PyTorch which accelerates the path from research prototyping to production deployment. To master the usage of PyTorch, I reimplemented U-Net (Ronneberger et al., 2015), a special fully-convolutional-networks (FCNs) architecture. It triggered me to build and deploy the experiment of my research on a suitable framework.

After extensive literature reading, I have become interested in the direction of self-supervised learning. The field of computer vision strongly relies on the availability of high quality, expert-labelled image data sets to develop, train, test and validate algorithms. The availability of such data sets is frequently the limiting factor for research and industrial projects alike. However, generating a sufficient number of ground-truth labels is time-consuming and labor-intensive. As a result, it is becoming a major bottleneck in the deep learning process. Therefore, reducing the need for ground-truth and increasing the representation learning of the samples has become the focus of my research. Self-supervised learning aims to learn useful representations of the input data without relying on human annotation, and lots of unsupervised methods for representation learning have largely closed the gap between unsupervised and supervised representation learning in multiple vision tasks. As the Siamese networks in the self-supervised learning play a fundamental role in representation learning, I am considering about designing a Siamese network to complete my research. After browsing the website of [University Name], I was attracted by the strong faculty background and excellent teaching resources of its Faculty of Engineering. Professor Wang Wenping presented a simple yet interpretable unsupervised method (Chen et al, 2020) for learning a new structural representation in the form of 3D structure points. Their method successfully extracted semantically meaningful structure points and built a PCA based shape embedding to preserve the shape structures well. Moreover, their framework achieves state-of-the-art performance on the sematic shape correspondence task and the segmentation transfer task.

I am looking forward to working with the best faculty and research resources from [University Name] and contributing my ideas and insights into the community. Together, we will make a big difference. I am ready.

Reference

[1] Drebin R，Carpenter L，and Hanrahan P, 1988．Volume rendering. ACM Siggraph Computer Graphics．New [University Name]：ACM ，1988，22(4)：65-74.

[2] M. Meiner, U. Hoffmann, and W. Straer, 1999. Enabling Classification and Shading for 3D Texture Mapping Based Volume Rendering. Proceedings of IEEE Visualization (1999):32.

[3] K. Mueller, N. Shareef, Jian Huang, and R. Crawfis, 1999. High-quality splatting on rectilinear grids with efficient culling of occluded voxels. In IEEE Transactions on Visualization and Computer Graphics, vol. 5, no. 2, pp. 116-134, April-June 1999, doi: 10.1109/2945.773804.

[4] Pfister, Hanspeter, Jan Hardenbergh, Jim Knittel, Hugh Lauer, and Larry Seiler, 1999. The VolumePro real-time ray-casting system. InProceedings of the 26th annual conference on Computer graphicsand interactive techniques: August 8-13, 1999, Los Angeles, California, ed. SIGGRAPH, 251-260. New [University Name], N.Y.: Association for Computing Machinery.

[5] L. Guo, and X. Mei, 2009. Implementation and Improvement Based on Shear-Warp Volume Rendering Algorithm. 2009 International Conference on Computer Engineering and Technology, Singapore, pp. 182-185, doi: 10.1109/ICCET.2009.9.

[6] O. Ronneberger, P. Fischer, and T. Brox, 2015. U-Net: Convolutional networks for biomedical image segmentation. In Proc. Int. Conf. Med. Image Comput. Comput. Assist. Intervent. Cham, Switzerland: Springer, 2015, pp. 234–241.

[7] J. Redmon, and A. Farhadi, 2018. Yolov3: An incremental improvement. arXiv: 1804.02767.

[8] Nenglun Chen, Lingjie Liu, Zhiming Cui, Runnan Chen, Duygu Ceylan, Changhe Tu, and Wenping Wang, 2020. Unsupervised Learning of Intrinsic Structural Representation Points, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9118-9127, doi: 10.1109/CVPR42600.2020.00914.