People
Datahunt utilizes AI across the machine learning lifecycle. As a small startup, we've been able to do this because we have a talented team of AI engineers who have been working hard on our technology, reducing the time and cost of running business projects, making them more efficient and more accurate. It's December, the beginning of winter, and the snow is falling, so we're excited to introduce you to Jisu Yu, an AI engineer who is having a busy holiday season.
A. Hello. I'm Jisu Yu, an AI engineer at Datahunt. There are two main areas of research in our company, natural language processing and computer vision, and I am working on computer vision, which mainly deals with image and video data.
A. Yes, I was originally working as an AI researcher. I joined Datahunt in February of this year.
During my graduate school, I was doing research related to natural language processing and text data, but somehow, my first job was computer vision and image data. It was great to learn new research areas and grow my career, but at the same time,I thought it was a bit sad to abandon the natural language processing that I had studied. That's when I found out about DataHunt. Since DataHunt is a company that processes various data without being limited to any field, I decided to join because I thought that I could continue to do NLP in addition to computer vision.
A. I don't know if I'm lucky or not, but I've never had a problem with research, and it's attractive to me that I can take the initiative to do the research I want and see the results immediately. If you're someone who feels satisfied with this, I think a startup will be a good fit.
A. Recently, I've been working on AI that automatically masks information that can identify specific individuals, such as faces, license plates, and the back of social security numbers, because countries are required to remove them when processing data.
At first, the requested data was images, so we developed an object detection model that could detect and mask faces and license plates in images, but later, we needed a de-identification model for video data, so we researched object tracking. Of course, you can simply break the video into frames and make separate inferences for each image, but in object tracking, the information of the previous frame and the next frame is connected, so even if the model does not detect an object in a certain frame, it can supplement the result based on the previous information. The model we developed is a bidirectional object tacking model that references not only the previous frame information but also the information of the next n frames. When comparing the detection performance of image-by-image inference, one-way object tracking, and bidirectional object tracking, the performance of the bidirectional object tracking model we developed was the best.
Our process differs slightly from case to case, but basically, we first make a basic plan based on the kind of technology and data required for the project and its purpose. Then, we search for data and models that are suitable for the purpose and conduct primary experiments. Based on the results of the baseline, we determine the feasibility and then create our own model while exchanging ideas and opinions with team members for full-scale development. It's a cycle of exploration, communication, research, and evaluation.
A. Recently, I mentioned that I was working on object tracking for de-identification, and in some cases, the tracking was not smooth because they were identified as similar objects even though they were different people.Using simple image similarity comparisons, it was not easy to distinguish them because the detailed characteristics of the faces were not reflected, so we tried to apply face-recognition to distinguish them, and we got much better results than checking the basic image similarity.
In addition to that, there's a baseline that the model gives you, a threshold that says, "This is a similar person," and that might be optimal, but it might not be optimal for our study, so we experimented with different parameters to see which ones were optimal for our data. In addition, when an object moves quickly in a video, even if it's captured as an image, it's often blurred, and we call it "blurred." We also experimented with deblurring the image to correct for the blur and comparing the image similarity, and we found that we got a more accurate result than comparing the similarity without deblurring.
Asyou go through research like this, the performance of the model itself is important, but there may be other ideas or techniques that can complement it.I'm always trying to improve our technology by sharing thoughts with my teammates and applying new ideas, and it's a great feeling of accomplishment.
A. Right now, all of our AI engineers are working on different projects, but next year, all of our researchers will be working on one project to prepare our model service.
In order to prepare for this goal, many employees and researchers have been working on AI-related ideation, and we are in the process of making a plan by reviewing various aspects such as marketability, feasibility, and technology of the ideas generated in the process. If the plan is well thought out, I think we will do well because I believe in our researchers' abilities.
A. One of the things I noticed when I first joined DataHunt was that the team itself was very cohesive, and I got the impression that everyone took responsibility for their own work and did it well.
I think when I was in grad school and in other organizations, as you have more people, there's always one person who's struggling or missing, but so far I think we're all in sync, and we're all in good spirits, so I think we're going to be able to get through the tough end of the year because we're all in good spirits.
The office space itself is bigger than before, and we have a conference room. Oh, and we also have a personal space where I can work alone when I want to focus on my work. I'm much more productive when I'm in my own space on days when I'm feeling particularly distracted. I can't wait to show it off sometime.
A. One of the things I realized when I started working at Datahunt, and this may seem obvious, is that all the departments involved in a project, including researchers, hold meetings together.
In my previous job, I also did AI research, but even if I was the lead on a project, I didn't involve researchers in every meeting. They would ask us for consultation or advice on technology, but that was about it. Since we were collaborating with external organizations, most of the time, people who weren't involved in AI technology would hold meetings first, and then communicate their conclusions to us.
But with DataHunt, we start the meeting with the CEO, the planners, the operations team, and the AI engineers organizational chart, so we can be more honest about our position in the project process, and we can give technical advice to non-developers and discuss scheduling, etc.
Asa researcher, I can fully understand and participate in the flow of the project I'm working on, rather than just doing research. I think this is only possible at DataHunt, which emphasizes communication and collaboration.
A. Someone who has an understanding of their field, who can think, worry, and communicate continuously for their own research. Someone who thinks that the work they are doing is something that they should take the initiative to lead and be responsible for, and that they can develop themselves through it, rather than just thinking that it's the company's work, I think that kind of person fits in with our current team members. If we have someone like that, I think we will have a good synergy with each other and be able to work on research more actively and interestingly.
A. If you look at ChatGPT, Dalle 2, or NovelAI, which have been in the news recently, I think generative models are hot. I think this momentum will be maintained for awhile, and I think there will be more demand for multimodal model research that combines text and image data. Dalle 2 and NovelAI also fall into the category of multimodal models.
A. I think the most rewarding moments are when my models are used in the field or when other people are satisfied with the results.
At my previous company, I developed a model to automatically detect dangerous items(smartphones, USBs, storage devices, etc.) in bags for corporate security. You know how at airports we put suitcases through a machine to check for prohibited items on airplanes? They do it at the entrance of a company, too, if it's a large corporation or something that requires security.
At the time, we weren't doing very well with USB detection compared to other categories, and from what I've heard, other companies weren't either, so we went through a lot of different models and approaches to try to solve the problem, and we were able to improve the performance along the way, and I remember when we demoed it to our actual customers, they were amazed at the results.
It reminds me of when we were working on recommendation systems at DataHunt. For example, when you're trying to recommend an activity to a user, you might use a user's characteristics. If height is more important than gender, it should be weighted more heavily to give it more influence on the results. To get the desired recommendation results, we need to assign importance to each data value. The client can set the importance, but we wondered if there was a way to automatically calculate the weights as part of the service. At first, some people thought it would be too difficult, but after much thought, we found away to make this calculation possible, and we were able to produce a convincing result. I remember the CEO sitting next to me in a meeting and saying in a quiet voice, "Good job." I think it was the first time I've ever heard him say that.
A. There is a lot of research being done on "explainable AI". It's called XAI, and I think it's very important for users to be able to understand the results delivered by the AI, and to understand the process of why the results come out the way they do, so that we can improve the performance based on that process and give users confidence in the performance of the AI. We are very interested in this area and use it as a reference point for improving our services.
A. I know that everyone on the team is doing their part and being responsible. I know that when other team members are struggling, they're willing to step up and help, and I think that's what's great about our company. We're all awesome. I hope you all have a healthy holiday season.