Anirban Majumder

I am an Applied Scientist II at Amazon Science in Tempe, Arizona, USA. I completed my B.Tech in Computer Science and Engineering from Netaji Subhash Engineering College , Kolkata, West Bengal, India and Masters of Science (MS) in Business Analytics and Project Management from University of Connecticut, School of Business , Hartford, Connecticut, USA.

Before joining Amazon Science, I was working as a Data Scientist at Google in California, USA and Verizon in New Jersey, USA. My previous experience spans over Business Intelligence, Data Analytics, Big Data, Engineering, Product, Project Management and Leadership in top consulting companies like Accenture , Capgemini , Cognizant Technology Solutions (CTS) and Tata Consulting Services (TCS)

Email / LinkedIn / GitHub / Twitter / Google Scholar

Research

My research mainly focuses on solving real world problems in the field of Data Science using Artificial Intelligence (AI) and Machine Learning (ML) methods. Currently, my research work focuses on Generative AI (Gen AI) and Large Language Models (LLMs).

Industrial Experience

Applied Scientist II

March 2021 - Present

I am working as Applied Scientist to prevent fraud and abuse in Amazon stores. My work includes models to detect risky sellers using credit card risk score, compromised or stolen credit card and virtual credit card, ownership transfer abuse detection and identifying risky relations.

Senior Data Scientist

September 2018 - February 2021

I worked in Small Medium Business (SMB) Marketing Analytics team in developing solutions to profile customers and measure the impact of events on Google Ads and YouTube revenue and product adoption using experimentation design and causal inference techniques.

Data Scientist

September 2016 - September 2018

I worked in Voice of Customer (VoC) team to identify digital friction points in vzw.com ecommerce channel with 2.5B+ transactions per year, 146M+ subscribers and 175M+ unique site visitors using AI/ML techniques to improve sales and digital channel mix.

Associate Manager

June 2013 - August 2015

I spearheaded a team of 15+ developers in data factory and implemented end to end project pipeline using technology accelerators and advanced analytical techniques.

Senior Consultant

January 2012 - June 2013

I engineered semantic layer and support analytics for HR, Inventory and Supply Chain operations, analyzed enterprise level data and devised analytical APIs to report revenue, product category, market segments and sales.

Senior Associate

March 2011 - December 2011

I engaged with data architecture team to formulate prototype on ETL frameworks (schedule, Change Data Capture (CDC) mechanism, target data model, data latency, and operations) and involved in test plan formulation, unit testing, performance testing, regression testing and acceptance testing and data analysis.

IT Analyst

September 2005 - September 2007

I implemented and tested 20+ highly complex and efficient global data warehouse processes, including process automation to reduce manual intervention (nearly 80%).

October 2007 - April 2009

I managed global data warehouse of size ~5-10 TB for 4 global regions, including North America, South America, Europe and Australia in Manufacturing and Supply Chain Management operations.

May 2009 - April 2010

I enhanced and maintained 100+ business reports providing SMART (Strategic Marketing and Research Techniques) solutions for 500+ end users in critical business areas like Sales, Purchasing, Receivables, Item Management, Inventory Management, Forex Gain Loss, Hedge Funds, GL Balances, Market Segments and Customer Analytics.

May 2010 - August 2010

I orchestrated 10+ critical ETL process and 25+ Business Intelligence (BI) reports for Sales, Marketing, Churn Rate and Market Segment Analytics in Telecom operations

August 2010 - February 2011

I managed critical ETL processes and mechanisms to deliver high end decision making frameworks and analytical solutions.

Publications

Mind Map Generation using Text Mining April 2016

Anirban Majumder, Revant Prashant Balasubramanian, Ashwin Ramanathan

Email is a fast, cheap and effective way of communication. It has become a very popular mode of communication over the past few years. People are sending and receiving thousands of messages per day in the process of communication with their friends, family, relatives, colleagues and other important persons in their lives. They are even sharing their important documents and information via emails. With so many emails in their inbox, email management has become a very tough and challenging job in recent times. They often overlook the most important emails, unable to analysis the connection between two related emails and even find it difficult to scroll through long email chains. These problems give us the motivation to explore an option to create a quick, easy and concise mind map of important and most recent emails for the end users without even logging into their email accounts.

Multimodal Information System and Speech Recognition

Anirban Majumder

This paper on Multimodal Information System and Speech Recognition was presented and published in AUTOCHONDRIAC-04 held at the Institute of Technology, Banaras Hindu University in January, 2004 and was awarded second prize for this all India paper presentation competition.

Research Works

Intelligent AI Agents for Fraud and Abuse Detection

Anirban Majumder

Fraud and abuse in financial, healthcare, and digital systems are growing concerns that traditional rule-based methods struggle to address. This paper presents intelligent AI agents that use machine learning, NLP, and behavioral analytics to detect anomalies and identify suspicious patterns in real time. By combining supervised and unsupervised models, the system improves detection accuracy and reduces false positives.

Rise & Impact of AI Agents in Digital Landscape

Anirban Majumder

AI agents are revolutionizing industries by automating tasks, enhancing decision-making, and enabling adaptive learning through machine learning and natural language processing. While they drive efficiency and innovation across sectors like healthcare, finance, and education, they also raise ethical concerns, including bias, security risks, and the spread of misinformation. As AI-powered ecosystems evolve, a balance between technological advancements and regulatory oversight is crucial to ensuring these intelligent agents augment human potential while mitigating risks.

DetoxBench: Benchmarking large language models for multitask fraud & abuse detection

Joymallya Chakraborty, Wei Xia, Anirban Majumder, Dan Ma, Walid Chaabene, Naveed Janvekar

Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks but their practical application in high-stake domains, such as fraud and abuse detection, remains an area that requires further exploration. The existing applications often narrowly focus on specific tasks like toxicity or hate speech detection. In this paper, we present a comprehensive benchmark suite designed to assess the performance of LLMs in identifying and mitigating fraudulent and abusive language across various real-world scenarios. Our benchmark encompasses a diverse set of tasks such as detecting spam emails, misogynistic language, etc. We evaluated several state-of-the-art LLMs (Anthropic, Mistral AI, AI21 family, etc) to provide a comprehensive assessment of their capabilities in this critical domain.

DetoxBSM: LLM to Detect Abusive Language on Amazon

Wei Xia, Anirban Majumder, Joymallya Chakraborty

Buyer Seller Messaging (BSM) system on the Amazon is designed to facilitate communication between customers and sellers. However, it sometimes encounters undesired abusive language exchanges among users, including hate speech, insults, and other toxic content. This abusive content can lead to negative customer experience and reputational risk for Amazon. Detecting abusive language is challenging because such text is written differently compared to traditional text. It may involve explicit mentions of abusive words, obfuscated words and typological errors. In this paper, we propose a LLM based solution, DetoxBSM, to detect abusive language in Amazon BSM channel.

Seller Identity Association using Fuzzy String Matching

Joymallya Chakraborty, Anirban Majumder

Fraudsters often create large number of digital accounts to conduct fraud or abusive activities in large scale. One of the easiest way to find association among these accounts is to find exact match of different user attributes. But most of the time these attributes are not exactly same and hence we introduce an advanced fuzzy matching Seller Identity Association Model (SIDA) model that generates fuzzy similarity score based on user attributes.

Streamlining Bad Actor Detection: A Unified Model Solution

Mojtaba Khanzadeh, Hector Flores, Diptendra Bagchi, George Pu, Anirban Majumder

Historically, Bad Actor Disincentives (BAD) has relied on the modus operandi (MO) approach to identify bad actors. These methods address risk at various stages of a seller's life cycle, leading to delays in the process. The ultimate aim of BAD is to proactively identify risky sellers and expedite the detection of bad actors. To achieve this objective, we propose an extensible data processing and modeling framework that has capability to integrate various types of data and modalities to detect high risk sellers. In addition, ensemble learning was employed to minimize false negatives, ensuring fewer bad actors got undetected.

Conferences and Events

ACM-ASU AI Workshop & Fireside Chat 2025, Tempe, Arizona

Amazon Guest Speaker Event 2024, Tempe, Arizona

Amazon Machine Learning Conference (AMLC) 2024, Seattle, Washington

Amazon Machine Learning Conference (AMLC) 2023, Seattle, Washington

Amazon Machine Learning Conference (AMLC) 2022, Dallas, Texas

Search Marketing Expo (SMX) West 2019, San Jose, California

Professional Memberships

Institute of Electrical and Electronics Engineers (IEEE) Senior Member

Data Science Association (DSA)

Data Science Central (DSC)

Society for Automotive Engineers (SAE)

Website Template Credits Last updated: 03/22/2023