Anirban Majumder
I am an Applied Scientist II at Amazon Science in Tempe, Arizona, USA. I completed my B.Tech in Computer Science and Engineering from Netaji Subhash Engineering College , Kolkata, West Bengal, India and Masters of Science (MS) in Business Analytics and Project Management from University of Connecticut, School of Business , Hartford, Connecticut, USA.
Before joining Amazon Science, I was working as a Data Scientist at Google in California, USA and Verizon in New Jersey, USA. My previous experience spans over Business Intelligence, Data Analytics, Big Data, Engineering, Product, Project Management and Leadership in top consulting companies like Accenture , Capgemini , Cognizant Technology Solutions (CTS) and Tata Consulting Services (TCS)
Email  / 
LinkedIn  / 
GitHub  / 
Twitter
|
|
Research
My research mainly focuses on solving real world problems in the field of Data Science using Artificial Intelligence (AI) and Machine Learning (ML) methods. Currently, my research work focuses on Generative AI (Gen AI) and Large Language Models (LLMs).
|
 |
Applied Scientist II
March 2021 - Present
I am working as Applied Scientist in Bad Actor Disincentives (BAD) Science team to prevent fraud and abuse in Amazon stores. My work includes models to detect risky sellers using credit card risk score, compromised or stolen credit card and virtual credit card, ownership transfer abuse detection and identifying risky relations.
|
 |
Senior Data Scientist
September 2018 - February 2021
I worked in Small Medium Business (SMB) Marketing Analytics team in developing solutions to profile customers and measure the impact of events on Google Ads and YouTube revenue and product adoption using experimentation design and causal inference techniques.
|
 |
Data Scientist
September 2016 - September 2018
I worked in Voice of Customer (VoC) team to identify digital friction points in vzw.com ecommerce channel with 2.5B+ transactions per year, 146M+ subscribers and 175M+ unique site visitors using AI/ML techniques to improve sales and digital channel mix.
|
 |
Associate Manager
June 2013 - August 2015
I spearheaded a team of 15+ developers in data factory and implemented end to end project pipeline using technology accelerators and advanced analytical techniques.
|
 |
Senior Consultant
January 2012 - June 2013
I engineered semantic layer and support analytics for HR, Inventory and Supply Chain operations, analyzed enterprise level data and devised analytical APIs to report revenue, product category, market segments and sales.
|
 |
Senior Associate
March 2011 - December 2011
I engaged with data architecture team to formulate prototype on ETL frameworks (schedule, Change Data Capture (CDC) mechanism, target data model, data latency, and operations) and involved in test plan formulation, unit testing, performance testing, regression testing and acceptance testing and data analysis.
|
 |
IT Analyst
September 2005 - September 2007
I implemented and tested 20+ highly complex and efficient global data warehouse processes, including process automation to reduce manual intervention (nearly 80%).
October 2007 - April 2009
I managed global data warehouse of size ~5-10 TB for 4 global regions, including North America, South America, Europe and Australia in Manufacturing and Supply Chain Management operations.
May 2009 - April 2010
I enhanced and maintained 100+ business reports providing SMART (Strategic Marketing and Research Techniques) solutions for 500+ end users in critical business areas like Sales, Purchasing, Receivables, Item Management, Inventory Management, Forex Gain Loss, Hedge Funds, GL Balances, Market Segments and Customer Analytics.
May 2010 - August 2010
I orchestrated 10+ critical ETL process and 25+ Business Intelligence (BI) reports for Sales, Marketing, Churn Rate and Market Segment Analytics in Telecom operations
August 2010 - February 2011
I managed critical ETL processes and mechanisms to deliver high end decision making frameworks and analytical solutions.
|
 |
Mind Map Generation using Text Mining April 2016
Anirban Majumder, Revant Prashant Balasubramanian, Ashwin Ramanathan
Email is a fast, cheap and effective way of communication. It has become a very popular mode of communication over the past few years. People are sending and receiving thousands of messages per day in the process of communication with their friends, family, relatives, colleagues and other important persons in their lives. They are even sharing their important documents and information via emails. With so many emails in their inbox, email management has become a very tough and challenging job in recent times. They often overlook the most important emails, unable to analysis the connection between two related emails and even find it difficult to scroll through long email chains. These problems give us the motivation to explore an option to create a quick, easy and concise mind map of important and most recent emails for the end users without even logging into their email accounts.
|
 |
Multimodal Information System and Speech Recognition
Anirban Majumder
This paper on Multimodal Information System and Speech Recognition was presented and published in AUTOCHONDRIAC-04 held at the Institute of Technology, Banaras Hindu University in January, 2004 and was awarded second prize for this all India paper presentation competition.
|
 |
DetoxBench: Benchmarking large language models for multitask fraud & abuse detection
Joymallya Chakraborty, Wei Xia, Anirban Majumder, Dan Ma, Walid Chaabene, Naveed Janvekar
Large language models (LLMs) have demonstrated remarkable capabilities in natural language processing tasks but their practical application in high-stake domains, such as fraud and abuse detection, remains an area that requires further exploration. The existing applications often narrowly focus on specific tasks like toxicity or hate speech detection. In this paper, we present a comprehensive benchmark suite designed to assess the performance of LLMs in identifying and mitigating fraudulent and abusive language across various real-world scenarios. Our benchmark encompasses a diverse set of tasks such as detecting spam emails, misogynistic language, etc. We evaluated several state-of-the-art LLMs (Anthropic, Mistral AI, AI21 family, etc) to provide a comprehensive assessment of their capabilities in this critical domain.
|
 |
DetoxBSM: LLM to Detect Abusive Language on Amazon
Wei Xia, Anirban Majumder, Joymallya Chakraborty
Buyer Seller Messaging (BSM) system on the Amazon is designed to facilitate communication between customers and sellers. However, it sometimes encounters undesired abusive language exchanges among users, including hate speech, insults, and other toxic content. This abusive content can lead to negative customer experience and reputational risk for Amazon. Detecting abusive language is challenging because such text is written differently compared to traditional text. It may involve explicit mentions of abusive words, obfuscated words and typological errors. In this paper, we propose a LLM based solution, DetoxBSM, to detect abusive language in Amazon BSM channel.
|
 |
Seller Identity Association using Fuzzy String Matching
Joymallya Chakraborty, Anirban Majumder
Fraudsters often create large number of digital accounts to conduct fraud or abusive activities in large scale. One of the easiest way to find association among these accounts is to find exact match of different user attributes. But most of the time these attributes are not exactly same and hence we introduce an advanced fuzzy matching Seller Identity Association Model (SIDA) model that generates fuzzy similarity score based on user attributes.
|
 |
Streamlining Bad Actor Detection: A Unified Model Solution
Mojtaba Khanzadeh, Hector Flores, Diptendra Bagchi, George Pu, Anirban Majumder
Historically, Bad Actor Disincentives (BAD) has relied on the modus operandi (MO) approach to identify bad actors. These methods address risk at various stages of a seller's life cycle, leading to delays in the process. The ultimate aim of BAD is to proactively identify risky sellers and expedite the detection of bad actors. To achieve this objective, we propose an extensible data processing and modeling framework that has capability to integrate various types of data and modalities to detect high risk sellers. In addition, ensemble learning was employed to minimize false negatives, ensuring fewer bad actors got undetected.
|
|