We use a variety of methods to label wallets, including:
Heuristics and algorithms, Smart-contract parsing and analysis, Investigations and research by our team, User submissions and more.
And crucially the interplay between all of these sources. More than 99% of our labels are algorithmically inferred. We aim for extremely high precision, meaning we'd rather not label an address than label it incorrectly.
One key thing to realize is that there is a very strong network effect in adding wallet labels. If we know that wallet X is of type A, and wallet Y is of type B, then if wallet Z interacts with X and Y in a certain way, we can sometimes infer that Z is of type C. The consequence is that every wallet label we add to our platform can help us infer even more wallet labels.
For the on-chain data itself, we rely heavily on the open source project Ethereum ETL. The main contributors to Ethereum ETL are members of the Nansen core team.