PT1: for each sample select one label, remove all others.
PT2: remove every sample which has multi labels.
PT3: for every combo of labels create a single-label, i.e. A&B, A&C etc..
PT4: (most common) create L datasets, for each label learn a binary representation, i.e., is it there or not.
PT5: duplicate each sample with only one of its labels
PT6: read the paper
There are other approaches for doing it within algorithms, they rely on the ideas PT3\4\5\6 implemented in the algorithms, or other tricks.
They also introduce Label cardinality and label density.
Efficient net, part 2 - EfficientNet is based on a network derived from a neural architecture search and novel compound scaling method is applied to iteratively build more complex network which achieves state of the art accuracy on multiclass classification tasks. Compound scaling refers to increasing the network dimensions in all three scaling formats using a novel strategy.
Multi label confusion matrices with sklearn