2/ First, we are incredibly grateful for the authors of the SAYCam paper and dataset (@mcxfrank, @andyperfors, and others not on twitter), which made our work possible and we were not involved in collecting. SAYCam provides an unprecedented look at a child's egocentric,…
— Wai Keen Vong (@wkvong) February 1, 2024
4/ To test this, what better than to train a neural network, not on enormous amounts of data from the web, but only on the input that a single child receives? What would it learn then, if anything? pic.twitter.com/bQ9aVbXUlB
— Wai Keen Vong (@wkvong) February 1, 2024
6/ Results: Even with limited data, we found that the model can acquire word-referent mappings from merely tens to hundreds of examples, generalize zero-shot to new visual datasets, and achieve multi-modal alignment. Again, genuine language learning is possible from a child's… pic.twitter.com/FCHfZCqftr
— Wai Keen Vong (@wkvong) February 1, 2024
8/ Limitations: A typical 2-year-old's vocabulary and word learning skills are still out of reach for the current version of CVCL. What else is missing? Note that the modeling, and the data, are inherently limited compared to children’s actual experiences and capabilities.
— Wai Keen Vong (@wkvong) February 1, 2024
- CVCL…
No comments:
Post a Comment