A brief reading about Direct Preference Optimization method to address the alignment fine tuning for LLM.
4-th year Ph.D. student
A brief reading about Direct Preference Optimization method to address the alignment fine tuning for LLM.
An introduction to community detection problem with a view from matrix factorization.
I want a software to manage and organize collections of papers in my way.
Mendeley is closed source so it is out of the picture. Zotero is pretty nice and open source with extensible capabilities through extensions. However, as a Vim
and Ranger
user, I crave for a purely keyboard driven interface. And I could not find any thing like that. So I wrote one.
We have readings on the trendy LLM. I collected some papers myself here. The list is still updating
These are a few papers I found interesting either by the work itself or the concepts/techniques it used, although the concept/techniques might be old.
There are many algorithms presenting in RL in a very intuitive way, but looks a bit heuristic. While re-reading Reinforcement Learning as an attempt to get rid of that heuristic feeling, I’ve tried to digest it under an optimization perspective. And well, I realized I couldn’t make any connection whatsoever from optimization understanding to any algorithm presenting in RL.
So this is an attempt to make thing more concrete under a somewhat first principle view.
The note is currently very unorganized. Link